Abstract
Gastric cancer and esophageal cancer are the second and sixth leading causes of cancer-related death worldwide. Multiple genomic alterations underlying gastric cancer and esophageal squamous cell carcinoma (ESCC) have been identified, but the full spectrum of genomic structural variations and mutations have yet to be uncovered. Here, we report the results of whole-genome sequencing of 30 samples comprising tumor and blood from 15 patients, four of whom presented with ESCC, seven with gastric cardia adenocarcinoma (GCA), and four with gastric noncardia adenocarcinoma. Analyses revealed that an A>C mutation was common in GCA, and in addition to the preferential nucleotide sequence of A located 5 prime to the mutation as noted in previous studies, we found enrichment of T in the 5 prime base. The A>C mutations in GCA suggested that oxidation of guanine may be a potential mechanism underlying cancer mutagenesis. Furthermore, we identified genes with mutations in gastric cancer and ESCC, including well-known cancer genes, TP53, JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and CHEK2, and potentially novel cancer-associated genes, KISS1R, AMH, MNX1, WNK2, and PRKRIR. Finally, we identified recurrent chromosome alterations in at least 30% of tumors in genes, including MACROD2, FHIT, and PARK2 that were often intragenic deletions. These structural alterations were validated using the The Cancer Genome Atlas dataset. Our studies provide new insights into understanding the genomic landscape, genome instability, and mutation profile underlying gastric cancer and ESCC development. Cancer Res; 76(7); 1714–23. ©2016 AACR.
Introduction
Gastric cancer and esophageal cancer cause an estimated 783,000 and 407,000 deaths, respectively, each year, and represent the second and sixth leading causes of cancer-related death worldwide (1). In China, gastric cardia adenocarcinoma (GCA) and esophageal squamous cell carcinoma (ESCC) occur together in the Taihang Mountains of north central China, including Shanxi and Henan Provinces, at some of the highest rates reported for any cancer (2), and historically over 20% of all deaths in this region have been attributed to these diseases (3). However, the cause of the high rates and geographical overlap of these two anatomically adjacent but histologically distinct tumors has not been determined. Gastric cancers in this area occur primarily in the uppermost portion of the stomach and are referred to as GCA, whereas those in the remainder of the stomach are referred to as gastric noncardia adenocarcinoma (GNCA). In addition to being anatomically adjacent, GCA and ESCC share many of the same etiologic risk factors, and before the widespread use of endoscopy and biopsy, they were diagnosed as a single disease referred to as “esophageal cancer” (4). The reason for the high rates of GCA and ESCC in this geographic area and their relation to each other remains unclear, but there are almost certainly common etiologically important environmental exposures, and a recent genome-wide association study of germline DNA found that the same SNPs in the PLCE1 gene had the strongest associations with risk for both GCA and ESCC (5). This led to our concurrent examination of these two cancers plus GNCA in the current study.
Recent advances in next-generation sequencing technology have revolutionized how we study cancer genomes. The identification of IDH1/2 mutations, initially in glioma (6, 7) and more recently in many other cancers such as AML (8), has transformed our understanding of cancer by relating mutations to metabolic control and epigenetic regulation (9). IDH1/2 encodes isocitrate dehydrogenases (IDH), which convert isocitrate to 2-oxoglutarate. But mutant IDHs produce 2-hydroxyglutarate, which inhibits the methyl cytosine hydroxylase TET2 as well as H3K36 demethylases, thus changes the global epigenetic landscape. Whole genome sequencing (WGS) is particularly useful for elucidating complex genomic changes, including translocations, inversions, tandem duplications, and large deletions. The importance of these structural changes has been well documented in the case of BCR-ABL in leukemia, TMPRESS2-ERG in prostate cancer (10), and EML4-ALK in lung cancer (11).
For gastric adenocarcinoma and ESCC, several publications have reported genomic scale analyses of cancers using exome or WGS technology. Wang and colleagues performed exome sequencing of 22 gastric cancer samples and identified frequent mutations in ARID1A (12). Mutations in ARID1A were particularly high in gastric cancers with microsatellite instability (MSI; 83%) or with Epstein-Barr virus (EBV) infection (73%). Exome sequencing of 15 gastric adenocarcinomas and their matched normal DNAs by Zang and colleagues also identified frequent mutations of ARID1A (13). In addition, they found 5% of gastric cancer contained FAT4 mutations. A study by Agrawal and colleagues reported exomic sequencing of 11 esophageal adenocarcinomas (EAC) and 12 ESCCs and found frequent NOTCH1 mutations in ESCC (14). Nagarajan and colleagues performed WGS analysis for two gastric cancer samples and found three mutational signatures (15). Dulak and colleagues did exome sequencing for 149 EAC tumor-normal pairs and WGS for 15 EACs and matched normals. They found a high prevalence of A>C transversions at AA dinucleotides (16). A recent study by Wang and colleagues analyzed 100 tumor–normal pairs of gastric cancer with WGS and identified MUC6, CTNNA2, GLI3, RNF43, and RHOA as significantly mutated driver genes. They found that RHOA mutations are specific for diffuse-type tumors (17). Recently, the International Cancer Genome Consortium research team published a study involving WGS of 17 ESCC cases and exome sequencing of 71 ESCC cases, which identified two novel cancer driver genes, ADAM29 and FAM135B (18). Lin and colleagues performed exome sequencing on 20 ESCC cases and targeted sequencing of 139 ESCC cases and identified several mutated genes that were previously unknown, including FAT1, FAT2, ZNF750, and KMT2D (19), and Gao and colleagues did exome sequencing on 113 ESCC cases and noted that histone modifier genes were frequently mutated, including KMT2D, KMT2C, KDM6A, EP300, and CREBBP. (20). Zhang and colleagues recently reported exome sequencing of 90 ESCC and WGS of 14 ESCCs and found that an APOBEC-mediated mutational signature in 47% of tumors (21).
Despite this progress, the full spectrum of genomic alterations in gastric cancer and ESCCs, particularly genomic structural variations (SV) and intergenic and intronic mutations, remains to be characterized. To discover and characterize genomic alterations in GCA, GNCA, and ESCC, we analyzed whole-genome sequences of tumors and matched blood DNA samples generated by Complete Genomics Inc. (CGI). We present here our findings of novel mutation substitution pattern, driver mutations, and recurrent SVs. Complete characterization of the genomic landscape of these cancers will hopefully provide new strategies for early diagnosis and therapy for these deadly diseases.
Materials and Methods
Study population
This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital, the Cancer Institute and Hospital of the Chinese Academy of Medical Sciences (CICAMS, Shanxi, PR China), and the U.S. National Cancer Institute (NCI, Bethesda, MD). Fifteen cases were analyzed with WGS. These cases came from a larger study sample and were selected on the basis of high-quality, sufficiently large amount of DNA (requiring at least 10 μg) available for WGS, and patients being deceased. Three cases with ESCC, seven cases with GCA, and four cases with GNCA diagnosed between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, were recruited to participate in this study. One ESCC case from Yaocun Commune Hospital in Linxian, Henan Province was also recruited. None of the cases had therapy before their surgical resection. After obtaining informed consent, cases were interviewed to obtain information on demographics, cancer risk factors (e.g., detailed family history of cancer), and clinical information. Only deceased cases were selected for study. Clinical data are described in Supplementary Table S1.
Biological specimen collection and processing
Venous blood (10 mL) was taken from each case before surgery and germline DNA from whole blood was extracted and purified using the standard phenol/chloroform method. Tumors obtained during surgery were snap-frozen in liquid nitrogen and stored at −130°C until used. The specimens were chosen for this study based on two criteria: (i) histologic diagnosis of ESCC or gastric cancer confirmed by pathologists at the Shanxi Cancer Hospital or CICAMS, and the NCI; (ii) availability of high purity tumor tissue (at least >75%).
Tissue DNA isolation
DNA from frozen tumors was extracted using AllPrep DNA/RNA/Protein Mini Kit (Qiagen, Inc.). DNA was dissolved in 100 μL Buffer BE. Concentrations of DNA were measured with the NanoDrop 2000 Spectrophotomer (Thermo Fisher Scientific) according to the manufacturer's instructions. DNAs were run on 0.7% agarose gel (UltraPure Agarose Powder, Invitrogen) in 1× TAE buffer to identify high-molecular weight genomic DNA (>20 Kb single band) and photographed using AlphaImager EC system (Biosciences, Inc.). DNA was quantified using Quant-iT PicoGreen Kit (Invitrogen).
Whole genome sequencing
Of note, 10 μg DNA was sent to CGI for WGS. CGI delivered data for SVs, DNA copy number variations, and single nucleotide variations (SNV). A detailed description of WGS data can be found in the company's website http://www.completegenomics.com/. Summary of the WGS data is described in Supplementary Table S2. We focused on somatic alterations, that is, tumor-specific changes present in tumor but absent in blood. We used cgatools, CallDiff, and JunctionsDiff to generate somatic SNVs and somatic SVs, respectively. Somatic copy number alterations (CNA) data came from CGI reports. These three types of somatic changes for each patient are shown in Circos plots in Fig. 1.
Circos plots of somatic single nucleotide variants, copy number alterations, and structural variations in the 15 cancer genomes. The inner ring displays SVs: black for intrachromosomal SVs and red for interchromosomal SVs. The second ring next to SVs is CNA, shown in gray. The third ring is SNV, shown in green. The outside ring is the chromosome ideogram. The sample description can be found in Supplementary Table S1.
Circos plots of somatic single nucleotide variants, copy number alterations, and structural variations in the 15 cancer genomes. The inner ring displays SVs: black for intrachromosomal SVs and red for interchromosomal SVs. The second ring next to SVs is CNA, shown in gray. The third ring is SNV, shown in green. The outside ring is the chromosome ideogram. The sample description can be found in Supplementary Table S1.
Genotyping with the Affymetrix SNP5.0 array
We performed genotyping experiments for 12 samples using the Affymetrix GeneChip Human Mapping 5.0 arrays. Briefly, 250 ng of DNA was digested with Nsp I or Sty I, ligated to an adaptor, and amplified by PCR. PCR products were processed for fragmentation and labeling; labeled DNA was hybridized onto the chips. The chip was scanned with the Affymetrix GeneChip Scanner 7G Plus using Affymetrix GeneChip Command Console software, and the data files were automatically generated. Genotype calls were generated by Genotyping Console v4.1 software (Affymetrix). SNP calls generated by the SNP array and WGS agreed very well, with a median of 99.65% of the time across all 12 genomes (Supplementary Table S3). We noted that agreement was slightly lower in tumors (median 98.85%); this reduced concordance was due to genomic regions containing SVs and CNAs. Concordances for CC0996, EC1475, and EC8413 were below 99%.
GEO accession number for the SNP array is GSE43470.
PCR and Sanger sequencing
To validate somatic SNVs and SVs identified from the WGS data, we used PCR to amplify genomic DNA spanning SNVs or SV junctions. In addition, we also amplified SV junction sequences from cDNAs. Target regions spanning SNVs and SVs were amplified by PCR using genomic DNA or cDNA as a template. The primers were designed using Primer3 (http://www.broadinstitute.org/genome_software/other/primer3.html), and they are summarized in Supplementary Tables S7 and S8. PCR products were purified using QiaQuick PCR Purification Kit (Qiagen), and sequenced using ABI BigDye Terminator BDT 3.1 (Applied Biosystems). Sequencing reactions were carried out at 96°C for 10 seconds, 50°C for 5 seconds, and 60°C for 2 minutes for 25 cycles, and the reaction product was analyzed in 3730 XL DNA Analyzer (Applied Biosystems). We carried out validation for a subset of somatic missense mutations in subjects CC0996 and EC0379 using Sanger sequencing. High-quality sequencing data were obtained for 62 mutations from CC0996; 51 (82%) validated and 11 did not. For EC0379, we examined 30 mutations, and 17 (63%) were validated. Mutations called by WGS but not validated by Sanger sequencing often had a low fraction of the mutant allele (below 15% of total sequence reads). The discrepancy between the two methods is partly attributable to the stringent criteria of mutation calling we used for the Sanger sequencing data, which usually requires a minimal of 15% of the mutant allele. Validated mutations are summarized in Supplementary Table S4.
Comparison of somatic mutation call using blood control versus adjacent normal tissue control
In addition to blood DNA controls, we had WGS data for six adjacent normal tissues. We compared somatic variant call using blood DNA as control versus adjacent normal tissue as control. Two tumors with very low somatic mutation rates were excluded from this analysis because they had higher false-positive calls. We found that the concordance between blood DNA controls versus adjacent normal tissue controls ranged from 86% to 95%. The tumors with higher somatic mutation rates also had higher concordance. This is also consistent with the validation data from Sanger sequencing described in the previous section.
Statistical analysis
Whole-genome sequence analysis.
We used cgatools (http://www.completegenomics.com/sequence-data/cgatools/) and custom-built Perl scripts to analyze WGS data. The calldiff and junctiondiff (cgatools) were used to identify somatic SNVs and SVs, respectively; the junctions2events method (cgatools) was used to identify genomic events such as deletions, duplications, inversions, and complex events that underlie observed SV junction sequences. The Circos plots were generated as described (http://circos.ca/).
Ingenuity variant analysis.
For functional interpretation of somatic mutations, we used Ingenuity Variant Analysis (IVA). IVA uses a cascade of filters to identify potential cancer driver mutations; these filters consist of selecting variants based on common variants database (1,000 genome, CGI database, and NHLBI ESP exomes), predicted deleterious effects (literature, SIFT, phyloP, etc.), genetic analysis (gain or loss function, number of alleles, absence of variant in the control), and cancer driver variants [literature, Catalogue of Somatic Mutations and Cancer, and The Cancer Genome Atlas (TCGA)]. (For details, see http://www.ingenuity.com/products/ingenuity_variant_analysis.html). We used the following cascade of filters to identify potential cancer driver mutations: (i) selection of variants based on common variants database (1,000 genome, CGI database, and NHLBI ESP exomes); (ii) selection of variants based on predicted deleterious effects (literature, SIFT, phyloP, etc.). Variants were selected on the basis of phenotypes (pathogenic or unknown significance), gain of function (literature, Ingenuity Knowledge Base, or BSIFT), or loss of function (non-synonymous, damaging by SIFT, splicing sites, or affecting microRNA); (iii) selection of variants based on genetic analysis (gain or loss of function, number of alleles, absence of variant in the control). We excluded variants found in any blood sample. We included homozygous, hemizygous, or compound heterozygous; (iv) selection of variants were based on cancer driver variants.
Miscellaneous analyses.
All statistical analyses and plots were generated using the R environment, and we used shell scripts and Perl scripts for general bioinformatics data processing.
Results
We analyzed 30 genomes from DNAs isolated from two tissues (tumor and blood) from 15 patients, seven with gastric cardia adenocarcinoma (GCA), four with gastric noncardia adenocarcinoma (GNCA), and four with ESCC, by WGS, generated by the Complete Genomics Inc. (CGI). Patient clinical data are described in Supplementary Table S1. Total sequences generated for each genome ranged from 193.7 Gb to 363.4 Gb (median = 321.7 Gb). Details of our WGS data for each genome are described in Supplementary Table S2.
We focused on identification and characterization of somatic changes, that is, changes in tumors that were absent in matched germline (blood) DNA. Figure 1 shows the Circos plots of three types of somatic changes of interest (SNVs, SVs, and CNAs) for the 15 cancer genomes Gastric cancer, in particular GNCA, had more structural alterations than ESCC.
Characterization of somatic single nucleotide substitutions
Figure 2A shows somatic mutation rates per million bases. Mutation rates ranged between 12.7 and 70.9 per million bases (median=33.4) for intergenic regions, between 10.2 and 41.5 (median=21.7) for introns, and between 9.4 and 27.8 (median=17.3) for exons. These mutation rates were similar to those reported for colon cancer (22), pancreatic cancer (23), liver cancer (24) gastric cancer (12, 13), and esophageal cancer (14). Mutation rates were highest in intergenic regions, followed by introns and exons. This is consistent with the role of a transcription-coupled repair process in reducing intragenic mutations. The coding mutation rates ranged from 3.9 to 14.0 per million bases (median=8.6) for missense mutations, and from 0.13 to 0.64 (median=0.36) for nonsense mutations. The mutation rates of synonymous, missense, nonsense, and other changes are summarized in Fig. 2B.
Summary of somatic mutations and mutation spectrum in GNCA, GCA, and ESCC genomes and non-negative matrix factorization analysis of SNV substitution matrix. Somatic SNVs are summarized in Figure 2A and B. The numbers of somatic mutations per million bases in intergenic, intronic, and exonic regions of GCA, GNCA, and ESCC genomes are illustrated with bar graphs. Here, somatic mutations include single nucleotide variations and small indels. A, graph shows mutations in intergenic, intronic, and exonic regions. B, graph shows mutations of various types of amino-acid changes. SNV substitution patterns are summarized in C and D. C, graph shows the numbers of somatic mutations for the six types of base substitutions, summarized on the left. We also considered the context of 5 prime base and 3 prime base; there are 96 possible combinations (4 × 6 × 4). D, the heatmap shows the results for the 15 tumors.
Summary of somatic mutations and mutation spectrum in GNCA, GCA, and ESCC genomes and non-negative matrix factorization analysis of SNV substitution matrix. Somatic SNVs are summarized in Figure 2A and B. The numbers of somatic mutations per million bases in intergenic, intronic, and exonic regions of GCA, GNCA, and ESCC genomes are illustrated with bar graphs. Here, somatic mutations include single nucleotide variations and small indels. A, graph shows mutations in intergenic, intronic, and exonic regions. B, graph shows mutations of various types of amino-acid changes. SNV substitution patterns are summarized in C and D. C, graph shows the numbers of somatic mutations for the six types of base substitutions, summarized on the left. We also considered the context of 5 prime base and 3 prime base; there are 96 possible combinations (4 × 6 × 4). D, the heatmap shows the results for the 15 tumors.
We noted that BC5439, CC1649, CC1730, and EC1475 had much higher mutation rates, exceeding 40 million per Mb in intergenic regions. To investigate a potential mutagenic mechanism for the higher mutation rates, we calculated mutation rates for each type of single nucleotide substitution (Fig. 2C). Transition rates, A>G and C>T, were high as expected. However, we saw high A>C mutation rates in six of the 11 gastric cancers. This increased A>C mutation was also observed in several recent studies involving EAC (14, 16) and gastric cancer (17). We also calculated the nucleotide substitution rates with their dependency on the flanking nucleotide sequences, and displayed the result in a heatmap (Fig. 2D). We found that the A>C mutation showed a preference of nucleotide sequence of A located 5 prime to the mutation (Fig. 2D).
Characterization of cancer driver mutations
We characterized somatic mutations to identify potential cancer driver mutations by analyzing our WGS data using IVA. Our initial preliminary results from IVA analysis identified 200 genes. We further filtered the genes by requiring that they be mutated in at least three tumors. This resulted in a final list of 24 genes. Tumors with mutations in the 24 genes are illustrated in Fig. 3. Half of these genes are well-known cancer genes, including TP53, JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and CHEK2. We also identified multiple potentially novel cancer genes, including KISS1R, AMH, MNX1, WNK2, and PRKRIR. The full list of the 200 genes is summarized in Supplementary Table S5. This list is enriched for genes that affect diseases of the stomach (Supplementary Table S6).
Summary of potential cancer driver mutations. The potential driver mutations were identified using IVA. We identified 24 genes that were frequently mutated in ESCC and gastric cancer. Mutations and affected tumors are shown in the heatmap.
Summary of potential cancer driver mutations. The potential driver mutations were identified using IVA. We identified 24 genes that were frequently mutated in ESCC and gastric cancer. Mutations and affected tumors are shown in the heatmap.
Characterization of DNA structural variations
Compared with exome sequencing, an advantage of WGS is its ability to identify SVs. We focused our analysis on tumor-specific SVs (somatic events), and they are summarized in Fig. 4. Figure 4 depicts the numbers of SVs per tumor summarized with respect to gene location (Fig. 4A), relation to repeats (Fig. 4B), interchromosomal versus intrachromosomal (Fig. 4C), and genetic events (Fig. 4D), and recurrence in multiple tumors (Fig. 4E). Note that here the complex type also contained some interchromosomal SVs. We noted a wide range of SVs across the tumors. Such variation in frequency of SVs among different tumors is consistent with the presence of two types of cancers: high and low genomic instability; an observation we previously reported in another series of ESCC tumors from this same high-risk region evaluated by LOH and CNV (25).
Characterization of somatic SVs in gastric cancer and ESCC genomes. A, bar graph of SVs with respect to whether breakpoints are located with genes or not. The count on x-axis refers to the number of SVs. B, bar graphs of SVs with respect to whether breakpoints are located with repeats or not. C, bar graph of interchromosomal versus intrachromosomal SVs. D, bar graph of SVs classified by the processes that generated these SVs. E, bar graph of recurrent SVs. Sample count refers to the number of tumors affected by the SVs, and gene count refers to the number of genes that showed recurrent SVs for a given number of sample count.
Characterization of somatic SVs in gastric cancer and ESCC genomes. A, bar graph of SVs with respect to whether breakpoints are located with genes or not. The count on x-axis refers to the number of SVs. B, bar graphs of SVs with respect to whether breakpoints are located with repeats or not. C, bar graph of interchromosomal versus intrachromosomal SVs. D, bar graph of SVs classified by the processes that generated these SVs. E, bar graph of recurrent SVs. Sample count refers to the number of tumors affected by the SVs, and gene count refers to the number of genes that showed recurrent SVs for a given number of sample count.
To validate SVs identified from the WGS data, we selected SVs with junction sequences derived from two different genes, but without containing DNA repeats. We successfully PCR-amplified 12 SV junction fragments and sequenced all of them by the Sanger method. All 12 SV junction sequences were confirmed, that is, they are identical to the WGS data (data not shown).
SVs that recur in multiple cases are of greatest interest. The genes across breakpoints are summarized in terms of frequency of tumors affected (Fig. 4E). Those affecting at least five tumors are also shown (Fig. 4E), and there are 14 of those genes with SVs occurring in at least five tumors. The details of SVs (only deletions are shown here that span exonic regions) are shown for MACROD2, FHIT, and PARK2 (Fig. 5). Four tumors deleted exon 6 of MACROD2 (NM_080676), which removed amino acids 140–180 and also caused frame-shift (Fig. 5A). The change likely resulted in a loss-function mutation of MACROD2. Seven tumors deleted exon 5 of FHIT (NM_002012), which removed the first 35 amino acids and also shifted the reading frame (Fig. 5B). The deletion likely inactivated the protein. The locations and sizes of PARK2 deletions varied (Fig. 5C). Two removed exons 3 and 4 (NM_004562), two removed exon 2, one deleted exon 4, and one deleted exons 2–4. The latter three deletions also caused reading frame shift. These deletions likely also inactivated the protein.
IGV views of structural changes of recurrent SVs. IGV views for MACROD2, FHIT, and PARK2. Blue rectangles are exons. Red rectangles are deleted regions, and the number below the box refers to SV id. Tumors are labeled on the left. A, MACROD2; B, FHIT; C, PARK2.
IGV views of structural changes of recurrent SVs. IGV views for MACROD2, FHIT, and PARK2. Blue rectangles are exons. Red rectangles are deleted regions, and the number below the box refers to SV id. Tumors are labeled on the left. A, MACROD2; B, FHIT; C, PARK2.
To seek additional supporting evidence for these SVs, we analyzed 39 tumors from the TCGA gastric cancer dataset that had high coverage WGS data, which is the subset of the 441 STAD tumors with copy number data that will be discussed below. We used BreakDancer to identify SVs. All three genes, MACROD2, FHIT, and PARK2, contained deletions involving coding exons with 16, 9, and 6 tumors having deletions in MACROD2, FHIT, and PARK2, respectively. Since these SVs are caused by deletions, we reviewed the deletion analysis reports from TCGA gastric cancer data performed by the Broad GDAC team, which has 441 STAD tumor samples analyzed. FHIT, MACROD2, and PARK2 were in the 6th, 7th, and 12th most significantly deleted regions (http://gdac.broadinstitute.org/runs/analyses__2014_10_17/reports/cancer/STAD-TP/CopyNumber_Gistic2/nozzle.html).
Discussion
WGS data provide a unique opportunity to combine SNV with large SVs and to perform an integrated analysis. There are three major findings from this study. First, A>C mutations were common in GCA and GNCA. In addition to the 5 prime A noted in previous studies of esophageal adenocarcinoma and gastric adenocarcinoma (14, 16, 17), we found enrichment of 5 prime T, which was weakly associated with ESCC. Second, we identified 24 driver mutations, including a subset that has not been previously reported. Third, we identified recurrent chromosome alterations that occurred in at least 30% of tumors in 14 genes, including CAMK1D, MACROD2, ANKRD30BL, FHIT, KCNB2, and PARK2.
A>C mutation rates are often low (range 2.6%–6.6% of all mutations; ref.23). However, a recent exome sequencing study of esophageal cancer found higher A>C mutation rates in EAC than in ESCC (14). Another recent study of EAC also found a high A>C mutation rate, particularly with the 5 prime A (16). Our study extended this observation to gastric adenocarcinoma. We found high A>C mutations in both GCA and GNCA. In addition, we found the enrichment of A>C mutations with the prime T in ESCC. The presence of A>C mutations in GCA, GNCA, and ESCC suggests the oxidation of guanine as a potential mutagen for gastric cancer and ESCC. In cancer cells under high oxidative stress, 8-oxo-dGTP accumulating in the DNA can result in G>T, which is equivalent to C>A. It is conceivable that a similar mechanism may contribute to the observed high A>C mutation rate.
A major interest of this study was to identify recurrent SVs. We found 14 genes with SVs occurring in at least five tumors (Fig. 4). The most frequent SVs appeared as deletions. Deletions that removed coding sequences were of greatest interest. Three examples of these genes are shown in Fig. 5. These deletions were clustered in small genomic regions. In the case of MACROD2, exon 6 of MACROD2 (NM_080676), spanning amino acids 140–180, was deleted. The deletion also caused a frame-shift. Similarly, exon 5 of FHIT (NM_002012), containing the first 35 amino acids, was deleted in multiple tumors. The mutation also shifted the reading frame. In the case of PARK2, the deletions were more heterogeneous, and varied from deletions involving one to three exons of exons 2, 3, and 4 (NM_004562). These deletions were very frequent, affecting about 50% of gastric cancer samples. They were supported by deletion events observed in gastric tumors from the TCGA dataset. It will be interesting to develop deletion-specific assays to screen a large number of gastric cancers and to associate the deletions with clinical phenotypes.
Our studies provide new insights into understanding the genomic landscape, genome instability, and mutation mechanisms for developing gastric cancer and ESCC. A limitation of the current study is its small sample size. Future studies should include validation of these findings in a larger panel of tumors that are selected from multiple tumor types and use combinations of exome sequencing, RNA-seq analysis, and target-resequencing for both SNVs and SVs.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: N. Hu, C.C. Abnet, N.D. Freedman, S.M. Dawsey, A.M. Goldstein, P.R. Taylor, M.P. Lee
Development of methodology: N. Hu, H. Wu, P.R. Taylor, M.P. Lee
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): N. Hu, M. Kadota, N.D. Freedman, S. Gere, A. Hutchinson, G. Song, T. Ding, Y.-L. Qiao, J. Koshiol, A.M. Goldstein, P.R. Taylor, M.P. Lee
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H. Liu, C.C. Abnet, H.H. Yang, C. Yan, P.R. Taylor, M.P. Lee
Writing, review, and/or revision of the manuscript: N. Hu, C.C. Abnet, H. Su, N.D. Freedman, L. Wang, J. Koshiol, S.M. Dawsey, A.M. Goldstein, P.R. Taylor, M.P. Lee
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): N. Hu, M. Kadota, H. Su, C. Wang, L. Wang, Y. Wang, T. Ding, Y.-L. Qiao, C.G.C. Giffen, P.R. Taylor, M.P. Lee
Study supervision: N. Hu, P.R. Taylor, M.P. Lee
Other (validation of the results): H. Su
Other (validation the results):L. Wang
Grant Support
This work was supported by the Intramural Research Program of the NIH and the National Cancer Institute, Division of Cancer Epidemiology and Genetics (DCEG), and Center for Cancer Research (CCR).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.