Gastric cancer and esophageal cancer are the second and sixth leading causes of cancer-related death worldwide. Multiple genomic alterations underlying gastric cancer and esophageal squamous cell carcinoma (ESCC) have been identified, but the full spectrum of genomic structural variations and mutations have yet to be uncovered. Here, we report the results of whole-genome sequencing of 30 samples comprising tumor and blood from 15 patients, four of whom presented with ESCC, seven with gastric cardia adenocarcinoma (GCA), and four with gastric noncardia adenocarcinoma. Analyses revealed that an A>C mutation was common in GCA, and in addition to the preferential nucleotide sequence of A located 5 prime to the mutation as noted in previous studies, we found enrichment of T in the 5 prime base. The A>C mutations in GCA suggested that oxidation of guanine may be a potential mechanism underlying cancer mutagenesis. Furthermore, we identified genes with mutations in gastric cancer and ESCC, including well-known cancer genes, TP53, JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and CHEK2, and potentially novel cancer-associated genes, KISS1R, AMH, MNX1, WNK2, and PRKRIR. Finally, we identified recurrent chromosome alterations in at least 30% of tumors in genes, including MACROD2, FHIT, and PARK2 that were often intragenic deletions. These structural alterations were validated using the The Cancer Genome Atlas dataset. Our studies provide new insights into understanding the genomic landscape, genome instability, and mutation profile underlying gastric cancer and ESCC development. Cancer Res; 76(7); 1714–23. ©2016 AACR.

Gastric cancer and esophageal cancer cause an estimated 783,000 and 407,000 deaths, respectively, each year, and represent the second and sixth leading causes of cancer-related death worldwide (1). In China, gastric cardia adenocarcinoma (GCA) and esophageal squamous cell carcinoma (ESCC) occur together in the Taihang Mountains of north central China, including Shanxi and Henan Provinces, at some of the highest rates reported for any cancer (2), and historically over 20% of all deaths in this region have been attributed to these diseases (3). However, the cause of the high rates and geographical overlap of these two anatomically adjacent but histologically distinct tumors has not been determined. Gastric cancers in this area occur primarily in the uppermost portion of the stomach and are referred to as GCA, whereas those in the remainder of the stomach are referred to as gastric noncardia adenocarcinoma (GNCA). In addition to being anatomically adjacent, GCA and ESCC share many of the same etiologic risk factors, and before the widespread use of endoscopy and biopsy, they were diagnosed as a single disease referred to as “esophageal cancer” (4). The reason for the high rates of GCA and ESCC in this geographic area and their relation to each other remains unclear, but there are almost certainly common etiologically important environmental exposures, and a recent genome-wide association study of germline DNA found that the same SNPs in the PLCE1 gene had the strongest associations with risk for both GCA and ESCC (5). This led to our concurrent examination of these two cancers plus GNCA in the current study.

Recent advances in next-generation sequencing technology have revolutionized how we study cancer genomes. The identification of IDH1/2 mutations, initially in glioma (6, 7) and more recently in many other cancers such as AML (8), has transformed our understanding of cancer by relating mutations to metabolic control and epigenetic regulation (9). IDH1/2 encodes isocitrate dehydrogenases (IDH), which convert isocitrate to 2-oxoglutarate. But mutant IDHs produce 2-hydroxyglutarate, which inhibits the methyl cytosine hydroxylase TET2 as well as H3K36 demethylases, thus changes the global epigenetic landscape. Whole genome sequencing (WGS) is particularly useful for elucidating complex genomic changes, including translocations, inversions, tandem duplications, and large deletions. The importance of these structural changes has been well documented in the case of BCR-ABL in leukemia, TMPRESS2-ERG in prostate cancer (10), and EML4-ALK in lung cancer (11).

For gastric adenocarcinoma and ESCC, several publications have reported genomic scale analyses of cancers using exome or WGS technology. Wang and colleagues performed exome sequencing of 22 gastric cancer samples and identified frequent mutations in ARID1A (12). Mutations in ARID1A were particularly high in gastric cancers with microsatellite instability (MSI; 83%) or with Epstein-Barr virus (EBV) infection (73%). Exome sequencing of 15 gastric adenocarcinomas and their matched normal DNAs by Zang and colleagues also identified frequent mutations of ARID1A (13). In addition, they found 5% of gastric cancer contained FAT4 mutations. A study by Agrawal and colleagues reported exomic sequencing of 11 esophageal adenocarcinomas (EAC) and 12 ESCCs and found frequent NOTCH1 mutations in ESCC (14). Nagarajan and colleagues performed WGS analysis for two gastric cancer samples and found three mutational signatures (15). Dulak and colleagues did exome sequencing for 149 EAC tumor-normal pairs and WGS for 15 EACs and matched normals. They found a high prevalence of A>C transversions at AA dinucleotides (16). A recent study by Wang and colleagues analyzed 100 tumor–normal pairs of gastric cancer with WGS and identified MUC6, CTNNA2, GLI3, RNF43, and RHOA as significantly mutated driver genes. They found that RHOA mutations are specific for diffuse-type tumors (17). Recently, the International Cancer Genome Consortium research team published a study involving WGS of 17 ESCC cases and exome sequencing of 71 ESCC cases, which identified two novel cancer driver genes, ADAM29 and FAM135B (18). Lin and colleagues performed exome sequencing on 20 ESCC cases and targeted sequencing of 139 ESCC cases and identified several mutated genes that were previously unknown, including FAT1, FAT2, ZNF750, and KMT2D (19), and Gao and colleagues did exome sequencing on 113 ESCC cases and noted that histone modifier genes were frequently mutated, including KMT2D, KMT2C, KDM6A, EP300, and CREBBP. (20). Zhang and colleagues recently reported exome sequencing of 90 ESCC and WGS of 14 ESCCs and found that an APOBEC-mediated mutational signature in 47% of tumors (21).

Despite this progress, the full spectrum of genomic alterations in gastric cancer and ESCCs, particularly genomic structural variations (SV) and intergenic and intronic mutations, remains to be characterized. To discover and characterize genomic alterations in GCA, GNCA, and ESCC, we analyzed whole-genome sequences of tumors and matched blood DNA samples generated by Complete Genomics Inc. (CGI). We present here our findings of novel mutation substitution pattern, driver mutations, and recurrent SVs. Complete characterization of the genomic landscape of these cancers will hopefully provide new strategies for early diagnosis and therapy for these deadly diseases.

Study population

This study was approved by the Institutional Review Boards of the Shanxi Cancer Hospital, the Cancer Institute and Hospital of the Chinese Academy of Medical Sciences (CICAMS, Shanxi, PR China), and the U.S. National Cancer Institute (NCI, Bethesda, MD). Fifteen cases were analyzed with WGS. These cases came from a larger study sample and were selected on the basis of high-quality, sufficiently large amount of DNA (requiring at least 10 μg) available for WGS, and patients being deceased. Three cases with ESCC, seven cases with GCA, and four cases with GNCA diagnosed between 1998 and 2001 in the Shanxi Cancer Hospital in Taiyuan, Shanxi Province, PR China, were recruited to participate in this study. One ESCC case from Yaocun Commune Hospital in Linxian, Henan Province was also recruited. None of the cases had therapy before their surgical resection. After obtaining informed consent, cases were interviewed to obtain information on demographics, cancer risk factors (e.g., detailed family history of cancer), and clinical information. Only deceased cases were selected for study. Clinical data are described in Supplementary Table S1.

Biological specimen collection and processing

Venous blood (10 mL) was taken from each case before surgery and germline DNA from whole blood was extracted and purified using the standard phenol/chloroform method. Tumors obtained during surgery were snap-frozen in liquid nitrogen and stored at −130°C until used. The specimens were chosen for this study based on two criteria: (i) histologic diagnosis of ESCC or gastric cancer confirmed by pathologists at the Shanxi Cancer Hospital or CICAMS, and the NCI; (ii) availability of high purity tumor tissue (at least >75%).

Tissue DNA isolation

DNA from frozen tumors was extracted using AllPrep DNA/RNA/Protein Mini Kit (Qiagen, Inc.). DNA was dissolved in 100 μL Buffer BE. Concentrations of DNA were measured with the NanoDrop 2000 Spectrophotomer (Thermo Fisher Scientific) according to the manufacturer's instructions. DNAs were run on 0.7% agarose gel (UltraPure Agarose Powder, Invitrogen) in 1× TAE buffer to identify high-molecular weight genomic DNA (>20 Kb single band) and photographed using AlphaImager EC system (Biosciences, Inc.). DNA was quantified using Quant-iT PicoGreen Kit (Invitrogen).

Whole genome sequencing

Of note, 10 μg DNA was sent to CGI for WGS. CGI delivered data for SVs, DNA copy number variations, and single nucleotide variations (SNV). A detailed description of WGS data can be found in the company's website http://www.completegenomics.com/. Summary of the WGS data is described in Supplementary Table S2. We focused on somatic alterations, that is, tumor-specific changes present in tumor but absent in blood. We used cgatools, CallDiff, and JunctionsDiff to generate somatic SNVs and somatic SVs, respectively. Somatic copy number alterations (CNA) data came from CGI reports. These three types of somatic changes for each patient are shown in Circos plots in Fig. 1. 

Figure 1.

Circos plots of somatic single nucleotide variants, copy number alterations, and structural variations in the 15 cancer genomes. The inner ring displays SVs: black for intrachromosomal SVs and red for interchromosomal SVs. The second ring next to SVs is CNA, shown in gray. The third ring is SNV, shown in green. The outside ring is the chromosome ideogram. The sample description can be found in Supplementary Table S1.

Figure 1.

Circos plots of somatic single nucleotide variants, copy number alterations, and structural variations in the 15 cancer genomes. The inner ring displays SVs: black for intrachromosomal SVs and red for interchromosomal SVs. The second ring next to SVs is CNA, shown in gray. The third ring is SNV, shown in green. The outside ring is the chromosome ideogram. The sample description can be found in Supplementary Table S1.

Close modal

Genotyping with the Affymetrix SNP5.0 array

We performed genotyping experiments for 12 samples using the Affymetrix GeneChip Human Mapping 5.0 arrays. Briefly, 250 ng of DNA was digested with Nsp I or Sty I, ligated to an adaptor, and amplified by PCR. PCR products were processed for fragmentation and labeling; labeled DNA was hybridized onto the chips. The chip was scanned with the Affymetrix GeneChip Scanner 7G Plus using Affymetrix GeneChip Command Console software, and the data files were automatically generated. Genotype calls were generated by Genotyping Console v4.1 software (Affymetrix). SNP calls generated by the SNP array and WGS agreed very well, with a median of 99.65% of the time across all 12 genomes (Supplementary Table S3). We noted that agreement was slightly lower in tumors (median 98.85%); this reduced concordance was due to genomic regions containing SVs and CNAs. Concordances for CC0996, EC1475, and EC8413 were below 99%.

GEO accession number for the SNP array is GSE43470.

PCR and Sanger sequencing

To validate somatic SNVs and SVs identified from the WGS data, we used PCR to amplify genomic DNA spanning SNVs or SV junctions. In addition, we also amplified SV junction sequences from cDNAs. Target regions spanning SNVs and SVs were amplified by PCR using genomic DNA or cDNA as a template. The primers were designed using Primer3 (http://www.broadinstitute.org/genome_software/other/primer3.html), and they are summarized in Supplementary Tables S7 and S8. PCR products were purified using QiaQuick PCR Purification Kit (Qiagen), and sequenced using ABI BigDye Terminator BDT 3.1 (Applied Biosystems). Sequencing reactions were carried out at 96°C for 10 seconds, 50°C for 5 seconds, and 60°C for 2 minutes for 25 cycles, and the reaction product was analyzed in 3730 XL DNA Analyzer (Applied Biosystems). We carried out validation for a subset of somatic missense mutations in subjects CC0996 and EC0379 using Sanger sequencing. High-quality sequencing data were obtained for 62 mutations from CC0996; 51 (82%) validated and 11 did not. For EC0379, we examined 30 mutations, and 17 (63%) were validated. Mutations called by WGS but not validated by Sanger sequencing often had a low fraction of the mutant allele (below 15% of total sequence reads). The discrepancy between the two methods is partly attributable to the stringent criteria of mutation calling we used for the Sanger sequencing data, which usually requires a minimal of 15% of the mutant allele. Validated mutations are summarized in Supplementary Table S4.

Comparison of somatic mutation call using blood control versus adjacent normal tissue control

In addition to blood DNA controls, we had WGS data for six adjacent normal tissues. We compared somatic variant call using blood DNA as control versus adjacent normal tissue as control. Two tumors with very low somatic mutation rates were excluded from this analysis because they had higher false-positive calls. We found that the concordance between blood DNA controls versus adjacent normal tissue controls ranged from 86% to 95%. The tumors with higher somatic mutation rates also had higher concordance. This is also consistent with the validation data from Sanger sequencing described in the previous section.

Statistical analysis

Whole-genome sequence analysis.

We used cgatools (http://www.completegenomics.com/sequence-data/cgatools/) and custom-built Perl scripts to analyze WGS data. The calldiff and junctiondiff (cgatools) were used to identify somatic SNVs and SVs, respectively; the junctions2events method (cgatools) was used to identify genomic events such as deletions, duplications, inversions, and complex events that underlie observed SV junction sequences. The Circos plots were generated as described (http://circos.ca/).

Ingenuity variant analysis.

For functional interpretation of somatic mutations, we used Ingenuity Variant Analysis (IVA). IVA uses a cascade of filters to identify potential cancer driver mutations; these filters consist of selecting variants based on common variants database (1,000 genome, CGI database, and NHLBI ESP exomes), predicted deleterious effects (literature, SIFT, phyloP, etc.), genetic analysis (gain or loss function, number of alleles, absence of variant in the control), and cancer driver variants [literature, Catalogue of Somatic Mutations and Cancer, and The Cancer Genome Atlas (TCGA)]. (For details, see http://www.ingenuity.com/products/ingenuity_variant_analysis.html). We used the following cascade of filters to identify potential cancer driver mutations: (i) selection of variants based on common variants database (1,000 genome, CGI database, and NHLBI ESP exomes); (ii) selection of variants based on predicted deleterious effects (literature, SIFT, phyloP, etc.). Variants were selected on the basis of phenotypes (pathogenic or unknown significance), gain of function (literature, Ingenuity Knowledge Base, or BSIFT), or loss of function (non-synonymous, damaging by SIFT, splicing sites, or affecting microRNA); (iii) selection of variants based on genetic analysis (gain or loss of function, number of alleles, absence of variant in the control). We excluded variants found in any blood sample. We included homozygous, hemizygous, or compound heterozygous; (iv) selection of variants were based on cancer driver variants.

Miscellaneous analyses.

All statistical analyses and plots were generated using the R environment, and we used shell scripts and Perl scripts for general bioinformatics data processing.

We analyzed 30 genomes from DNAs isolated from two tissues (tumor and blood) from 15 patients, seven with gastric cardia adenocarcinoma (GCA), four with gastric noncardia adenocarcinoma (GNCA), and four with ESCC, by WGS, generated by the Complete Genomics Inc. (CGI). Patient clinical data are described in Supplementary Table S1. Total sequences generated for each genome ranged from 193.7 Gb to 363.4 Gb (median = 321.7 Gb). Details of our WGS data for each genome are described in Supplementary Table S2.

We focused on identification and characterization of somatic changes, that is, changes in tumors that were absent in matched germline (blood) DNA. Figure 1 shows the Circos plots of three types of somatic changes of interest (SNVs, SVs, and CNAs) for the 15 cancer genomes Gastric cancer, in particular GNCA, had more structural alterations than ESCC.

Characterization of somatic single nucleotide substitutions

Figure 2A shows somatic mutation rates per million bases. Mutation rates ranged between 12.7 and 70.9 per million bases (median=33.4) for intergenic regions, between 10.2 and 41.5 (median=21.7) for introns, and between 9.4 and 27.8 (median=17.3) for exons. These mutation rates were similar to those reported for colon cancer (22), pancreatic cancer (23), liver cancer (24) gastric cancer (12, 13), and esophageal cancer (14). Mutation rates were highest in intergenic regions, followed by introns and exons. This is consistent with the role of a transcription-coupled repair process in reducing intragenic mutations. The coding mutation rates ranged from 3.9 to 14.0 per million bases (median=8.6) for missense mutations, and from 0.13 to 0.64 (median=0.36) for nonsense mutations. The mutation rates of synonymous, missense, nonsense, and other changes are summarized in Fig. 2B.

Figure 2.

Summary of somatic mutations and mutation spectrum in GNCA, GCA, and ESCC genomes and non-negative matrix factorization analysis of SNV substitution matrix. Somatic SNVs are summarized in Figure 2A and B. The numbers of somatic mutations per million bases in intergenic, intronic, and exonic regions of GCA, GNCA, and ESCC genomes are illustrated with bar graphs. Here, somatic mutations include single nucleotide variations and small indels. A, graph shows mutations in intergenic, intronic, and exonic regions. B, graph shows mutations of various types of amino-acid changes. SNV substitution patterns are summarized in C and D. C, graph shows the numbers of somatic mutations for the six types of base substitutions, summarized on the left. We also considered the context of 5 prime base and 3 prime base; there are 96 possible combinations (4 × 6 × 4). D, the heatmap shows the results for the 15 tumors.

Figure 2.

Summary of somatic mutations and mutation spectrum in GNCA, GCA, and ESCC genomes and non-negative matrix factorization analysis of SNV substitution matrix. Somatic SNVs are summarized in Figure 2A and B. The numbers of somatic mutations per million bases in intergenic, intronic, and exonic regions of GCA, GNCA, and ESCC genomes are illustrated with bar graphs. Here, somatic mutations include single nucleotide variations and small indels. A, graph shows mutations in intergenic, intronic, and exonic regions. B, graph shows mutations of various types of amino-acid changes. SNV substitution patterns are summarized in C and D. C, graph shows the numbers of somatic mutations for the six types of base substitutions, summarized on the left. We also considered the context of 5 prime base and 3 prime base; there are 96 possible combinations (4 × 6 × 4). D, the heatmap shows the results for the 15 tumors.

Close modal

We noted that BC5439, CC1649, CC1730, and EC1475 had much higher mutation rates, exceeding 40 million per Mb in intergenic regions. To investigate a potential mutagenic mechanism for the higher mutation rates, we calculated mutation rates for each type of single nucleotide substitution (Fig. 2C). Transition rates, A>G and C>T, were high as expected. However, we saw high A>C mutation rates in six of the 11 gastric cancers. This increased A>C mutation was also observed in several recent studies involving EAC (14, 16) and gastric cancer (17). We also calculated the nucleotide substitution rates with their dependency on the flanking nucleotide sequences, and displayed the result in a heatmap (Fig. 2D). We found that the A>C mutation showed a preference of nucleotide sequence of A located 5 prime to the mutation (Fig. 2D).

Characterization of cancer driver mutations

We characterized somatic mutations to identify potential cancer driver mutations by analyzing our WGS data using IVA. Our initial preliminary results from IVA analysis identified 200 genes. We further filtered the genes by requiring that they be mutated in at least three tumors. This resulted in a final list of 24 genes. Tumors with mutations in the 24 genes are illustrated in Fig. 3. Half of these genes are well-known cancer genes, including TP53, JAK3, BRCA2, FGF2, FBXW7, MSH3, PTCH, NF1, ERBB2, and CHEK2. We also identified multiple potentially novel cancer genes, including KISS1R, AMH, MNX1, WNK2, and PRKRIR. The full list of the 200 genes is summarized in Supplementary Table S5. This list is enriched for genes that affect diseases of the stomach (Supplementary Table S6).

Figure 3.

Summary of potential cancer driver mutations. The potential driver mutations were identified using IVA. We identified 24 genes that were frequently mutated in ESCC and gastric cancer. Mutations and affected tumors are shown in the heatmap.

Figure 3.

Summary of potential cancer driver mutations. The potential driver mutations were identified using IVA. We identified 24 genes that were frequently mutated in ESCC and gastric cancer. Mutations and affected tumors are shown in the heatmap.

Close modal

Characterization of DNA structural variations

Compared with exome sequencing, an advantage of WGS is its ability to identify SVs. We focused our analysis on tumor-specific SVs (somatic events), and they are summarized in Fig. 4. Figure 4 depicts the numbers of SVs per tumor summarized with respect to gene location (Fig. 4A), relation to repeats (Fig. 4B), interchromosomal versus intrachromosomal (Fig. 4C), and genetic events (Fig. 4D), and recurrence in multiple tumors (Fig. 4E). Note that here the complex type also contained some interchromosomal SVs. We noted a wide range of SVs across the tumors. Such variation in frequency of SVs among different tumors is consistent with the presence of two types of cancers: high and low genomic instability; an observation we previously reported in another series of ESCC tumors from this same high-risk region evaluated by LOH and CNV (25).

Figure 4.

Characterization of somatic SVs in gastric cancer and ESCC genomes. A, bar graph of SVs with respect to whether breakpoints are located with genes or not. The count on x-axis refers to the number of SVs. B, bar graphs of SVs with respect to whether breakpoints are located with repeats or not. C, bar graph of interchromosomal versus intrachromosomal SVs. D, bar graph of SVs classified by the processes that generated these SVs. E, bar graph of recurrent SVs. Sample count refers to the number of tumors affected by the SVs, and gene count refers to the number of genes that showed recurrent SVs for a given number of sample count.

Figure 4.

Characterization of somatic SVs in gastric cancer and ESCC genomes. A, bar graph of SVs with respect to whether breakpoints are located with genes or not. The count on x-axis refers to the number of SVs. B, bar graphs of SVs with respect to whether breakpoints are located with repeats or not. C, bar graph of interchromosomal versus intrachromosomal SVs. D, bar graph of SVs classified by the processes that generated these SVs. E, bar graph of recurrent SVs. Sample count refers to the number of tumors affected by the SVs, and gene count refers to the number of genes that showed recurrent SVs for a given number of sample count.

Close modal

To validate SVs identified from the WGS data, we selected SVs with junction sequences derived from two different genes, but without containing DNA repeats. We successfully PCR-amplified 12 SV junction fragments and sequenced all of them by the Sanger method. All 12 SV junction sequences were confirmed, that is, they are identical to the WGS data (data not shown).

SVs that recur in multiple cases are of greatest interest. The genes across breakpoints are summarized in terms of frequency of tumors affected (Fig. 4E). Those affecting at least five tumors are also shown (Fig. 4E), and there are 14 of those genes with SVs occurring in at least five tumors. The details of SVs (only deletions are shown here that span exonic regions) are shown for MACROD2, FHIT, and PARK2 (Fig. 5). Four tumors deleted exon 6 of MACROD2 (NM_080676), which removed amino acids 140–180 and also caused frame-shift (Fig. 5A). The change likely resulted in a loss-function mutation of MACROD2. Seven tumors deleted exon 5 of FHIT (NM_002012), which removed the first 35 amino acids and also shifted the reading frame (Fig. 5B). The deletion likely inactivated the protein. The locations and sizes of PARK2 deletions varied (Fig. 5C). Two removed exons 3 and 4 (NM_004562), two removed exon 2, one deleted exon 4, and one deleted exons 2–4. The latter three deletions also caused reading frame shift. These deletions likely also inactivated the protein.

Figure 5.

IGV views of structural changes of recurrent SVs. IGV views for MACROD2, FHIT, and PARK2. Blue rectangles are exons. Red rectangles are deleted regions, and the number below the box refers to SV id. Tumors are labeled on the left. A, MACROD2; B, FHIT; C, PARK2.

Figure 5.

IGV views of structural changes of recurrent SVs. IGV views for MACROD2, FHIT, and PARK2. Blue rectangles are exons. Red rectangles are deleted regions, and the number below the box refers to SV id. Tumors are labeled on the left. A, MACROD2; B, FHIT; C, PARK2.

Close modal

To seek additional supporting evidence for these SVs, we analyzed 39 tumors from the TCGA gastric cancer dataset that had high coverage WGS data, which is the subset of the 441 STAD tumors with copy number data that will be discussed below. We used BreakDancer to identify SVs. All three genes, MACROD2, FHIT, and PARK2, contained deletions involving coding exons with 16, 9, and 6 tumors having deletions in MACROD2, FHIT, and PARK2, respectively. Since these SVs are caused by deletions, we reviewed the deletion analysis reports from TCGA gastric cancer data performed by the Broad GDAC team, which has 441 STAD tumor samples analyzed. FHIT, MACROD2, and PARK2 were in the 6th, 7th, and 12th most significantly deleted regions (http://gdac.broadinstitute.org/runs/analyses__2014_10_17/reports/cancer/STAD-TP/CopyNumber_Gistic2/nozzle.html).

WGS data provide a unique opportunity to combine SNV with large SVs and to perform an integrated analysis. There are three major findings from this study. First, A>C mutations were common in GCA and GNCA. In addition to the 5 prime A noted in previous studies of esophageal adenocarcinoma and gastric adenocarcinoma (14, 16, 17), we found enrichment of 5 prime T, which was weakly associated with ESCC. Second, we identified 24 driver mutations, including a subset that has not been previously reported. Third, we identified recurrent chromosome alterations that occurred in at least 30% of tumors in 14 genes, including CAMK1D, MACROD2, ANKRD30BL, FHIT, KCNB2, and PARK2.

A>C mutation rates are often low (range 2.6%–6.6% of all mutations; ref.23). However, a recent exome sequencing study of esophageal cancer found higher A>C mutation rates in EAC than in ESCC (14). Another recent study of EAC also found a high A>C mutation rate, particularly with the 5 prime A (16). Our study extended this observation to gastric adenocarcinoma. We found high A>C mutations in both GCA and GNCA. In addition, we found the enrichment of A>C mutations with the prime T in ESCC. The presence of A>C mutations in GCA, GNCA, and ESCC suggests the oxidation of guanine as a potential mutagen for gastric cancer and ESCC. In cancer cells under high oxidative stress, 8-oxo-dGTP accumulating in the DNA can result in G>T, which is equivalent to C>A. It is conceivable that a similar mechanism may contribute to the observed high A>C mutation rate.

A major interest of this study was to identify recurrent SVs. We found 14 genes with SVs occurring in at least five tumors (Fig. 4). The most frequent SVs appeared as deletions. Deletions that removed coding sequences were of greatest interest. Three examples of these genes are shown in Fig. 5. These deletions were clustered in small genomic regions. In the case of MACROD2, exon 6 of MACROD2 (NM_080676), spanning amino acids 140–180, was deleted. The deletion also caused a frame-shift. Similarly, exon 5 of FHIT (NM_002012), containing the first 35 amino acids, was deleted in multiple tumors. The mutation also shifted the reading frame. In the case of PARK2, the deletions were more heterogeneous, and varied from deletions involving one to three exons of exons 2, 3, and 4 (NM_004562). These deletions were very frequent, affecting about 50% of gastric cancer samples. They were supported by deletion events observed in gastric tumors from the TCGA dataset. It will be interesting to develop deletion-specific assays to screen a large number of gastric cancers and to associate the deletions with clinical phenotypes.

Our studies provide new insights into understanding the genomic landscape, genome instability, and mutation mechanisms for developing gastric cancer and ESCC. A limitation of the current study is its small sample size. Future studies should include validation of these findings in a larger panel of tumors that are selected from multiple tumor types and use combinations of exome sequencing, RNA-seq analysis, and target-resequencing for both SNVs and SVs.

No potential conflicts of interest were disclosed.

Conception and design: N. Hu, C.C. Abnet, N.D. Freedman, S.M. Dawsey, A.M. Goldstein, P.R. Taylor, M.P. Lee

Development of methodology: N. Hu, H. Wu, P.R. Taylor, M.P. Lee

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): N. Hu, M. Kadota, N.D. Freedman, S. Gere, A. Hutchinson, G. Song, T. Ding, Y.-L. Qiao, J. Koshiol, A.M. Goldstein, P.R. Taylor, M.P. Lee

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H. Liu, C.C. Abnet, H.H. Yang, C. Yan, P.R. Taylor, M.P. Lee

Writing, review, and/or revision of the manuscript: N. Hu, C.C. Abnet, H. Su, N.D. Freedman, L. Wang, J. Koshiol, S.M. Dawsey, A.M. Goldstein, P.R. Taylor, M.P. Lee

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): N. Hu, M. Kadota, H. Su, C. Wang, L. Wang, Y. Wang, T. Ding, Y.-L. Qiao, C.G.C. Giffen, P.R. Taylor, M.P. Lee

Study supervision: N. Hu, P.R. Taylor, M.P. Lee

Other (validation of the results): H. Su

Other (validation the results):L. Wang

This work was supported by the Intramural Research Program of the NIH and the National Cancer Institute, Division of Cancer Epidemiology and Genetics (DCEG), and Center for Cancer Research (CCR).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Ferlay
J
,
Shin
HR
,
Bray
F
,
Forman
D
,
Mathers
C
,
Parkin
DM
. 
Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008
.
Int J Cancer
2010
;
127
:
2893
917
.
2.
Ke
L
. 
Mortality and incidence trends from esophagus cancer in selected geographic areas of China circa 1970–90
.
Int J Cancer
2002
;
102
:
271
4
.
3.
Li
JY
. 
Epidemiology of esophageal cancer in China
.
Natl Cancer Inst Monogr
1982
;
62
:
113
20
.
4.
Liu
SF
,
Shen
Q
,
Dawsey
SM
,
Wang
GQ
,
Nieberg
RK
,
Wang
ZY
, et al
Esophageal balloon cytology and subsequent risk of esophageal and gastric-cardia cancer in a high-risk Chinese population
.
Int J Cancer
1994
;
57
:
775
80
.
5.
Abnet
CC
,
Freedman
ND
,
Hu
N
,
Wang
Z
,
Yu
K
,
Shu
XO
, et al
A shared susceptibility locus in PLCE1 at 10q23 for gastric adenocarcinoma and esophageal squamous cell carcinoma
.
Nat Genet
2010
;
42
:
764
7
.
6.
Parsons
DW
,
Jones
S
,
Zhang
X
,
Lin
JC
,
Leary
RJ
,
Angenendt
P
, et al
An integrated genomic analysis of human glioblastoma multiforme
.
Science
2008
;
321
:
1807
12
.
7.
Yan
H
,
Parsons
DW
,
Jin
G
,
McLendon
R
,
Rasheed
BA
,
Yuan
W
, et al
IDH1 and IDH2 mutations in gliomas
.
N Engl J Med
2009
;
360
:
765
73
.
8.
Mardis
ER
,
Ding
L
,
Dooling
DJ
,
Larson
DE
,
McLellan
MD
,
Chen
K
, et al
Recurring mutations found by sequencing an acute myeloid leukemia genome
.
N Engl J Med
2009
;
361
:
1058
66
.
9.
Noushmehr
H
,
Weisenberger
DJ
,
Diefes
K
,
Phillips
HS
,
Pujara
K
,
Berman
BP
, et al
Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma
.
Cancer Cell
2010
;
17
:
510
22
.
10.
Tomlins
SA
,
Rhodes
DR
,
Perner
S
,
Dhanasekaran
SM
,
Mehra
R
,
Sun
XW
, et al
Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer
.
Science
2005
;
310
:
644
8
.
11.
Soda
M
,
Choi
YL
,
Enomoto
M
,
Takada
S
,
Yamashita
Y
,
Ishikawa
S
, et al
Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer
.
Nature
2007
;
448
:
561
6
.
12.
Wang
K
,
Kan
J
,
Yuen
ST
,
Shi
ST
,
Chu
KM
,
Law
S
, et al
Exome sequencing identifies frequent mutation of ARID1A in molecular subtypes of gastric cancer
.
Nat Genet
2011
;
43
:
1219
23
.
13.
Zang
ZJ
,
Cutcutache
I
,
Poon
SL
,
Zhang
SL
,
McPherson
JR
,
Tao
J
, et al
Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes
.
Nat Genet
2012
;
44
:
570
4
.
14.
Agrawal
N
,
Jiao
Y
,
Bettegowda
C
,
Hutfless
SM
,
Wang
Y
,
David
S
, et al
Comparative genomic analysis of esophageal adenocarcinoma and squamous cell carcinoma
.
Cancer Discov
2012
;
2
:
899
905
.
15.
Nagarajan
N
,
Bertrand
D
,
Hillmer
AM
,
Zang
ZJ
,
Yao
F
,
Jacques
PE
, et al
Whole-genome reconstruction and mutational signatures in gastric cancer
.
Genome Biol
2012
;
13
:
R115
.
16.
Dulak
AM
,
Stojanov
P
,
Peng
S
,
Lawrence
MS
,
Fox
C
,
Stewart
C
, et al
Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity
.
Nat Genet
2013
;
45
:
478
86
.
17.
Wang
K
,
Yuen
ST
,
Xu
J
,
Lee
SP
,
Yan
HH
,
Shi
ST
, et al
Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer
.
Nat Genet
2014
;
46
:
573
82
.
18.
Song
Y
,
Li
L
,
Ou
Y
,
Gao
Z
,
Li
E
,
Li
X
, et al
Identification of genomic alterations in oesophageal squamous cell cancer
.
Nature
2014
;
509
:
91
5
.
19.
Lin
DC
,
Hao
JJ
,
Nagata
Y
,
Xu
L
,
Shang
L
,
Meng
X
, et al
Genomic and molecular characterization of esophageal squamous cell carcinoma
.
Nat Genet
2014
;
46
:
467
73
.
20.
Gao
YB
,
Chen
ZL
,
Li
JG
,
Hu
XD
,
Shi
XJ
,
Sun
ZM
, et al
Genetic landscape of esophageal squamous cell carcinoma
.
Nat Genet
2014
;
46
:
1097
102
.
21.
Zhang
L
,
Zhou
Y
,
Cheng
C
,
Cui
H
,
Cheng
L
,
Kong
P
, et al
Genomic analyses reveal mutational signatures and frequently altered genes in esophageal squamous cell carcinoma
.
Am J Hum Genet
2015
;
96
:
597
611
.
22.
Bass
AJ
,
Lawrence
MS
,
Brace
LE
,
Ramos
AH
,
Drier
Y
,
Cibulskis
K
, et al
Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion
.
Nat Genet
2011
;
43
:
964
8
.
23.
Jones
S
,
Zhang
X
,
Parsons
DW
,
Lin
JC
,
Leary
RJ
,
Angenendt
P
, et al
Core signaling pathways in human pancreatic cancers revealed by global genomic analyses
.
Science
2008
;
321
:
1801
6
.
24.
Li
M
,
Zhao
H
,
Zhang
X
,
Wood
LD
,
Anders
RA
,
Choti
MA
, et al
Inactivating mutations of the chromatin remodeling gene ARID2 in hepatocellular carcinoma
.
Nat Genet
2011
;
43
:
828
9
.
25.
Hu
N
,
Wang
C
,
Ng
D
,
Clifford
R
,
Yang
HH
,
Tang
ZZ
, et al
Genomic characterization of esophageal squamous cell carcinoma from a high-risk population in China
.
Cancer Res
2009
;
69
:
5908
17
.