Abstract
Purpose: Somatic mutations occur at early stages of adenoma and accumulate throughout colorectal cancer progression. The aim of this study was to characterize the mutational landscape of stage II tumors and to search for novel recurrent mutations likely implicated in colorectal cancer tumorigenesis.
Experimental Design: The exomic DNA of 42 stage II, microsatellite-stable colon tumors and their paired mucosae were sequenced. Other molecular data available in the discovery dataset [gene expression, methylation, and copy number variations (CNV)] were used to further characterize these tumors. Additional datasets comprising 553 colorectal cancer samples were used to validate the discovered mutations.
Results: As a result, 4,886 somatic single-nucleotide variants (SNV) were found. Almost all SNVs were private changes, with few mutations shared by more than one tumor, thus revealing tumor-specific mutational landscapes. Nevertheless, these diverse mutations converged into common cellular pathways, such as cell cycle or apoptosis. Among this mutational heterogeneity, variants resulting in early stop codons in the AMER1 (also known as FAM123B or WTX) gene emerged as recurrent mutations in colorectal cancer. Losses of AMER1 by other mechanisms apart from mutations such as methylation and copy number aberrations were also found. Tumors lacking this tumor suppressor gene exhibited a mesenchymal phenotype characterized by inhibition of the canonical Wnt pathway.
Conclusions: In silico and experimental validation in independent datasets confirmed the existence of functional mutations in AMER1 in approximately 10% of analyzed colorectal cancer tumors. Moreover, these tumors exhibited a characteristic phenotype. Clin Cancer Res; 21(20); 4709–18. ©2015 AACR.
Exome sequencing analysis in colorectal cancer samples reveals that variants resulting in stop codons in AMER1 (also known as FAM123B or WTX) gene appeared in approximately 10% of the analyzed tumors. Moreover, although less commonly, AMER1 function may also be lost by other mechanisms different from mutations such as promoter hypermethylation and chromosome deletions. The subset of tumors lacking AMER1 expression showed Wnt pathway inhibition and, regarding molecular subtyping, could belong to type-C tumors. These results may enlighten about the mechanisms of carcinogenesis and biomarker discovery in those patients lacking AMER1.
Introduction
Colorectal cancer is the third most common cancer and the second leading cause of cancer death in the world (1). The classic adenoma-to-carcinoma model postulates that colorectal cancer tumorigenesis proceeds through a progressive accumulation of genetic alterations in oncogenes and tumor suppressors genes (2). However, colorectal cancer is currently considered a heterogeneous disease. While tumors fitting into the classic progression model (or chromosomal instability model, CIN) are the most frequent, other tumor phenotypes have been described, such as microsatellite instability (MSI) and CpG island methylator phenotypes (CIMP; ref. 3). Recent studies based on high-throughput technologies have addressed the issue of colorectal cancer molecular complexity, revealing high level of heterogeneity among tumors (4).
Among other biologic mechanisms, it is widely accepted that somatic mutations lead to tumor development in colorectal cancer. It is postulated that most mutations within a tumor are undamaging byproducts of tumorigenesis (passenger mutations) whereas only a few are responsible for driving the initiation and progression of the tumor (driver mutations; ref. 5). In colorectal cancer, a number of mutations have been proposed as drivers, such as those in the KRAS and BRAF oncogenes, or in the tumor suppressor genes APC and TP53 (6). However, the seminal study by Wood and colleagues revealed that the mutational landscapes of colorectal cancer genomes are composed of a few frequently mutated genes across patients, “mountains,” but are dominated by a much larger number of infrequently mutated genes, “hills” (7). Although still controversial, these rarely mutated genes may also contribute to tumor development, thus accounting for intertumor variability (8).
Next-generation sequencing technologies have revolutionized cancer genomics research by providing fast and accurate information about individual tumors, bringing us closer to personalized medicine (9). It has been reported that approximately 85% of cancer-associated mutations are located in protein-coding regions (10). In consequence, exome sequencing has been revealed as a useful technique for mutation discovery in cancer tissues. Indeed, several studies have successfully described the mutational background of different types of tumors by using this approach (11, 12). Here, we have performed an exome sequencing analysis aimed to explore the somatic genomic landscape of microsatellite-stable (MSS) stage II colorectal tumors.
Materials and Methods
Patients and samples
This study included a subset of 42 paired adjacent normal and tumor tissues (84 samples) from a previously described set of 100 patients with colon cancer diagnosed at stage II and MSS tumors (ref. 13; colonomics project, CLX-: www.colonomics.org; NCBI BioProject PRJNA188510; Supplementary Table S1). All patients were recruited at the Bellvitge University Hospital (Barcelona, Spain). Written informed consent was obtained from all patients and the Institution's Ethics Committee approved the protocol. Prior to DNA extraction, purity of the sample was assessed by a pathologist to ensure that at least 80% was tumoral. DNA was extracted using a standard phenol–chloroform protocol. To ensure that adjacent and tumor tissues were paired, dynamic arrays were used to genotype 13 SNPs in the 84 samples (see Validation of KRAS and TP53 point mutations). All 42 adjacent normal tissues correctly matched with their corresponding tumor (Supplementary Fig. S1). Tumor DNA from an additional series of 227 colorectal cancer patients from the same hospital was used for validation purposes (Supplementary Table S1). This extended series was not restricted regarding site, stage, and MSI phenotype.
In addition, raw exome sequencing data from 513 samples were downloaded from The Cancer Genome Atlas (TCGA) repository. TCGA discovery dataset comprised 239 colorectal cancer tumors and 100 adjacent mucosae and was used to expand the exome sequencing analysis. These are public samples available in TCGA repository but had not been used in the published work characterizing colorectal cancer exomes (14). Moreover, 87 matched nontumoral and tumoral colorectal samples, herein named TCGA validation dataset, were used as a validation cohort for AMER1 mutations (Supplementary Table S1). These second set of samples included 44 that already had been analyzed by the TCGA consortium (14), not all of available samples because we requested a paired germline sample to ensure that mutations were somatic. Finally, 224 tumors from the TCGA published work with suitable information about molecular characteristic of the samples were used to assess the relationship between AMER1 mutations and colorectal cancer molecular subtypes (MSS and CIMP status; ref. 14).
Exome sequencing pipeline
Genomic DNA from the set of 42 adjacent tumor paired samples was sequenced in the National Center of Genomic Analysis (Barcelona, Spain; CNAG) using the Illumina HiSeq-2000 platform. Exome capture was performed with the commercial kit Sure Select XT Human All Exon 50MB (Agilent). Tumor exomes were sequenced at 60× coverage (2 × 75 bp reads), and exomes from adjacent tissues were sequenced at 40× (2 × 75 bp reads). FastQ software was used to assess the quality of the sequences (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc). Bowtie 2.0 software was used to align sequences over the human reference genome HG19 (15). To refine data, reads unmapped, reads with unmapped mate, nonprimary alignments, and reads that were PCR or optical duplicates were discarded (http://picard.sourceforge.net/). We also executed a local realignment around indels defined in dbSNP (16) and 1000G (17) and also for the indels detected in this particular study. Variant calling was executed with GATK software, and low-quality variants (mapping quality below 30, read depth below 10 or frequency < 10%) were discarded (18). GATK has been proved to achieve higher sensitivity and specificity in exome variant calling than other softwares (19). Germline variants were also removed, that is, variants that were present in normal adjacent paired sequence for each tumor and variants reported in the 1000G project. Because of the high frequency of indels that were later considered as false-positive results, only single-nucleotide variants (SNV) were taken into account in this study. Finally, variants were annotated using the SeattleSeq Variant Annotation web tool (20). The same pipeline was applied to analyze exome data downloaded from TCGA repository. No correlation was observed between the number of mutations found per sample and the quality parameters “number of reads,” “number of no matched reads,” “percentage of unique aligned reads,” “number of duplicate reads,” and “coverage,” reinforcing the robustness of the analytic pipeline (Supplementary Fig. S2).
Raw exome sequencing data were also used to search for copy number variations (CNV). Coverage data were used to compare the amount of DNA in adjacent versus tumor samples. The Varscan2 copynumber algorithm was run on adjacent and tumor mpileup obtained from Samtools (21). Next, the copyCaller command was run to adjust raw copy number values for GC content. Finally, R-GADA package was used to perform the segmentation analysis (22). A region was considered lost if the log2 tumor to adjacent mucosa ratio was less than −0.5.
Functional and pathways analysis
Databases containing function and pathway information “KEGG,” “Biocarta,” “Reactome,” and “GO” were downloaded from MSigDB in gene set enrichment analysis (GSEA; ref. 23). Only the potentially functional SNVs, that is, coding nonsynonymous, stop gain, stop lost, splice-5′, splice-3′, coding synonymous near splice site, 3′-untranslated region (UTR), and 5′-UTR variants, were analyzed. For each gene set in each database, a score was calculated by dividing the number of mutations mapping into genes constituting the dataset by the number of genes in such dataset. The score was corrected by dividing the number of samples and multiplying by 100. A P value was calculated by randomly permuting the original matrix of SNVs. The number of permutations was calculated in each case to ensure that the minimum P value was at least as small as required by the Bonferroni correction at nominal 0.05 significance level.
Validation of KRAS and TP53 point mutations
KASPar genotyping assays (KASP-By-Design; LGC Group) on the Fluidigm genotyping platform (48.48 Dynamic Array IFG, Fluidigm) were used to validate 6 mutations in KRAS and 8 mutations in TP53. The same methodology was applied to genotype 13 SNPs used to ensure that adjacent and tumor tissues were matched (Supplementary Table S2). Each genotyping assay was previously validated and optimized on the Light-Cycler 480 real-time PCR detection system (Roche Diagnostics GmbH).
Functional prediction of mutations
MutSig software was used to identify the more likely cancer-associated genes from other less suspicious genes. Mutated genes were ranked using 3 criteria: (i) abundance of mutations relative to the background mutation rate; (ii) clustering of mutations in hotspots within the gene; and (iii) evolutionary conservation of the mutated positions (24). In addition, protein damage predictions of missense mutations in AMER1 were performed by using the in silico algorithms SIFT (25), PolyPhen-2 (26), and PMut (27). Possible alterations of the protein structure were evaluated using the Hope software (28).
Sanger sequencing
Sanger sequencing was used to perform a technical validation of exome sequencing. Mutations in KRAS, APC, and TP53 genes were sequenced using a standard protocol. Sanger sequencing was also used to validate the recurrent mutations found in AMER1. Two regions in exon 1 (covering 335 and 236, respectively) were sequenced. Sequencing was performed on an ABI Sequencer 3730 and data analyzed using Mutation Surveyor v.3.10. Primer sequences are shown in Supplementary Table S2.
Expression data
Gene expression data from GSE44076 dataset (deposited in GEO repository), which includes the 42 sequenced tumors, were used to search for phenotypic similarities among tumors exhibiting loss of AMER1 functionality. “Sub-type B score” and “Sub-type C score” were calculated for each tumor as the mean of absolute expression of those genes described in Roepman and colleagues (29) as B-type characteristic (53 genes) or C-type characteristic (102 genes). Supplementary Fig. S3 showed all molecular data used in this study.
AMER1 methylation and CIMP phenotype assessment
Tumor methylation in our discovery dataset (CLX data) was analyzed with the Illumina Infinium HumanMethylation450 BeadChip assay covering approximately 20,000 genes (99% of RefSeq genes). AMER1 promoter methylation was extracted from this large dataset. Also, this information has been used to asses CIMP phenotype by interrogating MLH1, RUNX3, CACNA1G, IGF2, NEUROG1, SOCS1, CRABP1, and CDKN2A promoters, as previously reported (30). A tumor was considered to be CIMP-high (CIMP-H) if at least 6 of these 8 genes were hypermethylated.
Immunohistochemistry
Xylene-dewaxed paraffin tissue sections (4-μm thick) were obtained from 4 AMER1-mutated tumors and from 3 wild-type AMER1 tumors. For antigen retrieval, the slides were boiled after deparaffinization in a pressure cooker for 2 minutes in sodium citrate buffer (10 mmol/L sodium citrate, 2 mmol/L citric acid, pH = 6). Endogen peroxidase was blocked by sample immersion in 3% H2O2 during 15 minutes. Blocking was carried out by applying goat serum 1:10 diluted in PBS for 60 minutes at room temperature (RT). Subsequently, the primary antibody against AMER1 (OAAB03558, Aviva Systems Biology, diluted 1:100 in blocking solution) was added and incubated overnight at 4°C in a humidified chamber. After rinsing, EnVision system-Goat secondary antibody (Dako) was applied for 60 minutes at RT and subsequently revealed with DAB substrate (Dako) exposed for 4 minutes. Slides were counterstained with hematoxylin.
Results
Somatic mutational landscape in stage II colon tumors
Exome analysis revealed a total of 11,122 somatic SNVs within the 42 analyzed tumors (Supplementary Table S3). Many were intergenic and intronic mutations (most of them likely to be false-positives due to the lower coverage). Indeed, approximately 50% of SNVs were located in intronic regions (away from canonical splice sites). As expected, all the tumors showed enrichment in C:G > T:A nucleotide changes (Fig. 1A). There was no concordance between the number of mutations and the age of the patients, even if only C>T nucleotide changes were taken into account (Supplementary Fig. S4). The number of SNVs identified in coding regions and flanking sequences was 4,725. The average number of mutations in coding regions per sample was 117 (range, 6–185; Fig. 1B). From those, 9.6% were coding synonymous, thus a priori not affecting the protein structure. Considering only potentially functional mutations, 22.4% of SNVs were missense, 5.9% in UTR, 2.1% stop gain or stop lost, and 0.7% splice-site variants (Supplementary Fig. S5).
Remarkably, the vast majority of the identified somatic SNVs (10,985 of 11,122 total and 4,699 of 4,725 located in coding regions—more than 99%-) were private events, whereas only 137 SNVs were shared by two or more tumors. Of those, 112 were intronic or intergenic. As expected, the KRAS G12D mutation was the most recurrent change, occurring in 8 of 42 samples (19%; Supplementary Table S4). As a methodologic validation of the overall discovery pipeline of SNVs, 6 mutations in KRAS (Q61H, A146T, G12V, G12D, G12S, G13D) and 7 in TP53 (G245D, R248Q, R237H, R273C, R175H, R282W, R213*, G245S) were tested using KASPar genotyping assays in the Fluidigm Biomark platform (dynamic arrays), achieving 65% concordance. Of note, 10 of 11 nonconcordant mutations were only found by exome sequencing confirming the better performance and higher sensitivity of this technique (Supplementary Table S5). To further validate the sensitivity of our mutation calling pipeline, 9 point mutations (including 4 not previously validated by dynamic arrays) in APC (1), KRAS (4), and TP53 (4) were validated by Sanger sequencing (the gold standard technique) in all the CLX tumor samples, achieving a 100% concordance (Supplementary Table S5).
After removal of intergenic SNVs (n = 962), the remaining 10,160 variants were located in 6,433 genes and 174 of them mapped to more than one gene. Across samples, a total of 723 genes were mutated in more than one tumor. TTN (the largest gene in the human genome) was the most frequently mutated, followed by APC, KRAS, and TP53. If only potentially functional mutations were taken into account, APC (22 tumors), KRAS (21 tumors), and TP53 (20 tumors) were the most mutated genes (Fig. 2). MutSig software was also used to rank mutated genes on the basis of recurrence and functional effect of mutations. As expected, spurious genes (like the well-known TTN) noticeably went down in the list whereas APC, KRAS, and TP53 continued standing out (Supplementary Table S6). These findings were in agreement with previous studies performed to discover mutated genes in colorectal cancer (Supplementary Fig. S6; refs. 14, 31, 32).
Next, a pathway analysis was performed, including 2,856 genes harboring 3,595 potentially functional SNVs. SNVs with putative functional impact accumulated in pathways and functions classically related to cancer such as cell cycle (P < 0.001), apoptosis (P < 0.001), or cell signaling (P < 0.001). Moreover, pathways and functions specifically related to colorectal cancer tumors, such as the Wnt pathway (P < 0.001), the NOTCH expression and translation (P < 0.001), the VEGF pathway (P < 0.001), or the TGFB pathway (P < 0.001); among others, also appeared enriched in mutated genes. The complete list of statistically significant functions is shown in Supplementary Table S7.
SNV analysis in TCGA data
Exome sequencing data from 239 tumors and 100 adjacent mucosae from TCGA (TCGA discovery) that had not been included in the TCGA Consortium analysis (14) were analyzed using the same pipeline. Only 3.6% of somatic SNVs discovered in our 42 tumors were also found in TCGA discovery data confirming the high heterogeneity of the colorectal cancer mutational landscape. On the other hand, 20 of 137 recurrent SNVs (15%) were present in the TCGA tumors analyzed (Table 1). Of these, 11 were intronic, 1 intergenic, and 1 occurred in the 5′-UTR of the MASP1 gene. Only 6 of these 20 recurrent mutations were predicted to have a functional effect at the protein level; 5 of them being the well-known KRAS G12C, KRAS G12D, TP53 R282W, APC R232*, and APC E1353*. A stop gain mutation in AMER1 c.1489C>T (R497*) was identified in 4 tumors, 2 from our series and 2 from the TCGA discovery subset, thus deserving further consideration. If only stage II tumors were taken into account, barely 1.8% of somatic SNVs discovered in our 42 tumors were validated, but these included 13 of 137 recurrent ones.
Gene . | Variant . | rs identifier . | Function/location . | CLX samples, n . | TCGA samples, n . |
---|---|---|---|---|---|
TP53 | chr17:7577094; G>A | rs28934574 | Missense | n = 4AF = 0.64; 0.6; 0.59; 0.63 | n = 16AF = 0.63; 0.87; 0.51; 0.84; 0.58; 0.73; 0.57; 0.48; 0.39; 0.93; 1 |
KRAS | chr12:25398284; C>A | rs121913529 | Missense | n = 5 AF = 0.67; 0.46; 0.4; 0.43; 0.46 | n = 25AF = 0.6; 0.41; 0.45; 0.46; 0.67; 0.45; 0.29; 0.43; 0.48; 0.3; 0.49; 0.67; 0.2; 0.21; 0.43; 0.37; 0.7; 0.44; 0.45; 0.31; 0.36; 0.35; 0.48; 0.33; 0.4 |
KRAS | chr12:25398284; C>T | rs121913529 | Missense | n = 8AF = 0.19; 0.46; 0.35; 0.33; 0.31; 0.24; 0.46; 0.49 | n = 17AF = 0.35; 0.29; 0.6; 0.25; 0.22; 0.3; 0.29; 0.32; 0.27; 0.76; 0.26; 0.35; 0.35; 0.42; 0.48; 0.35; 0.52; 0.49; 0.31 |
APC | chr5:112128191; C>T | rs0 | Stop gained | n = 4AF = 0.71; 0.33; 0.29; 0.23 | n = 3AF = 0.24; 0.34; 0.18; 0.8 |
APC | chr5:112175348; G>T | rs0 | Stop gained | n = 2AF = 0.71; 0.57 | n = 4AF = 0.24; 0.34; 0.18; 0.81 |
AMER1 | chrX:63411678; G>A | rs0 | Stop gained | n = 2AF = 0.31; 0.64 | n = 2AF = 0.76; 0.65 |
MASP1 | chr3:187009800; A>T | rs0 | 5′-UTR | n = 3AF = 0.38; 0.54; 0.41 | n = 1AF = 0.64 |
MS4A2 | chr11:59861311; A>G | rs113221333 | Intronic | n = 3AF = 0.20; 0.25; 0.2 | n = 3AF = 0.31; 0.21; 0.31 |
KIF7 | chr15:90173735; G>A | rs0 | Intronic | n = 3AF = 0.21; 0.25; 0.2 | n = 1AF = 0.25 |
KIF13A | chr6:17787892; C>A | rs62394104 | Intronic | n = 3AF = 0.21; 0.16; 0.27 | n = 1AF = 0.19 |
EMC2 | chr8:109468234; T>A | rs111255731 | Intronic | n = 3AF = 0.17; 0.27; 0.18 | n = 1AF = 0.38 |
HOXD10 | chr2:176983604; C>A | rs73974643 | Intronic | n = 2AF = 0.21; 0.5 | n = 3AF = 0.34; 0.21; 0.19 |
SETD2 | chr3:47143125; C>T | rs200952697 | Intronic | n = 2AF = 0.22; 0.22 | n = 3AF = 0.16; 0.44; 0.29 |
PDS5B | chr13:33253144; A>C | rs199860513 | Intronic | n = 2AF = 0.18; 0.18 | n = 3AF = 0.23; 0.74; 0.80 |
AGBL1 | chr15:87474796; T>G | rs0 | Intronic | n = 2AF = 0.23; 0.36 | n = 1AF = 0.36 |
NUP133 | chr1:229623415; A>T | rs0 | Intronic | n = 2AF = 0.2; 0.19 | n = 1AF = 0.3 |
DPP9 | chr19:4720039; A>C | rs0 | Intronic | n = 2AF = 0.23; 0.35 | n = 1AF = 0.54 |
PARL | chr3:183584713; C>T | rs199558489 | Intronic | n = 2AF = 0.25; 0.3 | n = 1AF = 0.23 |
VCL | chr10:75874194; T>C | rs0 | Intronic | n = 2AF = 0.23; 0.19 | n = 1AF = 0.35 |
Intergenic | chr3:195433152; G>A | rs76183393 | Intergenic | n = 2AF = 0.52; 0.33 | n = 3AF = 0.42; 0.41; 0.66 |
Gene . | Variant . | rs identifier . | Function/location . | CLX samples, n . | TCGA samples, n . |
---|---|---|---|---|---|
TP53 | chr17:7577094; G>A | rs28934574 | Missense | n = 4AF = 0.64; 0.6; 0.59; 0.63 | n = 16AF = 0.63; 0.87; 0.51; 0.84; 0.58; 0.73; 0.57; 0.48; 0.39; 0.93; 1 |
KRAS | chr12:25398284; C>A | rs121913529 | Missense | n = 5 AF = 0.67; 0.46; 0.4; 0.43; 0.46 | n = 25AF = 0.6; 0.41; 0.45; 0.46; 0.67; 0.45; 0.29; 0.43; 0.48; 0.3; 0.49; 0.67; 0.2; 0.21; 0.43; 0.37; 0.7; 0.44; 0.45; 0.31; 0.36; 0.35; 0.48; 0.33; 0.4 |
KRAS | chr12:25398284; C>T | rs121913529 | Missense | n = 8AF = 0.19; 0.46; 0.35; 0.33; 0.31; 0.24; 0.46; 0.49 | n = 17AF = 0.35; 0.29; 0.6; 0.25; 0.22; 0.3; 0.29; 0.32; 0.27; 0.76; 0.26; 0.35; 0.35; 0.42; 0.48; 0.35; 0.52; 0.49; 0.31 |
APC | chr5:112128191; C>T | rs0 | Stop gained | n = 4AF = 0.71; 0.33; 0.29; 0.23 | n = 3AF = 0.24; 0.34; 0.18; 0.8 |
APC | chr5:112175348; G>T | rs0 | Stop gained | n = 2AF = 0.71; 0.57 | n = 4AF = 0.24; 0.34; 0.18; 0.81 |
AMER1 | chrX:63411678; G>A | rs0 | Stop gained | n = 2AF = 0.31; 0.64 | n = 2AF = 0.76; 0.65 |
MASP1 | chr3:187009800; A>T | rs0 | 5′-UTR | n = 3AF = 0.38; 0.54; 0.41 | n = 1AF = 0.64 |
MS4A2 | chr11:59861311; A>G | rs113221333 | Intronic | n = 3AF = 0.20; 0.25; 0.2 | n = 3AF = 0.31; 0.21; 0.31 |
KIF7 | chr15:90173735; G>A | rs0 | Intronic | n = 3AF = 0.21; 0.25; 0.2 | n = 1AF = 0.25 |
KIF13A | chr6:17787892; C>A | rs62394104 | Intronic | n = 3AF = 0.21; 0.16; 0.27 | n = 1AF = 0.19 |
EMC2 | chr8:109468234; T>A | rs111255731 | Intronic | n = 3AF = 0.17; 0.27; 0.18 | n = 1AF = 0.38 |
HOXD10 | chr2:176983604; C>A | rs73974643 | Intronic | n = 2AF = 0.21; 0.5 | n = 3AF = 0.34; 0.21; 0.19 |
SETD2 | chr3:47143125; C>T | rs200952697 | Intronic | n = 2AF = 0.22; 0.22 | n = 3AF = 0.16; 0.44; 0.29 |
PDS5B | chr13:33253144; A>C | rs199860513 | Intronic | n = 2AF = 0.18; 0.18 | n = 3AF = 0.23; 0.74; 0.80 |
AGBL1 | chr15:87474796; T>G | rs0 | Intronic | n = 2AF = 0.23; 0.36 | n = 1AF = 0.36 |
NUP133 | chr1:229623415; A>T | rs0 | Intronic | n = 2AF = 0.2; 0.19 | n = 1AF = 0.3 |
DPP9 | chr19:4720039; A>C | rs0 | Intronic | n = 2AF = 0.23; 0.35 | n = 1AF = 0.54 |
PARL | chr3:183584713; C>T | rs199558489 | Intronic | n = 2AF = 0.25; 0.3 | n = 1AF = 0.23 |
VCL | chr10:75874194; T>C | rs0 | Intronic | n = 2AF = 0.23; 0.19 | n = 1AF = 0.35 |
Intergenic | chr3:195433152; G>A | rs76183393 | Intergenic | n = 2AF = 0.52; 0.33 | n = 3AF = 0.42; 0.41; 0.66 |
NOTE: Reference genome: hg19.
Abbreviation: AF, allelic frequency.
Somatic mutations in AMER1
In addition to the recurrent mutation R497*, 2 more tumors from our series showed stop gain mutations in AMER1 (also known as FAM123B or WTX): c.1891C>T (R631*) and c.1876C>T (R626*), with 31%, 64%, 58%, and 55% of allelic frequency, respectively. All 4 mutations were validated by Sanger sequencing (Supplementary Fig. S7A). In the TCGA discovery dataset, 25 of 239 tumors (10.5%) accumulated 26 different somatic mutations in AMER1, including the recurrent R497* (Supplementary Fig. S7B). From these, 10 were stop codon, 10 missense, 3 coding synonymous, and 3 were located in the UTR of the gene (Fig. 3; Supplementary Table S8).
Additional validation of AMER1 mutations was performed in independent series of colorectal cancer, including 87 TCGA cases (TCGA validation subset) and 227 colorectal cancer tumors recruited at the Bellvitge University Hospital. Exome sequencing data analysis of TCGA validation revealed that 15% (13 of 87) of the tumors carried somatic variants in AMER1. In all, 2 variants were stop gain, 4 missense, 2 synonymous, and 5 were located in the 3′-UTR (Supplementary Table S8). The most recurrently mutated regions of AMER1 were Sanger-sequenced in the 227 colon tumors from the Bellvitge University Hospital. In this series, 3 previously identified recurrent stop gain mutations, R497*, R631*, and R626*, were detected in 5 tumors (Supplementary Fig. S7C). In all mutated cases, adjacent mucosa exhibited a wild-type genotype supporting the somatic origin of the mutations (Supplementary Fig. S7A and S7C). Regarding the functional effect of the identified missense mutations, 10 of the 14 were predicted to be damaging by at least one of the prediction algorithms used (Supplementary Table S8).
Under the hypothesis that chromosome deletions can also cause the somatic loss of AMER1, exome sequencing data were used to detect CNVs in chromosome X. One of the 42 analyzed tumors exhibited the complete loss of one copy of such chromosome (Supplementary Fig. S8). Actually, the patient was a female whose tumor also carried a truncating mutation in AMER1, suggesting the complete inactivation of AMER1.
To assess whether other molecular mechanisms, apart from mutation or CNV, could inactivate AMER1, its methylation status was evaluated in 96 samples (including the 42 sequenced tumors, NCBI BioProject PRJNA188510). In females, several tumors were found to be hypo- and hypermethylated in the promoter region when compared with their paired adjacent samples. As expected, a negative correlation (Pearson r = −0.22, P = 0.03) between the level of methylation and AMER1 expression was found in this subset of patients. This trend suggests that some tumors may have AMER1-inactive by an epigenetic regulatory mechanism (Supplementary Fig. S9). Nevertheless, nor CNV neither methylation was as frequent inactivating events as mutations in our data.
Phenotypic features associated with AMER1 inactivation
To confirm the effect of AMER1 truncating mutations at the protein level, immunohistochemical staining was performed in the 4 (stop gain) mutated samples of the original series. As expected, no protein expression or reduction in expression was detected in 3 of 4 analyzed tumors, whereas strong staining was detected in all adjacent samples and tumors not harboring mutations in AMER1 (Supplementary Fig. S10). To decipher whether AMER1 mutational inactivation confers a characteristic tumor phenotype, we used expression data from the same tumors to assess whether AMER1-silenced tumors showed characteristic patterns of expression of related pathways and functions (data deposited in GEO repository with access code GSE44076 and project code PRJNA188510). In addition to AMER1-mutated and hypermethylated tumors, one male patient from the GSE44076 set showing the complete loss of chromosome X was included in this analysis (data not shown). As expected, tumors with altered AMER1 tended to cluster together when analyzing the genes related to β-catenin binding (Fig. 4A) and to the Wnt pathway (Fig. 4B). Regarding the latter, overall Wnt-related genes were underexpressed in cluster 1, grouping 7 of 9 tumors with aberrant AMER1. In fact, overexpressed genes in this cluster included Wnt inhibitors such as PRICKLE1, PRICKLE2, and DAAM2, Wnt antagonists such as the SFRP family and SOX17, and noncanonical Wnt pathway activators such as WNT5A. Moreover, the potential prognostic value of AMER1 was assessed, but no association was found with disease-free survival (Cox proportional hazards: P = 0.58).
We also evaluated the co-occurrence of AMER1 mutations with mutations in genes related to the Wnt pathway as described (14). Within 266 tumors (discovery dataset and TCGA), 72% of AMER1-mutated samples had also APC mutated. However, co-occurrence of AMER1 mutations with other Wnt genes were rare (i.e., CTNNB1, 1 of 12; DKK2, 1 of 7; TCF7L2, 3 of 19) or even mutually exclusive (i.e., LRP5; Supplementary Fig. S11).
We next assessed the relationship between AMER1 mutational status and MSI–MSS–CIMP molecular subtypes, as well as KRAS and BRAF mutational status. Samples from CLX and TCGA were used (n = 322). We found that the majority of AMER1-lacking tumors were MSS (70%), BRAF wild-type (91%), and CIMP-H–negative (88%; Supplementary Fig. S12). We also wanted to assess if AMER1-deficient tumors belong to any of the recently described molecular subtypes of colorectal cancer. We used the gene lists reported by Roepman and colleagues (29) to construct a score able to rank tumors according to subtype B or subtype C gene expression. Because our study only included MSS tumors, subtype A (which mainly comprises MSI tumors) was not included in the analysis. Subtype B mainly comprises epithelial tumors with active Wnt and better prognosis whereas those subtype C were mesenchymal tumors exhibiting worse prognosis. We observed that AMER1-deficient tumors tend to score higher in the C than in the B subtype (also if only mutated tumors were taking into account; Supplementary Fig. S13). This trend was even more marked when tumors from the 2 main clusters defined in Fig. 4B were compared. The lower score in B subtype shown by AMER1-mutant tumors was also validated in 224 tumors from published TCGA data (Fig. 4C). This result was supported by the level of expression of molecular markers associated with the C subtype: a decrease in proliferation markers and an increase in EMT, NOTCH, and VEGF markers (Supplementary Fig. S14).
Public data from cBio Cancer Genomics Portal was used to compare mutations and deletions of AMER1 gene across different cancer types (33). Colorectal was the tumor that accumulated more mutations (more than 10%) followed by lung, endometrial, and melanoma. On the contrary, other tumors such as leukemia, medulloblastoma, breast, or ovary showed none or low levels of AMER1 mutations. One study in prostate cancer found more than 20% of tumors with amplifications in this locus. However, 6 more prostate studies showed neither mutations nor CNV in AMER1 (Supplementary Fig. S15).
Discussion
In an attempt to better understand the colorectal cancer pathobiology at a genomic level, 42 colon tumors were profiled by means of exome sequencing. Despite the high mutational heterogeneity among tumors, mutations in AMER1 emerged as a recurrent feature in colorectal cancer. Reinforcing its putative role as a driver gene, MutSig software scored AMER1 between the top 10 functional genes (P = 0.018) along with TP53, APC, and KRAS. Although mutations in AMER1 have been observed in other studies, this gene has not received proper attention as a potential driver for colorectal cancer.
AMER1 (also known as FAM123B or WTX) is a gene located in chromosome X that codifies a highly conserved membrane protein that acts as scaffold for β-catenin degradation. AMER1 is associated with the plasma membrane via 2 N-terminal domains and forms complexes with APC, β-catenin, Axin, and β-TrCP. It can recruit APC from microtubules to the plasma membrane and it is also involved in stimulating LRP6 phosphorylation (34). In tumors, AMER1 is a negative regulator of the Wnt/β-catenin pathway by promoting β-catenin ubiquitination and degradation (35). Also, it has been reported as a repressor of Wnt signaling when cells establish cell–cell contacts. AMER1 maintains the integrity of cellular junctions by mediating the membrane localization of APC (36).
Truncating mutations in the AMER1 gene are frequent in Wilm tumors (30%), which are pediatric tumors of the kidney (37). Yoo and colleagues performed a mutational analysis of AMER1 in gastric, colorectal, and hepatocellular carcinomas and found no mutations in colorectal cancer tumors (0 of 141; ref. 38). However, the TCGA consortium reported AMER1 as a frequently mutated gene in colorectal cancer (14). Seshagiri and colleagues also found functional mutations in AMER1 gene in colorectal cancer tumors: R177C, E384*, G105D, and E244* (39). Recently, mutations in AMER1 have also been described in metastatic colorectal cancer samples and their paired primary tumors (40). Interestingly, somatic mutations affecting an X chromosome gene raise the possibility of one-hit inactivation of a tumor suppressor gene. Indeed, 6 of 7 mutations in our set occurred in male tumors, and the loss of the AMER1 wild-type copy of chromosome X was observed in the AMER1-mutated tumor developed by a female. Interestingly, Han and colleagues previously reported the deletion of the AMER1 locus (Xq11) in Wilm tumors, providing evidence of the inactivation of the gene via copy number changes (41). Regarding our analysis of the TCGA data, 10 of the 14 stop gain mutations were found in males as well as 10 of the 14 missense mutations. AMER1 has been classically catalogued as a Wnt signaling inhibitor due to its belonging to the β-catenin complex. In this scenario, loss of AMER1 would lead to activation of canonical Wnt pathway by β-catenin translocation into the nucleus. However, our results point to the inactivation of Wnt signaling or to the activation of a noncanonical Wnt pathway in AMER1-mutated tumors. These findings suggest that AMER1 has an alternative function to its role in the canonical Wnt signaling pathway in colorectal cancer tumors, as has been proposed by other authors (42). Indeed, it has been described that AMER1 inhibits or activates the Wnt pathway in Wilm disease, depending on the mesenchymal or epithelial origin of the tumor. AMER1 mutations in the mesenchymal lineage lead to nuclear accumulation of β-catenin and subsequent upregulation of Wnt targets. On the other hand, AMER1 mutations in the epithelial lineage are not associated with active Wnt (42). This observation agrees with our findings, where AMER1-mutated colorectal cancer (epithelial) tumors show inactive Wnt. Interestingly, more than 50% of tumors harboring AMER1 mutations were also APC mutants whereas they rarely co-occurred with CTNNB1 and other genes in the Wnt pathway, probably indicating a characteristic and exclusive pathway activation of AMER1 tumors. Regarding molecular classification, our results pointed to AMER1-mutant tumors as mainly MSS and in rare co-occurrence with the CIMP phenotype and BRAF mutations. Recently, molecular subtyping of colorectal cancer that takes into account different molecular features of the tumors has been proposed (4, 29). Our results suggested that the subset of tumors lacking AMER1 expression could belong to type C tumors. This subtype is characterized by the expression of EMT markers and shows lower proliferative ratio. Clinically, type C tumors exhibit poor prognosis and are chemotherapy-resistant. However, our results showed no association between mutational status of AMER1 and tumor relapse.
In concordance with previous observations (43, 44), the mutational patterns of colon tumors are highly heterogeneous. Our results indicate that the vast majority of SNVs were private (non recurrent). Indeed, an independent validation only found 3% of SNVs shared by more than one tumor, confirming the high heterogeneity of the colorectal cancer mutational landscape. Nevertheless, 15% of the small fraction of recurrent mutations found in our study were also observed in the TCGA tumors, the well-known driver mutation being KRAS G12D, the most recurrent one (14). Also in consistence with previous colorectal cancer genomic studies, mutational changes in colorectal cancer are predominated by C:G > T:A transitions (14, 32, 45). The background rate of somatic mutations in colorectal cancer has been reported to be approximately 1 mutation per megabase (46). However, mutation frequencies in colorectal cancer tumors are not homogeneous due to differences in the status of the mismatch repair machinery (MSI vs. MSS) or to the presence or absence of POLE mutations (14, 47). Because our sample did not include MSI tumors, the observed mutation rate and number of SNVs are similar to those previously reported in MSS colorectal tumors.
In our series, APC appeared as the most mutated gene followed by KRAS, TP53, and TTN. TTN mutations have been previously identified in colorectal cancer (14) and in other tumors (48), probably due to the fact that it codes for the longest human protein, increasing the likelihood of passenger mutations, most probably unrelated to cancer (24). Other recurrently mutated genes deserve further consideration: mutations in FBXW7, a gene encoding a protein implicated in Notch signaling, were identified in 12% of the tumors. Similar results have been reported in a recent colorectal cancer study comparing primary and metastatic colorectal tumors (49). Also, 9.5% of tumors showed mutations in CSMD1, which have been associated with poor prognosis in colorectal cancer (50, 51). Interestingly, in our study, all functional CSMD1 mutations occurred in patients who relapsed. Recently described as recurrently mutated genes in colorectal cancer, SYNE1, FAT4, ATM, and USH2A, (32), were also found in our study with more than one tumor mutated (Fig. 2).
The mutational patterns of colorectal cancer are highly heterogeneous among patients. However, mutations tend to accumulate in common pathways and functions crucial for tumorigenesis (i.e., apoptosis, cell cycle). This suggests that a broad range of equivalent genetic aberrations could deregulate key pathways in carcinogenesis, as has already been postulated (52). In other words, diverse molecular alterations could converge in similar phenotypes.
Exome sequencing is a useful technique to discover still unknown mutations that can lead us to a better understanding of the mechanisms underlying colorectal carcinogenesis. However, it also has technical limitations. Mutations are more easily detected in high-coverage regions, so regions at the extremes of the captured exons many suffer from smaller sensitivity. Also, capture kits do not cover equally all exons in the genome. Moreover, tumor heterogeneity and stromal contamination must be taken into account, as it may lead to difficulties in differentiating low-frequency mutations from technical artifacts. Our mutation-calling algorithm required a minimum number of reads with the mutation and total coverage to increase the likelihood of a correct mutation calling.
In conclusion, our exome sequencing approach has revealed that MSS stage II colon tumors exhibit a highly heterogeneous somatic mutational landscape. In concordance with previous studies, this finding clearly suggests that colorectal cancer is not a single disease and supports the necessity of pathway-directed treatments. We have also described that approximately 10% of colorectal cancer tumors often harbor AMER1 inactivation due to somatic mutation, loss of the chromosome X, or hypermethylation (in lower fraction). This subgroup of tumors with AMER1 deficiency also exhibited a particular gene expression pattern in Wnt signaling pathway genes and showed an overall gene expression phenotype similar to the molecular subtype C described by Roepman and colleagues(29). Although promising, further experimental work is required to clearly demonstrate the role of AMER1 as a tumor suppressor gene in colon cancer tumors.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: R. Sanz-Pamplona, L. Valle, V. Moreno
Development of methodology: R. Sanz-Pamplona, F. Bellido, M. Gut
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): E. Guinó, X. Sanjuan, A. Soriano, R. Salazar
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R. Sanz-Pamplona, A. Lopez-Doriga, M.H. Alonso, S. Aussó, S. Beltrán, F. Castro-Giner, A. Closa, D. Cordero, F.D. Morón-Duran, V. Moreno
Writing, review, and/or revision of the manuscript: R. Sanz-Pamplona, A. Lopez-Doriga, M.H. Alonso, S. Beltrán, D. Cordero, F.D. Morón-Duran, R. Salazar, L. Valle, V. Moreno
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): L. Paré-Brunet, E. Guinó, S. Beltrán, M. Gut, D. Cordero
Study supervision: V. Moreno
Other (performed experiments): K. Lázaro
Acknowledgments
The authors thank Carmen Atencia, Pilar Medina, and Isabel Padrol for their help with the clinical annotation of the samples. They also thank Ana Ma Corraliza for helping with bioinformatics analysis and Gemma Aiza for technical assistance.
Grant Support
This study was supported by the Instituto de Salud Carlos III grants (FIS PI09-01037, PI11-01439, and PIE13-00022), CIBERESP CB07/02/2005, Spanish Ministry of Economy and Competitiveness (SAF2012-38885), the Spanish Association Against Cancer (AECC) Scientific Foundation, the Catalan Government DURSI grant 2014SGR647, and NIH grant U19CA148107 (CORECT). Sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d'Oncología de Catalunya (XBTC) and ICOBiobanc, sponsored by the Catalan Institute of Oncology.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.