Abstract
Purpose: The intratumoral heterogeneity (ITH) and the evolution of genomic architectures associated with the development of distant metastases are not well understood in colorectal cancers.
Experimental Design: We performed multiregion biopsies of primary and liver metastatic regions from five colorectal cancers with whole-exome sequencing and copy number profiling.
Results: In addition to a substantial level of genetic ITH, multiregion genetic profiling identifies the subclonal mutational architecture, leading to the region-based or spatial categorization of somatic mutations and the inference of intratumoral evolutionary history of cancers. The universal mutations (those observed in all the regional biopsies) are enriched in known cancer genes such as APC and TP53 with distinct mutational spectra compared with biopsy- or region-specific mutations, suggesting that major operative mutational mechanisms and their selective pressures are not constant across the metastatic progression. The phylogenies inferred from genomic data show branching evolutionary patterns where some primary biopsies are often segregated with metastastic lesions. Our analyses also revealed that copy number changes such as the chromosomal gains of c-MYC and chromothripsis can be region specific and the potential source of genetic ITH.
Conclusions: Our data show that the genetic ITH is prevalent in colorectal cancer serving as a potential driving force to generate metastasis-initiating clones and also as a means to infer the intratumoral evolutionary history of cancers. The paucity of recurrent metastasis-clonal events suggests that colorectal cancer distant metastases may not follow a uniform course of genomic evolution, which should be considered in the genetic diagnosis and the selection of therapeutic targets for the advanced colorectal cancer. Clin Cancer Res; 21(19); 4461–72. ©2015 AACR.
Metastasis, especially to liver, is a major cause of cancer-related death of colorectal cancer. Knowledge on genomic evolution from primary to metastatic colorectal cancer would be crucial not only to understanding colorectal cancer pathogenesis but also to biomarker development for diagnosis and prognosis. In this study, we identified that colorectal cancer showed branching evolutionary patterns rather than linear patterns during the progression. Also the genetic ITH was prevalent in colorectal cancer and might serve as a potential driving force to generate metastasis-initiating clones. The data suggest that a single regional biopsy may underestimate colorectal cancer mutations in clinic. We observed a relative paucity of recurrent metastasis-clonal event, suggesting that colorectal cancer distant metastases may not follow a uniform course of genomic evolution, which should be considered in the genetic diagnosis and the selection of therapeutic targets for advanced colorectal cancer.
Introduction
Colorectal cancers are the third most common human malignancy worldwide with a ∼5% lifetime risk (1). Distant metastasis is a major cause of cancer-related morbidity and mortality of colorectal cancer, mainly due to liver metastasis that is observed in approximately 25% of primary diagnoses of colorectal cancer. In spite of the clinical successes in early detection and prevention of colorectal cancer that resulted in an overall reduction of colorectal cancer risks (2), metastatic colorectal cancer still remains one of the leading causes of cancer-related death, with few therapeutic options available. Therefore, a better understanding of the molecular and genetic mechanisms underlying the biological and phenotypic evolution of colorectal cancer during metastasis is crucial to reduce morbidity and mortality of this disease.
As previously proposed (3), cancer evolution is described as successive cycles of somatic mutation acquisition and the natural selection of the fittest subclones, eventually giving rise to metastatic subclones (4, 5). Assuming that the development and progression of distant metastasis are genetic events, there have been efforts to identify causal genetic determinants responsible for distant metastasis. Recently, massively paralleled sequencing of breast cancer revealed that the majority of mutations observed in the metastatic lesions were also present, but often in low allele frequencies in the primary tumors (6). This finding was also supported by a direct comparison of mutation profiles obtained from primary and metastatic breast cancers (7). These results support the presence of genetic evolution associated with the development of distant metastasis, also raising a possibility that the clonal ancestry of distant metastasis can be traced to primary tumors.
It is now well recognized that human cancers have a substantial level of intratumoral heterogeneity (ITH) and mutational ITH has been demonstrated across multiple human cancer types (8–15). Genetic ITH is the manifestation of a driving force in cancer genomes for the emergence of new subclones that have metastatic potential. Thus, ITH should be taken into account when identifying a segregation pattern of metastasis-related mutations in primary tumors (16). To account for the regional ITH of mutations, Yachida and colleagues (17) performed multiregion biopsies in both primary and metastatic pancreatic tumors and identified that a primary tumor mass is a mixture of geographically distinct subclones, some of which might specifically give rise to distant metastases. However, the extent of genetic ITH in primary and metastatic colorectal cancer lesions and the subclonal architectures specifically associated with the development of distant metastasis are largely unknown.
To identify genetic alterations associated with the metastatic progression of colorectal cancer, we performed multiregion biopsies in both primary and metastatic colorectal cancer lesions from five colorectal cancer cases, and analyzed somatic mutations and copy number alterations (CNA). In addition to the apparent genetic ITH in both primary and metastatic lesions, we found branched evolutionary patterns in the colorectal cancer genomes, in which preexisting subclones in primary lesions often appeared responsible for the development of distant metastases. The mutations showed distinct mutational spectra, enriched molecular functions and aberration types (e.g., chromothripsis) according to their regional distribution and intratumoral recurrences (i.e., spatial mutation categories) giving clues to the subclonal architectures and potential metastasis-associated genomic changes of the corresponding tumors.
Materials and Methods
Tumor specimen
Colectomy tissues from five colorectal cancer patients used for this study came from a university-affiliated hospital (Eujeongbu St. Mary Hospital, Korea). All of the patients were Koreans and we only collected sporadic cases without any positive family history of colorectal cancer. The hospital pathology department confirmed pathologic features of the colorectal cancer. After the surgery, two to six different tumor areas and one normal mucosal area were picked from each fresh colectomy specimen (Supplementary Fig. S1). Also, matched metastatic nodules in liver were picked from the same patients along with primary colorectal cancer. Three patients (CRC1, CRC3, CRC4) had liver metastases at initial surgeries. Two patients (CRC2, CRC5) had no liver metastasis at initial surgery, but later they were found to have liver metastasis. All of the picked tissues from tumor and normal areas were frozen and stained with hematoxylin and eosin. Two pathologists selected cases with rich tumor cell populations (at least 70%), which were subsequently used in the study. For genomic DNA extraction, we used the DNeasy Blood & Tissue Kit (Qiagen) according to the manufacturer's recommendation. To evaluate the status of microsatellite instability, we used five mononucleotide repeats (BAT25, BAT26, NR-21, NR-24, and MONO-27; ref. 18). For all of the five colorectal cancer cases examined, no markers showed instability; thus, all cases were classified as microsatellite-stable (MSS). The other clinical information of five colorectal cancer cases is available in Supplementary Table S1.
Whole-exome sequencing and mutation analyses
Using genomic DNA from tumor and matched normal samples, we performed exome-capture sequencing. Using Agilent SureSelect Human All Exome 50 Mb Kit (Agilent Technologies), whole exome-sequencing was performed with Illumina HiSeq2000 platform to generate 101 bp paired-end sequencing reads according to the manufacturer's instructions. Burrows-Wheeler aligner (19) was used to align the sequencing reads onto the human reference genome (hg19). We used Genome Analysis ToolKit for the local realignment and score recalibration of the sequencing reads (20). We employed Picard (http://picard.sourceforge.net) and Samtools (21) for the basic processing and the management of the sequencing data. MuTect (22) and SomaticIndelDetector (20) were used to call somatic point mutations and indels by comparing the sequencing reads of the tumor and matched normal genomes. ANNOVAR package was used to predict their functional consequences such as silent or nonsilent variants for somatic variants located in coding sequences (23). The evolutionary analyses based on mutation calls can be substantially flawed when the mutations are independently called from each of the intratumoral biopsies and simply merged. This is mainly due to false negative calls of low-frequent mutations or those in sparsely covered regions leading to the overestimation of the ITH. To minimize such biases, we first collected index mutations with mutant allele frequencies >5% and those called in the regions with sequencing coverage (>20×). For each of the index mutations, we examined the rest of the intratumoral biopsies for the presence of any sequencing reads that support the corresponding mutation in other regions of the given case. The sequencing information, including the depth of sequencing for 35 tumor and 5 normal genomes, is available in Supplementary Table S1. Sequences have been deposited to the SRA database and can be accessed on Project ID of PRJNA271316.
Spatial mutation categories with respect to the regional distribution
We established schematics of mutation classification system as shown in Fig. 1A. The given example represents six regional biopsies from primary and metastatic lesions (P1–P3 and M1–M3, respectively) and the regional distribution of mutations that can be used to infer the phylogenetic relationship of regional biopsies. We classified the mutations into five spatial categories of “universal,” “metastasis-clonal,” “primary-private,” “metastasis-private” mutations along with “unclassified.” Universal mutations are those observed in all regional biopsies of a given tumor (red in Fig. 1A). The regionally clonal presentation (i.e., the regional commonality) of universal mutations suggests that these events have arisen earlier during tumor evolution and may contain potential cancer initiators. Metastasis-clonal mutations represent those that are regionally clonal only in the metastatic regions thereby excluding universal mutations. We used this mutation category in a broad sense to include the events with a regional clonality in metastatic lesions but partially present or absent in primary lesions (orange and yellow in Fig. 1A, respectively). The latter might have occurred after the physical separation of metastasis-originating clones; however, it is possible that not all mutations in primary masses can be identified in our experimental settings due to regional biases or low allele frequencies. Thus, we collectively classified these events as metastasis-clonal mutations. These events correspond to “metastasis progressors” in previous reports (17) and may include the genetic events responsible for the initiation and development of distant metastases. Hypothetically, the primary subclones from which the metastasis-originating clones arise can be identified (annotated as P2* with asterisk in Fig. 1A). The other mutational categories include “metastasis-private” events that are present but not in all of the metastatic lesions (green in Fig. 1A). These events may have occurred after the distant metastasis is established and acquired during the expansion of metastatic clones. In addition, “primary—private” events that are present in primary but absent in metastatic lesions (blue in Fig. 1A), representing the changes independently acquired during the expansion of the primary tumor mass. Along with the four spatial mutational categories, we classified the mutations that cannot be classified into the mutational categories discussed above as an additional category of “unclassified” (gray in Fig. 1A). We performed DAVID analyses (http://david.abcc.ncifcrf.gov; ref. 24) to identify the molecular functions enriched in the spatial category-specific mutations (i.e., genes harboring nonsilent mutations only in one spatial category but not in other categories). We selected the top three significant functional clusters and their functional annotations from the DAVID output for each spatial category.
Copy number profiling
We used Agilent SurePrint G3 Human CGH Microarray 180 K for copy number profiling as previously described (12). The log2 ratio (tumor biopsy/matched normal) was segmented using GLAD algorithm (25). We further employed ABSOLUTE algorithm to estimate the purity and ploidy for each copy number profile (26). The segmented log2 ratios and probe-level intensities were adjusted using the estimated purity and ploidy as described previously (27). The microarray data have been submitted to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE58512. To compare the array-based copy number calls with those from exome sequencing, we used VarScan2 (28) to obtain the read depth differences between the tumor and normal sequencing data. The concordance between two platforms was estimated by calculating the Pearson correlation coefficient (r) between the GC-corrected, log2-transformed read depth differences and log2 ratio from microarray data. A good concordance level was observed (r = 0.75–0.98; median of 0.90), suggesting that DNA copy numbers estimated from two independent platforms were largely concordant.
Chromothripsis
The inference of chromothripsis was done as previously described (29). For each of the chromosomes with at least 10 breakpoints, we first evaluated whether the sizes of individual segments are not substantially different from their neighboring segments given that the random occurrence of breakpoints in chromothripsis would generate multiple segments with their sizes roughly the same or at least at the same order of magnitude with their neighbors. If we suppose that a chromosome has n segments (sizes of s1, s2, …, sn), we can use Ri = |log2(si/si−1)| as a measure for the size difference between the neighboring segments. Then, we used the score S = R/Re (R as the median of the Ri's and the Re as the expected value of R under the hypothesis that breakpoints are randomly distributed in the chromosome) to filter out the chromosomes that are unlikely to be involved with chromothripsis (S < 2.0). To further evaluate the oscillating pattern of copy number states, we counted the number of peak-and-valley (i.e., the segments showing copy number difference of log2 > 0.5 with their neighbors). The permutation was performed 10,000 times and P value was assigned as the proportion of the permutations whose peak and valley counts are greater than the observed counts of peak-and-valley. P < 0.05 was used to select the chromosomes with the oscillating or alternating copy number changes.
Phylogeny estimation from mutations and copy numbers
The phylogenetic relationship between the regional biopsies of a given sample was inferred by mutations and copy number profiles. For each colorectal cancer, maximum parsinomy trees were inferred as previously described (11, 12) using branch-and-bound algorithm implemented in PHYLIP software package. The phylogenetic relationship of intratumoral biopsies was also inferred from the copy number changes using the TuMult packages (30). Among the GLAD-based segments, we selected those with log2 ratios of >0.2 or <−0.2 and used their boundaries as chromosomal breakpoints for the input of TuMult algorithm. As reference of Tumult, we downloaded the segmentation data of 470 colorectal cancer genomes as available in The Cancer Genome Atlas consortium (31). The copy number calls were also used to test for a star topology (i.e., whether a phylogeny agrees with a linear or a branched evolution) using MEDICCquant R package (32). The null hypothesis assuming a linear evolution was rejected (P < 0.05), so that the alternative hypothesis of a branched evolution was selected for four of the five cases examined (CRC1–CRC4, except for CRC5).
Results
Multiregion sequencing and mutation analyses of colorectal cancer genomes
In this study, we adopted a scheme of multiregion exome sequencing combined with spatial mutational profiling (12, 17) to reveal the mutational ITH of primary and metastatic colorectal cancers. Somatic mutational events were classified into five spatial mutation categories of “universal,” “metastasis-clonal,” “metastasis-private,” “primary-private,” and “unclassified” (Fig. 1A) with respect to the regional distribution (i.e., the regional biases and the level of recurrences for given mutations). The details on the categorization system and the interpretation of different spatial mutation categories are available in Materials and Methods.
We analyzed 35 multiregion biopsies of the primary and metastatic lesions in five colorectal cancers (n = 2–5 and n = 2–6 for primary and metastatic biopsies per case, respectively; Supplementary Table S1). The geographical mapping of the biopsy spots in primary colorectal cancers and liver metastatic lesions is shown in Supplementary Fig. S1. Overall, whole-exome sequencing of the 35 primary and metastatic colorectal cancer genomes identified 76 to 244 coding mutations and indels per biopsy (median of 104 somatic variants). Mutation numbers between the primary and metastatic biopsies of each patient were not significantly different except one case with the largest number of biopsies (CRC3; P = 4.8 × 10−5; t test). The mutations observed from the five colorectal cancer cases are listed with the detailed information of spatial categories and functional consequences in Supplementary Tables S2. The regional mutation profiles for the five colorectal cancer cases with their spatial categories showed a substantial level of ITH (Fig. 1B). We observed that 19.8% to 53.9% of mutations in a given sample were universal, whereas 46.1% to 80.2% of mutations were subclonal. Among the subclonal mutations, metastasis-clonal, metastasis-private and primary-private categories comprised 0.7% to 15.6%, 2.4% to 40.7% and 13.8% to 56.0% of mutations, respectively. In addition, 1.4% to 37.2% of mutations per biopsy were not observed in any other regions of the given sample (biopsy-specific mutations, see Supplementary Fig. S2).
We next investigated recurrent nonsilent mutations (i.e., those observed in ≥3 of the five colorectal cancer cases). Figure 2A shows the nine genes with such mutations, including well-known the KRAS mutations. All of the APC, KRAS, and TP53 mutations were identified as universal events except for a metastasis-clonal APC mutation in one case. Eight APC mutations were all truncating (either nonsense substitutions or frameshift indels) and were often biallelic (e.g., CRC2 harbored one nonsense mutation and another frameshift mutations). Two KRAS missense mutations occurred at amino acid residue of position 12 (p.G12S and p.G12D). Two of five TP53 mutations were truncating and the other three TP53 missense mutations occurred at (p.R213X and p.R273C) or near the known mutation hotspots (p.R267P; ref. 33). See also Supplementary Tables S2, for the details on the functional consequences and the amino acid residues affected for other mutations. The other recurrent mutations besides APC, TP53, and KRAS were dispersed across different spatial categories. We also performed gene set enrichment analysis to infer the unifying molecular terms of the nonsilent mutations that exclusively belong to each of the spatial categories (i.e., those observed in only one of five spatial categories) or the category-specific mutations. The functional annotations of “cell adhesion” are commonly enriched in universal, metastasis-clonal and primary-private categories (Fig. 2B).
Distinct mutational spectrum of spatial mutation categories
We evaluated the properties of mutations according to the spatial categories (Fig. 3). Figure 3A shows the number of mutations across five spatial categories for five colorectal cancer cases. We observed that frameshift and inframe indels were enriched to the metastasis-private category in two cases (CRC2 and CRC4; Fig. 3B). Although C:G>T:A transitions at CpG dinucleotides were significantly enriched to universal category in three colorectal cancers (Fig. 3C), the enrichment of C:G>A:T and T:A>A:T transversions were only observed in lesion-specific (metastasis-private and primary-private) categories in more than two cases. Analyses of mutant allele frequencies revealed that mutations from metastasis biopsies had significantly higher allele frequencies compared with those from primary biopsies (P = 2.2 × 10−16; t test; Supplementary Fig. S3). This finding suggests that metastatic lesions were relatively clonal due to the evolutionary bottleneck and restriction of diversity. Regarding spatial mutation categories, universal mutations had the highest allele frequencies, whereas primary- and metastasis-private mutations had the lowest (Supplementary Fig. S3).
Phylogeny and copy number profiles of regional biopsies
Phylogenetic trees inferred from genomic data may represent the evolutionary life history of individual tumors as discussed in previous literatures (11, 12). In this study, the phylogenetic relationship between the regional biopsies of a given sample was inferred from mutation data using maximum parsimony and chromosomal breakpoints in DNA copy number profiles as described previously (11, 12). Figure 4 shows phylogenic profiles (mutation- and copy number-based) of the five -s along with the genome-wide heatmap of CNAs. The inferred evolutionary patterns appeared as a branched evolution rather than a constant or linear evolution. For example, statistical analysis of the phylogeny patterns (32) indicated that the phylogeny of four (CRC1–CRC4) of the five -s was branched; that is, the null hypothesis of the test (a linear or constant evolution) was rejected for these four colorectal cancers except for CRC5.
In either mutation- or copy number–based phylogenies, most regional biopsies from metastases were cosegregated apart from those of primary lesions indicating that they were more genetically similar to each other than the primary lesions (Fig. 4). Most CNAs were shared by both primary and metastatic lesions, representing early or universal genomic events. Interestingly, in two cases (CRC2 and CRC3), mutation-based phylogenies segregated some primary subclones to those of metastatic tumors rather than those of other primary tumors (e.g., CRC2-P1 and CRC3-P3/P4; those with asterisks in Fig. 4). This phenomenon was consistently observed in copy number–based phylogenies (Fig. 4). These observations suggest that the origins of distant metastases could also be traceable in the subclonal architecture of primary colorectal cancers as previously reported in pancreatic cancers (17).
Copy number alterations containing well-known cancer genes
Well-known cancer genes that were reported to be frequently altered in colorectal cancer genomes such as APC, PTEN, IGF2, and SMAD4 (31) were found to be universally altered across the whole biopsy sites in this study (Fig. 5A). We also observed the universal focal deletion in between VTI1A and TCF7L2 in CRC2, which was reported to be responsible for the recurrent fusion between these genes in colorectal cancers (34). Most of the universal focal changes involving potential driver genes showed exactly coinciding boundaries across the regional biopsies indicative of their clonal origins (Fig. 5A). By contrast, genes located in fragile regions such as MACROD2, FHIT, WWOX (all on CRC2), and PARK2 (on CRC3) showed deletions whose boundaries were irregular across the regional biopsies of given tumors, although they were universal (Fig. 5B) indicative of ongoing genomic instability in these fragile loci. It is worth noting that the copy number gains involving c-MYC oncogene were observed in all five colorectal cancers analyzed (one focal and four 8q arm-level change cases; Fig. 5C). Universal or metastasis-clonal copy number gains of 8q encompassing c-MYC were observed in CRC1, CRC2, CRC4, and CRC5, whereas a focal gain of c-MYC as a metastasis-clonal event was observed in CRC3. In CRC2, in addition to arm-level metastasis-clonal 8q gain, a focal gain encompassing c-MYC was also observed in a primary lesion (CRC2-P4), suggesting that the evolutionary path to c-MYC copy number gains may be diverse even within a single cancer mass.
Evidences of region-specific chromothripsis
Of the five colorectal cancers examined, two candidate chromothripsis events were observed as one universal (on chromosome 2 of CRC3) and one metastasis-private event (on chromosome 16 of CRC5). In the chromothripsis of CRC3, breakpoints in deleted segments exactly overlapped across all the regional biopsies, suggesting that this event occurred early in tumor development (Fig. 5D). However, the intervening segments in between the copy number losses often showed neutral states or copy number gains in the primary tumors while they were consistently copy number gained in the metastatic lesions. This result suggests that additional copy number changes were acquired in the rearranged chromosomal segments during metastatic progression after the initial universal chromothripsis event. In CRC5, we observed a metastasis-private chromothripsis event that must have occurred after the divergence of metastatic clones from the primary cancer mass (Fig. 5E).
Potential genetic determinants of colorectal cancer metastases
Theoretically, metastasis-clonal events are potential candidates for metastasis determinants. We first investigated apparent loss-of-function events among the metastasis-clonal mutations, including frameshift indels (GDPD1, NPS, and SLK) and nonsense mutations (APC, ASCC3, CDH12, ENTPD4, GYLTL1B, NEDD4L, PCDHB5, and TK2). Through capillary sequencing of seven lesions (four primary and three metastatic) of the corresponding case (CRC1), the ITH of the NEDD4L nonsense mutation was confirmed (data not shown). In the following immunohistochemical analysis for the CRC1, primary lesions without the NEDD4L mutation showed positive NEDD4L immunostaining, whereas metastatic liver lesions with the NEDD4L mutation showed decreased NEDD4L immunostaining (Fig. 6A and B). Furthermore, in an independent set of 17 colorectal cancers with liver metastasis, seven liver metastases (41.2%) even without NEDD4L mutation showed decreased NEDD4L expression compared with the corresponding primary colorectal cancers (Fig. 6C and D), suggesting that liver metastasis-specific NEDD4L downregulation might be caused by nonmutational as well as mutational events. Among the focal CNAs, we selected metastasis-clonal deletions of PCDH9 as a candidate genetic event for metastasis. The ITH of PCDH9 was confirmed by real-time qPCR in the corresponding case (CRC2). In this case, primary colorectal cancer sites without the PCDH9 deletion showed positive immunostaining, whereas two liver metastasis lesions showed much weaker immunostaining in the cancer cells, indicating that the metastasis-clonal pattern of the PCDH9 deletion was recapitulated at the protein level (Supplementary Fig. S4).
Discussion
In this study, multiregion biopsies from primary colorectal cancer and liver metastases revealed a substantial level of ITH in both primary and matched liver metastases. Our results, 46% to 80% subclonal mutation fractions (i.e., those not detectable across all the regional biopsies), are similar to the previous estimates of other cancer types (10–12), further supporting the notion that ITH is a general phenomenon in cancers. Although ITH at the genomic level has been observed across many cancer types, including a report of colon adenomas (35), to our knowledge, our study is the first report on the ITH of colorectal cancer genomes with a detailed view on spatial mutation categories and phylogenetic structures of primary and metastatic lesions obtained from the same individuals.
There have been concerns on previous metastasis-specific mutation identification that might generate bias. For example, studies based on “index” mutations solely identified from metastatic tumors (17, 36) might not only ignore a substantial level of ITH, but also can be biased by the selection of metastatic biopsies used to generate the index mutations. In present study, we demonstrate that multiregion sequencing across multiple primary and metastatic lesions provides more information (e.g., the extent of ITH, the categorization of the mutations according to the different evolutionary stages and the evolutionary relationship between the primary and metastatic lesions) compared with a simple primary versus metastatic lesions (37, 38).
The recurrent mutations on known cancer genes (e.g., APC, KRAS, and TP53; ref. 39) were almost exclusively observed in the universal category consistent with recent reports (37, 38). In case of APC mutations, three of five cases showed “second-hit” mutations on APC, probably representing biallelic mutations or the occasional third hit on APC locus (40). To get a clearer understanding, biallelic APC mutations in one case (one universal frameshift indel and one metastasis-clonal nonsense mutation of CRC2) were validated using Sanger sequencing (Supplementary Fig. S5). The nonsense APC mutation in this case was classified as metastasis-clonal because of the absence of the mutations in a primary region (CRC2-P1) that also exclusively harbors an entire loss of chromosome 5. Thus, it is expected that two APC mutations in this case were biallelic, but the subsequent chromosomal loss led to the loss of only one of the biallelic mutations (nonsense mutation) while retaining the other (frameshift indel). It is also notable that this lesion-specific chromosomal loss altered the spatial category of the mutation (universal to metastasis-clonal), which may be a potential source of unclassified mutations.
The mutations that cannot be explained according to the evolutionary model (Fig. 1A) were left in “unclassified” category. We propose several assumptions for these evolutionarily unexplainable mutations (e.g., those subclonal both in primary and metastatic biopsies). First, the false-negative calls of mutations due to a low sequencing depth or other technical issues may lead to a misclassification of mutations and can be the source of “unclassified” mutations. Second, unclassified mutations may have arisen with subsequent chromosomal changes such as loss-of-heterozygosity as shown in the example of biallelic APC mutations in CRC2. Third, these mutations may have been acquired independently in primary and metastatic lesions representing a convergent evolution (homoplasy). Although homoplasy events have been previously reported in terms of gene-level convergent events (e.g., SETD2 frameshift indels and splicing mutations independently acquired by distinct biopsies in a cancer mass; ref. 10), we believe that convergent mutations with identical genomic coordinates would be rare. But the mutations with known hotspots will be worthy of further investigation as the potential source of misclassification. Finally, a recent report about the colorectal cancer evolution proposed that the subclonal mutational architecture as configured early in the evolution of cancers, may persist during the growth of cancer mass (41). The presence of “pervasive” mutations (i.e., subclonal mutations present across the cancer mass), especially those with very low mutation frequencies, may be the source of error in the accurate inferring of the phylogeny.
The first issue may be overcome by using “high-confident” somatic mutations (i.e., those called in genomic regions with sufficient sequencing coverage across all the regional biopsies). To test this, we selected somatic mutations called with sequencing depth >50× in all the regional biopsies and inferred the phylogeny (Supplementary Fig. S6). Although the phylogenies inferred from the mutations with or without this stringent filtering are largely concordant with each other, it should be further investigated whether the use of high-confident mutations may perform better in inferring the evolutionary relationship of intratumoral biopsies. In addition, low-confident mutations in particular biopsies can be selectively ignored to preserve the number of mutations as evolutionary markers.
We observed the preponderance of indels in late (metastasis-private and primary-private) genomic events, suggesting that colorectal cancer genomes seem more vulnerable to different types of somatic variants during their evolution. It is also possible that these unique genomic footprints might reflect different fixation rates of the mutation types, for example, highly deleterious events such as indels may be negatively selected in the early evolutionary phase. We also disclosed the mutation spectra associated with colorectal cancer development and progression stages (e.g., C:G>T:A transitions enriched in universal mutations), indicative of different mutational processes dominant in the early and late evolutionary phases during colorectal cancer progression.
The overall higher allele frequencies of mutations in the metastatic tumors compared with those in primary tumors was indicative of the presence of an evolutionary bottleneck. However, the allele frequencies of the five spatial mutation categories revealed that primary- or metastasis-private mutations had the lowest allele frequencies, indicating that these lesion-specific mutations could be largely late events. Cares should be taken for this interpretation since this comparison as well as other analyses based on allele frequency can be biased by the purity and ploidy level of the selected tumor lesions. For example, low tumor purity with a high level of normal tissue contamination will lead to the underestimation of tumor allele frequencies of mutations. Moreover, we selected tumor-rich regions (>70%) for regional biopsy in this study, but additional source of low tumor purity (e.g., the selection of tumor invasive fronts) would still exist.
The phylogenetic trees inferred from somatic mutations and DNA copy number changes were largely concordant and provided the evolutionary history. First, the metastatic clones are cosegregated apart from those of primary lesions indicative of their genetic similarity. Second, mutation profiles in some primary lesions were genetically more similar to those of metastasis than the other primary lesions suggesting that they might be a potential metastatic source. In spite of the overall similarities, some discrepancies were observed between the trees from somatic mutations and DNA copy number changes. Potential causes for the discrepancies may include those of “unclassified” mutations as previously addressed as well as technical issues in calling the CNAs. Among the unclassified mutations, those in CRC1 were all observed on chromosome 2 (i.e., four unclassified mutations that are present in all of the regional biopsies except for one metastatic biopsy of CRC1-M3). It is speculated that those mutations are universal events ab initio but subsequent chromosomal events such as loss-of-heterozygosity in CRC1-M3 led to the biopsy-specific loss of mutational calls. Moreover, the chromosome 2 of CRC1-M3 does not show apparent copy number changes suggesting that this chromosomal event is likely to be copy number neutral loss-of-heterozygosity. Supporting our speculation, all three universal mutation events in this chromosome showed significantly higher mutational allele frequencies (P < 0.05; paired t test) for CRC1-M3 compared with those of other regional biopsies.
Although a prevailing view has supported early occurrences of chromothripsis, especially those that are cancer drivers (42), we found a colorectal cancer case that harbored a metastasis-private chromothripsis event, supporting another report that chromothripsis does not necessarily represent an initiating event (43). Our results suggest that chromothripsis may occur at any developmental stage in colorectal cancer genomes, and the incidence of chromothripsis estimated from primary colorectal cancer cases might be underestimated. Additional complexity of chromothripsis in terms of the CNAs of affected chromosomal segments further suggests that this chromosomal event may be accompanied by subsequent chromosomal changes.
Systematic sequencing analyses of primary and metastatic cancer genomes have shown that a small number of mutations are required for an invasive colorectal cancer to metastasize (36). Moreover, even newly acquired mutations during metastatic progression are not overlapped among individual patient samples (17). Consistent with these findings, our study identified metastasis-clonal mutations only in 0.7% to 15.6% of total mutations in the given cases. These data suggest that essential mutations for cancer progression may be obtained in primary cancer genomes before metastasis. In line with this hypothesis, colon adenoma genomes were reported to be nearly as old as invasive cancers (44), suggesting the time required for malignant transformation may be relatively short in the entire life history of colorectal cancer. The driving force for colorectal cancer metastases may be beyond the scope of our study, for example, mutations occurring in noncoding sequences (45) or epigenetic alterations (46).
Finally, we validated some of metastasis-clonal mutations, including a NEDD4 nonsense mutation and PCDH9 deletion (both as singletons) by resequencing or qPCR in our test sets along with immunostaining in an independent cohort. Metastasis-clonal nonsense mutation of NEDD4L is interesting, because the encoded protein NEDD4L interacts with Smad proteins (47) and can regulate TGFβ signaling that has been implicated in cancer invasion and metastasis (48). However, the potential role of NEDD4 as metastatic determinants along with that of PCDH9 requires further investigation. Although we did not find novel driver genes of colorectal cancer metastasis in the metastasis-clonal category, integration of these intermittent findings will be needed to elicit biologically relevant insights to improve diagnosis and treatment of colorectal cancer patients with metastasis.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: S.-H. Lee, Y.-J. Chung
Development of methodology: Y.-J. Chung
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.-H. Jung, S.H. Lee, M.S. Kim, S.-W. Park, Y.-J. Chung
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): T.-M. Kim, S.-H. Jung, S.H. Lee, I.-P. Baek, J.-K. Rhee, Y.-J. Chung
Writing, review, and/or revision of the manuscript: T.-M. Kim, S.-H. Jung, J.-K. Rhee, S.-H. Lee, Y.-J. Chung
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.-W. Park
Study supervision: C.H. An, S.-H. Lee, Y.-J. Chung
Grant Support
This study was supported by a grant from the National Research Foundation of Korea (2012 R1A5A2047939).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.