Determining the evolutionary history of metastases is a key problem in cancer biology. Several recent studies have presented inferences regarding the origin of metastases based on phylogenies of cancer lineages. Many of these studies have concluded that the observed monophyly of metastatic subclones favored metastasis-to-metastasis spread (“a metastatic cascade” rather than parallel metastases from the primary tumor). In this article, we argue that identifying a monophyletic clade of metastatic subclones does not provide sufficient evidence to unequivocally establish a history of metastatic cascades. In the absence of a complete phylogeny of the subclones within the primary tumor, a scenario of parallel metastatic events from the primary tumor is an equally plausible interpretation. Future phylogenetic studies on the origin of metastases should obtain a complete phylogeny of subclones within the primary tumor. This complete phylogeny may be obtainable by ultra-deep sequencing and phasing of large sections or by targeted sequencing of many small, spatially heterogeneous sections, followed by phylogenetic reconstruction using well-established molecular evolutionary models. In addition to resolving the evolutionary history of metastases, a complete phylogeny of subclones within the primary tumor facilitates the identification of driver mutations by application of phylogeny-based tests of natural selection. Cancer Res; 75(19); 4021–5. ©2015 AACR.

Cancer progression, from the formation of the first neoplastic cell to metastasis, occurs through the origin and proliferation of subclonal lineages characterized by unique somatic mutations. Provided sufficient numbers of mutations occur during the branching process of cancer lineage evolution (1–3), this process generates a hierarchical genealogy that can be analyzed using the phylogenetic tools of evolutionary biology. Numerous studies have leveraged the reconstructed phylogenies of tumor subclones to infer the genetic history of cancer progression, including addressing the origin of metastases. For example, metastasis-to-metastasis spread (a “metastatic cascade”), as opposed to parallel metastatic events, has been inferred from the tree-like topology of tumor subclone phylogenies (4–7). Central to this conclusion was the assumption that parallel metastatic events from a single primary tumor would necessarily result in a star topology with a polyphyletic group of metastases, while a metastatic cascade would result in a tree topology with a monophyletic clade of metastases. This assumption—and others like it—require careful scrutiny to ensure that the inferred phylogenies truly support proposed models of cancer progression and evolution. In this commentary, we investigate the roles such assumptions play in these studies. For instance, star phylogenies are not the only possible outcomes of parallel metastases. We argue that the chronology of metastatic events cannot be established without complete information on the phylogeny of subclonal lineages within the primary tumor. Below, we will show why this is the case and suggest approaches that are more likely to provide insights into the chronology of metastatic events.

To illustrate how parallel metastatic events can generate a monophyletic subclade of genotypes from different metastatic tumors, consider the tumor progression histories and phylogenies shown in Figs. 1A and B. In both examples, subclones A and E represent the most common and the only recognized subclones in the primary tumor, while subclones B, C, and D represent metastases. Figure 1A illustrates a single metastatic event from the primary tumor followed by a metastatic cascade, whereas Fig. 1B shows three parallel metastases from the primary tumor. Despite two fundamentally different histories of metastatic events, both scenarios generate the same phylogenetic tree, with a monophyletic clade of metastases. Figure 1C shows a third scenario where parallel metastatic events can be unambiguously inferred from the polyphyly of metastatic genotypes.

Figure 1.

Demonstration of the necessity of complete sampling of primary tumor subclones when inferring the mode of metastasis spread from phylogenetic data. Assuming that lineages F and G in the primary tumor are not sampled, monophyly of metastases in a phylogenetic tree can arise from a single metastatic event from the primary tumor followed by a metastatic cascade (A) or from parallel metastases of subclones that are descended from unsampled portions of the primary tumor (B). C, in contrast, polyphyly of metastases unambiguously indicates a history of parallel metastatic events without complete sampling of the primary.

Figure 1.

Demonstration of the necessity of complete sampling of primary tumor subclones when inferring the mode of metastasis spread from phylogenetic data. Assuming that lineages F and G in the primary tumor are not sampled, monophyly of metastases in a phylogenetic tree can arise from a single metastatic event from the primary tumor followed by a metastatic cascade (A) or from parallel metastases of subclones that are descended from unsampled portions of the primary tumor (B). C, in contrast, polyphyly of metastases unambiguously indicates a history of parallel metastatic events without complete sampling of the primary.

Close modal

In the absence of a complete phylogeny of all cell lineages within the primary tumor, the most that can be inferred from a monophyletic clade of metastases is that all metastases descended from a single subclone within the primary tumor, rather than descending from different sampled subclones within the primary tumor (Fig. 1C). The monophyly of metastatic genotypes in itself provides no means of distinguishing between a single metastatic event and parallel metastatic events.

The relative plausibility of each of these three scenarios (Fig. 1A, B, or C) depends on the biology of metastasis. If most cells in the primary tumor have a high potential for generating metastases, we should expect parallel metastatic events from the primary tumor that will result in unambiguous polyphyly of metastatic genotypes (i.e., some metastases will be more closely related to primary tumor subclones than to other metastases). In contrast, if we assume a linear model of cancer progression in which there are a limited number of mutations that permit cells to metastasize, then we should expect all metastatic cells to arise from a single subclonal lineage. Nevertheless, it is crucial to note that the set of shared mutations present among all members of a metastatic subclade is not equivalent to a metastatic phenotype; mutations instigating metastasis might have occurred later in evolution and might be specific to subsets of lineages. This distinction between shared mutations that define a lineage and the unique phenotypes of its extant descendants is analogous to a problem in the molecular phylogenies of higher organisms: it cannot be assumed that the phenotypic traits defining extant “crown taxa” were present at the times when the stem lineages diverged from their common ancestor (8, 9). For example, it is unlikely that defining morphologic traits, such as upright posture/locomotion, appeared in the most recent common ancestor of humans and australopithecines at the moment this monophyletic lineage split from the shared common ancestor with chimpanzees. Similarly, we cannot assume that a monophyletic lineage of metastatic subclones necessarily had a metastatic cell as a most recent common ancestor.

Consequently, only a phylogeny replete with a complete representation of subclones within the primary tumor will make it possible to distinguish parallel metastases of primary tumor origin from a metastatic cascade. Unless we can rule out the possibility of clonal diversification within the primary tumor followed by parallel metastases, we cannot claim clonal diversification within metastases. The challenge is made even worse if we admit a possibility of subclones within the primary tumor going extinct as a consequence of natural immune surveillance or of early therapeutic intervention, in which case no purely topologic inference will differentiate between a scenario of parallel metastases and a scenario of a metastatic cascade.

Examples from the recent literature

Two recent publications have wrestled with the problem of inferring the presence or absence of metastatic cascades without gathering complete representations of subclonal heterogeneity within the primary tumor. Schwarz and colleagues (5) inferred primarily metastasis-to-metastasis spread (consistent with Fig. 1A) based on the monophyly of metastatic genotypes in 8 out of 9 subjects with high-grade serous ovarian cancer. However, their analyzed samples include only one major clone from the primary tumor of each subject, and thus do not differentiate between the metastatic cascade scenario and the alternative hypothesis of parallel metastatic events (as in Fig. 1B), because of the absence of necessary information on spatially distinct subclonal heterogeneity within the primary tumor.

In another recent example, Hong and colleagues (6) infer from the monophyly of metastatic genotypes in a subject with prostate cancer an evolutionary history that includes a complex metastatic cascade, with cross-seedings across metastatic sites and re-seedings into the surgical bed (site of primary tumor resection). In the phylogeny of the first subject in Hong and colleagues (Fig. 2A), four subclones (A, B, C, and D) were identified in the primary tumor through extensive spatial sampling from seven different locations. Subclone E was unique to the single clinically identifiable metastasis and was clearly derived from a single metastatic event from the primary. In contrast, the primary tumor was sequenced in only two locations in the second subject, yielding three subclones (Fig. 2B). Perhaps, it is no coincidence that it is in this subject that a complex metastatic cascade is inferred, based on the monophyly of metastatic genotypes with primary tumor subclones as an outgroup. Because the sampling of the primary tumor in the second subject was limited in comparison with the sampling of the primary tumor in their first subject, there is no way to formally exclude the possibility of parallel metastatic events from unsampled subclones in the primary tumor (e.g., Fig. 1B), as opposed to a metastatic cascade (e.g., Fig. 1A). In the third subject (Fig. 2C), a single primary tumor subclone was inferred from a single sample. In this case, because the metastatic subclones are polyphyletic (i.e., metastatic subclone B shares a more recent common ancestor with primary tumor subclone A than with the other metastatic subclones), the inference of parallel metastatic events is unambiguous (e.g., Fig. 1C).

Figure 2.

Phylogenies of the first three subjects from Hong and colleagues (A, subject 299; B, subject 498; C, subject 177; ref. 6). These phylogenetic trees are not the result of a new analysis. They are a preferred graphical representation to those presented in Hong and colleagues (6) that is topologically equivalent. In all panels, observed subclones are represented as terminal taxa (leaves) rather than as direct ancestors (internal nodes).

Figure 2.

Phylogenies of the first three subjects from Hong and colleagues (A, subject 299; B, subject 498; C, subject 177; ref. 6). These phylogenetic trees are not the result of a new analysis. They are a preferred graphical representation to those presented in Hong and colleagues (6) that is topologically equivalent. In all panels, observed subclones are represented as terminal taxa (leaves) rather than as direct ancestors (internal nodes).

Close modal

A number of recent studies have more or less explicitly acknowledged the need for a complete characterization of subclonal heterogeneity within the primary tumor in order to unambiguously estimate the location of the genetic divergence of metastases. In these studies, the most frequent approach taken has been to perform deep next-generation sequencing on as large a portion of the tumor as possible, followed by the execution of phasing algorithms for genotyping (10–12). This thorough approach was applied by Gundem and colleagues (7), in which evidence was marshaled for the presence of metastatic cascades in subjects with lethal prostate cancer, and in Yates and colleagues (13), which investigated subclonal diversification within primary breast cancer tissue. Gundem and colleagues (7) point out that “we cannot formally exclude an alternative explanation for the observed patterns, that each of these metastases has seeded from an undetected subclone in the primary tumor. However, targeted resequencing of a subset of mutations failed to detect any such subclones, despite a median sequencing depth of 471×” (7). Care must be taken, however, to differentiate between depth of sequencing and extent of spatial sampling. Sequencing depth is irrelevant to the inference of genetic variation within the primary tumor unless it is accompanied by broad spatial sampling of the tumor, because genetic diversity in tumors is spatially partitioned and increasing sequence depth for a small section of the tumor will not necessarily capture any additional subclones. For the question of discerning parallel metastases from metastatic cascades, sequencing a sample of limited spatial extent at great depth is the proverbial equivalent of rereading the same paragraph of a newspaper article to validate its truth.

Two approaches can be applied to address these issues and better resolve the question of parallel metastases versus metastatic cascades. First, one can sample a large number of small, spatially separated sections of the tumor, under the assumption that genetic diversity is spatially partitioned and that each small portion is relatively homogeneous. If the assumptions are valid, this approach eliminates the need for especially deep sequencing and avoids the challenging task of haplotype inference from multiple genotypes. Alternatively, one can perform highly parallelized single-cell sequencing from entire tumors or from spatially heterogeneous samples of a single tumor to provide the same information. Even with single-cell sequencing, it remains a requirement that single cells from sufficiently many tumor sections are sampled to capture the full diversity of the primary tumor. The number of cells needed to be sequenced would depend—in a way that is not yet worked out—on the extent of subclonal heterogeneity and on the quality of each single-cell exome or genome sequence.

To be clear, next-generation sequencing of tumor sections (14–16) and single-cell sequencing (17–21) have successfully reconstructed plausibly complete subclone phylogenies within the primary tumor. Importantly, results of these studies only provided evidence for the polyphyly of metastatic genotypes. There were many instances in these studies where metastases shared unique somatic mutations with subclones in the primary tumor that were not shared with other metastases. Moreover, the metastases and their sister lineages in the primary tumor were typically descendants of single, comparatively recent subclones in the tumor, as depicted in Fig. 1B. In the absence of what appears to have been a complete phylogeny within the primary tumor, each of these studies could easily have erroneously identified metastatic tumors as monophyletic and arising as a consequence of a metastatic cascade.

Currently, whether one interprets the monophyly of metastatic subclones—when observed—as the result of a metastatic cascade or as the result of parallel metastases depends to a large degree on one's assumptions about whether the capacity for metastasis is innate in most primary tumor cells or a derived trait found in a uniquely derived subpopulation of cells. In a detailed, spatially explicit, and spatially thorough sequencing study of metastatic pancreatic cancer (15), it was argued that the capacity for metastasis is acquired later in the course of tumor progression because none of the metastatic tumor genotypes were derived from the ancestral tumor genotype. However, Yachida and colleagues (16) found no evidence for metastatic cascades: the two distinct liver metastases and the lung metastases are derived from different (albeit recent) subclones, while the subclones associated with local invasion (metastasis to the peritoneum) occurred comparatively early in the subclone genealogy. The latter result is consistent with studies that indicate that the capacity for invasive migration is innate to many cancer types (22–24). It implies that early seeding of metastatic lineages is likely in many cases, and argues that with sufficient sampling of subclones, we should generally expect to find phylogenies resembling Fig. 1C rather than Fig. 1A.

Completely characterizing subclonal variation within tumors and correctly inferring their phylogeny provides a framework for addressing many key aspects of cancer evolutionary genomics, such as distinguishing somatic natural selection from neutral evolution. The implications of the subclonal phylogeny are wide-ranging. For instance, in Schwarz and colleagues (4), a “clonal expansion (C.E.) index” that compares the distribution of subclones within and among tumors to a uniform distribution is proposed as a model for distinguishing selection from the neutral effects of mutation and genetic drift. However, the uniform null for tumor genotypes is not justified by neutral models of genome evolution in tumor cells. A uniform distribution is not the null expectation for any population evolving along a phylogenetic tree, nor is it the null expectation for the distributions of genotypes of organisms in a population, all for the same reasons: genealogic relatedness, finite time, and spatial structure. This realization is the basis for comparative method (25) statistics in evolutionary biology. Attempts to validate causal relations between traits that map to subclonal genotypes must make use of similar approaches applied to tumor subclones. Specifically, reconstructing the genealogy of tumor subclones provides a foundation for deriving null distributions of subclone frequencies, potentially by use of a coalescent model of neutral evolution. Under a neutral null model, it is assumed that all changes in the frequency distribution of tumor subclones result from mutations and stochastic genetic drift, in contrast to any deterministic process of natural selection favoring some subclones over others. A neutral distribution would be the correct null on which to base tests of natural selection on driver genes, as is used in tests based on the distribution of pairwise genetic distances among genotypes that are now standard in population genetics research (26, 27). Finally, a complete phylogeny would also facilitate tests of natural selection—such as comparisons of silent versus nonsynonymous nucleotide substitutions (e.g., ref. 28)—that are performed on the basis of genealogic information rather than under an arbitrary assumption of null uniformity.

We have illustrated the challenges inherent to inferring the mode of metastasis spread from cancer phylogenies in the absence of a complete characterization of subclone diversity within the primary tumor. Specifically, we have demonstrated how parallel metastatic events can result in the monophyly of metastases, refuting the idea that monophyly of metastases necessarily implies metastasis-to-metastasis spread, without a complete accounting of subclonal heterogeneity within the primary tumor. A complete phylogeny of subclones within the primary tumor requires extensive sampling of the subclone diversity; it also requires the application of algorithms that can reliably infer the evolutionary history of subclone-defining mutations. We remark that many of the studies of tumor metastases referenced above either use distance-based clustering methods or heuristic reconstructions of genealogy done by inspection and argue that these approaches need to be refined by the application of well-established character-based phylogeny reconstruction methods based on molecular evolutionary models.

Applying phylogenetic analysis to the study of cancer with an understanding of its power and limitations will be a great boon to cancer analytic methodology. Inferring the mode of metastasis spread is just one area to which applications of longstanding research on evolutionary biology can be applied to the evolution of cancer; for example, the timing of key mutations underlying tumorigenesis can also be inferred using ancestral state inference and mutation mapping. It has long been recognized that genetic heterogeneity within and among tumors in a patient affects clinical outcomes such as response to therapy. In such cases, regardless of the originating basis of that heterogeneity, the degree of genealogic affinity among tumor lineages should be predictive of the degree of response at dispersed sites. A reliable inference of the evolutionary history of genetic variants will inform clinical decision-making as well as provide insights into the basic biology of tumors and metastases. Ultimately, phylogenetic information has significant potential to be useful in the clinic, though the relation of cancer phylogenies to clinical trajectories has yet to be demonstrated. Phylogenies will likely exhibit predictive value for a subject's prognosis and the cancer's responses to therapy—they are the natural conceptual framework for conveying an understanding of tumor heterogeneity, which is the putative source of emergent tumor resistance to current pharmaceutical therapeutics. As sampling multiple sites becomes increasingly feasible with decreasing sequencing costs, and as clinicians become more aware of the relevance and interpretation of phylogenetic analysis, we expect to see studies that map clinical trajectories to phylogenies, infer ancestral states and the timing of mutations, and even describe the full temporal history of cancer within subjects. Such studies will provide a complex but manageable scheme of how cancer evolves and will lead to testable hypotheses regarding the optimal means of providing personalized care informed by phylogenetic and evolutionary analysis.

J.P. Townsend reports receiving a commercial research grant from Gilead Sciences. No potential conflicts of interest were disclosed by the other authors.

Conception and design: M. Shpak, J.P. Townsend

Development of methodology: M. Shpak

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M. Shpak, J.P. Townsend

Writing, review, and/or revision of the manuscript: W.S. Hong, M. Shpak, J.P. Townsend

Study supervision: J.P. Townsend

W.S. Hong was supported by Yale School of Medicine Medical Student Research Fellowship. M. Shpak was supported by the St. David's Foundation impact fund. J.P. Townsend was supported by the Notsew Orm Sands Foundation and by NIH NCI 1U01CA176067-01A1.

1.
Shpak
M
,
Churchill
GA
. 
The information content of a character under a Markov model of evolution
.
Mol Phylogenet Evol
2000
;
17
:
231
43
.
2.
Townsend
JP
. 
Profiling phylogenetic informativeness
.
Syst Biol
2006
;
56
:
222
31
.
3.
Townsend
JP
,
Su
Z
,
Tekle
YI
. 
Phylogenetic signal and noise: predicting the power off a data set to resolve phylogeny
.
Syst Biol
2012
;
61
:
835
49
.
4.
Schwarz
RF
,
Trinh
A
,
Sipos
B
,
Brenton
JD
,
Goldman
N
,
Markowetz
F
. 
Phylogenetic quantification of intra-tumour heterogeneity
.
PLOS Comput Biol
2014
;
10
:
e1003535
.
5.
Schwarz
RF
,
Ng
CKY
,
Cooke
SL
,
Newman
S
,
Temple
J
,
Piskorz
AM
, et al
Spatial and temporal heterogeneity in high-grade serous ovarian cancer: a phylogenetic analysis
.
PLOS Med
2015
;
12
:
e1001789
.
6.
Hong
MKH
,
Macintyre
G
,
Wedge
DC
,
Van Loo
P
,
Patel
K
,
Lunke
S
, et al
. 
Tracking the origins and drivers of subclonal metastatic expansion in prostate cancer
.
Nat Commun
2015
;
6
:
6605
.
7.
Gundem
G
,
Van Loo
P
,
Kremeyer
B
,
Alexandrov
LB
,
Tubio
JMC
,
Papaemmanuil
E
, et al
The evolutionary history of lethal metastatic prostate cancer
.
Nature
2015
;
520
:
353
6
.
8.
Budd
G
. 
Does evolution in body patterning genes drive morphological change—or vice-vesa?
Bioessays
1994
;
21
:
326
32
.
9.
Valentine
JW
. 
On the origin of phyla
.
Chicago, IL
:
University of Chicago Press
; 
2004
.
10.
Fischer
A
,
Vazquez-Garcia
I
,
Illingworth
CJR
,
Mustonen
V
. 
High-definition reconstruction of clonal composition in cancer
.
Cell Rep
2014
;
7
:
1740
52
.
11.
Martinez
F
,
Lafforgue
G
,
Morelli
MJ
,
Gonzalez-Candelas
F
,
Chua
N-H
,
Daros
J-A
,
Elena
SF
. 
Ultradeep sequencing analysis of population dynamics of virus escape mutants in RNAi-mediated resistant plants
.
Mol Biol Evol
2012
;
29
:
3297
307
.
12.
Zagordi
O
,
Bhattacharya
A
,
Eriksson
N
,
Beerenwinkel
N
. 
ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequence data
.
BMC Bioinformatics
2011
;
12
:
199
.
13.
Yates
LR
,
Gerstung
M
,
Knappskog
S
,
Desmedt
C
,
Gundem
G
,
Van Loo
P
, et al
Subclonal diversification of primary breast cancer revealed by multiregion sequencing
.
Nat Med
2015
;
21
:
751
9
.
14.
Gerlinger
M
,
Rowan
AJ
,
Horswell
S
,
Larkin
J
,
Endesfelder
D
,
Gronroos
E
, et al
Intratumor heterogeneity and branched evolution revealed by multi-region sequencing
.
N Engl J Med
2012
;
366
:
883
92
.
15.
Tao
Y
,
Ruan
J
,
Yeh
S-H
,
Lu
X
,
Wang
Y
,
Zhai
W
, et al
Rapid growth of hepatocellular carcinoma and the driving mutations revealed by cell-population genetic analysis of whole-genome data
.
Proc Natl Acad Sci U S A
2011
;
108
:
12042
7
.
16.
Yachida
S
,
Jones
S
,
Bozic
I
,
Antal
T
,
Leary
R
,
Fu
B
, et al
Distance metastasis occurs late during the genetic evolution of pancreatic cancer
.
Nature
2010
;
467
:
1114
7
.
17.
Navin
N
,
Kendall
J
,
Troge
J
,
Andrews
P
,
Rodgers
L
,
McIndoo
J
, et al
Tumour evolution inferred by single-cell sequencing
.
Nature
2011
;
472
:
90
94
.
18.
Xu
X
,
Hou
Y
,
Yin
X
,
Bao
L
,
Tang
A
,
Song
L
, et al
Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor
.
Cell
2012
;
148
:
886
95
.
19.
Hou
Y
,
Song
L
,
Zhu
P
,
Zhang
B
,
Tao
Y
,
Xu
X
, et al
Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm
.
Cell
2012
;
148
:
873
85
.
20.
Li
Y
,
Xu
X
,
Song
L
,
Hou
Y
,
Li
Z
,
Tsang
S
, et al
Single-cell sequencing analysis characterizes common and cell-lineage-specific mutations in a muscle-invasive bladder cancer
.
GigaScience
2012
;
1
:
1
14
.
21.
Navin
N
,
Hicks
J
. 
Future medical applications of single-cell sequencing in cancer
.
Genome Med
2011
;
3
:
31
.
22.
Nguyen
DX
,
Bos
PD
,
Massague
J
. 
Metastasis: from dissemination to organ-specific colonization
.
Nat Rev Cancer
2009
;
9
:
274
84
.
23.
Giese
A
,
Vjerkvig
R
,
Berens
ME
,
Westphal
M
. 
Cost of migration: invasion of malignant gliomas and implications for treatment
.
J Clin Oncol
2003
;
21
:
1625
36
.
24.
Bernards
R
,
Weinberg
RA
. 
A progression puzzle
.
Nature
2002
;
418
:
823
.
25.
Felsenstein
J
. 
Phylogenies and the comparative method
.
Am Nat
1984
;
125
:
1
15
.
26.
Tajima
F
. 
Statistical method for testing the neutral mutation hypothesis by DNA polymorphism
.
Genetics
1989
;
123
:
585
95
.
27.
Fay
JC
,
Wu
CI
. 
The neutral theory in the genomic era
.
Curr Opin Genet Dev
2001
;
11
:
642
6
.
28.
Yang
Z
,
dos Reis
M
. 
Statistical properties of the branch-site test of positive selection
.
Mol Biol Evol
2011
;
28
:
1217
28
.