Abstract
In a recent article, Wu and colleagues (Nature 2016;529:43–47) review previous studies and present new estimates for the contribution of extrinsic factors to cancer development. The new estimates are generally close to 100%, even for bone and brain cancers that have no known associations with lifestyle and are typically not considered to be preventable. We find that the results of Wu and colleagues are incompatible with previous estimates derived from epidemiological and genetic data. We further argue that their methods are fundamentally flawed because they overlook important effects of tissue type on cancer risk. We therefore conclude that their results give a misleading view of cancer etiology and preventability. Cancer Prev Res; 9(10); 773–6. ©2016 AACR.
Quantifying how much cancer risk may be avoided through changes in lifestyle or environment is important for guiding prevention efforts. In a recent article, Wu and colleagues (1) review previous studies and present new estimates for the contribution of extrinsic factors to cancer development. The authors define extrinsic factors as “environmental factors that affect mutagenesis rates (such as ultraviolet (UV) radiation, ionizing radiation and carcinogens)” that are “highly modifiable and thus preventable.” They derive their new estimates by reanalyzing a previously published data set using two different methods: an “intrinsic risk line” (IRL) and a multistage model (MM). The estimates of extrinsic risk are generally close to 100%, even for bone and brain cancers that have no known associations with lifestyle and are typically not considered to be preventable. We question the reliability of these results for the following reasons: (i) the two methods produce contradictory estimates of extrinsic contributions for individual cancer types, which are also inconsistent with evidence from epidemiological studies and mutational signatures; (ii) the IRL method relies on the biologically implausible assumption that the intrinsic risk per stem cell division is a constant; and (iii) results from the MM model are undermined if one considers plausible, higher mutation rates. These considerations cast doubt on whether the new estimates of Wu and colleagues (1) accurately characterize cancer etiology or preventability.
If the IRL and MM methods of Wu and colleagues (1) were reliable, then we would expect them to produce consistent estimates when applied to the same cancer types. We might also expect them to be consistent with estimates derived from epidemiological studies (Table 1) and from an analysis of mutational signatures (2) cited by Wu and colleagues (1). In fact, the data show that while the estimates derived from prior studies are correlated, the generally much higher IRL and MM estimates are uncorrelated both with each other and with the two other data sets (Fig. 1). These inconsistencies have two potential implications. First, it is possible that the estimates derived from epidemiological studies and analysis of mutational signatures are unreliable, and the true contribution of extrinsic risk is close to 100% for almost all cancer types; however, given the significant correlation of these two independent data sets and our technical concerns over the other two methods (see below), this appears unlikely. Second, at least one, and potentially both, of the methods proposed by Wu and colleagues (1) are unreliable. The latter possibility suggests that a careful examination of the assumptions underlying the two novel methods is warranted.
Categorization of cancer types by relative contributions of extrinsic factors to risk, according to epidemiological data
Cancer types with high (>50%) extrinsic risk . | Reference . |
---|---|
Basal cell carcinoma | 1 |
Colorectal adenocarcinoma | 1, 19 |
Colorectal adenocarcinoma with FAP | * |
Colorectal adenocarcinoma with Lynch syndrome | * |
Duodenum adenocarcinoma with FAP | * |
Esophageal squamous cell carcinoma | 1, 19 |
Head and neck squamous cell carcinoma | 1, 19 |
Head and neck squamous cell carcinoma with HPV-16 | * |
Hepatocellular carcinoma | 1 |
Hepatocellular carcinoma with HCV | * |
Lung adenocarcinoma (smokers) | * |
Melanoma | 1, 19 |
Small intestine adenocarcinoma | 1 |
Thyroid papillary/follicular carcinoma | 1 |
Cancer types with low (<50%) extrinsic risk | |
Acute myeloid leukemia | 19 |
Chronic lymphocytic leukemia | 19 |
Gallbladder non papillary adenocarcinoma | 19 |
Glioblastoma | 21 |
Osteosarcoma | 20 |
Osteosarcoma of the arms | 20 |
Osteosarcoma of the head | 20 |
Osteosarcoma of the legs | 20 |
Osteosarcoma of the pelvis | 20 |
Ovarian germ cell | 19 |
Pancreatic ductal adenocarcinoma (acinar) | 19 |
Cancer types with high (>50%) extrinsic risk . | Reference . |
---|---|
Basal cell carcinoma | 1 |
Colorectal adenocarcinoma | 1, 19 |
Colorectal adenocarcinoma with FAP | * |
Colorectal adenocarcinoma with Lynch syndrome | * |
Duodenum adenocarcinoma with FAP | * |
Esophageal squamous cell carcinoma | 1, 19 |
Head and neck squamous cell carcinoma | 1, 19 |
Head and neck squamous cell carcinoma with HPV-16 | * |
Hepatocellular carcinoma | 1 |
Hepatocellular carcinoma with HCV | * |
Lung adenocarcinoma (smokers) | * |
Melanoma | 1, 19 |
Small intestine adenocarcinoma | 1 |
Thyroid papillary/follicular carcinoma | 1 |
Cancer types with low (<50%) extrinsic risk | |
Acute myeloid leukemia | 19 |
Chronic lymphocytic leukemia | 19 |
Gallbladder non papillary adenocarcinoma | 19 |
Glioblastoma | 21 |
Osteosarcoma | 20 |
Osteosarcoma of the arms | 20 |
Osteosarcoma of the head | 20 |
Osteosarcoma of the legs | 20 |
Osteosarcoma of the pelvis | 20 |
Ovarian germ cell | 19 |
Pancreatic ductal adenocarcinoma (acinar) | 19 |
Risks for cancer types labeled * apply to subpopulations and are more than double the risks among the general population. FAP, familial adenomatous polyposis.
Lack of consistency between different sets of estimates of extrinsic cancer risks. A, comparison between results of the two new analyses by Wu et al. (1), and comparisons of these results with estimates derived from mutational signatures. B, comparisons with estimates derived from epidemiological data, categorized as “low” (less than 50%) or “high” (more than 50%). Each point corresponds to a cancer type. ICC is the intraclass correlation coefficient; r is Pearson correlation coefficient. Epidemiological data are from Extended Data Table 2 in Wu et al. (1) and recent reviews (refs. 19–21; see Table 1); estimates from mutational signatures are from Extended Data Table 3 in Wu et al. (1), derived from Alexandrov et al. (2). Wu et al. (1) also present estimates derived from two-hit and four-hit models of carcinogenesis, which are likewise uncorrelated with the estimates from the IRL method, epidemiological data, and analysis of mutational signatures (data not shown). ICC values were calculated using the irr package for the R programming language (22). In the calculation of ICC values, the “low” and “high” categories were assigned values of 0.25 and 0.75, respectively.
Lack of consistency between different sets of estimates of extrinsic cancer risks. A, comparison between results of the two new analyses by Wu et al. (1), and comparisons of these results with estimates derived from mutational signatures. B, comparisons with estimates derived from epidemiological data, categorized as “low” (less than 50%) or “high” (more than 50%). Each point corresponds to a cancer type. ICC is the intraclass correlation coefficient; r is Pearson correlation coefficient. Epidemiological data are from Extended Data Table 2 in Wu et al. (1) and recent reviews (refs. 19–21; see Table 1); estimates from mutational signatures are from Extended Data Table 3 in Wu et al. (1), derived from Alexandrov et al. (2). Wu et al. (1) also present estimates derived from two-hit and four-hit models of carcinogenesis, which are likewise uncorrelated with the estimates from the IRL method, epidemiological data, and analysis of mutational signatures (data not shown). ICC values were calculated using the irr package for the R programming language (22). In the calculation of ICC values, the “low” and “high” categories were assigned values of 0.25 and 0.75, respectively.
For their IRL analysis, Wu and colleagues (1) used simple linear regression to estimate how cancer risk scales with the lifetime number of stem cell divisions across diverse tissue types, based on the data set of Tomasetti and Vogelstein (3). Underlying their method (and prior analysis by Tomasetti and Vogelstein; ref. 3) is the assumption that the intrinsic risk of a given cancer type over a lifetime can be expressed as
This equation is not an exact description (because risk must eventually saturate at 100%), but it is a reasonable approximation when the risk per stem cell division is much less than 1 and the lifetime risk is less than 10% (which holds for almost all of the data set). It follows that if the risk per stem cell division is the same for a subset of cancer types, then the lifetime risk of each type will simply be proportional to the lscd of the affected tissue. However, in any multistage model of carcinogenesis (i.e., any model requiring more than one mutational hit), the risk per stem cell division is not a constant, because it depends on the extent to which cancer-promoting mutations have accumulated. Given that obtaining a cancer is a rare event, then a simple multistage model requiring M mutational hits predicts that (4)
where C is the number of stem cells in the tissue, u is the probability of mutation at any one of the M sites per stem cell division, and K is the lifetime number of stem cell divisions. Equation 2 can be rewritten as
so that the risk per division in Equation 1 is replaced by (uK)M/K, which is the long-term average risk per division. If M is constant across a subset of tissues and they undergo division at similar rates, then this risk scales linearly with size. For example, because leg bones contain approximately twice as many cells as arm bones, we would expect osteosarcoma to be about twice as common in legs as in arms. Equation 3 implies that for such a subset,
Therefore, assuming the long-term average risk is constant, we would predict an approximately linear relationship between log(lifetime intrinsic risk) and log(lscd) with slope approximately equal to 1. Any deviations above this line would represent the contributions of extrinsic factors. However, if extrinsic factors may affect all of the observed risks then how can we know where to draw the line? One approach is to position the line so that it intercepts the minimum observed risk, in which case deviations from the line correspond to minimum estimates of the contribution of extrinsic factors. What is essential is that, wherever the line is drawn, its slope must be equal to 1, so that it represents the assumed linear relationship between intrinsic cancer risk and lscd. The Wu and colleagues (1) IRL method follows the above procedure except that it estimates the contributions of extrinsic factors to cancer risk by measuring deviations from a regression line that has slope much less than one (Fig. 2A). The latter method has no theoretical basis and its results are therefore unreliable.
Alternative models for explaining variation in cancer risk. For clarity, only part of the Tomasetti and Vogelstein (3) data set is shown. A, Wu et al. (1) assume that a single regression line can be used to quantify intrinsic cancer risk, and any additional risk must be due to extrinsic factors. B, if we allow that the intrinsic cancer risk per stem cell division varies between tissues, then it is invalid to draw only a single regression line through the data for diverse cancer types. Instead, separate regression lines must be drawn for each cancer type or for sets of cancer types with similar features (such as anatomical site, ref. 6). The vertical displacement of these regression lines corresponds to the tissue effect, which may be due to a combination of intrinsic factors (such as the number of mutational “hits” necessary to initiate cancer) and extrinsic factors (such as diet or smoking).
Alternative models for explaining variation in cancer risk. For clarity, only part of the Tomasetti and Vogelstein (3) data set is shown. A, Wu et al. (1) assume that a single regression line can be used to quantify intrinsic cancer risk, and any additional risk must be due to extrinsic factors. B, if we allow that the intrinsic cancer risk per stem cell division varies between tissues, then it is invalid to draw only a single regression line through the data for diverse cancer types. Instead, separate regression lines must be drawn for each cancer type or for sets of cancer types with similar features (such as anatomical site, ref. 6). The vertical displacement of these regression lines corresponds to the tissue effect, which may be due to a combination of intrinsic factors (such as the number of mutational “hits” necessary to initiate cancer) and extrinsic factors (such as diet or smoking).
A more fundamental problem is that the IRL method (similar to that of Tomasetti and Vogelstein, ref. 3) relies on the assumption that the long-term average risk per division is the same among all cancer types. This assumption is false, because the risk varies considerably depending on tissue organization (5), number of mutational “hits” necessary to initiate cancer (M), the rate of stem cell division, and other intrinsic biological factors (4). As noted above, whenever the number of mutational hits (M) required to induce cancer is greater than one, then cancer risk scales linearly with cell number, but is more strongly related to the number of divisions (to the power M; ref. 4). As a result, any simple regression method cannot discriminate extrinsic from intrinsic effects across diverse cancer types, and estimates derived in this way are overestimates, because variation interpreted as extrinsic risk (Fig. 2A) can disappear when the data are subdivided into comparable groups (Fig. 2B), as pointed out in recent critiques of Tomasetti and Vogelstein's analysis (6–10). Unreliability of the IRL method would explain the apparently anomalous results of Wu and colleagues (1), such as the inconsistent estimates for bone cancer (for example, approximately 0% estimated extrinsic contribution to the risk of arm, head and pelvis osteosarcoma but >77% for leg osteosarcoma).
In their second analysis, Wu and colleagues (1) estimated extrinsic contributions to cancer risk using a mathematical model of multistage carcinogenesis and the same previously published data set (3). They concluded that the risks predicted by the multistage model are much lower than observed cancer rates, and therefore extrinsic factors must make a very large contribution. This analysis is based on two assumptions: first, that all tissues require the same number of driver mutations to initiate cancer (i.e., M is constant across all tissues), and second that the effective somatic mutation rate is between 10−10 and 10−6 per stem cell division. The first assumption is known to be false (e.g., comparing retinoblastoma, where M = 2, to the cancers of larger tissues, where M is larger; ref. 11). At first sight, the second assumption appears plausible; however, higher mutation rates may be more realistic given that (i) cancer-promoting mutations per driver gene are generally considered to be at the high end of this range (as in recent literature, refs. 12 and 13), and (ii) some driver mutations result in a burst of cell proliferation leading to clonal expansion (14). Whereas more complex models are required to examine the effect of clonal expansion in detail (for example, refs. 15–17), it can more simply be approximated as an increase in the “effective” mutation rate (4). With these concerns in mind, Nunney and Muir (7) found the same data set to be consistent with a similar multistage model provided that M was allowed to vary among tissue types and was estimated from the data. As expected, their analysis indicated a higher effective driver mutation rate of about 8 × 10−6 per stem cell division. Of course, this higher mutation rate could be partly the result of extrinsic factors, but in its present form the data do not permit us to distinguish between intrinsic and extrinsic effects. As a result, we cannot assign percent causality between these two sources, or even exclude the possibility that the variation in risk is largely explained by intrinsic differences between tissues, such as the number of mutational hits required to initiate cancer (7).
Wu and colleagues' (1) attribution of nearly all cancer risk to “highly modifiable and thus preventable extrinsic factors” demands scrutiny because of its implications for prevention strategies. We agree that epidemiological studies (18, 19) and analyses of mutational signatures (2) suggest that extrinsic factors contribute substantially to the risk of many cancer types. We also share concerns about the methodology underlying the recent very different estimates of Tomasetti and Vogelstein (3). However, the methods of Wu and colleagues (1) similarly do not account for the effects of intrinsic, tissue-dependent factors that could explain much of the observed variation in cancer risk, and the resulting estimates are therefore likewise questionable. Both Wu and colleagues (1) and Tomasetti and Vogelstein (3) also do not consider the realistic possibility that within a given cell or tumor, some mutational “hits” may be ascribed to the environment, whereas others may be due to bad luck, and extrinsic factors may further influence the selective value of mutations by changing the microenvironment (17). Taken together, this suggests that estimating relative contributions, even for particular cancer types, will be a daunting challenge. We argue that beyond an improved knowledge of transformation pathways toward specific cancers, progress will require a greater understanding of how mutational processes undermine evolved cancer prevention and suppression in different cell types, and how they interact with tumor microenvironments and our external environments and lifestyles.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: R. Noble, M.E. Hochberg, L. Nunney
Development of methodology: R. Noble, L. Nunney
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R. Noble
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R. Noble, O. Kaltz
Writing, review, and/or revision of the manuscript: R. Noble, O. Kaltz, L. Nunney, M.E. Hochberg
Study supervision: M.E. Hochberg
Grant Support
M.E. Hochberg acknowledges support from the Agence National de la Recherche (EvoCan ANR-13-BSV7-0003-01), L'institut thematiques multi-organismes cancer (‘Physique Cancer’; CanEvolve PC201306), and the Institut National du Cancer (2014-1-PL BIO-12-IGR-1).