Background:

Incidence of early-onset colorectal cancer (EOCRC; e.g., diagnosed before age 50) in the United States has increased substantially since the 1990s but the underlying reasons remain unclear.

Methods:

We examined the ecologic associations between dietary factors and EOCRC incidence in adults aged 25–49 during 1977–2016 in the United States, using negative binomial regression models, accounting for age, period, and race. The models also incorporated an age-mean centering (AMC) approach to address potential confounding by age. We stratified the analysis by sex and computed incidence rate ratio (IRR) for each study factor. Study factor data (for 18 variables) came from repeated national surveys; EOCRC incidence data came from the Surveillance Epidemiology, and End Results Program.

Results:

Results suggest that confounding by age on the association with EOCRC likely existed for certain study factors (e.g., calcium intake), and that AMC can alleviate the confounding. EOCRC incidence was positively associated with smoking [IRR (95% confidence interval (CI): 1.17 (1.10–1.24) for men; 1.15 (1.09–1.21) for women] and alcohol consumption [IRR (95% CI), 1.08 (1.04–1.12) for men; 1.08 (1.04–1.11) for women]. No strong associations were found for most other study factors (e.g., fiber and calcium).

Conclusions:

Alcohol consumption was positively associated with EOCRC and has increased among young adults since the 1980s, which may have contributed to the EOCRC incidence increases since the 1990s. The AMC approach may help alleviate age confounding in similar ecologic analyses.

Impact:

Increases in alcohol consumption may have contributed to the recent increases in colorectal cancer incidence among young adults.

See related commentary by Ni et al., p. 164

This article is featured in Highlights of This Issue, p. 155

Recent studies have showed increases in early-onset colorectal cancer (EOCRC, e.g., those diagnosed before age 50) incidence in the United States since roughly the 1990s (1–3). Studies have also projected a further increase in EOCRC incidence (e.g., >90% higher by 2030 compared with 2010; ref. 4), if this trend continued. Thus, accurate identification of modifiable risk factors of EOCRC is urgently needed to inform effective prevention in younger adults.

While there is a body of risk factor research on colorectal cancer primarily based on cases 50 years and older (5, 6), important research gaps on EOCRC exist. Exposure during early life and critical development period are widely believed to be important in EOCRC development (7, 8), yet studies of such exposure are largely absent. In typical cohorts, exposure measurements start in the 40s, the age of cohort recruitment. Moreover, risk factors of EOCRC and older cases may differ. Compared with older colorectal cancer cases, EOCRC is associated with more aggressive pathology and late diagnosis (7, 8). As such, current risk-classification tools based on family history and inflammatory bowel disease could wrongly classify many EOCRC cases as average risk, resulting in late diagnosis (7). It is also challenging to study risk factors of EOCRC using traditional cohort and/or nested case–control designs. Because the absolute EOCRC risk is relatively low, prohibitively large sample sizes would be needed to provide sufficient statistical power. For example, assuming an incidence rate as that among U.S. women aged 25–49 during 2011–2016 (i.e., 12.9/100,000; ref.9), to observe 500 cases over five years, a cohort of 0.78 million would be needed.

Given the above research gap and challenges, we conducted an ecologic analysis to examine the association of EOCRC incidence with a range of dietary factors, which are of major interest in EOCRC etiology and amenable to public health interventions. We focused on the U.S. population aged 25–49 [i.e., age groups shown to experience substantial EOCRC incidence increases (2, 3, 10, 11)] during 1977–2016. We also proposed a set of regression models to address two common challenges in similar ecologic analyses (i.e., time lag from exposure to disease and confounding by age). The proposed ecologic approach allows efficient and low-cost investigations of various exposures at different life stages and could be used to study other early-onset cancers with similar rapid increases in recent decades (10).

Study design

In a previous study, Pfeiffer and colleagues examined ecologic associations between concurrent exposures and breast cancer incidence in the United States across population groups defined by age, period, race, and sex (12). Here, to further account for the potential latent period from exposure to cancer diagnosis, we propose two strategies: regress the outcome on (i) the exposure 10 years ago (equivalent to lagging the outcome) or (ii) the cumulative exposure over the 10 years before the outcome.

Another challenge in our study is potential confounding by age, when both the outcome (here, cancer incidence) and exposure can be associated with age and sometimes in opposite directions (see, e.g., fat intake in Fig. 1A). For such exposures, including age as a covariate in the model may not be able to handle the discordant association of age with the outcome versus exposure. Moreover, for exposures with similar positive association with age as for colorectal cancer, residual age confounding is also possible. To address this, we propose an age-mean centering (AMC) approach. Briefly, we remove the association between an exposure and age, by subtracting the age-specific mean exposure for each age, and use these age-removed exposure data in the models (see details below). In so doing, we decouple the age association with the exposure and allow the covariate age to account for its association with the outcome alone (Fig. 1). This approach is similar to a strategy in behavioral sciences that disaggregates between-person and within-person effect (13). Here, we tested five regression models combining the above strategies.

Study factors

Data source

We obtained study factor data from the National Health and Nutrition Examination Surveys (NHANES; ref. 14), the National Health Interview Surveys (NHIS; ref. 15), and the Behavioral Risk Factor Surveillance System (BRFSS; ref. 16). These three programs conduct repeated national cross-sectional studies in the United States over several decades (see Table 1 for survey designs and included survey cycles; refs. 17–19). We included the following dietary factors: smoking, the intake of alcohol, tea, coffee, caffeine, whole fruit, fruit juice, total fruit (whole fruit and fruit juice combined), cholesterol, protein, fiber, calcium, magnesium, fat, saturated fat, total energy, and carbohydrate, and serum folate (see Supplementary Table S1 for the availability of the study factors in the surveys and sample sizes; see Supplementary Table S2 for the measurements). Further details on compiling study factor data and handling of periods with no data are described in the Supplementary Methods and Supplementary Figs. S1 and S2.

Computing study factor levels

We harmonized study factor data from the different surveys and computed the weighted prevalence for each study factor for each population group defined by age, period, race, and sex (20, 21). The study population (whites and blacks aged 25–49 during 1977–2016) was divided into 160 subgroups: Five 5-year age groups (25–29, 30–34, …, 45–49) × eight 5-year periods (1977–1981, 1982–1986, …, 2012–2016) × two race groups (whites and blacks) × two sexes (men and women). In addition, we computed weighted prevalence for the population groups aged 20–24 and during 1972–1976 for use in the lagged or cumulative models. See Supplementary Table S3 for specific age and period groups used in each model. Because of small sample sizes, we did not include races other than whites and blacks; we also did not stratify by ethnicity (Hispanic/non-Hispanic), as such information was unavailable from the surveys (e.g., NHANES I and II) or cancer surveillance programs (see below) for earlier periods.

For the no-lag and lagged models (see below), all exposures were categorized by quintiles, as done in Pfeiffer and colleagues (12). The quintiles were determined on the basis of all population groups (i.e., 80 subgroups for men/women when data were complete). For the other three models (AMC no-lag, AMC lagged, and AMC cumulative; see below), exposures were analyzed as continuous variables, because the AMC-processed exposures no longer spanned a wide range of quintile categories for different age groups and could lead to unstable model estimates using quintiles.

EOCRC incidence

We obtained EOCRC incidence data from the Surveillance Epidemiology, and End Results (SEER) Program using SEER*STAT (9, 22, 23). To match with the exposures, the EOCRC incidence data were aggregated to the same 160 groups specified above. As the coverage of SEER expanded over time, we used SEER data in two ways. In the main analysis, we used SEER 9, which included nine registries, covered 9.4% of the U.S. population, and provided EOCRC incidence throughout our study period (1977–2016). As a sensitivity analysis, we combined SEER 9 with SEER 13 (13 registries; 13.5% coverage; 1992–2016) and SEER 18 (18 registries; 27.8% coverage; 2000–2016). The SEER program, albeit covering a subset of the U.S. population, is representative of the general U.S. population (24); in addition, SEER started in 1973, earlier than many other national cancer surveillance programs (vs. e.g., the National Program of Cancer Registries starting in 1992).

Statistical analysis

Using the population groups defined above, we applied five negative binomial regression models to examine the association between EOCRC incidence and each study factor for men and women, separately.

No-lag model

The no-lag model is similar to the model of Pfeiffer and colleagues (12) but differs in two ways: (i) we included race as a covariate in order to include subgroups from both races (whites and blacks) in the same model to increase sample sizes; (ii) we considered age–period interaction. The no-lag model equation is:
formula
where |${\lambda }_{a,p,r}$| is the expected EOCRC incidence rate for age |$a$|⁠, period |$p$|⁠, and race |$r$|⁠. |${\beta }_a,\ {\gamma }_p,\ {\pi }_r$|⁠, and |${\delta }_{a,p}$| represent the effects of age, period, race, and age-period interaction, respectively; |$\mu $| is the intercept. |${\delta }_{a,p}$| was included only when the term was significant (P < 0.05) in a model including age, period, race, and interaction terms for all age-period combinations. |${Z}_{a,p,r,q}$| is an indicator variable that is 1 for age a, period p, race r, and exposure quintile q, and 0 otherwise; |$a$| ( |$a\ = \ 0,1, \ldots ,A$|⁠) represents the five 5-year age groups: 25–29, …, 45–49; |$p$| ( |$p\ = \ 0,1, \ldots ,P$|⁠) represents the eight 5-year periods: 1977–1981, …, 2012–2016; |$r$| ( |$r\ = \ 0,1$|⁠) represents white and black; and |$q$| ( |$q\ = \ 0,1,2,3,4$|⁠) represents the quintile of exposure.

Lagged model

The lagged model used the same structure as Eq (A), except that |${Z}_{a,p,r,q}$| was replaced by |${Z}_{a - 2,p - 2,r,q}$|⁠. That is, the exposure occurred 10 years before EOCRC diagnosis (i.e., two 5-year periods ago, hence p-2 in the subscript) when the EOCRC cases were 10 years younger (i.e., two 5-year age intervals ago, hence a-2). The 10-year lag was chosen, given the likely induction time (5) and data availability (note the youngest age group, i.e., 25–29, can no longer be included due to a lack of earlier measurements; details in Supplementary Table S3). In addition, we also tested models with a 5-year or 15-year lag to explore pattern across different lags.

AMC no-lag model

We used AMC to address potential confounding by age. For each study factor, we calculated the age-mean centered exposure per:
formula
where |${Z}_{a,p,r}$| represents the exposure (on continuous scales) for age |$a$|⁠, period |$p$|⁠, and race |$r$|⁠; |$\overline {{Z}_{a,r}} $| is the mean of |${Z}_{a,p,r}$| for age |$a$| and race |$r$| across all study periods. By subtracting the age-, race-specific mean, the residuals, |${R}_{a,p,r}$|⁠, would still retain the time trend, which is of interest here, but remove the association with age (Fig. 1). The AMC no-lag model equation is:
formula
using the same notations as Eqs (A) and (B). Of note, for all three AMC models (the AMC no-lag model here and the others below) where continuous exposures were used, we standardized the age-removed exposure (mean = 0; SD = 1) before regression, which allows comparison of the estimates across different study factors and models.

AMC lagged model

The AMC lagged model extends the AMC no-lag model to include the time-lag from exposure to cancer diagnosis. The AMC lagged model equation is the same as Eq (C) except that |${R}_{a,p,r}$| is replaced by |${R}_{a - 2,p - 2,r}$|⁠.

AMC cumulative model

The AMC cumulative model uses exposures summed over the 10 years before cancer diagnosis. The model equation is the same as Eq (3) except that |${R}_{a,p,r}$| is replaced by |${R}_{a - 1,p - 1,r} + {R}_{a - 2,p - 2,r}$|⁠.

Examine the association between age and each study factor

For men and women, separately, we regressed each study factor upon age using 12 groups defined by age and race: six 5-year age groups (20–24, 25–29, …45–49) × two race groups (whites and blacks).

Assess the association between each study factor and EOCRC

All models estimated the incidence rate ratio (IRR) of EOCRC in relation to each study factor, including the mean, 95% confidence interval (95% CI), and P value (see Table 2 and Supplementary Table S4). In addition, we used the Bayesian information criterion (BIC) to assess the strength of estimated associations (25). Specifically, for each study factor and model (one of the five described above), we also tested a corresponding null model with all covariates but the study factor. We calculated the BIC for both models and the diffidence ∆BIC = BIC0-BICf (BICf for the full model including the study factor and BIC0 for the null model). ∆BIC>0 indicates the EOCRC data are better explained when the study factor is included, thus supporting the association between the study factor and EOCRC. The evidence was deemed weak, positive, strong, and very strong for ∆BICs in the ranges of 0–2, 2–6, 6–10, and >10, respectively (25). ∆BIC<0 implies an absence of such evidence. All data processing and analyses were conducted using R (https://www.r-project.org).

Method validation

To test the models, we performed two sets of model validation. First, we tested the models on model-generated synthetic data, for which the underlying associations are known and thus can be compared with model estimates. Second, we applied the models to older age groups (i.e., 35–59-year-olds) and a subset of well-studied exposures (smoking, alcohol consumption, and calcium intake; refs. 6, 26). For details, see Supplementary Methods, Supplementary Tables S5–S7, and Supplementary Figs. S3–S5.

Data availability statement

The study factor data are publicly available at the websites of NHANES (14), NHIS (15), and BRFSS (16). The EOCRC incidence data are available at the SEER website (9, 22, 23).

Method validation

As detailed in the Supplementary Methods, synthetic testing showed that both the lagged and AMC-lagged models were able to accurately identify the true direction of association in most tests (overall accuracy: 79% and 80% by the lagged and AMC-lagged models, respectively; Figures S4-S5). When the association between EOCRC and exposure was close to the null (i.e., IRRs close to 1), the AMC-lagged model was more accurate than the lagged model (71% vs. 65% accuracy; Supplementary Fig. S5), suggesting the AMC approach may alleviate potential biases to more accurately estimate the true association. Furthermore, model results for those aged 35–59 were generally consistent with findings in the literature (i.e., positive associations of CRC with smoking and alcohol consumption and a negative association with calcium intake, primarily based on cases 50 years and older (6, 26)); see the red cells (representing positive association) for smoking and alcohol and blue cells (negative association) for calcium in Supplementary Fig. S6 and Supplementary Table S7 for specific estimates.

Effect of AMC on estimated associations

We designed AMC to address potential age confounding between study factors and EOCRC. In multiple instances, changes of estimated associations after AMC were consistent with the expected. For instance, as calcium intake decreased with age (Supplementary Fig. S7) while CRC increased with age, age confounding could bias the estimated association between calcium intake and EOCRC towards the negative (Supplementary Table S5). Indeed, without removing the negative association between calcium intake and age, the no-lag and lagged models estimated negative associations (see blue cells in Fig. 2) with larger ∆BICs (∆BIC>6 except for men using the lagged model; Table 2), indicating stronger evidence for this association. In comparison, the AMC models, designed to remove the age association with calcium intake, generally estimated negative associations with lower ∆BICs, indicating weaker evidence for this association.

For tea, coffee, and caffeine, intake generally increased with age (Supplementary Fig. S7), which could nudge the estimated association with EOCRC towards the positive (Supplementary Table S5). Indeed, without removing the age association with these exposures, in multiple instances, the no-lag and lagged models estimated positive associations for these exposures (see red cells in Fig. 2). In contrast, with AMC, the models in general estimated negative or no association (see light blue or white cells in Fig. 2).

Association between study factors and EOCRC

For smoking, the models found a positive association with EOCRC for both men and women (Fig. 2). For 25–49-year-old men, the no-lag model estimated that IRRs were 1.11 (95% CI, 0.95–1.29) and 1.26 (95% CI, 1.04–1.53) for the top two quintiles (Table 2). When smoking prevalence 10 years before EOCRC diagnosis was used, the lagged model estimated that IRRs increased from 1.12 (95% CI, 1.02–1.24) for the second quintile to 1.33 (95% CI, 1.14–1.55) for the fifth. Consistently, estimated IRRs were 1.14 (95% CI, 1.05–1.23) per the AMC no-lag, 1.17 (95% CI, 1.10–1.24) per the AMC lagged, and 1.20 (1.13–1.29) per the AMC cumulative models. For the three AMC models, comparison with the corresponding null models showed strong to very strong support for this association (∆BIC ranged from 6.1 to 24.9; Table 2).

For alcohol consumption, the models also generally found a positive association with EOCRC for both young men and women (Fig. 2). For 25–49-year-old men, the lagged model estimated the IRRs increased from 1.10 (95% CI, 0.97–1.24) for the second quintile to 1.28 (95% CI, 1.13–1.46) for the fifth; for the AMC lagged and AMC cumulative models, estimated IRRs were 1.08 (95% CI, 1.04–1.12) and 1.06 (95% CI, 1.03–1.09), respectively (Table 2). The three models incorporating the time-lag also outperformed their corresponding null models (∆BICs ranged from 8.1 to 14.5; Table 2), further supporting the association. Models without the time-lag generally found no association for alcohol consumption (except the AMC no-lag model for men).

For the intake of whole fruit, fruit juice, and total fruit, the estimated associations with EOCRC tend to be negative, but the overall evidence was not strong (Fig. 2). For the intake of cholesterol, protein, fiber, and magnesium, the estimated associations with EOCRC were either nonsignificant or inconsistent across different models for meaningful interpretation (Fig. 2).

The estimated associations between a few study factors and EOCRC were unexpected: negative associations for fat, total energy, and carbohydrate intake, and a positive association for serum folate (Fig. 2).

Model results using EOCRC data combining SEER 9, 13, and 18 were similar to those above using SEER 9 data alone (Supplementary Fig. S8; Supplementary Table S8). Results from models using different lags are also similar to the main analyses using a 10-year lag; we did not find any clear pattern (Supplementary Fig. S9) except for alcohol, for which the IRRs were the largest with a 10-year lag.

To explore reasons underlying the recent increases in EOCRC incidence, we have examined the ecologic association between EOCRC and 18 dietary factors. Given the ecologic nature of the study, model results represent a first assessment to generate hypotheses regarding potential risk factors to inform more in-depth investigation. Overall, we found that smoking and alcohol consumption starting in young adulthood were positively associated with EOCRC. While these exposures are long-established carcinogens for many cancers including colorectal cancer (26, 27), most studies are based on older populations and mid to late life exposure (26, 28). Given the likely long induction time (5), our findings suggest that primary prevention strategies for EOCRC, which are urgently needed, should incorporate tobacco and alcohol control measures targeting younger populations. The findings also suggest smoking and alcohol consumption may be important risk factors for identifying young adults for early screening and detection of EOCRC in clinical settings.

The contributions of smoking and alcohol consumption to the recent increases in EOCRC, however, likely differ. As shown in Fig. 3, smoking prevalence has been decreasing significantly in recent decades (see details of the break-point trend analysis in the Supplementary Methods), suggesting changes in smoking are likely not the reason behind the recent EOCRC increases. In contrast, alcohol consumption decreased significantly from 1971 to around 1980, consistent with the decrease of EOCRC incidence from 1973 to the early 1990s; alcohol consumption then increased since the 1980s, albeit not statistically significant, followed by the increases in EOECR incidence since the 1990s (Fig. 3). These lagged, concordant trends of alcohol consumption and EOCRC incidence resemble the parallel trends in smoking and lung cancer that have strongly supported smoking as a main cause of lung cancer (26). Consistently, using the approach in Fig. 3 of Pfeiffer and colleagues (12), we showed that, compared with the adjusted EOCRC incidence setting alcohol consumption at the lowest quintile, for both men and women, the observed EOCRC incidence was higher from 1992 onwards and the gap reached the maximum during recent periods (e.g., 2012–2016), when alcohol consumption levels were the highest (see Supplementary Fig. S10 and details in Supplementary Methods). Given these analyses, we hypothesize that increase in alcohol consumption is a key contributor to the recent EOCRC incidence increases. Further investigation is warranted while teasing out the effect of other potential risk factors.

We found some, albeit weak evidence for negative associations of caffeine, whole fruit, fruit juice, and total fruit intake with EOCRC (18/24 of the IRRs in the range of 0.95–0.99 after AMC). The literature on biological effects of these dietary factors also suggests negative associations (29–31). More in-depth investigation into the potential role of fruit and caffeine using stronger epidemiologic designs may thus prove fruitful for EOCRC prevention.

For fiber, calcium, and magnesium intake, we found either no or weak negative association with EOCRC. In contrast, epidemiologic studies among older adults suggest these nutrients are protective against colorectal cancer (6, 32). For instance, an umbrella review of meta-analyses of cohort studies found convincing evidence for a negative association of colorectal cancer with fiber and calcium intake, separately, and some evidence of a negative association with magnesium (6). Unlike previous studies using cohorts, we used aggregated population-level data, due to the challenges studying EOCRC as noted in the Introduction. This ecologic design may be less powered to identify milder risk factors, especially for younger population (e.g., aged 25–49 here). Moreover, unlike other study factors (e.g., smoking), fiber and magnesium data were unavailable during 1972–1987, further reducing the sample sizes and statistical power. Nonetheless, the direction of our estimates for calcium and magnesium (see the blue cells indicating negative associations in Fig. 2) is consistent with previous findings.

Importantly, we note that fiber, calcium, and magnesium intake among blacks were significantly lower than those among whites (P < 0.001, paired t test; Supplementary Figs. S11–S13), and also considerably lower than the recommended levels per current dietary guidelines (33). Supporting the disparities in intake of these nutrients and potential impact on EOCRC, models including these nutrients partly explained the higher incidence for blacks than whites (e.g., estimated IRRs for black compared with white men: 1.03–1.16 vs. 1.19–1.25 using the lagged model with vs. without one of these nutrients; Supplementary Table S9). Given the higher EOCRC among blacks and multiple health benefits of these nutrients, these findings suggest increasing the intake of these nutrients may help mitigate EOCRC risk among blacks.

Some of our findings were at odds with the literature. In particular, for colorectal cancer, past studies found positive associations with high fat diet and total energy intake (32, 34, 35), no association with carbohydrate intake (36), and negative associations with folate intake (37, 38). Model estimates here were inconsistent with these previous findings, particularly for young men, which highlights limitations in this ecologic analysis. Nonetheless, we note that while EOCRC increased during the latter part of our study period (from 1990s onwards), fat intake had been decreasing among young men (Supplementary Fig. S14). Similar time-trends were observed for total energy and carbohydrate (Supplementary Figs. S15 and S16). These trends suggest that, at the population level, the changes in fat, total energy, and carbohydrate intake are likely not associated with the recent increases in EOCRC. For folate intake, serum folate concentration increased during 1987–2016 likely due to the folic acid fortification program implemented in 1998 (ref. 39; Supplementary Fig. S17; see Supplementary Table S2 for reasons for excluding earlier serum folate data); this coincided with the increases in EOCRC during the time period. The positive association between serum folate and EOCRC may have been an artefact of such concurrent changes. We thus caution the above limitations, even though ecologic studies could be invaluable in examining potential risk factors taking advantage of long-term population data. Furthermore, we advocate for comprehensive result interpretation combining ecologic modeling results, findings from the literature, and careful inspection of underlying data, as demonstrated here.

We note several study limitations, apart from the ecologic design. First, while our analysis included cigarette smoking, other forms of tobacco consumption were not included due to a lack of long-term data. For example, e-cigarettes have gained popularity among youth and young adults in the United States in the 2010s. The potential impacts of such exposure, particularly during critical development periods, warrant future investigations. Second, due to challenges in converting and harmonizing intake of various vegetable items (e.g., inconsistent classification/inclusion schemes and definitions of serving size; refs. 40, 41), we were unable to analyze the association of EOCRC with total vegetable intake. Third, this study focused on testing the proposed methods and dietary factors. Future work will extend to non-dietary factors, including those that have been found to affect colorectal cancer risks among older adults (e.g., body weight and physical exercise; ref. 5). Fourth, this study estimated the marginal effect of each study factor, as done in Pfeiffer and colleagues (12) Future work considering potential interactions among various study factors is under way. Fifth, while our models accounted for and estimated the age and period effect, to incorporate the risk factor data and estimate their associations with EOCRC, the models were not formulated as conventional age-period-cohort models (42) to enable estimation of birth cohort effect.

In sum, we found that alcohol consumption was strongly associated with EOCRC incidence and has increased since the 1980s, which may have contributed to recent EOCRC increases among U.S. adults aged 25–49. We have also proposed an AMC approach, which may be applied in ecologic studies of risk factors and other diseases where large-cohort data are unavailable.

J. Chen reports grants from Data Science Institute and Irving Institute for Cancer Dynamics Seed Funds Program at Columbia University and grants from NCI during the conduct of the study. I.L. Zhang reports grants from Data Science Institute and Irving Institute for Cancer Dynamics Seed Funds Program and grants from NCI during the conduct of the study. W. Yang and M.B. Terry report grants from the Data Science Institute and Irving Institute for Cancer Dynamics Seed Funds Program at Columbia University and NCI during the conduct of the study.

J. Chen: Resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. I.L. Zhang: Resources, data curation, software, formal analysis, validation, investigation, visualization, writing–original draft, writing–review and editing. M.B. Terry: Conceptualization, methodology, writing–review and editing. W. Yang: Conceptualization, resources, data curation, supervision, funding acquisition, investigation, methodology, project administration, writing–original draft.

This study was supported by the Data Science Institute and Irving Institute for Cancer Dynamics Seed Funds Program at Columbia University and the NCI (grant number: R01CA257971).

The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

1.
Siegel
RL
,
Fedewa
SA
,
Anderson
WF
,
Miller
KD
,
Ma
J
,
Rosenberg
PS
, et al
.
Colorectal cancer incidence patterns in the United States, 1974–2013
.
J Natl Cancer Inst
2017
;
109
:
djw322
.
2.
Kehm
RD
,
Lima
SM
,
Swett
K
,
Mueller
L
,
Yang
W
,
Gonsalves
L
, et al
.
Age-specific trends in colorectal cancer incidence for women and men, 1935–2017
.
Gastroenterology
2021
;
161
:
1060
2
.
3.
Yang
W
,
Kehm
RD
,
Terry
MB
.
Survival model methods for analyses of cancer incidence trends in young adults
.
Stat Med
2020
;
39
:
1011
24
.
4.
Bailey
CE
,
Hu
CY
,
You
YN
,
Bednarski
BK
,
Rodriguez-Bigas
MA
,
Skibber
JM
, et al
.
Increasing disparities in the age-related incidences of colon and rectal cancers in the United States, 1975–2010
.
JAMA Surg
2015
;
150
:
17
22
.
5.
Brenner
H
,
Kloor
M
,
Pox
CP
.
Colorectal cancer
.
Lancet
2014
;
383
:
1490
502
.
6.
Veettil
SK
,
Wong
TY
,
Loo
YS
,
Playdon
MC
,
Lai
NM
,
Giovannucci
EL
, et al
.
Role of diet in colorectal cancer incidence: umbrella review of meta-analyses of prospective observational studies
.
JAMA Netw Open
2021
;
4
:
e2037341
.
7.
Stoffel
EM
,
Murphy
CC
.
Epidemiology and mechanisms of the increasing incidence of colon and rectal cancers in young adults
.
Gastroenterology
2020
;
158
:
341
53
.
8.
Hofseth
LJ
,
Hebert
JR
,
Chanda
A
,
Chen
H
,
Love
BL
,
Pena
MM
, et al
.
Early-onset colorectal cancer: initial clues and current views
.
Nat Rev Gastroenterol Hepatol
2020
;
17
:
352
64
.
9.
Surveillance, Epidemiology, and End Results. Surveillance, Epidemiology, and End Results (SEER) Program. SEER*Stat Database: Incidence - SEER Research Data, 9 Registries, Nov 2020 Sub (1975–2018) - Linked To County Attributes - Time Dependent (1990–2018) Income/Rurality, 1969–2019 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2021, based on the November 2020 submission [cited 2022 Sep 14].
Available from
: https://seer.cancer.gov/data-software/.
10.
Yang
W
,
Terry
MB
.
Do temporal trends in cancer incidence reveal organ system connections for cancer etiology?
Epidemiology
2020
;
31
:
595
8
.
11.
Kehm
RD
,
Yang
W
,
Tehranifar
P
,
Terry
MB
.
40 years of change in age- and stage-specific cancer incidence rates in US women and men
.
JNCI Cancer Spectr
2019
;
3
:
pkz038
.
12.
Pfeiffer
RM
,
Webb-Vargas
Y
,
Wheeler
W
,
Gail
MH
.
Proportion of U.S. trends in breast cancer incidence attributable to long-term changes in risk factor distributions
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
1214
22
.
13.
Curran
PJ
,
Bauer
DJ
.
The disaggregation of within-person and between-person effects in longitudinal models of change
.
Annu Rev Psychol
2011
;
62
:
583
619
.
14.
Centers for Disease Control and Prevention (CDC)
. National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention [cited 2022 Sep 14].
Available from
: https://wwwn.cdc.gov/nchs/nhanes/default.aspx.
15.
Blewett
LA
,
Drew
JAR
,
King
ML
,
Williams
KCW
. IPUMS Health Surveys: National Health Interview Survey, Version 6.4. Minneapolis, MN: IPUMS; 2019.
Available from
: https://nhis.ipums.org/nhis-action/variables/group.
16.
Centers for Disease Control and Prevention (CDC)
.
Behavioral Risk Factor Surveillance System Survey Data. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention.
Available from
: https://www.cdc.gov/brfss/.
17.
Centers for Disease Control and Prevention (CDC)
.
National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Examination Protocol. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention.
Available from
: https://www.cdc.gov/nchs/nhanes/index.htm.
18.
NHIS sample design. IPUMS
.
Available from
: https://nhis.ipums.org/nhis/userNotes_sampledesign.shtml.
19.
Centers for Disease Control and Prevention (CDC)
. Behavioral Risk Factor Surveillance System. Survey Data & Documentation. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention.
Available from
: https://www.cdc.gov/brfss/data_documentation/index.htm.
20.
Centers for Disease Control and Prevention (CDC)
. National Health and Nutrition Examination Survey. Tutorial Module 3: Weighting. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention.
Available from
: https://wwwn.cdc.gov/nchs/nhanes/tutorials/Module3.aspx.
21.
Korn
EL
, Graubard BI.
Chapter 8: Analyses using multiple surveys. In: Analysis of health surveys
.
Hoboken, NJ
:
John Wiley & Sons, Inc
;
1999
. p.
278
84
.
22.
Surveillance, Epidemiology, and End Results (SEER) Program. SEER*Stat Database: Incidence - SEER Research Data, 13 Registries, Nov 2020 Sub (1992–2018) - Linked To County Attributes - Time Dependent (1990–2018) Income/Rurality, 1969–2019 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2021, based on the November 2020 submission [cited 2022 Sep 14].
Available from
: https://seer.cancer.gov/data-software/.
23.
Surveillance, Epidemiology, and End Results (SEER) Program. SEER*Stat Database: Incidence - SEER Research Data, 18 Registries, Nov 2020 Sub (2000–2018) - Linked To County Attributes - Time Dependent (1990–2018) Income/Rurality, 1969–2019 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2021, based on the November 2020 submission [cited 2022 Sep 14].
Available from
: https://seer.cancer.gov/data-software/.
24.
Duggan
MA
,
Anderson
WF
,
Altekruse
S
,
Penberthy
L
,
Sherman
ME
.
The surveillance, epidemiology, and end results (SEER) program and pathology: toward strengthening the critical relationship
.
Am J Surg Pathol
2016
;
40
:
e94
e102
.
25.
Raftery
AE
.
Bayesian model selection in social research
.
Sociol Methodol
1995
;
25
:
111
63
.
26.
The Health Consequences of Smoking-50 Years of Progress: A Report of the Surgeon General
.
Atlanta, GA
:
Centers for Disease Control and Prevention
;
2014
. p.
197
203
.
27.
Baan
R
,
Straif
K
,
Grosse
Y
,
Secretan
B
,
El Ghissassi
F
,
Bouvard
V
, et al
.
Carcinogenicity of alcoholic beverages
.
Lancet Oncol
2007
;
8
:
292
3
.
28.
Cho
E
,
Smith-Warner
SA
,
Ritz
J
,
van den Brandt
PA
,
Colditz
GA
,
Folsom
AR
, et al
.
Alcohol intake and colorectal cancer: a pooled analysis of 8 cohort studies
.
Ann Intern Med
2004
;
140
:
603
13
.
29.
Cui
WQ
,
Wang
ST
,
Pan
D
,
Chang
B
,
Sang
LX
.
Caffeine and its main targets of colorectal cancer
.
World J Gastrointest Oncol
2020
;
12
:
149
72
.
30.
Steinmetz
KA
,
Potter
JD
.
Vegetables, fruit, and cancer. II. mechanisms
.
Cancer Causes Control
1991
;
2
:
427
42
.
31.
Aune
D
,
Lau
R
,
Chan
DS
,
Vieira
R
,
Greenwood
DC
,
Kampman
E
, et al
.
Nonlinear reduction in risk for colorectal cancer by fruit and vegetable intake based on meta-analysis of prospective studies
.
Gastroenterology
2011
;
141
:
106
18
.
32.
Song
M
,
Garrett
WS
,
Chan
AT
.
Nutrients, foods, and colorectal cancer prevention
.
Gastroenterology
2015
;
148
:
1244
60
.
33.
DietaryGuidelines.gov
[
Internet
]. U.S. Department of Agriculture and U.S. Department of Health and Human Services. Dietary Guidelines for Americans, 2020–2025. 9th Edition. December 2020 [cited 2022 Sep 14].
Available from
: https://www.dietaryguidelines.gov/sites/default/files/2020–12/Dietary_Guidelines_for_Americans_2020–2025.pdf.
34.
Beyaz
S
,
Mana
MD
,
Roper
J
,
Kedrin
D
,
Saadatpour
A
,
Hong
SJ
, et al
.
High-fat diet enhances stemness and tumorigenicity of intestinal progenitors
.
Nature
2016
;
531
:
53
58
.
35.
Sun
Z
,
Liu
L
,
Wang
PP
,
Roebothan
B
,
Zhao
J
,
Dicks
E
, et al
.
Association of total energy intake and macronutrient consumption with colorectal cancer risk: results from a large population-based case-control study in Newfoundland and Labrador and Ontario, Canada
.
Nutr J
2012
;
11
:
18
.
36.
Aune
D
,
Chan
DS
,
Lau
R
,
Vieira
R
,
Greenwood
DC
,
Kampman
E
, et al
.
Carbohydrates, glycemic index, glycemic load, and colorectal cancer risk: a systematic review and meta-analysis of cohort studies
.
Cancer Causes Control
2012
;
23
:
521
35
.
37.
Liu
Y
,
Yu
Q
,
Zhu
Z
,
Zhang
J
,
Chen
M
,
Tang
P
, et al
.
Vitamin and multiple-vitamin supplement intake and incidence of colorectal cancer: a meta-analysis of cohort studies
.
Med Oncol
2015
;
32
:
434
.
38.
Bollheimer
LC
,
Buettner
R
,
Kullmann
A
,
Kullmann
F
.
Folate and its preventive potential in colorectal carcinogenesis. How strong is the biological and epidemiological evidence?
Crit Rev Oncol Hematol
2005
;
55
:
13
36
.
39.
Yetley
EA
,
Johnson
CL
.
Folate and vitamin B-12 biomarkers in NHANES: history of their measurement and use
.
Am J Clin Nutr
2011
;
94
:
322S
31S
.
40.
Patterson
BH
,
Block
G
,
Rosenberger
WF
,
Pee
D
,
Kahle
LL
.
Fruit and vegetables in the American diet: data from the NHANES II survey
.
Am J Public Health
1990
;
80
:
1443
9
.
41.
Branum
AM
,
Rossen
LM
.
The contribution of mixed dishes to vegetable intake among US children and adolescents
.
Public Health Nutr
2014
;
17
:
2053
60
.
42.
Holford
TR
.
Understanding the effects of age, period, and cohort on incidence and mortality rates
.
Annu Rev Public Health
1991
;
12
:
425
57
.

Supplementary data