Background:

Parental smoking is implicated in the etiology of acute lymphoblastic leukemia (ALL), the most common childhood cancer. We recently reported an association between an epigenetic biomarker of early-life tobacco smoke exposure at the AHRR gene and increased frequency of somatic gene deletions among ALL cases.

Methods:

Here, we further assess this association using two epigenetic biomarkers for maternal smoking during pregnancy—DNA methylation at AHRR CpG cg05575921 and a recently established polyepigenetic smoking score—in an expanded set of 482 B-cell ALL (B-ALL) cases in the California Childhood Leukemia Study with available Illumina 450K or MethylationEPIC array data. Multivariable Poisson regression models were used to test the associations between the epigenetic biomarkers and gene deletion numbers.

Results:

We found an association between DNA methylation at AHRR CpG cg05575921 and deletion number among 284 childhood B-ALL cases with MethylationEPIC array data, with a ratio of means (RM) of 1.31 [95% confidence interval (CI), 1.02–1.69] for each 0.1 β value reduction in DNA methylation, an effect size similar to our previous report in an independent set of 198 B-ALL cases with 450K array data [meta-analysis summary RM (sRM) = 1.32; 95% CI, 1.10–1.57]. The polyepigenetic smoking score was positively associated with gene deletion frequency among all 482 B-ALL cases (sRM = 1.31 for each 4-unit increase in score; 95% CI, 1.09–1.57).

Conclusions:

We provide further evidence that prenatal tobacco-smoke exposure may influence the generation of somatic copy-number deletions in childhood B-ALL.

Impact:

Analyses of deletion breakpoint sequences are required to further understand the mutagenic effects of tobacco smoke in childhood ALL.

This article is featured in Highlights of This Issue, p. 1453

Acute lymphoblastic leukemia (ALL) is the most common childhood malignancy in the United States, with approximately 2,700 incident cases diagnosed under age 15 each year (1). Although survival rates for ALL have improved dramatically in recent decades, with overall 5-year survival now upwards of 90% (2), ALL remains a leading cause of disease-related mortality in children and current treatments still carry long-term health consequences (3–5). Therefore, prevention remains a top priority (6). In addition to known ALL risk factors with large effects, namely ionizing radiation and genetic predisposition syndromes (7–9), several environmental exposures have been associated with ALL etiology, including tobacco smoke, pesticides, paint, and air pollution (6, 10); however, the causal mechanisms remain largely unclear (11).

Childhood ALL, in particular B-cell ALL (B-ALL), is thought to follow a “two-hit” model of leukemogenesis (12, 13), with in utero development of a preleukemic clone (14, 15) that progresses to overt leukemia following postnatal acquisition of secondary genetic changes (16). Deletions of genes involved in cell cycle control, and B-lymphocyte development and hematopoiesis (17–19) including, most commonly, CDKN2A, ETV6, PAX5, and IKZF1 (18), comprise a large proportion of the secondary alterations in childhood B-ALL.

We recently reported a positive association between early-life tobacco smoke exposure and somatic gene deletions in childhood ALL cases, suggesting a potential etiologic role for parental and/or household smoking. In 559 childhood ALL cases in the California Childhood Leukemia Study (CCLS), self-reported maternal and paternal smoking were associated with an increased number of gene deletions (20). In a subset of 198 B-ALL cases for whom genome-wide DNA methylation data were available from Illumina HumanMethylation450 BeadChip (450K) arrays, we validated this association using an epigenetic biomarker for maternal smoking during pregnancy at the AHRR gene (20–22).

In this study, we examine the association between DNA methylation at the AHRR CpG cg05575921 and gene deletions in an expanded set of 482 B-ALL cases in the CCLS, including an additional 284 B-ALL cases with DNA methylation data now available from Illumina Infinium MethylationEPIC BeadChip (EPIC) arrays. Further, we sought to expand our analysis of the impact of prenatal tobacco smoke exposure on gene deletion burden in childhood B-ALL by using a recently established polyepigenetic smoking score of in utero tobacco smoke exposure (23).

Ethics statement

This study was approved and reviewed by the Institutional Review Boards at the University of Southern California, the University of California, Berkeley, the California Department of Public Health, and all participating hospitals. Written informed consent was obtained from all study participants. This study was conducted in accordance with the Declaration of Helsinki.

Study population

The CCLS is a population-based case–control study conducted from 1995 to 2015 to examine the relationships between various environmental exposures, genetic factors, and childhood leukemia (10). Cases were identified within 72 hours after diagnosis at hospitals across California. Eligible criteria include (i) age under 15 years, (ii) without prior cancer diagnosis, (iii) residence in California at the time of diagnosis, and (iv) having an English or Spanish-speaking biological parent available for interview. Controls were not included in the current case-only analysis. Newborn dried bloodspots (DBS) were obtained from the California Biobank Program Genetic Disease Screening Program. The current analysis included 482 B-ALL cases with available genome-wide DNA methylation array data and gene deletion frequency data (Fig. 1). Of those with available race/ethnicity, 228 (58.9%) self-identified as Latino, 102 (26.4%) as non-Latino White, and 57 (14.7%) as other non-Latino races/ethnicities (including African American, Native American, Asian, and mixed/other groups; Table 1).

Somatic copy-number data

Copy-number at eight commonly deleted gene regions [CDKN2A, ETV6, IKZF1, PAX5, BTG1, EBF1, RB1, and genes within the pseudoautosomal region (PAR1) of the sex chromosomes (CRLF2, CSF2RA, IL3RA)] was assayed in tumor DNA using multiplex ligation-dependent probe amplification (MLPA), as described previously (20, 24).

Genome-wide DNA methylation arrays

For 198 B-ALL cases, genome-wide DNA methylation data were already available from Illumina 450K arrays (20, 25). For an additional 284 B-ALL cases, germline DNA was isolated from newborn DBS and bisulfite-treated as described previously (25), and subsequently assayed on Illumina EPIC arrays. EPIC arrays include >850,000 CpG probes, comprising >90% of CpGs on 450K arrays plus an additional 413,743 CpGs. CpG β values were normalized to remove batch effects according to the approach by Fortin and colleagues (26). Functional normalization was performed with noob background correction (27) by using the “preprocessFunnorm” function in the minfi package (28) through the Bioconductor project (29, 30).

The AHRR CpG cg05575921 is included on both the 450K and EPIC arrays; we extracted beta values for this CpG for all 284 B-ALL cases assayed on the EPIC array to test for association with number of gene deletions, as previously performed for the 198 B-ALL cases in the 450K dataset (20).

We also calculated a new polyepigenetic smoking score for sustained maternal smoking during pregnancy, in both the 450K and EPIC datasets (23). In brief, for 450K data, we computed a DNA methylation-based smoking score as the linear combination of 28 previously selected maternal smoking-associated CpGs using their corresponding logistic LASSO regression coefficients (Supplementary Table S1; ref. 23). For EPIC data, we calculated a score using 26 of the 28 CpGs that are included on the EPIC array, including the AHRR CpG cg05575921 (23). Cases with missing data for any of these CpGs (due to detection P values >0.01) were excluded from analyses involving polyepigenetic smoking scores.

AHRR DNA methylation quantitative trait locus (mQTL) genotype data

To account for potential genetic effects on DNA methylation at the AHRR CpG cg05575921, for 198 B-ALL cases (450K) we had available genotype data for SNP rs148405299, which was identified as an mQTL for cg05575921, as described previously (25). In addition, for the new set of 284 B-ALL cases (EPIC) we genotyped SNP rs77111113, which is in perfect linkage disequilibrium with rs148405299 across all populations in LDlink (R2 = 1.0; ref. 31), using a predesigned TaqMan SNP genotyping assay (Thermo Fisher Scientific, Assay ID: C__25986435_10). Hereafter, we refer to either rs148405299 or rs77111113 as the AHRR-mQTL SNP.

Self-reported smoking exposures

Among B-ALL cases with available parent interview data, we tested the association between the DNA methylation-based biomarkers of maternal tobacco smoking in pregnancy and self-reported tobacco exposures as assessed by parent interviews (10, 20). Dichotomous smoking variables (“yes” or “no”) included maternal/paternal ever smoking, maternal/paternal smoking 3 months before conception (preconception), maternal/paternal smoking at the time of the interview, maternal smoking during pregnancy, maternal smoking during breastfeeding, maternal prenatal smoking (during either preconception or pregnancy), maternal smoking during the year after birth, and child postnatal passive smoking. Continuous measures (number of cigarettes, pipes, or cigars per day) included maternal/paternal smoking preconception, maternal smoking during pregnancy, maternal smoking during breastfeeding, and maternal prenatal smoking (average of maternal smoking during preconception and during pregnancy). We also used combinations of responses from parental interviews to infer: (i) which mothers smoked throughout the entire duration of pregnancy, and (ii) which mothers were never exposed to tobacco smoke from any source during pregnancy, allowing us to compare the DNA methylation-based biomarker levels of maternal smoking in pregnancy at these two extremes of self-reported smoking.

Statistical analyses

All statistical analyses were performed in R v 4.0.0 (32). All two-sided P values below 0.05 indicate statistical significance. All analyses were performed separately in the 450K and EPIC datasets, including 198 and 284 B-ALL cases, respectively. Means and SDs were summarized to describe the distribution of continuous characteristics, and frequencies and proportions were computed for categorical characteristics.

We calculated Spearman rank correlation among self-reported tobacco smoking exposures, DNA methylation at the AHRR CpG cg05575921, and the polyepigenetic smoking scores. Linear regression models were additionally used to test for association between DNA methylation at the AHRR CpG cg05575921 or the polyepigenetic smoking scores and self-reported tobacco smoke exposures. To obtain independent effects of paternal smoking or maternal smoking on DNA methylation, maternal smoking was adjusted for paternal smoking in linear regression models, and vice versa (Supplementary Table S2). In addition, associations between the joint exposures of prenatal and postnatal tobacco smoking and DNA methylation were measured by fitting linear regression models for composite variables that were newly derived from paternal smoking preconception, maternal prenatal smoking, and child postnatal passive smoking (Supplementary Table S3). Linear regression models were adjusted for cell-type heterogeneity using principal components (PC) derived from ReFACTor (33), and genetic ancestry using PCs derived from EPISTRUCTURE (34).

Linear regression models were used to assess whether the DNA methylation-based biomarkers had significant associations with child's birth year, with the AHRR-mQTL SNP being additionally adjusted for DNA methylation at the AHRR CpG cg05575921 (25).

Poisson regression models were used to test association between DNA methylation at the AHRR CpG cg05575921 and deletion numbers in 284 cases in the EPIC dataset, and between the polyepigenetic smoking scores and deletion numbers in both the 450K and EPIC datasets. Models were adjusted for ReFACTor and EPISTRUCTURE PCs and additionally adjusted for the AHRR-mQTL SNP to control for potential confounding (25). Models for ratios of means (RM) were calculated for every 0.1 β-value decrease (20) in AHRR cg05575921 methylation, and for every 4-unit increase in polyepigenetic scores. We also assessed the association between the polyepigenetic scores minus the AHRR CpG cg05575921 and deletion numbers. Sensitivity analysis was conducted in which the Poisson regression models were adjusted for self-reported race/ethnicity (i.e., Latino, non-Latino White, and non-Latino other), instead of EPISTRUCTURE PCs, in the subset of cases with available data (Table 1).

Fixed effect meta-analysis models were used to test for heterogeneity between 450K and EPIC datasets, and to generate summary effect estimates accounting for the variance of each dataset, using R packages tidymeta and metafor (35, 36). Study heterogeneity was characterized with I2 statistics and their corresponding P values (37).

Finally, we repeated the epigenetic biomarker and gene deletions analyses stratified by: (i) self-reported race/ethnicity, in the subset of B-ALL cases with available data (Table 1) and limited to Latinos and non-Latino Whites due to sample size; and (ii) age of diagnosis, limited to ≥2 years of age, 0 to 5 years of age, and >5 years of age [as the number of ALL cases diagnosed <1 year of age in our study (n = 8) was small].

Demographic characteristics of the 482 B-ALL cases are summarized in Table 1, and the study design is illustrated in Fig. 1. The distribution of deletions among 198 B-ALL cases in the 450K dataset and in the additional 284 B-ALL cases in the EPIC dataset were similar (Supplementary Fig. S1; Supplementary Table S4). In the 450K dataset, 125 of 198 (63.1%) of cases harbored at least one gene deletion compared with 162/284 (57.0%) of cases in the EPIC dataset (Supplementary Fig. S1).

Association between self-reported smoking variables and DNA methylation-based biomarkers of maternal smoking in B-ALL cases

The median AHRR cg05575921 β-value among B-ALL cases was 0.82 [interquartile range (IQR): 0.79–0.85] in the 450K dataset and 0.81 (IQR: 0.78–0.84) in the EPIC dataset. The median polyepigenetic smoking score was −0.52 among 194 cases (4/198 cases excluded due to missing data) in the 450K dataset (IQR: −1.83–0.95) and 0.83 (IQR: −0.34–2.11) among 284 cases in the EPIC dataset (Supplementary Fig. S2). Mean methylation β values of CpGs that were used to generate the polyepigenetic scores are summarized in Supplementary Table S1. β values of most of the CpGs were significantly different between B-ALL cases in the 450K dataset and the EPIC dataset (Supplementary Fig. S3), although a significant difference was not found for DNA methylation at the AHRR CpG cg05575921.

Self-reported tobacco smoking exposure data were available for all 198 B-ALL cases in the 450K dataset and 189 out of 284 cases in the EPIC dataset (Supplementary Table S5). The distributions of smoking variables were similar between cases in the 450K and EPIC datasets, although in general more cases in the 450K dataset were reported to be exposed to tobacco smoke. We did not find any evidence that DNA methylation at the AHRR CpG cg05575921 or the polyepigenetic smoking scores were associated with child's birth year (Supplementary Fig. S4).

Maternal smoking variables were strongly correlated with each other (ρ range: 0.36–1.00) and had relatively lower correlations with paternal smoking variables (ρ range: 0.03–0.48; Supplementary Fig. S5). The two DNA methylation-based biomarkers were significantly correlated (450K: ρ = 0.54; EPIC: ρ = 0.60). Decreased DNA methylation at the AHRR cg05575921 was correlated with maternal prenatal smoking exposures in both the 450K and EPIC datasets; it was additionally correlated with maternal smoking during breastfeeding and child passive smoking (via parental smoking) in the EPIC dataset. Increased polyepigenetic scores were significantly correlated with the majority of the self-reported parental smoking exposures in both 450K and EPIC data.

In both the 450K and EPIC datasets, polyepigenetic smoking scores were associated with nearly all of the self-reported smoking exposures in multivariable linear regression models (Fig. 2). Decreased AHRR cg05575921 β-value was mainly associated with maternal smoking exposures. Furthermore, joint exposures of maternal or paternal smoking and child postnatal passive smoking were significantly associated with the two epigenetic biomarkers.

Independent maternal and paternal smoking effects on DNA methylation were obtained from multivariable linear regression models (Fig. 2). Maternal smoking exposures remained associated with polyepigenetic smoking scores and AHRR cg05575921 methylation while adjusting for paternal smoking preconception. In addition, paternal smoking during preconception remained associated with the polyepigenetic smoking score when controlling for maternal prenatal smoking. Notably, we found a −0.091 difference in the mean cg05575921 β value and a ∼4-unit difference for the polyepigenetic smoking score in the 450K dataset for mothers who smoked throughout pregnancy compared with mothers who were never exposed to tobacco smoke. The −0.091 difference is comparable to the previously reported −0.1 difference in AHRR cg05575921 β-value of neonates of mothers with high cotinine levels versus mothers with undetectable cotinine levels (38). Therefore, we considered the corresponding 4-unit coefficient estimate in the same model for the polyepigenetic score to be biologically relevant, and subsequently computed RMs of deletion numbers for every 4-unit increase of the polyepigenetic score in both 450K and EPIC datasets.

DNA methylation-based biomarkers of tobacco smoke exposure are associated with gene deletion burden in childhood ALL

In the new EPIC dataset of 284 B-ALL cases, we found a 1.31-fold increase in the mean number of deletions with every 0.1 β-value decrease in cg05575921 [95% confidence interval (CI), 1.02–1.69; Fig. 3]. After stratifying by sex, a stronger association presented in males (RM, 1.41; 95% CI, 0.99–2.02) compared with females (RM, 1.30; 95% CI, 0.87–1.95; Supplementary Table S6), however, these differences were not significant in tests for heterogeneity (Phet = 0.599). In a meta-analysis of the 450K and EPIC datasets, the summary RM (sRM) was 1.32 (95% CI, 1.10–1.57; Fig. 3).

We further extended our original analysis by constructing a DNA methylation-based smoking score, including the AHRR CpG and over 20 additional CpGs. In the meta-analysis of the 450K and EPIC datasets, the polyepigenetic score was also significantly associated with an increased number of deletions with a 1.31-fold increase in mean number of deletions for every 4-unit increase in the score (95% CI, 1.09–1.57; Fig. 3). Similar effect sizes were seen for the association between the polyepigenetic score and the number of deletions in the 450K dataset (RM = 1.36; 95% CI, 1.05–1.76) and the EPIC dataset (RM = 1.26; 95% CI, 0.97–1.64), although the latter did not reach statistical significance (Fig. 3).

We next explored whether removal of the AHRR CpG cg05575921 would impact the association between the polyepigenetic score and ALL patient gene deletion burden. A significant association between the modified polyepigenetic score and the number of deletions was still observed in the 450K dataset (RM = 1.44; 95% CI, 1.06–1.95), with a slightly attenuated effect in the EPIC dataset (RM = 1.24; 95% CI, 0.91–1.69; Fig. 3). In the meta-analysis, the polyepigenetic smoking score excluding the AHRR CpG remained significantly positively associated with number of gene deletions (sRM = 1.34; 95% CI, 1.08–1.66; Fig. 3).

No significant interaction effects were detected between the DNA methylation-based biomarkers and B-ALL cytogenetic subtypes [high-hyperdiploidy (HD-ALL) and ETV6–RUNX1 fusion] on deletion numbers (Supplementary Table S7).

Effect sizes from Poisson regression models adjusting for self-reported race/ethnicity were very similar to those adjusting for EPISTRUCTURE PCs (Supplementary Fig. S6). In analyses stratified by self-reported race/ethnicity, stronger associations between the DNA methylation-based biomarkers and gene deletions presented in non-Latino White compared with Latino B-ALL cases (Supplementary Fig. S7), although the differences were not significant in tests for heterogeneity (Phet > 0.10). Finally, we assessed potential effects of patient age-at-diagnosis on our results, and observed similar associations between the epigenetic biomarkers and gene deletion burden after excluding cases diagnosed <2 years of age to those found in the overall B-ALL cases (Supplementary Fig. S8); the association between the polyepigenetic smoking score and gene deletions was slightly stronger among B-ALL cases diagnosed >5 years of age than those diagnosed ≤5 years of age (Phet > 0.10).

Somatic copy-number loss of lymphoid transcription factor and cell-cycle control genes is an important driver of leukemogenesis in childhood ALL. Aberrant recombination-activating gene (RAG) activity, which normally drives antibody diversification as part of the adaptive immune system, is thought to underlie the formation of gene deletions in some patients with ALL (39, 40). However, few epidemiology studies have explored whether extrinsic factors influence the generation of somatic copy-number alterations in developing lymphocytes. Here, we provide further evidence that prenatal exposure to tobacco smoke may induce leukemia-causing gene deletions in patients with ALL (20, 41).

We recently reported that decreased DNA methylation at the AHRR CpG cg05575921, a biomarker for maternal smoking during pregnancy (38, 42), was associated with an increased frequency of somatic gene deletions among childhood B-ALL cases (20). We have replicated this association in a larger, independent set of childhood B-ALL cases, assayed on Illumina EPIC DNA methylation arrays, and found a remarkably similar effect size with a ratio of means of 1.31 in this study compared with 1.32 in our previous report. Further, we found a similar positive association between gene deletion frequency and increased in utero tobacco smoke exposure in ALL cases as measured by a recently established polyepigenetic smoking score (23). This association remained after removal of the AHRR CpG from the smoking score and, thus, we were able to confirm our findings using an independent epigenetic biomarker.

Previous case–control studies based on questionnaire data reported significant association between paternal preconception smoking and childhood ALL risk, but no association between maternal smoking and childhood ALL risk (10, 43, 44). The discrepancy between our findings and the epidemiologic literature on maternal smoking and childhood ALL risk could be due to several reasons. First, case–control studies limited to the use of self-reported data may be affected by recall bias, and may include potentially underreported smoking exposures (45) due to a perceived social stigma (46), in particular for maternal smoking. In addition, the two epigenetic biomarkers examined in this study can reflect particularly sustained maternal smoking throughout pregnancy (10, 23, 47), which is difficult to assess using single survey questions.

Second, we cannot rule out that these epigenetic biomarkers may also be proxies for nonmaternal and/or postnatal tobacco smoke exposures that may impact the generation of gene deletions in childhood ALL. In our study, AHRR cg05575921 methylation was strongly associated with self-reported maternal prenatal smoking, consistent with previous findings that decreased methylation at cg05575921 was definitively associated with in utero exposure to maternal smoking (21, 25), and not overtly connected to paternal smoking or secondhand smoke exposure (22). However, in contrast, the polyepigenetic smoking scores were associated with both self-reported maternal and paternal smoking exposures, suggesting that some CpGs included in this score may be associated with multiple sources of tobacco exposure. Moreover, both biomarkers were associated with the cumulative self-reported smoking exposures and the joint exposures of parental prenatal smoking with child postnatal passive smoking. These composite variables are indicative of smoking exposures in the household or residual prenatal smoking exposures that were not captured by single survey questions.

Third, the potential leukemogenic effects of tobacco smoke exposure on gene deletions in patients with ALL may not translate to overall ALL risk in case–control studies, perhaps due to varying effects in different molecular subtypes. This is supported by the finding that the combination of paternal prenatal smoking with child postnatal passive smoking was significantly associated with ETV6-RUNX1 fusion ALL, but not with HD-ALL (10). HD-ALL is associated with a lower frequency of somatic gene deletions relative to other ALL subtypes (20) and, in our previous study, self-reported tobacco exposures were no longer associated with gene deletion frequency in ALL cases when restricted to HD-ALL (20), although we did not formally test for interaction. In this study, we did not find significant interaction between ALL subtype and the DNA methylation-related biomarkers, but this may be due to a lack of power and warrants further investigation.

To our knowledge, this is the first study to (i) compute a polyepigenetic smoking score using neonatal DNA methylation data from EPIC arrays, and (ii) test whether self-reported smoking exposures were associated with polyepigenetic scores derived from both 450K and EPIC data. The smoking score was developed by Reese and colleagues using 450K array data from newborn cord blood samples (23) and, in our study, we found largely consistent mean methylation β values for the CpGs used to generate the score (Supplementary Table S1). Excluding the two CpGs present on 450K but not EPIC arrays caused little loss of performance in the 450K data; however, the predictive performance of the score using EPIC array data requires further investigation. We found that the majority of CpGs in the consensus smoking score showed significantly different average β values between 450K and EPIC array data (21/26 CpGs in overall newborns; 19/26 CpGs in newborns not exposed to tobacco smoke during pregnancy; Supplementary Fig. S3). Further, the consensus score was significantly lower in the 450K dataset [0.26 (IQR = −1.03–1.80)] than the EPIC dataset [0.83 (IQR = −0.34–2.11); Wilcoxon test P = 0.003], despite more newborns in the 450K dataset being exposed to parental tobacco smoke according to interview data (Supplementary Table S5). The interarray differences in the smoking score CpGs did not correspond consistently with their association with maternal smoking during pregnancy (38), nor were they likely explained by changes in individuals' smoking behaviors over time (i.e., cohort effect; Supplementary Fig. S4). They may instead be due to probe cross-reactivity or a shifted distribution of methylation values caused by increased Type II probe measurements on EPIC arrays (48, 49).

The polyepigenetic smoking score was developed in a homogenous population from Norway (23), a country with different smoking habits (e.g., more prevalent use of hand-rolled cigarettes with higher nicotine and tar content) than the United States (50). This may hamper the performance and generalizability of the score in our study, in which over 50% of cases were of non-white race/ethnicity, with a particularly large number of Latinos. In analyses stratified by self-reported race/ethnicity, both epigenetic biomarkers of tobacco smoke exposure showed a stronger association with the frequency of ALL gene deletions in non-Latino Whites than in Latinos. This might be attributable to a potentially superior performance of these biomarkers in predicting prenatal tobacco smoke exposure in non-Latino Whites compared with Latinos, but this warrants further evaluation as the number of non-Latino White cases in our study was relatively small. Nonetheless, the transferability of epigenetic biomarkers developed in largely European ancestry individuals across ancestrally diverse populations should be determined.

Our study does have several limitations that warrant consideration. Importantly, the DNA methylation-based biomarkers were derived from newborn DBS, thus we were not able to assess the potential effects of postnatal tobacco smoke exposure. Preleukemic clones may be present at birth, but at very low clonal frequencies in whole blood (51, 52) and, thus, are unlikely to have influenced our DNA methylation results. This was supported by the minimal effects on our results after excluding B-ALL cases diagnosed <2 years of age. An additional limitation was our limited ability to study the effects of tobacco smoke exposure across different cytogenetic subtypes of ALL, due to sample size and a lack of information on subtypes beyond HD-ALL and ETV6-RUNX1 fusion. Further, our analyses were limited to the eight commonly deleted genes targeted by the MLPA assays. These assays do not provide information on deletion breakpoint locations, hence we could not explore the molecular mechanisms underlying the formation of deletions in our ALL cases. Given that aberrant RAG-mediated V(D)J recombination underlies a large proportion of somatic gene deletions in ALL (39, 40, 53), it is compelling that cord blood lymphocytes in newborns of mothers exposed to tobacco smoke have been found to harbor a significantly increased frequency of off-target RAG recombination-mediated deletions than in newborns of mothers who were not exposed to tobacco (41, 54), however, this remains to be examined in the setting of childhood ALL.

In summary, we provide further evidence that prenatal tobacco smoke exposure may influence the generation of somatic copy-number deletions in childhood B-ALL cases. Future epidemiologic studies that incorporate both information on early-life exposure to tobacco smoke as well as whole-genome sequencing of ALL tumors and, in turn, analysis of mutational signatures and deletion breakpoint sequences are required to investigate the potential mutagenic effects of tobacco smoke in childhood ALL.

C. Metayer reports grants from University of California Berkeley during the conduct of the study. No disclosures were reported by the other authors.

K. Xu: Formal analysis, visualization, methodology, writing–original draft, writing–review and editing. S. Li: Data curation, formal analysis, validation, writing–review and editing. T.P. Whitehead: Data curation, writing–original draft, writing–review and editing. P. Pandey: Formal analysis, writing–review and editing. A.Y. Kang: Data curation, project administration, writing–review and editing. L.M. Morimoto: Data curation, writing–review and editing. S.C. Kogan: Writing–review and editing. C. Metayer: Resources, funding acquisition, investigation, writing–original draft, project administration, writing–review and editing. J.L. Wiemels: Conceptualization, supervision, funding acquisition, writing–original draft, writing–review and editing. A.J. de Smith: Conceptualization, data curation, formal analysis, supervision, funding acquisition, investigation, writing–original draft, writing–review and editing.

The biospecimens and/or data used in this study were obtained from the California Biobank Program, (SIS request #26), Section 6555(b), 17 CCR. The California Department of Public Health is not responsible for the results or conclusions drawn by the authors of this publication. For recruitment of subjects enrolled in the California Childhood Leukemia Study (CCLS), the authors gratefully acknowledge the families for their participation. We also thank the clinical investigators at the following collaborating hospitals for help in recruiting patients: University of California Davis Medical Center (Dr. Jonathan Ducore), University of California San Francisco (Drs. Mignon Loh and Katherine Matthay), Children's Hospital of Central California (Dr. Vonda Crouse), Lucile Packard Children's Hospital (Dr. Gary Dahl), Children's Hospital Oakland (Drs. James Feusner and Carla Golden), Kaiser Permanente Roseville (formerly Sacramento; Drs. Kent Jolly and Vincent Kiley), Kaiser Permanente Santa Clara (Drs. Carolyn Russo, Alan Wong, and Denah Taggart), Kaiser Permanente San Francisco (Dr. Kenneth Leung), Kaiser Permanente Oakland (Drs. Daniel Kronish and Stacy Month), California Pacific Medical Center (Dr. Louise Lo), Cedars-Sinai Medical Center (Dr. Fataneh Majlessipour), Children's Hospital Los Angeles (Dr. Cecilia Fu), Children's Hospital Orange County (Dr. Leonard Sender), Kaiser Permanente Los Angeles (Dr. Robert Cooper), Miller Children's Hospital Long Beach (Dr. Amanda Termuhlen), University of California, San Diego Rady Children's Hospital (Dr. William Roberts), and University of California, Los Angeles Mattel Children's Hospital (Dr. Theodore Moore). We thank Robin Cooley and Steve Graham at the California Biobank Program, California Department of Public Health, for assistance in retrieval of neonatal dried bloodspot specimens for CCLS subjects.

This research was supported by the California Tobacco-Related Disease Research Program, Grant No. 26IR-0005A (to A.J. de Smith, C. Metayer, and A.Y. Kang); by the National Institute of Environmental Health Sciences grants R01ES009137 and P42ES004705 (to C. Metayer, A.Y. Kang, J.L. Wiemels, and A.J. de Smith), R24ES028524 (to C. Metayer, A.Y. Kang, T.P. Whitehead, and L.M. Morimoto), P01ES018172 (C. Metayer, T.P. Whitehead, A.Y. Kang, L.M. Morimoto, J.L. Wiemels, and A.J. de Smith), and P50ES018172 (to C. Metayer, T.P. Whitehead, A.Y. Kang, L.M. Morimoto, J.L. Wiemels, S.C. Kogan, and A.J. de Smith); by the United States Environmental Protection Agency under assistance agreement RD83451101 (to C. Metayer, T.P. Whitehead, A.Y. Kang, L.M. Morimoto, J.L. Wiemels, and A.J. de Smith) and RD83615901 (to C. Metayer, T.P. Whitehead, A.Y. Kang, L.M. Morimoto, J.L. Wiemels, S.C. Kogan, and A.J. de Smith); and in part by the Children with Cancer UK (to C. Metayer). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or TRDRP. The contents of this document do not necessarily reflect the views and policies of the Environmental Protection Agency, nor does the EPA endorse trade names or recommend the use of commercial products mentioned in this document.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Ward
E
,
DeSantis
C
,
Robbins
A
,
Kohler
B
,
Jemal
A
. 
Childhood and adolescent cancer statistics, 2014
.
CA Cancer J Clin
2014
;
64
:
83
103
.
2.
Cancer Facts & Figures
.
American Cancer Society
; 
2020
.
3.
Essig
S
,
Li
Q
,
Chen
Y
,
Hitzler
J
,
Leisenring
W
,
Greenberg
M
, et al
Estimating the risk for late effects of therapy in children newly diagnosed with standard risk acute lymphoblastic leukemia using an historical cohort: a report from the Childhood Cancer Survivor Study
.
Lancet Oncol
2014
;
15
:
841
51
.
4.
Winther
JF
,
Schmiegelow
K
. 
How safe is a standard-risk child with ALL?
Lancet Oncol
2014
;
15
:
782
3
.
5.
Mody
R
,
Li
S
,
Dover
DC
,
Sallan
S
,
Leisenring
W
,
Oeffinger
KC
, et al
Twenty-five–year follow-up among survivors of childhood acute lymphoblastic leukemia: a report from the Childhood Cancer Survivor Study
.
Blood
2008
;
111
:
5515
23
.
6.
Whitehead
TP
,
Metayer
C
,
Wiemels
JL
,
Singer
AW
,
Miller
MD
. 
Childhood leukemia and primary prevention
.
Curr Probl Pediatr Adolesc Health Care
2016
;
46
:
317
52
.
7.
Preston
DL
,
Kusumi
S
,
Tomonaga
M
,
Izumi
S
,
Ron
E
,
Kuramoto
A
, et al
Cancer incidence in atomic bomb survivors. Part III. Leukemia, lymphoma and multiple myeloma, 1950–1987
.
Radiat Res
1994
;
137
:
S68
97
.
8.
Doll
R
,
Wakeford
R
. 
Risk of childhood cancer from fetal irradiation
.
Br J Radiol
1997
;
70
:
130
9
.
9.
Pui
C-H
,
Nichols
KE
,
Yang
JJ
. 
Somatic and germline genomics in paediatric acute lymphoblastic leukaemia
.
Nat Rev Clin Oncol
2019
;
16
:
227
40
.
10.
Metayer
C
,
Zhang
L
,
Wiemels
JL
,
Bartley
K
,
Schiffman
J
,
Ma
X
, et al
Tobacco smoke exposure and the risk of childhood acute lymphoblastic and myeloid leukemias by cytogenetic subtype
.
Cancer Epidemiol Biomarkers Prev
2013
;
22
:
1600
11
.
11.
Greaves
M
. 
Infection, immune responses and the aetiology of childhood leukaemia
.
Nat Rev Cancer
2006
;
6
:
193
203
.
12.
Greaves
M
. 
Childhood leukaemia
.
BMJ
2002
;
324
:
283
7
.
13.
Greaves
M
. 
A causal mechanism for childhood acute lymphoblastic leukaemia
.
Nat Rev Cancer
2018
;
18
:
471
84
.
14.
Wiemels
J
,
Cazzaniga
G
,
Daniotti
M
,
Eden
O
,
Addison
G
,
Masera
G
, et al
Prenatal origin of acute lymphoblastic leukaemia in children
.
Lancet
1999
;
354
:
1499
503
.
15.
Greaves
MF
,
Maia
AT
,
Wiemels
JL
,
Ford
AM
. 
Leukemia in twins: lessons in natural history
.
Blood
2003
;
102
:
2321
33
.
16.
Bateman
CM
,
Colman
SM
,
Chaplin
T
,
Young
BD
,
Eden
TO
,
Bhakta
M
, et al
Acquisition of genome-wide copy number alterations in monozygotic twins with acute lymphoblastic leukemia
.
Blood
2010
;
115
:
3553
8
.
17.
Mullighan
CG
,
Phillips
LA
,
Su
X
,
Ma
J
,
Miller
CB
,
Shurtleff
SA
, et al
Genomic analysis of the clonal origins of relapsed acute lymphoblastic leukemia
.
Science
2008
;
322
:
1377
80
.
18.
Schwab
CJ
,
Chilton
L
,
Morrison
H
,
Jones
L
,
Al-Shehhi
H
,
Erhorn
A
, et al
Genes commonly deleted in childhood B-cell precursor acute lymphoblastic leukemia: association with cytogenetics and clinical features
.
Haematologica
2013
;
98
:
1081
8
.
19.
Mullighan
CG
,
Goorha
S
,
Radtke
I
,
Miller
CB
,
Coustan-Smith
E
,
Dalton
JD
, et al
Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia
.
Nature
2007
;
446
:
758
64
.
20.
de Smith
AJ
,
Kaur
M
,
Gonseth
S
,
Endicott
A
,
Selvin
S
,
Zhang
L
, et al
Correlates of prenatal and early-life tobacco smoke exposure and frequency of common gene deletions in childhood acute lymphoblastic leukemia
.
Cancer Res
2017
;
77
:
1674
83
.
21.
Joubert
BR
,
Felix
JF
,
Yousefi
P
,
Bakulski
KM
,
Just
AC
,
Breton
C
, et al
DNA methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis
.
Am J Hum Genet
2016
;
98
:
680
96
.
22.
Joubert
BR
,
Håberg
SE
,
Bell
DA
,
Nilsen
RM
,
Vollset
SE
,
Midttun
Ø
, et al
Maternal smoking and DNA methylation in newborns: in utero effect or epigenetic inheritance?
Cancer Epidemiol Biomarkers Prev
2014
;
23
:
1007
17
.
23.
Reese
SE
,
Zhao
S
,
Wu
MC
,
Joubert
BR
,
Parr
CL
,
Håberg
SE
, et al
DNA methylation score as a biomarker in newborns for sustained maternal smoking during pregnancy
.
Environ Health Perspect
2017
;
125
:
760
6
.
24.
Walsh
KM
,
de Smith
AJ
,
Welch
TC
,
Smirnov
I
,
Cunningham
MJ
,
Ma
X
, et al
Genomic ancestry and somatic alterations correlate with age at diagnosis in Hispanic children with B-cell ALL
.
Am J Hematol
2014
;
89
:
721
5
.
25.
Gonseth
S
,
de Smith
AJ
,
Roy
R
,
Zhou
M
,
Lee
S-T
,
Shao
X
, et al
Genetic contribution to variation in DNA methylation at maternal smoking-sensitive loci in exposed neonates
.
Epigenetics
2016
;
11
:
664
73
.
26.
Fortin
J-P
,
Labbe
A
,
Lemire
M
,
Zanke
BW
,
Hudson
TJ
,
Fertig
EJ
, et al
Functional normalization of 450k methylation array data improves replication in large cancer studies
.
Genome Biol
2014
;
15
:
503
.
27.
Triche
TJ
,
Weisenberger
DJ
,
Van Den Berg
D
,
Laird
PW
,
Siegmund
KD
. 
Low-level processing of Illumina Infinium DNA Methylation BeadArrays
.
Nucleic Acids Res
2013
;
41
:
e90
.
28.
Aryee
MJ
,
Jaffe
AE
,
Corrada-Bravo
H
,
Ladd-Acosta
C
,
Feinberg
AP
,
Hansen
KD
, et al
Minfi: a flexible and comprehensive bioconductor package for the analysis of infinium DNA methylation microarrays
.
Bioinformatics
2014
;
30
:
1363
9
.
29.
Huber
W
,
Carey
VJ
,
Gentleman
R
,
Anders
S
,
Carlson
M
,
Carvalho
BS
, et al
Orchestrating high-throughput genomic analysis with Bioconductor
.
Nat Methods
2015
;
12
:
115
21
.
30.
Gentleman
RC
,
Carey
VJ
,
Bates
DM
,
Bolstad
B
,
Dettling
M
,
Dudoit
S
, et al
Bioconductor: open software development for computational biology and bioinformatics
.
Genome Biol
2004
;
5
:
R80
.
31.
Machiela
MJ
,
Chanock
SJ
. 
LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants
.
Bioinformatics
2015
;
31
:
3555
7
.
32.
R: A language and environment for statistical computing
.
R Foundation for Statistical Computing
,
Vienna, Austria
. https://www.R-project.org/
2020
.
33.
Rahmani
E
,
Zaitlen
N
,
Baran
Y
,
Eng
C
,
Hu
D
,
Galanter
J
, et al
Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies
.
Nat Methods
2016
;
13
:
443
5
.
34.
Rahmani
E
,
Shenhav
L
,
Schweiger
R
,
Yousefi
P
,
Huen
K
,
Eskenazi
B
, et al
Genome-wide methylation data mirror ancestry information
.
Epigenetics Chromatin
2017
;
10
:
1
.
35.
Barrett
M
. 
tidymeta: Tidy and Plot Meta Analyses
.
R package version 0.1.0.9000
; 
2020
.
36.
Viechtbauer
W
. 
Conducting meta-analyses in R with the metafor package
.
J Stat Softw
2010
;
36
:
1
48
.
37.
Higgins
JPT
,
Thompson
SG
. 
Quantifying heterogeneity in a meta-analysis
.
Stat Med
2002
;
21
:
1539
58
.
38.
Joubert
BR
,
Håberg
SE
,
Nilsen
RM
,
Wang
X
,
Vollset
SE
,
Murphy
SK
, et al
450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy
.
Environ Health Perspect
2012
;
120
:
1425
31
.
39.
Papaemmanuil
E
,
Rapado
I
,
Li
Y
,
Potter
NE
,
Wedge
DC
,
Tubio
J
, et al
RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia
.
Nat Genet
2014
;
46
:
116
25
.
40.
Mendes
RD
,
Sarmento
LM
,
Canté-Barrett
K
,
Zuurbier
L
,
Buijs-Gladdines
JGCAM
,
Póvoa
V
, et al
PTEN microdeletions in T-cell acute lymphoblastic leukemia are caused by illegitimate RAG-mediated recombination events
.
Blood
2014
;
124
:
567
78
.
41.
Finette
BA
,
O'Neill
JP
,
Vacek
PM
,
Albertini
RJ
. 
Gene mutations with characteristic deletions in cord blood T lymphocytes associated with passive maternal exposure to tobacco smoke
.
Nat Med
1998
;
4
:
1144
51
.
42.
Markunas
CA
,
Xu
Z
,
Harlid
S
,
Wade
PA
,
Lie
RT
,
Taylor
JA
, et al
Identification of DNA methylation changes in newborns related to maternal smoking during pregnancy
.
Environ Health Perspect
2014
;
122
:
1147
53
.
43.
Orsi
L
,
Rudant
J
,
Ajrouche
R
,
Leverger
G
,
Baruchel
A
,
Nelken
B
, et al
Parental smoking, maternal alcohol, coffee and tea consumption during pregnancy, and childhood acute leukemia: the ESTELLE study
.
Cancer Causes Control
2015
;
26
:
1003
17
.
44.
Milne
E
,
Greenop
KR
,
Scott
RJ
,
Bailey
HD
,
Attia
J
,
Dalla-Pozza
L
, et al
Parental prenatal smoking and risk of childhood acute lymphoblastic leukemia
.
Am J Epidemiol
2012
;
175
:
43
53
.
45.
Rhomberg
LR
,
Chandalia
JK
,
Long
CM
,
Goodman
JE
. 
Measurement error in environmental epidemiology and the shape of exposure-response curves
.
Crit Rev Toxicol
2011
;
41
:
651
71
.
46.
Rebagliato
M
. 
Validation of self reported smoking
.
J Epidemiol Community Health
2002
;
56
:
163
4
.
47.
Klimentopoulou
A
,
Antonopoulos
CN
,
Papadopoulou
C
,
Kanavidis
P
,
Tourvas
A-D
,
Polychronopoulou
S
, et al
Maternal smoking during pregnancy and risk for childhood leukemia: a nationwide case-control study in Greece and meta-analysis
.
Pediatr Blood Cancer
2012
;
58
:
344
51
.
48.
Pidsley
R
,
Zotenko
E
,
Peters
TJ
,
Lawrence
MG
,
Risbridger
GP
,
Molloy
P
, et al
Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling
.
Genome Biol
2016
;
17
:
208
.
49.
Logue
MW
,
Smith
AK
,
Wolf
EJ
,
Maniates
H
,
Stone
A
,
Schichman
SA
, et al
The correlation of methylation levels measured using Illumina 450K and EPIC BeadChips in blood samples
.
Epigenomics
2017
;
9
:
1363
71
.
50.
Rolke
HB
,
Bakke
PS
,
Gallefoss
F
. 
Relationships between hand-rolled cigarettes and primary lung cancer: a Norwegian experience
.
Clin Respir J
2009
;
3
:
152
60
.
51.
Schäfer
D
,
Olsen
M
,
Lähnemann
D
,
Stanulla
M
,
Slany
R
,
Schmiegelow
K
, et al
Five percent of healthy newborns have an ETV6-RUNX1 fusion as revealed by DNA-based GIPFEL screening
.
Blood
2018
;
131
:
821
6
.
52.
Mori
H
,
Colman
SM
,
Xiao
Z
,
Ford
AM
,
Healy
LE
,
Donaldson
C
, et al
Chromosome translocations and covert leukemic clones are generated during normal fetal development
.
Proc Natl Acad Sci U S A
2002
;
99
:
8242
7
.
53.
Mullighan
CG
,
Miller
CB
,
Radtke
I
,
Phillips
LA
,
Dalton
J
,
Ma
J
, et al
BCR–ABL1 lymphoblastic leukaemia is characterized by the deletion of Ikaros
.
Nature
2008
;
453
:
110
4
.
54.
Grant
SG
. 
Qualitatively and quantitatively similar effects of active and passive maternal tobacco smoke exposure on in utero mutagenesis at the HPRT locus
.
BMC Pediatr
2005
;
5
:
20
.

Supplementary data