Abstract
Diet has been recognized as a modifiable risk factor for breast cancer. Highlighting predictive diet-related biomarkers would be of great public health relevance to identify at-risk subjects. The aim of this exploratory study was to select diet-related metabolites discriminating women at higher risk of breast cancer using untargeted metabolomics.
Baseline plasma samples of 200 incident breast cancer cases and matched controls, from a nested case–control study within the Supplémentation en Vitamines et Minéraux Antioxydants (SU.VI.MAX) cohort, were analyzed by untargeted LC-MS. Diet-related metabolites were identified by partial correlation with dietary exposures, and best predictors of breast cancer risk were then selected by Elastic Net penalized regression. The selection stability was assessed using bootstrap resampling.
595 ions were selected as candidate diet–related metabolites. Fourteen of them were selected by Elastic Net regression as breast cancer risk discriminant ions. A lower level of piperine (a compound from pepper) and higher levels of acetyltributylcitrate (an alternative plasticizer to phthalates), pregnene-triol sulfate (a steroid sulfate), and 2-amino-4-cyano butanoic acid (a metabolite linked to microbiota metabolism) were observed in plasma from women who subsequently developed breast cancer. This metabolomic signature was related to several dietary exposures such as a “Western” dietary pattern and higher alcohol and coffee intakes.
Our study suggested a diet-related plasma metabolic signature involving exogenous, steroid metabolites, and microbiota-related compounds associated with long-term breast cancer risk that should be confirmed in large-scale independent studies.
These results could help to identify healthy women at higher risk of breast cancer and improve the understanding of nutrition and health relationship.
Introduction
Cancer is a multifactorial disease (1). Although a large part of cancers are explained by intrinsic factors, between 30% and 50% of all cancer cases are estimated to be preventable (2). In France in 2015, over 30% of diagnosed incident breast cancers, the first female cancer in the world in terms of incidence (3), are attributable to nutrition-related factors, including alcohol, physical inactivity contributing to excess adiposity and weight status (4). Beyond the three latter factors, a protective role of other dietary factors in breast carcinogenesis has been suggested for nonstarchy vegetables [estrogen receptor–negative (ER−) breast cancer; ref. 3], dairy products (among premenopausal women; ref. 3), foods containing carotenoids (3), foods high in calcium (3), or dietary fiber (5). In contrast, increased risk has been suggested to be associated with some types of lipids such as plasma levels of trans-fatty acids produced by industrial processing (ER− breast cancer; ref. 6) and saturated fatty acids intake (7). Diet impacts both the endogenous metabolome and the food metabolome (i.e., the metabolites derived from ingested foods and their subsequent metabolism in the human body; ref. 8). The identification of a set of metabolites that are both (i) influenced by diet and on which it is possible to act through dietary interventions, and (ii) related to increased or decreased breast cancer risk would open new perspectives in terms of prevention and potentially provide insights on the underlying mechanisms.
High-throughput technologies allow the exploration of thousands of molecules resulting from complex system-wide biological interactions. An in-depth untargeted investigation that is not hypothesis-driven holds promise for the discovery of new biomarkers. In comparison with approaches focused on a restricted number of a priori–selected biomarkers, untargeted metabolomics allows highlighting combinations of metabolites (potentially acting with synergistic and antagonist effects) associated with disease risk that could allow a better sensitivity and specificity of predictive models (9). This application to the nutrition field may highlight breast cancer–related metabolomic signatures, combining exogenous metabolites resulting directly from dietary exposure and endogenous or microbial metabolites, and therefore reflecting the impact of diet on host and its microbiota metabolism. So far, to our knowledge, only one previous study used nutritional semiuntargeted metabolomics to examine diet-related serum metabolite associations with long-term breast cancer risk (10). The latter highlighted 22 diet-related metabolites, mainly related to alcohol, vitamin E, and animal fat intakes, associated with breast cancer risk. However, this study was conducted in postmenopausal women only, thus these results cannot be generalized in premenopausal women. Moreover, the unknown compounds were not considered in statistical analyses, limiting the potential discoveries of new biomarkers.
The aim of our study was to select a small subset of diet-related metabolites that best predicted long-term breast cancer risk using up-to-date statistical method, particularly adapted to large-scale omic data (Elastic Net regressions) by analyzing untargeted mass spectrometry metabolomic data.
Materials and Methods
Study population
This study involved participants from the Supplémentation en Vitamines et Minéraux Antioxydants (SU.VI.MAX) prospective cohort (clinicaltrials.gov; NCT00272428), which initially aimed to investigate the effect of a daily antioxidant supplementation in nutritional doses on the incidence of cardiovascular diseases and cancers. This population-based, double-blinded, placebo-controlled, randomized trial was conducted over 8 years, and observational follow-up of health events was subsequently maintained during 5 years (13 years of follow-up). The study design and methods have been previously detailed (11, 12). 13,017 participants were recruited in 1994–1995 and were invited to provide their written informed consent. The trial was approved by the Ethics Committee for Studies with Human Subjects of Paris-Cochin Hospital (CCPPRB 706/2364) and the “Commission Nationale de l'Informatique et des Libertés” (CNIL 334641/907094), and was conducted according to the Declaration of Helsinki guidelines. This work focused on a nested case–control study, including SU.VI.MAX participants with a first incident invasive breast cancer diagnosed after baseline (N = 215) and matched with controls (1:1 ratio, using the density sampling method; ref. 13) for the following characteristics at baseline: age, menopausal status, body mass index (BMI), intervention group of the trial, smoking status, season of blood draw. Details about the nested case–control study are presented in Supplementary Methods and the flowchart is presented in Fig. 1.
Baseline data collection and case ascertainment
Baseline data collection and case ascertainment are described in Supplementary Methods. Briefly, at enrollment, participants were invited to fulfill self-administered questionnaires about sociodemographic characteristics, smoking status, medication use, health status and family history of cancer, and underwent anthropometric measurements as well as a fasting blood draw. During the trial phase, participants were asked to complete computerized 24-hour dietary records every 2 months, including >990 food items (14). Dietary habits at baseline were estimated by averaging intakes from all dietary records collected during the first 2 years of participation in the SU.VI.MAX study. Self-reported health events were reviewed by a physician expert committee. Pathological reports were used to validate cases, and cancers were classified using the International Chronic Diseases Classification, 10th Revision, Clinical Modification (15).
Metabolomic analyses
Untargeted metabolomics (vs. targeted) was performed to discover new biomarkers and new potential metabolites associated with breast cancer risk. Untargeted metabolomics was performed on plasma samples (N = 430) following a slightly modified version of the procedure described by Pereira and colleagues (16). Profiling was conducted using high performance liquid chromatography/mass spectroscopy with the Metabolic profiler platform (Bruker Daltonique). To monitor the analytical system stability, at the beginning of each sequence, blank sample was injected three times to equilibrate the column, followed by a QC sample (pooled participants plasma samples). Then, one QC sample was also injected after each set of 12 participants plasma samples. All samples (blank, QC, and plasma samples) were analyzed using the same analytical method. Details on chemicals and reagents, biological samples preparation, metabolic profiling, raw data extraction, quality controls, and metabolite identification are available in Supplementary Methods and Supplementary Table S1. Data were processed under the Galaxy web-based platform Workflow4Metabolomics (https://workflow4metabolomics.org/) using first XCMS module for peak detection, followed by quality checks and signal drift correction to yield a data matrix containing variables (retention times, masses) and peak intensities that were corrected for batch effects. After a fast overview of all chromatograms, some individuals were excluded due to problems during sample preparation, multiple ions with null intensity or pollution on chromatograms (few serum collecting tubes were contaminated with a PEG-like materials), leaving 200 cases and 200 controls samples for both positive and negative mode analyses. Highly correlated ions (correlation threshold at 0.9) from the same metabolite within the same retention time cluster were removed using the metabolite correlation analysis (MCA) Galaxy module. After these processing steps, 528 and 690 ions (detected in positive and negative ionization modes, respectively) from plasma metabolome remained in datasets for statistical analyses. This metabolomic discovery approach allows semiquantitative measurements, representing relative ion intensities (no absolute concentration). However, linearity between ion intensities and their concentration level was previously validated and matrix effect was previously studied (16). In particular, Pearson correlation was checked between ion intensities and sample dilutions.
Statistical analysis
Participants' baseline characteristics were compared between cases and controls using conditional logistic regressions.
To cover different aspects of diet, dietary exposure was assessed using 4 complementary methods. First, two dietary scores were computed: the mPNNS-GS (modified “Program National Nutrition Santé—Guideline Score”; ref. 17), reflecting adherence to 2001 French nutritional recommendations and including 12 components, eight referred to food-serving recommendations and four referred to moderation in consumption; and the DQI-I (Diet Quality Index-International; ref. 18), including four components (variety, adequacy, moderation, and overall balance) with adapted cutoff values corresponding to French recommendations (19); higher scores representing a higher dietary quality. Then, we computed the average daily intake of 74 specific food groups (g/day) by considering several parameters such as the level of 24-hour recalls, the food groups of official French dietary guidelines and the current knowledge about nutritional biomarkers of dietary consumption. Finally, a principal component analysis (PCA) with varimax rotated factors by orthogonal transformation was performed. This PCA highlighted two dietary patterns representing a “Western diet” (mainly characterized by higher intakes of alcohol, bread, processed and red meat, animal fat, cheese, and potatoes) and a “Healthy diet” (with higher intakes of fruits, vegetables, whole grain, yoghurt, and vegetable oil). Details of these dietary exposures computation are presented in Supplementary Methods, Supplementary Tables S2 and S3. Two statistical analyses were independently performed: Correlation analysis was used to select ions related with diet (first step) whereas Elastic Net regression was used for the selection of breast cancer risk–related ions (second step). In the current broad exploratory approach, dietary factors (either new or already suggested) were considered altogether in the same analysis and metabolites that were the best breast cancer risk predictors were selected.
Identification of ions potentially associated with diet
First, we estimated correlations between the 1,218 ions (both positive and negative modes) and dietary exposures using partial Spearman correlations adjusted for potential confounding factors specifically associated with diet, that is, for age, BMI, menopausal status, smoking status, season at time of blood draw, number of 24-hour dietary records and mean of daily energy intake during the 2 years following the blood draw. Given that ions were tested individually in the models, no normalization was previously performed. In this exploratory study, the aim was to collect a maximum of candidate ions potentially associated with diet in this first step, thus, all ions associated with diet at the threshold of P < 0.05 and correlation >|0.15| were selected for further analysis. Benjamini–Hochberg (BH) correction (20) was only tested before this selection (on 1,218 ions).
Subset selection of diet-related breast cancer risk discriminant ions
Among the ions potentially associated with diet highlighted in the first step, the best subset of predictors of breast cancer risk were selected by using penalized logistic regression thanks to the optimization of the α and λ parameters (no statistical test was performed when using this procedure). As metabolomic data are highly correlated, we used the Elastic Net method (21) implemented in the R glmnet package. This method allows variable selection by forcing the coefficients of the less predictive variables to be exactly zero. Our model considered all diet-related ions simultaneously and the following confounding factors specifically associated with breast cancer risk were included as unpenalized explanatory variables: Age, BMI, season, menopausal status, smoking status, height, physical activity, education level, alcohol intake, use of hormone replacement therapy for menopause, number of children and family history of breast cancer at blood draw (baseline), and intervention group of the initial SU.VI.MAX trial after dietary data collection. As in this second step all diet-related ions were considered simultaneously, intensities of these ions were previously unit-variance scaled. The Elastic Net penalization α and λ parameters (α = 0 implies no variable selection; α = 1 is equivalent to LASSO regression; λ defines the strength of regularization) were optimized by using 5-fold cross-validation repeated 100 times, to account for the variance of cross-validation. Details on these steps are given in footnote of Fig. 2.
Spearman correlation matrix of the selected metabolites was computed. After Elastic Net selection, biological plausibility and plots of (ion intensity) × (dietary exposure) were check. To determine the stability of the selected ions, we applied a Bootstrap resampling Elastic Net method (22) by repeating the selection process 1,000 times on re-sampled data and recording the percentage of times where each ion was selected. Direction of the associations (OR) for the selected ions was estimated using logistic regression with all the selected ions in the same model and adjusted for the confounding factors cited above. Although P values are generated by this model, they cannot be used for inference given the prior Elastic Net-based selection. Complementary analyses were performed using logistic regression models for each selected ion adjusted for the same confounding factors to check the associations between each individual ion and breast cancer risk.
A flow chart of the different statistical steps is shown in Fig. 3.
Because of the untargeted approach, only ions of interest were annotated according to the procedure described in Supplementary Methods. As proposed by Sumner and colleagues (23), the metabolites were classified according to levels of confidence in the identification process: identified (level 1), putatively annotated (level 2), putatively characterized compound classes (level 3), and unknown compound (level 4).
Analyses were performed using SAS (v9.3, Cary, NC) and R (v3.5.2) software.
Results
Baseline characteristics of breast cancer cases and controls in the study population are summarized in Table 1. Among the 1,218 detected ions, 595 were selected as candidate diet-related ions and were considered in the penalized logistic regression analyses (associations between these ions and dietary exposure and FDR values are presented in Supplementary Table S4, N = 1,085).
. | Breast cancer cases (N = 200) . | Controls (N = 200) . | Pb . |
---|---|---|---|
Age at baseline (y) | 48.8 ± 5.8 | 48.7 ± 5.9 | 0.4 |
BMI (kg/m²) | 23.1 ± 3.9 | 23.5 ± 4.2 | 0.08 |
Not applicable | |||
<18.5 kg/m² (underweight) | 8 (4) | 6 (3) | |
≥18.5–<25 kg/m² (normal weight) | 143 (71.5) | 145 (72.5) | |
≥25 kg/m² (overweight) | 49 (24.5) | 49 (24.5) | |
Height (cm) | 162.9 ± 6.1 | 160.9 ± 5.9 | 0.001 |
Intervention group | Not applicable | ||
Placebo | 99 (49.5) | 99 (49.5) | |
Antioxidants | 101 (50.5) | 101 (50.5) | |
Smoking status | Not applicable | ||
Never and former | 159 (79.5) | 159 (79.5) | |
Current smoker | 41 (20.5) | 41 (20.5) | |
Physical activity | 0.8 | ||
Irregular | 64 (32) | 58 (29) | |
<1 h/d walking equivalent | 63 (31.5) | 67 (33.5) | |
≥1 h/d walking equivalent | 73 (36.5) | 75 (37.5) | |
Educational level | 0.1 | ||
Primary | 35 (17.5) | 43 (21.5) | |
Secondary | 75 (37.5) | 86 (43) | |
Superior | 90 (45) | 71 (35.5) | |
Number of biological children | 1.9 ± 1.2 | 2 ± 1.2 | 0.3 |
Hormonal treatment for menopause (yes) | 69 (34.5) | 70 (35) | 0.9 |
Menopausal status at baseline | Not applicable | ||
Premenopausal | 127 (63.5) | 127 (63.5) | |
Postmenopausal | 73 (36.5) | 73 (36.5) | |
Menopausal status at diagnosis | Not applicable | ||
Premenopausal | 77 (38.5) | 77 (38.5) | |
Postmenopausal | 123 (61.5) | 123 (61.5) | |
Family history of breast cancerc (yes) | 34 (17) | 21 (10.5) | 0.05 |
Alcohol intake (g/day) | 10.8 ± 11.2 | 11.6 ± 13.3 | 0.5 |
Month of blood draw | Not applicable | ||
March–April–May | 74 (37) | 74 (37) | |
October–November | 30 (15) | 31 (15.5) | |
December–January–February | 96 (48) | 95 (47.5) |
. | Breast cancer cases (N = 200) . | Controls (N = 200) . | Pb . |
---|---|---|---|
Age at baseline (y) | 48.8 ± 5.8 | 48.7 ± 5.9 | 0.4 |
BMI (kg/m²) | 23.1 ± 3.9 | 23.5 ± 4.2 | 0.08 |
Not applicable | |||
<18.5 kg/m² (underweight) | 8 (4) | 6 (3) | |
≥18.5–<25 kg/m² (normal weight) | 143 (71.5) | 145 (72.5) | |
≥25 kg/m² (overweight) | 49 (24.5) | 49 (24.5) | |
Height (cm) | 162.9 ± 6.1 | 160.9 ± 5.9 | 0.001 |
Intervention group | Not applicable | ||
Placebo | 99 (49.5) | 99 (49.5) | |
Antioxidants | 101 (50.5) | 101 (50.5) | |
Smoking status | Not applicable | ||
Never and former | 159 (79.5) | 159 (79.5) | |
Current smoker | 41 (20.5) | 41 (20.5) | |
Physical activity | 0.8 | ||
Irregular | 64 (32) | 58 (29) | |
<1 h/d walking equivalent | 63 (31.5) | 67 (33.5) | |
≥1 h/d walking equivalent | 73 (36.5) | 75 (37.5) | |
Educational level | 0.1 | ||
Primary | 35 (17.5) | 43 (21.5) | |
Secondary | 75 (37.5) | 86 (43) | |
Superior | 90 (45) | 71 (35.5) | |
Number of biological children | 1.9 ± 1.2 | 2 ± 1.2 | 0.3 |
Hormonal treatment for menopause (yes) | 69 (34.5) | 70 (35) | 0.9 |
Menopausal status at baseline | Not applicable | ||
Premenopausal | 127 (63.5) | 127 (63.5) | |
Postmenopausal | 73 (36.5) | 73 (36.5) | |
Menopausal status at diagnosis | Not applicable | ||
Premenopausal | 77 (38.5) | 77 (38.5) | |
Postmenopausal | 123 (61.5) | 123 (61.5) | |
Family history of breast cancerc (yes) | 34 (17) | 21 (10.5) | 0.05 |
Alcohol intake (g/day) | 10.8 ± 11.2 | 11.6 ± 13.3 | 0.5 |
Month of blood draw | Not applicable | ||
March–April–May | 74 (37) | 74 (37) | |
October–November | 30 (15) | 31 (15.5) | |
December–January–February | 96 (48) | 95 (47.5) |
Abbreviation: BMI, body mass index.
aValues are means ± SDs or n (%).
bP value for the comparison between breast cancer cases and controls using conditional logistic regression. Not applicable for matching factors except for more precise variables (e.g., age and BMI).
cAmong first-degree female relatives.
Fourteen ions resulted from the Elastic Net penalized regression. Among these, 2 were identified with a high level of confidence in annotation: piperine (M286T989) and acetyltributylcitrate (ATBC; M425T1158), two were putatively annotated: 2-amino-cyano-butanoic acid (M153T116) and pregnene-triol sulfate (M413T967) and the other were unknown compounds (M192T181, M265T186, M335T864, M364T125, M97T134, M166T144, M201T1091, M415T1344, M475T122, and M587T121). Identification details of the 14 ions selected by Elastic net are presented in Supplementary Methods. The Spearman correlation matrix of these ions is provided in Supplementary Table S5. 2-amino-cyano-butanoic acid was highly correlated (P < 0.0001) with 3 unknown compounds, including 2 NaCOOH adducts.
In particular, piperine was positively associated with alcoholic drinks intake (r = 0.19) and with a “Western” dietary pattern (r = 0.22); ATBC was positively associated with coffee intake (r = 0.17); 2-amino-cyano-butanoic acid was negatively associated with cake and biscuits intakes (r = −0.16) and pregnene-triol sulfate was positively associated with alcoholic drinks intakes (r = 0.17). The unknown compounds were associated with several dietary exposures such as pasta and cereals, salty products, processed meat, tomatoes, citrus fruit, and pressed cooked cheese intakes (see Supplementary Table S4). Table 2 displays the associations between the 14 selected ions and breast cancer risk from adjusted logistic regression, including the 14 ions altogether or one ion at a time. Lower levels of piperine and 6 unknown compounds (M335T864, M364T125, M166T144, M415T1344, M475T122, and M587T121) and higher levels of 2-amino-4-cyano butanoic acid, ATBC, pregnene-triol sulfate and 4 unknown compounds (M192T181, M265T186, M97T134, and M201T1091) were found in plasma from women who have subsequently developed breast cancer during follow-up.
. | Mass/retention . | Mode of . | All selected ions together . | Selected ions one by one . | ||
---|---|---|---|---|---|---|
Ions (annotationb) . | time . | detection . | OR (95% CI) . | Pc . | OR (95% CI) . | Pc . |
M153T116 (level 2: 2-amino-cyano-butanoic acid) | 153.0162/1.93 | ESI Positive | 1.23 (0.93–1.62) | 0.1 | 1.08 (1.03–1.14) | 0.002 |
M192T181 (level 4: molecular formula; C9H6O4N) | 192.0305/3.01 | ESI Positive | 1.37 (1.05–1.79) | 0.02 | 1.08 (1.03–1.13) | 0.003 |
M265T186 (level 4: unknown) | 265.3932/3.1 | ESI Positive | 1.28 (0.93–1.76) | 0.1 | 1.09 (1.04–1.14) | 0.0008 |
M286T989 (level 1: piperine) | 286.143/16.48 | ESI Positive | 0.76 (0.58–0.99) | 0.04 | 0.94 (0.89–0.99) | 0.01 |
M335T864 (level 4: unknown) | 335.2404/14.4 | ESI Positive | 0.77 (0.61–0.99) | 0.04 | 0.94 (0.89–0.98) | 0.009 |
M364T125 (level 4: NaCOOH adduct) | 363.9292/2.09 | ESI Positive | 0.74 (0.58–0.94) | 0.01 | 0.93 (0.89–0.98) | 0.007 |
M425T1158 (level 1: ATBC) | 425.2172/19.31 | ESI Positive | 1.39 (1.08–1.79) | 0.01 | 1.07 (1.02–1.12) | 0.006 |
M97T134 (level 4: NaCOOH adduct) | 96.9218/2.24 | ESI Positive | 1.23 (0.94–1.60) | 0.1 | 1.10 (1.04–1.15) | 0.0003 |
M166T144 (level 4: unknown) | 166.0387/2.4 | ESI Negative | 0.80 (0.62–1.03) | 0.09 | 0.94 (0.89–0.98) | 0.01 |
M201T1091 (level 4: unknown) | 201.149/18.18 | ESI Negative | 1.21 (0.94–1.57) | 0.1 | 1.10 (1.04–1.15) | 0.0003 |
M413T967 (level 2: pregnene-triol sulfate) | 413.2002/16.11 | ESI Negative | 1.40 (1.08–1.84) | 0.01 | 1.08 (1.03–1.14) | 0.004 |
M415T1344 (level 4: unknown) | 415.2078/22.4 | ESI Negative | 0.83 (0.64–1.07) | 0.2 | 0.92 (0.87–0.96) | 0.0006 |
M475T122 (level 4: unknown) | 474.7265/2.04 | ESI Negative | 0.89 (0.70–1.14) | 0.4 | 0.93 (0.88–0.97) | 0.003 |
M587T121 (level 4: unknown) | 586.6522/2.02 | ESI Negative | 0.69 (0.54–0.89) | 0.004 | 0.93 (0.89–0.98) | 0.005 |
. | Mass/retention . | Mode of . | All selected ions together . | Selected ions one by one . | ||
---|---|---|---|---|---|---|
Ions (annotationb) . | time . | detection . | OR (95% CI) . | Pc . | OR (95% CI) . | Pc . |
M153T116 (level 2: 2-amino-cyano-butanoic acid) | 153.0162/1.93 | ESI Positive | 1.23 (0.93–1.62) | 0.1 | 1.08 (1.03–1.14) | 0.002 |
M192T181 (level 4: molecular formula; C9H6O4N) | 192.0305/3.01 | ESI Positive | 1.37 (1.05–1.79) | 0.02 | 1.08 (1.03–1.13) | 0.003 |
M265T186 (level 4: unknown) | 265.3932/3.1 | ESI Positive | 1.28 (0.93–1.76) | 0.1 | 1.09 (1.04–1.14) | 0.0008 |
M286T989 (level 1: piperine) | 286.143/16.48 | ESI Positive | 0.76 (0.58–0.99) | 0.04 | 0.94 (0.89–0.99) | 0.01 |
M335T864 (level 4: unknown) | 335.2404/14.4 | ESI Positive | 0.77 (0.61–0.99) | 0.04 | 0.94 (0.89–0.98) | 0.009 |
M364T125 (level 4: NaCOOH adduct) | 363.9292/2.09 | ESI Positive | 0.74 (0.58–0.94) | 0.01 | 0.93 (0.89–0.98) | 0.007 |
M425T1158 (level 1: ATBC) | 425.2172/19.31 | ESI Positive | 1.39 (1.08–1.79) | 0.01 | 1.07 (1.02–1.12) | 0.006 |
M97T134 (level 4: NaCOOH adduct) | 96.9218/2.24 | ESI Positive | 1.23 (0.94–1.60) | 0.1 | 1.10 (1.04–1.15) | 0.0003 |
M166T144 (level 4: unknown) | 166.0387/2.4 | ESI Negative | 0.80 (0.62–1.03) | 0.09 | 0.94 (0.89–0.98) | 0.01 |
M201T1091 (level 4: unknown) | 201.149/18.18 | ESI Negative | 1.21 (0.94–1.57) | 0.1 | 1.10 (1.04–1.15) | 0.0003 |
M413T967 (level 2: pregnene-triol sulfate) | 413.2002/16.11 | ESI Negative | 1.40 (1.08–1.84) | 0.01 | 1.08 (1.03–1.14) | 0.004 |
M415T1344 (level 4: unknown) | 415.2078/22.4 | ESI Negative | 0.83 (0.64–1.07) | 0.2 | 0.92 (0.87–0.96) | 0.0006 |
M475T122 (level 4: unknown) | 474.7265/2.04 | ESI Negative | 0.89 (0.70–1.14) | 0.4 | 0.93 (0.88–0.97) | 0.003 |
M587T121 (level 4: unknown) | 586.6522/2.02 | ESI Negative | 0.69 (0.54–0.89) | 0.004 | 0.93 (0.89–0.98) | 0.005 |
Abbreviation: CI, confidence interval.
aThe principal logistic regression model considered all the 14 selected ions at the same time. Logistic regression models from complementary analyses considered separately the selected ions (one model for each of the 14 ions). All these models were adjusted for age (continuous), BMI (continuous), season (a priori–defined periods: October–November/December–January–February/March–April–May), menopausal status (pre/postmenopause status), smoking status (current, former and nonsmokers), height (continuous), physical activity (low, moderate, intense), education level (primary, secondary, superior), alcohol intake (continuous), use of hormone replacement therapy for menopause, number of children and family history of breast cancer at blood draw (baseline), and intervention group of the initial SU,VI,MAX trial (placebo/supplemented). Tests for linear trend were performed using the continuous variables. ORs were presented for a 1 SD increase of the continuous variable (semiquantification).
bLevels of confidence for every identification were given accordingly to Sumner and colleagues (23): level 1, formally identified compound (confirmed with analysis of authentic standard); level 2, putatively identified compound (based upon spectral similarity with public/commercial spectral libraries or reference compound in the literature and/or physicochemical properties); level 3, putatively characterized compound classes; and level 4, unknown compound. Among the unknown compounds, two ions were NaCOOH adducts, but not all of them. The parent ions of the two ion adducts resulted from Elastic Net regression were not detected because data were acquired in positive and negative ion modes with a scan range from 50 to 1,000 mass-to-charge ratio (m/z). To observe the parent ions it would be necessary to acquire data with a lower mass range but this is not achievable with this kind of instrument.
cAlthough P values are generated by this model, they cannot be used for inference given the prior Elastic Net–based selection.
Concerning the stability of the selected 14 ions on original data, their selection frequencies were ≥50% on the 1,000 bootstraps except for 2-amino-cyano-butanoic acid (27%). It was even >70% for piperine (M286T989), ATBC (M425T1158) and pregnene-triol sulfate (M413T967) and 8 unknown compounds (see Fig. 2 presenting the frequency of selection of ions ≥40% over 1,000 bootstraps). Further investigation was carried out to annotate ions with selection frequencies >70% but not selected on the original dataset (i.e., M114T115, M498T1040, M230T171, M423T120, M454T1064, M196T953, M485T1437, M491T120, and M581T123); however, none of them were identified. The associations of these ions with dietary exposures are available in Supplementary Table S4 and with breast cancer risk (from adjusted logistic regression) in Supplementary Table S6.
A summary of the results is shown in Fig. 3.
Discussion
This exploratory study used untargeted metabolomics coupled with multivariable penalized regression to screen for a limited set of ions potentially associated with various dietary exposures and maximized breast cancer risk discrimination.
Comparison with literature remains difficult due to the large diversity of study designs, type of biofluid used or statistical analyses performed. Nevertheless, the magnitude of the dietary associations highlighted in our study seems to be comparable with similar studies based on Food Frequency Questionnaires (FFQ) as in Playdon and colleagues (10) and seems relatively low for dietary patterns and nutritional scores as in Playdon and colleagues (24). However, some associations with individual food as citrus and coffee appear much higher compared with other dietary exposures and could match to metabolites from direct consumption of these foods. For instance, several correlations between ions and coffee intake were over 0.4, including one at 0.54 that seems higher than some studies based on FFQ as the one of Guertin and colleagues (25) despite their use of serum that is more concentrated in metabolites than plasma (26). Few studies have investigated the associations between diet-related metabolomics signatures and cancer risk. They observed associations between metabolites and, on the one hand, nutritional exposures (in particular coffee, alcohol, fibers, vitamin E and fried foods intake, BMI, physical activity), and on the other hand, risk of HCC (27–29), colorectal cancer (30, 31), and only two dealing with breast cancer (10, 32). In particular, most of these studies found metabolites related to alcohol intake, in particular, associated with breast cancer (10) or HCC (27–29) risks. In our study, among the 14 selected ions; 3 were positively associated to alcohol intake (M286T989, M335T864, M413T967), including piperine and pregnene-triol sulfate. Moreover, the ion M230T171, frequently selected during the bootstrap resampling but not selected on the original dataset, was also associated with alcoholic drinks intake. Compared with the literature, some associations were newly identified in this study (such as the positive association between ATBC, coffee intake and breast cancer risk), whereas several others were not replicated. These differences across studies are probably explained by differences in analytical technics, study design, and statistical approaches, as well as study population with heterogeneous underlying diets and cancer sites.
In this study, piperine, an exogenous active alkaloid with no endogenous origin reported so far, was highlighted as potential predictor of breast cancer risk (inverse association), with high stability across penalized models (selection frequencies >90% on the 1,000 bootstraps). Piperine is contained in black and long pepper (33) and is used as feed additive in animal feed, in particular for poultry (34). The human exposure to piperine via this latter source has been estimated at 0.93 μg/metabolic body weight per day (34). Several animal or cellular studies suggested a promising spectrum of properties for piperine such as anti-oxidant (35, 36), anti-inflammatory (33, 35, 36), immunomodulatory, bioavailability and absorption promoter for many active molecules (33, 37), antiasthmatic, anticonvulsing, antimutagenic, antimycobacterial, and anticancer activities (ref. 33; chemopreventive properties, including inhibition of angiogenesis and increased cell apoptosis), especially in breast cancer models (38–40). In our study, piperine was in particular positively associated with alcohol intake and with a “Western” dietary pattern. These associations are probably explained by the fact that either several foods (e.g., processed meat, sauces, industrial cheese, poultry) containing piperine as feed additive or via pepper are either part of a “Western” diet or consumed in association with alcoholic drinks. The association with alcohol intake could also be explained by higher alkaloid solubility in ethanol (41). Consistent with our findings, Playdon and colleagues (10) found a positive association between piperine and liquor intake and a decreased risk of breast cancer. Direct association was also found between piperine and wine intake in blood of female Twins (42). However, the origin of circulating piperine is not restricted to Western-type foods and can also be the results of adding black pepper into food (e.g., for salad or fish seasoning). Unfortunately, the level of detail of the SU.VI.MAX dietary questionnaire did not allow us to estimate pepper intake.
ATBC is an alternative plasticizer to phthalates (43) commonly used in polyvinyl resins and permitted as a food additive and food contact substance (44). Migration of ATBC from food packaging material into food has been observed for cheese, wrapped cake, microwaved soup, and microwaved peanut-containing cookies (44, 45) and its leaching rate from medical equipment was found 10 times faster than the potent endocrine disruptor di-2-ethylhexyl phthalate (44, 46, 47). In our study, we found a positive association between plasma ATBC and coffee intake. This association should be confirmed in independent human observational studies and in vivo or in vitro animal intervention studies; however, it may reflect contaminant migration from plastic cup into coffee. Milk added in coffee may facilitate this migration because one study suggested that ATBC is prone to migrate into protein liquids, such as aqueous skim milk solution (48). Its increase in plasma from women who have developed breast cancer could also come from other exposure that we were unable to detect in this study. Recent studies found potential biological activity of ATBC on tissue growth (49) and a potential disruption of ovarian function in female mice due to exposure to ATBC at low-dosage–imitating human exposure (50).
Currently, alcohol intake is the only established dietary risk factor for breast cancer risk with strong evidence (3). Some underlying mechanisms are already described; however, other remain to elucidate (3). In our study, pregnene-triol sulfate, a steroid sulfate hormone belonging to progestin family, was positively associated with both alcohol intake and breast cancer risk. Consistently with our results, a recent metabolomic study found positive correlations between alcohol intake and several serum steroids, notably pregnene-diol sulfate, which were associated with an increased breast cancer risk (10). Moreover, increased levels of sex steroids seem strongly associated with risk of postmenopausal breast cancers (51). Alcohol intake may increase circulating levels of steroid hormone, which could affect susceptibility to transform or promote cancer growth (52). An interventional study in postmenopausal women showed that alcohol consumption increased serum level of dehydroepiandrosterone (DHEA) sulfate (53), a precursor to androstenedione, testosterone, and, ultimately, estrone and estradiol. Furthermore, the pregnene-triol sulfate found in our study, may come from the 17-hydroxy-pregnenolone, which losing its side chain can produce DHEA. Other factors could influence the level of sex steroids, such as BMI and lactation (3). Furthermore, it has been shown that ATBC strongly activated human and rat Steroid and Xenobiotic Receptor (SXR) and may alter metabolism of endogenous steroid hormones (43).
In our study, increased plasma level of 2-amino-4-cyano butanoic acid was found in plasma from women who have subsequently developed a breast cancer during follow-up. This metabolite, also called alpha-amino-gamma-cyanobutanoic acid, is a non-proteinogenic alpha-amino acid (2-aminobutanoic acid) substituted at position 4 by a cyano group and an aliphatic nitrile. It may derive from butyrate (54) that is a short-chain fatty acid synthesized by the fermentation of fibers by colon bacteria (55). Butyrate has recently received growing attention for its beneficial effects on intestinal homeostasis and energy metabolism. With its anti-inflammatory properties, butyrate improves intestinal barrier function and mucosal immunity (56). In our study, we found an inverse association between plasma 2-amino-4-cyano butanoic acid and cake and biscuits intakes. To our knowledge, at this time no direct association between plasma 2-amino-cyano butanoic acid and diet exposure has been established. The production and effects of butyrate appear to be related to diet, including the type of dietary fibers and fat consumed, respectively (57). Butyrate is present in various types of foods, including vanilla, oats, peanut, and several fruits (58). Having no available data on the plasma butyrate level in our study, we cannot conclude on a possible link between the observed variations and a disturbance of butyrate metabolism and therefore of the microbiota. Our results need to be replicated in other independent observational studies, and intervention studies on animal models would provide a better understanding of the origin of its variations and related to food. Its association with the risk of breast cancer and a potential causal relationship could be investigated through cellular mechanistic studies.
Unfortunately, despite additional analytical analyses (including different fragmentation experiments using ultra high-resolution MS, fraction collection, pre concentration of the biological samples, H/D exchange and eventually additional analysis such as GC/MS), the other metabolites selected by Elastic Net regression for breast cancer risk analyses could not be identified, mainly because of limited signal intensities, the lack of commercial standards and the incompleteness of the available databases. However, sharing putatively annotated compounds or unknowns within the scientific community could be of great interest. Indeed, in case of absence of commercial standard, it could be relevant if several studies and consortia shared the same hypothesis, to further synthetize the compound or isolate it from a biological media. In case of too low signal abundance, as MS technologies are more and more sensitive, unknowns could be elucidated with new instruments in the future.
The major strengths of this study pertained to the very sensitive untargeted MS metabolomic analysis, the prospective design and the choice of statistical design using Elastic Net penalized regression; a suitable method to high-dimensional correlated data. Penalized regression techniques were applied to control the variance of estimates (that increases in the presence of many predictors or multicollinearity) by shrinking coefficients toward zero (59). Some penalized regression techniques such as the Least Absolute Shrinkage and Selection Operator (LASSO; ref. 59) and Elastic Net (21) simultaneously perform automatic variable selection by shrinking the irrelevant predictor coefficients to exactly zero. The latter tends to better select important variables when high correlations are present as in metabolomic data. Several studies applied these methods in multiple fields (as youth violence, genetics, cardiovascular disease risk; refs. 22, 60–62); however, to our knowledge, no previous study focusing on nutritional metabolomics and cancer prevention used penalized techniques. Nevertheless, several limitations should be acknowledged for this study. First, despite the processes of cross validation, iterations and bootstraps, these results need to be validated through an independent study sample. Unbiased predictive performance could not be investigated in this study due to the lack of an independent dataset. The investigation of performance gain when adding a diet-related metabolic signature to a model, including already known breast cancer risk factors would be useful to improve discrimination of women at higher risk of breast cancer. However, to get a positive impact in terms of prevention, this signature must be modifiable following a change of diet. This issue, as well as several others (e.g., replication, quantification) should be investigated before considering an application in public health. Second, associations may have been missed due to a lack of power, self-reporting bias or to the analytical protocol (LC-MS) that did not allow detecting all categories of metabolites. Indeed, although our analytical method was optimized to detect as many metabolites as possible, because of the huge chemical diversity of compounds in blood it is impossible to cover all metabolites using a single analytical method, even with an untargeted approach. Thus, some metabolites of interest may have been not detected with our analytical conditions. However, UPLC-MS is one of the most sensitive methodology available. Complementary GC-MS analyses could be useful to provide additional types of metabolites. Moreover, a larger sample size would have allowed a stratification of the population according to several parameters such as time prior to cancer diagnosis and menopausal status. Third, the possibility of residual or unmeasured confounding cannot be ruled out in this observational study. However, many potential confounders were accounted for. Finally, our study was based on a single blood draw, which limits the investigation of metabolomic profiles stability across time. Nevertheless, several studies have showed a good reproducibility of metabolomic measurements for most of metabolites (63, 64). Several blood draws during follow-up would allow a finer detection of metabolic pathways that are disrupted during carcinogenesis.
In conclusion, this exploratory prospective study identified a plasma diet–related metabolic signature of long-term breast cancer risk involving exogenous, steroid and microbiota-related metabolites. The hypotheses highlighted in this study should be further investigated in future large-scale independent studies. In the future, such signatures could help to better understand the etiology of nutrition and breast cancer and to identify key metabolites both associated to modifiable nutritional behavior and to breast cancer risk.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The funders had no role in the design, analysis, or writing of this article.
Authors' Contributions
Conception and design: L. Lécuyer, P. Micheau, P. Galan, S. Hercberg, C. Manach, M. Touvier
Development of methodology: L. Lécuyer, P. Micheau, C. Samieri, P. Ferrari, M. Touvier
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): P. Micheau, A. Rossary, A. Demidem, M. Petera, M. Lagree, D. Centeno, P. Galan, S. Hercberg, E. Kesse-Guyot, S. Durand, M. Touvier
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L. Lécuyer, C. Dalle, S. Lefevre-Arbogast, P. Micheau, B. Lyan, C. Samieri, N. Assi, V. Viallon, M. Deschasaux, V. Partula, B. Srour, E. Kesse-Guyot, N. Druesne-Pecollo, M.-P. Vasson, S. Durand, E. Pujos-Guillot, C. Manach, M. Touvier
Writing, review, and/or revision of the manuscript: L. Lécuyer, S. Lefevre-Arbogast, P. Micheau, A. Rossary, A. Demidem, M. Petera, P. Galan, C. Samieri, N. Assi, V. Viallon, M. Deschasaux, V. Partula, P. Latino-Martel, E. Kesse-Guyot, N. Druesne-Pecollo, S. Durand, E. Pujos-Guillot, C. Manach, M. Touvier
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): L. Lécuyer, P. Micheau, E. Kesse-Guyot, E. Pujos-Guillot, M. Touvier
Study supervision: P. Micheau, P. Galan, S. Hercberg, M. Touvier
Other (mass spectrometry analysis and metabolites identification): B. Lyan
Acknowledgments
The authors thank Younes Esseddik, Frédéric Coffinieres, Thi Hong Van Duong, Paul Flanzy, Régis Gatibelza, Jagatjit Mohinder, and Maithyly Sivapalan (computer scientists); Rachida Mehroug and Frédérique Ferrat (logistic assistants); Nathalie Arnault, Véronique Gourlet, PhD, Fabien Szabo, PhD, Julien Allegre, and Laurent Bourhis (data-manager/statisticians); and Cédric Agaesse (dietitian) for their technical contribution to the SU.VI.MAX study. We also thank Nathalie Druesne-Pecollo, PhD (operational coordination) as well as all participants of the SU.VI.MAX study. This work was conducted in the framework of the French network for Nutrition And Cancer Research (NACRe network), www.inra.fr/nacre, and received the NACRe Partnership Label. Metabolomic analysis was performed within the metaboHUB French infrastructure (ANR-INBS-0010). This work was supported by the French National Cancer Institute (grant number INCa_8085 for the project, and PhD grant number INCa_11323, to L. Lecuyer); the Federative Institute for Biomedical Research IFRB Paris 13; and the Cancéropôle Ile-de-France/Région Ile de France (PhD grant for M. Deschasaux).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.