Abstract
Chronic inflammation is an established risk factor for colorectal cancer. To study reactive products of gut inflammation and redox signaling on colorectal cancer development, we used untargeted adductomics to detect adduct features in prediagnostic serum from the EPIC Italy cohort. We focused on modifications to Cys34 in human serum albumin, which is responsible for scavenging small reactive electrophiles that might initiate cancers. Employing a combination of statistical methods, we selected seven Cys34 adducts associated with colorectal cancer, as well as body mass index (BMI; a well-known risk factor). Five adducts were more abundant in colorectal cancer cases than controls and clustered with each other, suggesting a common pathway. Because two of these adducts were Cys34 modifications by methanethiol, a microbial–human cometabolite, and crotonaldehyde, a product of lipid peroxidation, these findings further implicate infiltration of gut microbes into the intestinal mucosa and the corresponding inflammatory response as causes of colorectal cancer. The other two associated adducts were Cys34 disulfides of homocysteine that were less abundant in colorectal cancer cases than controls and may implicate homocysteine metabolism as another causal pathway. The selected adducts and BMI ranked higher as potentially causal factors than variables previously associated with colorectal cancer (smoking, alcohol consumption, physical activity, and total meat consumption). Regressions of case–control differences in adduct levels on days to diagnosis showed no statistical evidence that disease progression, rather than causal factors at recruitment, contributed to the observed differences. These findings support the hypothesis that infiltration of gut microbes into the intestinal mucosa and the resulting inflammation are causal factors for colorectal cancer.
Infiltration of gut microbes into the intestinal mucosa and the resulting inflammation are causal factors for colorectal cancer.
Introduction
Colorectal cancer is a major cause of human mortality, accounting for about 9% of all cancer deaths (1); however, the etiology of colorectal cancer is poorly understood. Because studies of families and twins have shown that heritable genetics contribute less than 15% to colorectal cancer incidence (2, 3), nongenetic factors must be important. Indeed, many studies have implicated diet and lifestyle factors with colorectal cancer risks (reviewed in ref. 4). Interestingly, some associations pointed to increased risks—notably, consumption of fat and red meat, smoking, and alcohol use—whereas others suggested reduced risks, namely, consumption of fish, fish oil, and fiber, plus regular exercise and intake of vitamin D, calcium, and aspirin. Because most of these risk factors implicate dietary exposures, recent interest has focused on the interplay between the diet and gut microbiota as a contributor to colorectal cancer (5, 6). In particular, evidence is accumulating that the shift away from fiber-rich foods in the “Westernized diet” has discouraged gut fermentation that enhances colonic health.
An emerging theme from this collection of risk factors is the hypothesis that colorectal cancer results from chronic promotion of gut dysbiosis “… creating a microclimate that promotes inflammation, proliferation and neoplastic progression” (5). Certainly, chronic colonic inflammation is a hallmark of inflammatory bowel disease and colitis-associated cancer and is an established risk factor for colorectal cancer. A critical adjunct to gut inflammation is production of reactive oxygen species (ROS) by neutrophils and macrophages that are mobilized in response to infiltration of microbiota into the intestinal mucosa. ROS can damage DNA and thereby initiate tumors; they can react with polyunsaturated fatty acids to produce reactive carbonyl species (RCS) that modify proteins and promote cancers; and they are important modulators of redox-signaling pathways that are activated by gut inflammation (7).
Despite their potential importance to cancer causation, ROS, RCS, and other reactive electrophilic products of human and microbial metabolism cannot generally be measured in vivo. This has motivated investigators to study the dispositions of reactive metabolites by monitoring adducts of these species with abundant proteins, particularly hemoglobin (Hb) and human serum albumin (HSA). Although most assays have targeted particular modifications of Hb and HSA selected a priori (8), recent work has explored untargeted avenues for characterizing adductomes at particular nucleophilic loci (9–11). Our laboratory developed an adductomics pipeline to investigate modifications at the highly nucleophilic Cys34 residue of HSA (11). We focused on Cys34, not only because it efficiently scavenges small reactive electrophiles (12) but also because its oxidation by ROS generates a host of reversible sulfoxidations that act as redox switches in homeostatic processes (13–16). Indeed, oxidation of HSA-Cys34 to the reactive sulfenic acid (Cys34-SOH) serves as an intermediate in formation of mixed Cys34 disulfides that are also sentinels of redox biology during the 1-month residence time of HSA (17).
Given evidence that reactive products of gut inflammation and modulation of redox signaling pathways are potential contributors to colorectal cancer, we conducted Cys34 adductomics with archived serum from incident colorectal cancer cases and matched controls from the European Prospective Investigation into Cancer and Nutrition (EPIC; ref. 18). This exploratory study is intended to discover discriminating adducts that can motivate hypotheses and follow-up of potentially important exposures or pathways leading to colorectal cancer. Results point to the associations of colorectal cancer with several adducts, some of which further implicate the gut microbiota and redox biology as potential causes.
Materials and Methods
Colorectal cancer cases and controls
Serum samples were obtained at recruitment from 95 pairs of incident colorectal cancer cases and matched controls (68 male pairs and 27 female pairs), collected between 1993 and 1997 from subjects in Turin, Italy, as part of the EPIC cohort study (18). Written informed consent was obtained from all participants and the study was conducted in accordance with recognized ethical guidelines (e.g., Declaration of Helsinki, CIOMS, Belmont Report, U.S. Common Rule). The study protocol was approved by an Institutional Review Board of the Human Genetics Foundation (Turin, Italy). Controls were sampled from within the cohort (a sample of the general population) and matched by age, gender, and enrollment year and season. The cohort was regularly followed up and, at diagnosis of colorectal cancer, cases were confirmed by colonoscopy and biopsy; matched controls were healthy and with few exceptions, did not undergo colonoscopies. Information related to the diet, body mass index (BMI), and lifestyle factors was obtained by questionnaire (19). Serum samples were obtained in cryostraws from the central biorepository of the International Agency for Research on Cancer (Lyon, France) where they had been stored in liquid nitrogen prior to shipment to our laboratory with further storage at −80°C for approximately 2 years prior to analysis. Upon processing of the serum, 59 samples had a gelled consistency, which was traced to an additive in the cryostraws (20). Because these gelled samples affected adductomic profiles, they were excluded, as were two subjects with large percentages of missing adducts, leaving 129 samples for downstream statistical analysis (57 cases and 72 controls), including 47 matched case–control pairs. Table 1 provides summary statistics for these subjects and relevant covariates (smoking, physical activity, consumption of alcohol and meat, and BMI). Out of these covariates, BMI was the most different between cases and controls (nominal P value = 0.026 from a two-sample t test), with cases having a higher average BMI.
. | Total, |\bi n\ $| = 129 . | CRC cases, |\bi n\ $| = 57 . | Controls, |\bi n\ $| = 72 . | Pa . |
---|---|---|---|---|
Gender | Male | 39 | 49 | |
Female | 18 | 23 | ||
Age at enrollment (y) | Mean | 55.30 | 55.05 | |
Median | 57.02 | 56.4 | ||
Minimum | 35.48 | 35.46 | ||
Maximum | 64.68 | 63.58 | ||
Years to diagnosis | Mean | 6.86 | — | |
Median | 6.99 | — | ||
Minimum | 0.02 | — | ||
Maximum | 14.41 | — | ||
BMI (kg/m2) | Mean | 27.06 | 25.52 | 0.026 |
Median | 26.71 | 25.01 | ||
Minimum | 19.68 | 18.73 | ||
Maximum | 40.68 | 33.57 | ||
Smoking status | Current | 10 | 17 | |
Former | 25 | 26 | ||
Never | 17 | 25 | ||
NA | 5 | 4 | ||
Alcohol consumption (mL/d) | Mean | 21.94 | 19.78 | 0.585 |
Median | 13.47 | 11.77 | ||
Minimum | 0.0 | 0.0 | ||
Maximum | 80.57 | 93.54 | ||
Physical activityb | Active | 9 | 11 | |
Moderately active | 10 | 20 | ||
Moderately inactive | 23 | 20 | ||
Inactive | 10 | 17 | ||
NA | 5 | 4 | ||
Total meat consumption (g/d) | Mean | 80.24 | 72.76 | 0.386 |
Median | 75.30 | 63.45 | ||
Minimum | 2.60 | 0.0 | ||
Maximum | 189 | 201.3 | ||
Total vegetable consumptionc (g/d) | Mean | 259.1 | 255.6 | 0.849 |
Median | 227.9 | 241.5 | ||
Minimum | 74.5 | 80.7 | ||
Maximum | 739.7 | 593.6 |
. | Total, |\bi n\ $| = 129 . | CRC cases, |\bi n\ $| = 57 . | Controls, |\bi n\ $| = 72 . | Pa . |
---|---|---|---|---|
Gender | Male | 39 | 49 | |
Female | 18 | 23 | ||
Age at enrollment (y) | Mean | 55.30 | 55.05 | |
Median | 57.02 | 56.4 | ||
Minimum | 35.48 | 35.46 | ||
Maximum | 64.68 | 63.58 | ||
Years to diagnosis | Mean | 6.86 | — | |
Median | 6.99 | — | ||
Minimum | 0.02 | — | ||
Maximum | 14.41 | — | ||
BMI (kg/m2) | Mean | 27.06 | 25.52 | 0.026 |
Median | 26.71 | 25.01 | ||
Minimum | 19.68 | 18.73 | ||
Maximum | 40.68 | 33.57 | ||
Smoking status | Current | 10 | 17 | |
Former | 25 | 26 | ||
Never | 17 | 25 | ||
NA | 5 | 4 | ||
Alcohol consumption (mL/d) | Mean | 21.94 | 19.78 | 0.585 |
Median | 13.47 | 11.77 | ||
Minimum | 0.0 | 0.0 | ||
Maximum | 80.57 | 93.54 | ||
Physical activityb | Active | 9 | 11 | |
Moderately active | 10 | 20 | ||
Moderately inactive | 23 | 20 | ||
Inactive | 10 | 17 | ||
NA | 5 | 4 | ||
Total meat consumption (g/d) | Mean | 80.24 | 72.76 | 0.386 |
Median | 75.30 | 63.45 | ||
Minimum | 2.60 | 0.0 | ||
Maximum | 189 | 201.3 | ||
Total vegetable consumptionc (g/d) | Mean | 259.1 | 255.6 | 0.849 |
Median | 227.9 | 241.5 | ||
Minimum | 74.5 | 80.7 | ||
Maximum | 739.7 | 593.6 |
Abbreviations: CRC, colorectal cancer; NA, not available.
aNominal P values from a two-sided t test.
bFor definitions and validation, see ref. 50.
cSum of leafy vegetables (raw and cooked), other vegetables, tomatoes (raw and cooked), root vegetables, cabbages, mushrooms, onion, garlic, mixed salad, mixed vegetables, and legumes.
Chemicals and reagents
With the following exceptions, all of the chemicals used in this study were the same as described previously (11). For the current investigation, sodium thiomethoxide (≥95%) and iodine (≥99%), were from Sigma-Aldrich, and hydrogen peroxide (30 wt. % aqueous solution) and formic acid (Optima, LC/MS grade) were from Thermo Fisher Scientific.
Sample processing and nLC-HRMS data acquisition
Sample processing and analysis by nano-liquid chromatography high-resolution mass spectrometry (nLC-HRMS) were performed as previously described (11). The order of analyses was randomized except that each case–control pair was analyzed on the same day, also with random order. Briefly, HSA was purified (≥75 %) by precipitating other serum proteins and residual Hb with 60% methanol. HSA was digested with trypsin at 37°C with high-pressure cycling for 30 minutes (NEP2320, Pressure BioSciences Inc.) and without prior reduction of disulfide bonds. Adducts were located on the triply charged “T3 peptide” (21ALVLIAFAQYLQQC34PFEDHVK41, m/z 811.7593). Prior to nLC-HRMS, 1 μL of an internal standard, consisting of the isotopically labeled T3 peptide modified at Cys34 with iodoacetamide (IAA-iT3, 20 pmol/μL), was added to normalize data for instrument performance. One microliter of each digest was injected into the nLC-HRMS, consisting of a Dionex Ultimate 3000 nanoflow LC system equipped with a Dionex monolithic column (100 μm internal diameter |\times \ $|25 cm) and connected via a Flex Ion nano-ESI source to an LTQ Orbitrap XL hybrid mass spectrometer (Thermo Fisher Scientific) that was operated in positive-ion mode. After duplicate injections of a sample, a blank sample was injected to reduce carryover effects, and after analysis of three samples, the LC column was washed with 1 μL of a solution containing 80% acetonitrile, 10% acetic acid, 5% DMSO, and 5% water to stabilize the chromatography.
Adducts were located on the T3 peptide based on the monoisotopic mass (MIM) within 10 ppm as described previously (11). By performing nLC-HRMS in data-dependent mode, the MS2 spectra for all triply charged precursor ions were first interrogated for b+- and y2+-series ions that are signatures of the T3 peptide and its modifications. Spectra displaying the requisite fragment ions were designated as putative T3 modifications. The corresponding precursor ions were then extracted from the total ion chromatogram (TIC) to obtain an MIM for each adduct feature. To normalize peak areas for the amount of HSA in each tryptic digest, the MIM was also extracted for the doubly charged HSA peptide (42LVNEVTEFAK51, m/z, 575.3111) adjacent to T3 and referred to as the “housekeeping peptide” (HKP). As shown previously (11), the peak area ratio (PAR), representing the ratio of the adduct-peak abundance to the HKP peak abundance, is a robust measure of the adduct concentration. Peaks representing the selected ion chromatogram (SIC) for the internal standard (IAA-iT3) were used to normalize for instrument performance. Peak picking and integration were performed using the Xcalibur Processing Method (version 3.0, Thermo Fisher Scientific) based on the average MIMs and retention times. Peak integration employed the Genesis algorithm after normalizing for instrument performance via iT3-IAA. Added masses relative to the Cys34 thiolate ion were estimated as Madduct = (m/zadduct − m/zT3-peptide) × 3 + 1.0078, where m/zadduct and m/zT3-peptide are the observed m/z values for the triply charged MIMs of a given precursor ion for an adduct and the unmodified T3 peptide, respectively, and 1.0078 is the mass of a hydrogen atom. All data processing utilized in-house software written in R.
Synthesis of reference standards
The identities of several adducts were verified by synthetic reference standards that had been prepared previously (11, 21, 22). A new reference standard for the Cys34 S-methanethiol adduct was prepared as follows. Two microliters of 25 mmol/L sodium thiomethoxide were diluted with 0.25 mL of water with and without 10 μL of 1 mmol/L hydrochloric acid and incubated at room temperature for 1 hour. Purified HSA from 15 μL of serum from a volunteer subject was diluted with 0.2 mL of digestion buffer and mixed with the thiomethoxide solution plus 1 μL of 30% H2O2 and 0.5 μL of 35 mmol/L of iodine. A negative control was also prepared with HSA from an additional 15 μL of serum that was processed with all reagents except sodium thiomethoxide. After incubation at room temperature with constant agitation for 24 hours, the reagents were removed with 30K MWCO spin columns and the modified HSA was digested with trypsin and analyzed by nLC-HRMS as described previously. The MIM for the Cys34 S-methanethiol adduct (m/z 827.0890) was extracted from TICs using a mass tolerance of 5 ppm.
Statistical analysis
Duplicate injections were averaged for each adduct peak, ignoring missing values, in order to reduce technical variation. Eight adducts were detected in only one or two serum samples and were excluded from further analyses. Using a cutoff of 15% for missing values across adducts, two subjects were excluded. Missing values were imputed using the k-nearest-neighbor method (23), with k = 5 adduct neighbors. Data were normalized using the Bioconductor R package “scone” (24), which employs linear regression models on scaled and logged feature abundances to adjust for various combinations of factors of unwanted variation (25). The “scone” package then evaluates each candidate normalization scheme with metrics that gauge the removal of unwanted variation and retention of wanted variation (e.g., case–control status) to help users select an appropriate normalization scheme. The top-ranking normalization scheme according to “scone” used DESeq scaling (26) and adjusted for unwanted variation due to digested HSA and instrument performance. Here, “digested HSA” was quantified by the abundance of the HKP and “instrument performance” was indicated by the drift in abundance of the internal standard (iT3-IAA) peak over time. All quantified T-3 peptides were clustered using the partitioning around medoids (PAM) method (k = 6, “pam” function in R) using Spearman correlations on the normalized abundances.
A combination of regression and classification methods was used to select adducts that were associated with colorectal cancer cases and controls. Because BMI was greater on average in colorectal cancer cases than controls (Table 1), this variable was also investigated. (Because of missing values of BMI, five cases and seven controls were excluded, leaving 117 subjects for analysis). First, the following multivariate linear regression model was fitted:
where |{Y_{ij}}$| represents logged and DESeq-scaled abundance for the jth adduct (or BMI) in the ith subject, |$X_{case}$| and |$X_{sex}$| are binary indicators, |$X_{age}$| is a continuous variable, |$X_{HKP}$| is the vector of HKP abundances, |$X_{IS}$| is the vector of internal standard abundances, and |{\varepsilon _i}$| is a random error term for the ith subject. The nominal P value corresponding to the coefficient |{\beta _1}$| was used to rank each variable by its association with case–control status. The mean case/control fold change in adduct levels was calculated as exp|( {{\beta _1}} )$|, and |{\beta _1}$| was used to represent the difference in average BMI between cases and controls, adjusting for sex and age.
Second, a regularized logistic regression (LASSO; ref. 27) of colorectal cancer case–control status on normalized adduct abundances and BMI along with sex and age (matching variables) was performed to find groups of variables associated with colorectal cancer. The logistic LASSO regression was performed on 500 bootstrapped datasets to provide stability (28), using the number of times a given adduct was selected in 500 iterations as a measure of its importance. A concordance plot was used to evaluate agreement between ranked P values from Model (1) and the bootstrapped LASSO variable importance measures. Variables that were top-ranking for both methods were selected.
Finally, variable importance measures from random forest classification of colorectal cancer case–control status were used to provide a nonlinear index of association (29). A random forest of 500 trees was used to predict case–control status on the basis of adduct abundances and BMI, and all variables were ranked in importance on the basis of their mean decrease in Gini index (30, 31). Variables with large increases in random forest variable importance were also considered for addition to the list of selected variables.
To investigate factors that could potentially drive relationships between selected adducts and colorectal cancer status, the covariates BMI (kg/m2), smoking (current vs. former/never), alcohol consumption (mL/d/d), physical activity (active/moderately active vs. moderately inactive/inactive), and total meat consumption (g/d) were evaluated because these variables have been implicated as risk factors for colorectal cancer (4). First, to determine whether the aforementioned covariates might have influenced selection of adducts in our ensemble variable selection method, a random forest classifier was used to rank all selected adducts and covariates by their importance in classifying colorectal cancer case status (31–33). Then, to obtain additional information about potential associations between adducts and covariates—regardless of case–control status—random forest classifiers were used to rank all measured adducts in terms of their predictive power for each particular covariate.
Because colorectal cancer cases and matched controls were evaluated up to 14 years after recruitment, we tested associations between adduct abundances and days (from recruitment) to diagnosis to discern whether they represent potentially causal effects or reactive effects of disease progression (34). If a significant linear trend in the log fold change for a given feature were detected with increasing days to diagnosis, the adduct would be regarded as potentially reactive.
Results
Adducts detected
A total of 55 modifications to the T3 peptide were detected in colorectal cancer cases and controls (Supplementary Table S1). Peak abundances covered a 2,250-fold range (PAR × 1,000: 0.09–203). On the basis of ANOVA of duplicate injections across blood specimens for the 46 adducts with sufficient data (Supplementary Table S2), the median intraclass correlation coefficient was 0.777 (range: 0.345–0.982), indicating that technical variation typically accounted for 23% of the total variance of adduct abundances. Coefficients of variations across duplicate injections for adducts ranged from 0.134 to 0.758, with a median value of 0.283, consistent with previous applications of the assay (11).
Accurate masses for 51 adducts led to reasonable elemental compositions added to the Cys34-S− ion within 3 ppm of theoretical values from −46 Da to 510 Da (negative added masses refer to deletions and truncations). A subset of 30 modifications to the T3 peptide was annotated including truncations [e.g., 796.43 (Cys34→Gly) and 805.76 (Cys34→oxoalanine or formylglycine)], Cys34 sulfoxidation products (816.42, 822.42, and 827.76, representing addition of 1, 2, and 3 oxygens to Cys34, respectively), RCS (i.e., crotonaldehyde, 835.11), and a host of mixed Cys34 disulfides, notably those of methanethiol (827.09), Cys (851.43), homocysteine (hCys, 856.10), CysGly (870.44), GluCys (894.44), and glutathione (913.45). About one-third of the T3 adducts were unannotated, including several whose MS2 spectra indicated T3 modifications at sites other than Cys34, including methylation (816.43).
We had previously detected 43 of these adducts in at least one of four studies with serum/plasma from diverse populations (11, 22, 35, 36), and 17 of these adducts were common to all four studies. This points to a pool of modifications of the T3-peptide that arises from a set of precursor molecules, including ROS, RCS, and small thiols from metabolic pathways involving common nutrients. Twelve adducts were unique to this study, and none of these modifications was annotated (Supplementary Table S1). The MS2 spectra and SICs/MS1 spectra of these 12 new adducts are reproduced in Supplementary Figs. S1 and S2, respectively.
Adducts associated with colorectal cancer
Results of our variable selection strategy are summarized in Fig. 1. The concordance of linear regression (Model 1) and bootstrapped LASSO logistic regression was 100% for the eight highest ranked variables, including seven adducts and BMI (Fig. 1A and B). In addition to BMI, five of the selected adducts were present at higher levels in colorectal cancer cases, namely, 853.78 (unknown), 835.11 (crotonaldehyde), 805.76 (Cys34→oxoalanine or formylglycine), 827.09 (S-methanethiol), and 811.76 (a “T3-labile adduct” detected with the same MIM as the T3 peptide but a different retention time, suggesting truncation of the adduct in the ESI source), whereas two adducts representing hCys disulfides were more abundant in controls, namely, 860.77 [S-hCys (+CH3)] and 850.10 [S-hCys (−H2O)]. Many of the adducts and BMI that had been selected by both linear regression and LASSO logistic regression were among the top-ranked variables determined by random forest (Fig. 1C). In fact, the S-methanethiol adduct was the only adduct to demonstrate a marked increase in Gini index by random forest (Fig. 1C).
Clusters of adducts and BMI resulting from the PAM algorithm are shown in Supplementary Fig. S3 (k = 6 resulted in the highest average silhouette width among k = 2,…, 8). Of the six clusters identified, the most informative was cluster 2, which included all of the five adducts that were more abundant in colorectal cancer cases than in controls (from Fig. 1A). Other adducts in this cluster included the Cys34 sulfonic acid (827.75), the T3 dimer (811.42), and two unknowns (847.77 and 815.44). The two selected adducts that were less abundant in colorectal cancer cases were disulfides of hCys that had been either methylated (860.77) or dehydrated (850.10). These adducts were grouped with each other and with the parent hCys disulfide (856.10) in cluster 5 of Supplementary Fig. S3. Interestingly, BMI did not cluster with any of the seven adducts associated with colorectal cancer. Spearman correlations between these selected adducts and BMI were ≤ |0.11| except for the dehydration product of hCys (850.10), which was −0.21.
Effects of covariates on associations between adducts and colorectal cancer
Variables ranked for importance by a random forest classification of case–control status are shown in Fig. 2 to compare the selected adducts (Fig. 1) and covariates previously associated with colorectal cancer (BMI, smoking, alcohol consumption, physical activity, and total meat consumption). Aside from BMI (second ranked), the other covariates ranked below the selected adducts as classifiers of colorectal cancer case status, suggesting that these covariates are probably not responsible for the potential associations shown in Fig. 1. When random forest models were constructed to investigate the variable importance of all 46 adducts as predictors of each covariate individually, the seven selected adducts were not top ranking for most of the covariates, further suggesting that the covariates are not driving the associations shown in Fig. 1. Nonetheless, there were some interesting results (Supplementary Fig. S4). For example, among smokers the variables with greatest importance were adducts of acrylonitrile (829.43) and ethylene oxide (826.43; Supplementary Fig. S4A), both of which had been previously associated with smoking in our adductomics pipeline (11). Also, the top three variables for BMI were Cys34 sulfoxidation products [(−H2+O), 816.42; (+CH3O2), 827.10; and (+HO2), 822.42] (Supplementary Fig. S4B). Top-ranking adducts for total meat consumption included two unknowns (847.77 and 815.44) and the Cys34→Gly truncation (796.43; Supplementary Fig. S4C); top-ranking adducts for physical activity included three unknowns (981.50, 894.13, and 879.13) plus the T-3 labile adduct (811.76) and S-Cys (NH2→OH; 851.76; Supplementary Fig. S4D); and top-ranking adducts for alcohol consumption included S-Cys (NH2→OH; 851.76), S-hCys (856.10), and S-glutathione (913.45; Supplementary Fig. S4E). Thus, of the seven adducts selected as associated with colorectal cancer (Fig. 1) only the T-3 labile adduct (811.76) had high-ranking variable importance for any of the tested covariates (i.e., physical activity and alcohol consumption), further suggesting that the underlying colorectal cancer associations were largely free of confounding by these variables.
Discussion
This is the first study to apply our adductomics pipeline to prospective analysis of colorectal cancer or cancer generally. We had validated our adductomics methodology with archived serum/plasma from healthy smoking and nonsmoking subjects (11) and subsequently applied it to populations with and without high exposures to indoor combustion products (36) or benzene (22) and to subjects with and without lung or heart disease (35). This led to measurement of more than 75 adducts, several of which were significantly associated with particular exposures or diseases, notably Cys34 modifications of reactive oxygen and carbonyl species and disulfides of small thiols derived from redox processes. This combination of results points to separate windows that Cys34 modifications provide for viewing exposure-specific electrophiles and global characteristics of the redox proteome (14).
Using an ensemble of regression and classification methods developed initially for untargeted metabolomics (20, 31), we selected seven adducts as potentially associated with colorectal cancer in 57 cases and 72 control subjects from the EPIC cohort. The fact that our adductomics pipeline had previously detected all seven of the selected adducts in various human populations (11, 22, 35, 36) suggests that the reactive precursors of these modifications represent common exposures of everyday life that are modulated by pathways leading to colorectal cancer.
Using results from two large cohort studies, Morikawa and colleagues argued that high BMI increased colorectal cancer risk through a combination of obesity, insulin, and insulin-like growth factor-1 that modulated the Wnt-β-catenin (WNT-CTNNB1; official HGNC ID: HGNC:2514) signaling pathway in unspecified ways (37). Based on a review of recent literature, Liu and colleagues concluded that the WNT-CTNNB1 signaling pathway was modulated by production of ROS (38). Thus, it is interesting that the top-ranking adducts in terms of random forest variable importance for BMI were all sulfoxidation products of Cys34 (Supplementary Fig. S4B) that are formed by Cys34 reactions with ROS. However, because these Cys34 sulfoxidation products were not among the seven adducts selected as associated with colorectal cancer (Fig. 1), it seems that multiple pathways are involved in the etiology of this cancer.
Five of the seven adducts associated with colorectal cancer incidence were more abundant in cases than controls, with fold changes between 1.11 and 1.20 (Fig. 1A). Interestingly, all five of these adducts clustered with each other (Fig. 2, cluster 2), suggesting a common pathway. Yet, the sources and biochemistry underlying production of these adducts are varied. Perhaps the most informative of these five adducts is the S-methanethiol modification of Cys34 (827.09) that was also observed in two previous studies (11, 36). This modification results from oxidation of Cys34 to the sulfenic acid (Cys34-SOH), which subsequently binds with circulating methanethiol, with loss of H2O, to form the corresponding Cys34 disulfide (17). Methanethiol is a product of microbial–human cometabolism that is mediated by the gut microbiota via catabolism of methionine and/or methylation of hydrogen sulfide (39). Interestingly, methanethiol was found to be more abundant in feces from patients with colorectal cancer than from controls (40). Thus, we speculate that the Cys34 adduct of methanethiol is a biomarker of human enteric bacteria and that the increased abundance of this adduct in colorectal cancer cases further implicates the gut microbiota as a risk factor, consistent with formal hypotheses (5, 6). It is also worth noting that Bae and colleagues reported a positive association between colorectal cancer risk in postmenopausal women and plasma trimethylamine-N-oxide, which is another human-microbial cometabolite (41).
Another Cys34 adduct with mechanistic significance is the crotonaldehyde modification (835.11). Crotonaldehyde is a reactive α, β-unsaturated aldehyde produced by ROS oxidation of membrane lipids (42). We had previously shown that workers exposed to high levels of benzene—a strong promoter of ROS—had elevated serum levels of this crotonaldehyde adduct compared with controls (22). In their review of redox biology and colorectal cancer, Liu and colleagues linked lipid peroxidation with COX2 expression and two subsequent pathways toward colorectal cancer, one involving production of prostaglandins and the other involving reduced degradation of CTNNB1 (38). The fact that the crotonaldehyde adduct (835.11) clustered with the S-methanethiol adduct (827.09; cluster 2 of Supplementary Fig. S3) lends credibility to the hypothesis that invasion of gut microbiota into the intestinal mucosa initiates a chain of events involving an inflammatory response, followed by production of ROS, RCS, and subsequent damage to DNA and proteins as well as modulation of redox signaling pathways (7).
Potential origins of the other three adducts in cluster 2 (Supplementary Fig. S3) that are associated with colorectal cancer are more difficult to characterize. Adduct 805.76 represents conversion of Cys34 to oxoalanine or formylglycine with a mass loss of 18 Da. Although oxidative cleavage of the sulfhydryl group from protein cysteine residues to produce dehydroalanine (−34 Da) and serine (−16 Da) has been reported (43, 44), we have not found evidence of modifications yielding the observed mass shift of −18 Da. It also seems unlikely that 805.76 represents conversion to Cys34 to formylglycine by human or microbial sulfatase metabolism because Cys34 is not embedded in the sequence motif (CXPXR) recognized by sulfatases (45). Regarding unknown adduct 853.78, we had previously detected this modification in two studies and suspected that it was a Cys34 disulfide of a small thiol because it disappeared after treatment of HSA with tris(2-carboxyethyl)phosphine hydrochloride (TCEP), a reagent that selectively cleaves disulfide bonds (11, 36). However, none of the putative elemental compositions that include a sulfur atom (C7H11S, C5H7N2S or C6H9NS) resulted in a plausible added mass relative to that observed (127.077 Da). On the basis of analysis of MS2 fragmentation spectra, it seems that the same precursor ion (853.7834) can generate two different sets of fragment ions that suggest rearrangement during collision-induced dissociation in the mass spectrometer. And finally, the T3-labile adduct (811.76) seems to represent a T3 modification(s) that is cleaved in the ESI source to yield the unadducted T3 peptide, albeit with a different retention time. Although this modification has been observed in all previous studies, we have no information about its identity.
The other two adducts potentially associated with colorectal cancer in our samples were Cys34 disulfides of hCys that were either methylated (860.77) or dehydrated (850.10) at another site on the T3 peptide. Unlike the other five associated adducts, these hCys modifications were less abundant in colorectal cancer cases than in controls (Fig. 1A) and clustered with each other and the unmodified Cys34-hCys disulfide (856.10; cluster 5 of Supplementary Fig. S3). As a key intermediate in one-carbon metabolism, hCys is remethylated to produce methionine, and subsequently S-adenosylmethionine, which plays an important role in DNA methylation that has been linked to colorectal cancer and other cancers (46). However, recent meta-analyses of many case–control studies (46, 47) and a combination of case–control and cohort studies (48) point to colorectal cancer risks that increase with hCys blood concentrations, which is the reverse of what we observed. However, estimated effect sizes were smaller in cohorts than case–control studies, and numerous dietary and lifestyle factors increased colorectal cancer risks (reduced intake of fiber, methionine, vitamin B9 or folate, and vitamin B6, and increased intake of B12, alcohol, and smoking; ref. 48). Also, hCys levels have been shown to increase with age greater than (but not less than) 65 years (49) and the mean age across 16 studies that linked colorectal cancer with increasing hCys by Xu and colleagues (47) was 61.4 years (SD = 3.7 years). This indicates that many subjects in the meta-analysis (47) were older than 65 years and this may have contributed to increased levels of hCys. In contrast, the mean age at phlebotomy of the 57 cases in our study was 55.3 years. Thus, although it is difficult to entirely reconcile our findings about adducts 860.77 and 850.10 with the current epidemiologic literature, we cannot rule out their connections to a potentially causal pathway involving hCys metabolism.
Finally, to determine whether modulation in levels of selected adducts was the result of disease progression rather than a causal factor, we examined the relationships between log fold changes of adduct abundances of colorectal cancer case–control pairs and days (from recruitment) to diagnosis (34). Results are presented in Supplementary Fig. S5 as individual plots for the selected adducts. In each case, the P value for slope of the linear relationship was large, indicating that there is little statistical evidence supporting the notion that disease progression (reverse causality) rather than a causal pathway(s) leads to differential adduct abundances between cases and matched controls.
Our study had several limitations. The initial sample size was small (95 cases and matched controls) and then was reduced because of exclusion of gelled samples from cryostraw storage and missing information about BMI in some subjects. Also, we had no information about aspirin use and histories of colorectal cancer in families of cases, two factors that have been associated with colorectal cancer (4). The storage of biological specimens for decades can lead to artifacts but in our study, all specimens were collected within 4 years and cases and controls were matched by year of enrollment to minimize potential effects of sample storage on case–control differences. Three of the seven adducts selected for associations with colorectal cancer were unannotated and, therefore, of limited utility in discovery of causal factors. Another limitation was our inability to examine possible connections between adducts and advanced neoplasms (precursors of colorectal cancer) and advanced-stage versus early-stage cancers.
In summary, we used untargeted adductomics to detect 51 adduct features in HSA from incident cases and controls from the EPIC cohort, of which, seven were found to be associated with colorectal cancer (Fig. 1). Two adducts were more abundant in colorectal cancer cases than controls and represent Cys34 modifications by methanethiol and crotonaldehyde that jointly implicate infiltration of gut microbes into the intestinal mucosa and the corresponding inflammatory response as potential causes of colorectal cancer. Two other associated adducts were disulfides of hCys that were both less abundant in colorectal cancer cases than in controls and may implicate hCys metabolism as a contributor to colorectal cancer. These adducts should be targeted for validation in independent samples of colorectal cancer cases and controls and should motivate mechanistic hypotheses about the underlying causal exposures and pathways. For example, the methanethiol/crotonaldehyde adducts could be measured in colorectal cancer cases and controls in conjunction with metagenomics of fecal samples to determine whether particular strains of microbiota may be responsible for the observed effects. It would also be interesting to determine whether there are associations between Cys34 adducts and DNA adducts or mutations in oncogenes or tumor suppressor genes in colorectal cancer cases.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: H. Grigoryan, S.M. Rappaport
Development of methodology: H. Grigoryan, S. Dudoit, S.M. Rappaport
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H. Grigoryan, M.J. Gunter, A. Naccarati, S.M. Rappaport
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H. Grigoryan, C. Schiffman, M.J. Gunter, S. Dudoit, S.M. Rappaport
Writing, review, and/or revision of the manuscript: H. Grigoryan, C. Schiffman, M.J. Gunter, A. Naccarati, S. Polidoro, S. Dudoit, S.M. Rappaport
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H. Grigoryan
Study supervision: S.M. Rappaport
Acknowledgments
Financial support for this work was provided from the U.S. National Institutes of Health through grants R33CA191159 from the NCI (to S.M. Rappaport, C. Schiffman, and S. Dudoit) and P42ES04705 from the National Institute for Environmental Health Sciences (to S.M. Rappaport) and grant agreement 308610-FP7 from the European Commission (Project Exposomics; to P. Vineis and S.M. Rappaport). The authors appreciate the assistance of Kelsi Perttula and William Edmands, who extracted serum from the cryostraws; Anthony Iavarone, who assisted with mass spectrometry; and Carlotta Sacerdote, who provided information regarding covariates.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.