Abstract
Lung cancer remains the most common cause of cancer deaths worldwide, yet there is currently a lack of diagnostic noninvasive biomarkers that could guide treatment decisions. Small molecules (<1,500 Da) were measured in urine collected from 469 patients with lung cancer and 536 population controls using unbiased liquid chromatography/mass spectrometry. Clinical putative diagnostic and prognostic biomarkers were validated by quantitation and normalized to creatinine levels at two different time points and further confirmed in an independent sample set, which comprises 80 cases and 78 population controls, with similar demographic and clinical characteristics when compared with the training set. Creatine riboside (IUPAC name: 2-{2-[(2R,3R,4S,5R)-3,4-dihydroxy-5-(hydroxymethyl)-oxolan-2-yl]-1-methylcarbamimidamido}acetic acid), a novel molecule identified in this study, and N-acetylneuraminic acid (NANA) were each significantly (P < 0.00001) elevated in non–small cell lung cancer and associated with worse prognosis [HR = 1.81 (P = 0.0002), and 1.54 (P = 0.025), respectively]. Creatine riboside was the strongest classifier of lung cancer status in all and stage I-II cases, important for early detection, and also associated with worse prognosis in stage I-II lung cancer (HR = 1.71, P = 0.048). All measurements were highly reproducible with intraclass correlation coefficients ranging from 0.82 to 0.99. Both metabolites were significantly (P < 0.03) enriched in tumor tissue compared with adjacent nontumor tissue (N = 48), thus revealing their direct association with tumor metabolism. Creatine riboside and NANA may be robust urinary clinical metabolomic markers that are elevated in tumor tissue and associated with early lung cancer diagnosis and worse prognosis. Cancer Res; 74(12); 3259–70. ©2014 AACR.
Introduction
Lung cancer is the leading cause of cancer deaths in men and women in the United States (1, 2) and worldwide (3), and survival rates are dismal. When the disease is detected while it is still localized, the 5-year survival rate is 53%, but that rate drops to 24% for regional disease and, even more significantly, to <5% for distant tumors (4). However, these survival rates could be improved substantially with the identification of biomarkers to support the accurate and reliable diagnosis and prognosis of lung cancer.
Current clinically accepted methods for detecting lung cancer include low-dose spiral computed tomography (LDCT) scanning in smokers between the ages of 55 to 74 years and a history of smoking 30 packs of cigarettes per year (5, 6). However, LDCT scanning provides a high rate of false positives—96.4% overall and 24% in combination with invasive testing (7). Moreover, LDCT scanning may be of concern due to an increased lung cancer risk associated with radiation exposure (8). As a result, the medical community requires a concordant biomarker to better identify patients who should be screened or who should undergo invasive diagnostic work-ups. However, to date, no molecular biomarker for early-stage lung cancer has been validated (9, 10).
Several biomarkers currently support the assessment of overall prognosis and guide therapy decisions. For example, the KRAS mutation in non–small cell lung cancer (NSCLC) confers a significantly shorter survival (HR = 121) in stage IV disease (11), and the presence of an ALK or EGF receptor mutation indicates a responsive tumor to targeted therapies and longer survival (12–15). However, these biomarkers for lung cancer outcomes are based on tumor assays, an invasive approach that can be hindered by the limited availability of tissue.
Urine is now attracting increased attention as a biospecimen for detecting cancer biomarkers (16), not only because it is collected noninvasively, but also because it is abundant and requires minimal preparation. For instance, one urinary cancer biomarker, PCA3, is currently applied clinically to detect prostate cancer (17). No clinically applied biomarkers exist yet for lung cancer. Nonetheless, promising urinary biomarkers include modified nucleosides (18–21), whose high levels indicate an increased RNA turnover and degradation and whose utility is being evaluated in clinical trials. However, modified nucleosides are elevated in many different tumor types, and therefore may not be cancer-type specific (22).
Mass spectrometry (MS)-based metabolomic approaches are increasingly used for uncovering new biomarkers for diagnosis (23–28) and customized treatment (29), as well as for evaluating pathologic characteristics of metastatic cells (30) and carcinogenic tobacco-smoke constituents (31, 32). The reliability and reproducibility of such approaches are robust (33) and the technologies are currently in place in clinical practice (34), making them strong candidates for uncovering potential biomarkers. Unfortunately, most studies suffer from limited sample sizes, poor quality control, and a lack of technical and biologic validation.
To address these current limitations, we have taken a comprehensive approach utilizing state of the art methodology and a large sample size, and have uncovered robust and technically validated biomarkers that can aid diagnosis and guide therapeutic decisions in NSCLC. Initially, we measured small (<1,500 Da) urinary molecules from 1,005 individuals with and without lung cancer (training set) to uncover metabolites that most strongly distinguished the two groups. We found that levels of four metabolites were elevated in patients with lung cancer and best predicted their lung cancer status, independent of their gender, race: creatine riboside (a novel molecule identified in our study), N-acetylneuraminic acid (NANA), cortisol sulfate, and an as-yet-unidentified glucuronidated compound referred to as 561+. These results were confirmed in a validation set comprising 158 individuals, and abundances of significant metabolites were further validated through absolute quantitation and values normalized to urinary creatinine levels to control for kidney function. The applicability of these findings to lung cancer diagnosis in clinical practice is primarily focused on two of the urinary metabolites, creatine riboside and NANA, which were significantly more abundant in stage I tumors when compared with adjacent nontumor lung tissues. This association in the tissue provides a direct link to altered tumor metabolism and importantly, elevated levels of these metabolites can be noninvasively detected in the urine. Notably, elevated levels of these metabolites are also associated with worse prognosis.
Patients and Methods
Study subjects
Urine samples from 469 patients with NSCLC and 536 population controls collected from 1998 to 2007 from the greater Baltimore, MD, area were used as a training set (Table 1). Patients were recruited from pathology departments, pulmonary, and thoracic clinics with the cooperation of attending physicians in seven hospitals: Baltimore Veterans Administration Medical Center (Baltimore, MD), Bon Secours Hospital (Cork, Ireland), MedStar Harbor Hospital (Baltimore, MD), Sinai Hospital (Baltimore, MD), Johns Hopkins Bayview Medical Center, The Johns Hopkins Hospital, and University of Maryland Medical Center (Baltimore, MD). Population controls were identified from the Department of Motor Vehicles (DMV) lists and frequency matched to cases by age, gender, and self-reported race. Patients with lung cancer were not diagnosed with other cancer types. Findings from the training set were replicated in an additional set of 80 recently diagnosed cases (years of diagnosis 2008–2010) and 78 population controls (recruited through the DMV), a sample set we refer to as a validation set (Table 1). These validation set samples have a similar distribution of demographic and clinical characteristics when compared with the training set. We also utilized 48 tumor and adjacent nontumor stage I tissue pairs, of which 20 were a subset of the training set. Survival times were calculated as time of diagnosis to time of death or to follow-up (2010); death due to cancer was determined from the National Death Index extraction of the death certificates. This study was approved by the Institutional Review Boards of the seven institutions. Urine samples were collected at the time of interview when possible. If collected at a different time, a brief intake questionnaire was administered including recent smoking information. In each case, urine was collected in a plain, sterile 50 mL container and transported to the University of Maryland where it was split into 10 mL aliquots and stored at −80°C until used. Urines were thawed on wet ice at the time of use. Subjects were not required to fast or undergo any other preparatory procedure before urine collection. The time of interview and subsequent urine collection was recorded with the questionnaire data.
Sample characteristics of all sample sets presented in the study
. | Training set . | Validation seta . | Tissue set . | ||||
---|---|---|---|---|---|---|---|
. | All (N = 1,005) . | Cases (N = 469) . | Population controls (N = 536) . | All (N = 158) . | Cases (N = 80) . | Population controls (N = 78) . | Tumor/adjacent normal pairs (N = 48) . |
Age | (mean = 66.4) | (mean = 66.2) | (mean = 66.6) | (mean = 66.7) | (mean = 64.2) | (mean = 68.7) | (mean = 68.9) |
>Mean | 519 | 240 | 279 | 82 | 35 | 47 | 27 |
≤ Mean | 486 | 229 | 257 | 76 | 45 | 31 | 21 |
Smoking statusb | |||||||
Ever | 10 | ||||||
Current | 293 | 222 | 71 | 46 | 38 | 8 | 17 |
Former | 463 | 214 | 249 | 73 | 31 | 42 | 17 |
Never | 249 | 33 | 216 | 39 | 11 | 28 | 4 |
Histology | |||||||
ADC | 216 | 51 | 31 | ||||
SCC | 122 | 14 | 16 | ||||
NSCLC | 131 | 10 | 1 | ||||
Gender | |||||||
Female | 492 | 232 | 260 | 81 | 46 | 35 | 24 |
Male | 513 | 237 | 276 | 77 | 34 | 43 | 24 |
Raceb | |||||||
African-American | 366 | 127 | 239 | 70 | 35 | 35 | 9 |
Caucasian | 639 | 342 | 297 | 88 | 45 | 43 | 39 |
Stagec | |||||||
I–II | 213 | 31 | 48 | ||||
III–IV | 103 | 41 | 0 |
. | Training set . | Validation seta . | Tissue set . | ||||
---|---|---|---|---|---|---|---|
. | All (N = 1,005) . | Cases (N = 469) . | Population controls (N = 536) . | All (N = 158) . | Cases (N = 80) . | Population controls (N = 78) . | Tumor/adjacent normal pairs (N = 48) . |
Age | (mean = 66.4) | (mean = 66.2) | (mean = 66.6) | (mean = 66.7) | (mean = 64.2) | (mean = 68.7) | (mean = 68.9) |
>Mean | 519 | 240 | 279 | 82 | 35 | 47 | 27 |
≤ Mean | 486 | 229 | 257 | 76 | 45 | 31 | 21 |
Smoking statusb | |||||||
Ever | 10 | ||||||
Current | 293 | 222 | 71 | 46 | 38 | 8 | 17 |
Former | 463 | 214 | 249 | 73 | 31 | 42 | 17 |
Never | 249 | 33 | 216 | 39 | 11 | 28 | 4 |
Histology | |||||||
ADC | 216 | 51 | 31 | ||||
SCC | 122 | 14 | 16 | ||||
NSCLC | 131 | 10 | 1 | ||||
Gender | |||||||
Female | 492 | 232 | 260 | 81 | 46 | 35 | 24 |
Male | 513 | 237 | 276 | 77 | 34 | 43 | 24 |
Raceb | |||||||
African-American | 366 | 127 | 239 | 70 | 35 | 35 | 9 |
Caucasian | 639 | 342 | 297 | 88 | 45 | 43 | 39 |
Stagec | |||||||
I–II | 213 | 31 | 48 | ||||
III–IV | 103 | 41 | 0 |
aFive samples are missing histology, and eight samples are missing stage information.
bSelf-reported smoking status and race.
cOnly pathologically staged cases, according to the seventh edition of the Cancer Staging Manual of the American Joint Committee on Cancer, were utilized for stratified analyses.
Detailed clinical information derived from extensive questionnaires is available for each patient, including age, gender, self-reported race, self-reported smoking status (never smokers, having smoked less than 100 cigarettes in their lifetime; former smokers, having quit smoking at least 6 months before the interview date), pack years, histology, American Joint Committee on Cancer (AJCC) staging, and survival (Table 1). Lung cancer diagnosis was pathologically determined. Staging was performed by a pathologist using the seventh edition of the AJCC's Cancer Staging Manual (35).
Study design
All initial analyses were performed in a training set comprising 1,005 samples (Table 1). Results from Random Forest (36, 37) classifications and univariate Cox analysis were combined to identify four metabolites that were predictive of both lung cancer diagnosis and prognosis. Results were then confirmed in a quantitation set (N = 198) comprising a subset of the training set samples, and a validation set of 158 urine samples independent of the training set samples (Table 1). Finally, the four metabolites of interest were measured in 48 matched tumor and adjacent nontumor tissue pairs. The overall study design is depicted in Supplementary Fig. S1.
Untargeted metabolite profiling using UPLC-ESI-QTOFMS
We analyzed urine samples using a quadrupole time-of-flight (QTOF) mass spectrometer (Premier, Waters), in positive (ESI+) and negative (ESI−) electrospray ionization modes, using a 50 × 2.1 mm Acquity 1.7 μm C18 column (Waters Corp). Urine samples were diluted with an equal volume of 50% aqueous acetonitrile containing debrisoquine (ESI+ internal standard) and 4-nitrobenzoic acid (ESI− internal standard). Samples were centrifuged at 14,000 × g for 20 minutes at 4°C to precipitate proteins. Five μL was chromatographed on a 50 × 2.1 mm Acquity BEH 1.7 μm C18 column (Waters) using an Acquity UPLC system (Waters). The gradient mobile phase consisted of 0.1% formic acid (A) and acetonitrile containing 0.1% formic acid (B). A typical 10-minute sample run (at 0.5 mL/minute) consisted of 0.5 minute of 100% solvent A followed by a linear gradient to 80% A at 4 minutes, to 5% A at 8 minutes. After a 0.5-minute wash step, the column was equilibrated to initial conditions for 1.5 minutes. The eluent was introduced by electrospray ionization into the QTOF mass spectrometer (Premier, Waters) operating in ESI+ or ESI−. The capillary and sampling cone voltages were set to 3,000 and 30 V, respectively. Source and desolvation temperatures were set to 120°C and 350°C, respectively, and the cone and desolvation gas flows were set to 50.0 and 650.0 L/hour, respectively. To maintain mass accuracy, sulfadimethoxine at a concentration of 300 pg/μL in 50% aqueous acetonitrile was used as a lock mass and injected at a rate of 50 μL/minute. For MS scanning, data were acquired in centroid mode from 50 to 850 m/z and for tandem MS, the collision energy was ramped from 5 to 35 V.
To avoid artifacts based on sample injection order, the order was randomized. Five different quality control sets were included with the runs to assess machine sensitivity and sample carry over. First, 169 “pooled” samples, containing aliquots from 108 randomly selected urine samples were processed randomly throughout the run. Second, a standard cocktail containing theophylline, caffeine, hippuric acid, 4-nitrobenzoic acid, and nortriptyline (designated as MetMix) was injected every 100 samples. Third, 32 blanks were randomly injected to assess sample carryover. Fourth, 48 samples with four high-purity nicotine metabolite standards, including cotinine, nicotine-N'-oxide, anabasine, and trans-3′-hydroxycotinine (Sigma-Aldrich), were spiked into urine. Fifth, 10% of the samples were randomly selected and processed in duplicate at the end of the run to evaluate chromatogram consistency. Finally, debrisoquine and 4-nitrobenzoic acid were spiked into samples for runs in ESI+ and ESI− modes, respectively.
Raw chromatograms along with extracted and normalized ion counts can be accessed in the MetaboLights database with study identifier MTBLS28.
Metabolite quantitation
Urine samples were processed with an equal volume of 50% aqueous acetonitrile containing chloropropamide and aminopimelic acid as internal standards and chromatographed on a 50 × 2.1 mm Acquity BEH 1.7 μm C18 column using an Acquity UPLC system (Waters). MRM transitions were monitored using a Xevo TQMS (Waters). In addition, samples were analyzed using hydrophilic interaction chromatography (HILIC) columns (Acquity UPLC BEH Amide 1.7 μm 50 × 2.1 mm) for the quantitation of creatine riboside and NANA. HILIC columns improve retention, separation, and detection of highly polar metabolites.
Tissue metabolite extraction and quantitation
Tumor and matched adjacent nontumor tissues were pulverized by cryogenic grinding (Cryomill, Retsch GmbH) using a 5-mm stainless steel ball per sample. Average sample weight was 15 mg (with a range between 3 mg and 30 mg). A monophasic mixture of ice-cold chloroform:methanol:water (2:5:2, v:v:v) was used for extraction. Samples were centrifuged at 14,000 × g for 15 minutes at 4°C, dried down using vacuum evaporator (SpeedVac), and reconstituted in 70% aqueous acetonitrile, of which 5 μL was injected onto the Xevo TQMS system for analysis.
Statistical analyses
Samples were classified as lung cancer or healthy controls using an R package Random Forests (36, 37). For additional details about Random Forests parameters used in data processing, please see Supplementary Materials and Methods.
Unconditional logistic regression was performed in STATA (Stata Statistical Software Release 11.2), while controlling for race, gender, interview year, smoking status, pack years, and urine collection time. NANA levels do show some diurnal variation (Supplementary Fig. S7), and therefore all analyses were also adjusted for the time of day urine was collected. Unconditional logistic regression analysis was performed on categorical variables calculated by dichotomizing metabolite abundances into high (≥75th percentile) and low (< 75th percentile) based on the distribution of metabolite abundances in the population control subjects. Unconditional logistic regression models were used to estimate ORs and 95% confidence intervals (CI) for both univariate and multivariate models adjusted for race, gender, interview year, smoking status, pack years, and urine collection time. False discovery rates (FDR) were calculated using the Benjamini and Hochberg method (38).
Survival analyses were performed on categorical variables of dichotomized metabolite abundances in SAS Enterprise Guide, version 4.2 (SAS Institute Inc.), and all reported P values are two sided. Cox models with left truncation were performed to account for the lag time between diagnosis and urine collection dates (up to 2 years). Multivariate Cox models were adjusted for urine collection time, histology, stage, race, gender, interview year, pack years, smoking status, chemotherapy/radiation, and surgery status. The proportional hazards assumption (39) was tested, and if it was not met, the HR function was calculated separately before and after a given time point. This cutoff was determined by the time at which the survival curves started to diverge/converge and by ensuring that the β coefficients of the signal-time term before and after were no longer significant.
Receiver operating characteristics (ROC) were conducted in STATA 11.2 to assess the predictive value of identified metabolites in lung cancer diagnosis using roctab and roccomp functions. Models were built using logistic regression on the continuous abundances of each metabolite individually, and on the combination of the four metabolites. For the comparison of ROC curves, rocreg function in STATA 11.2 was used.
Nonparametric Wilcoxon test in STATA 11.2 was utilized to assess abundance differences of four metabolites, as detected in the urine of patients with lung cancer when compared with population controls, for three sets (training, validation, and quantitation sets).
Paired Student t test in STATA 11.2 was used to assess abundance differences between 48 tumor and 48 adjacent nontumor tissue samples. All reported P values are double sided.
Results
Quality control assessment of the metabolomics data
Initially, abundances of possible small (<1,500 Da) urinary molecules in a training set comprising 1,005 urine and 521 quality control samples (Table 1 and Supplementary Fig. S1) were measured using ultraperformance liquid chromatography-electrospray-ionization-quadrupole time-of-flight (UPLC-ESI-QTOF) MS. After signal filtering (see Supplementary Materials and Methods for additional detail), a total of 1,807 signals were detected in the positive and 1,359 in the negative ionization mode, which represents a comprehensive pool of small urinary molecules. Signals here refer to unique m/z and retention time pairs and not unique metabolites. It is possible that a metabolite could be represented by multiple signals due to adduct formation and/or fragmentation occurring in the mass spectrometer.
The quality and robustness of our measurements were assessed using a variety of internal controls. First, the expected clustering of quality control samples (blanks, MetMix, pools, nicotine standards) apart from the lung cancer and population control urine samples were observed in the multidimensional scaling analysis (see Materials and Methods for additional detail; Supplementary Fig. S2A). Second, measurement reproducibility within a run was assessed by processing 169 (∼15%) randomly selected, duplicate samples, and a strong correlation was observed with Pearson correlation coefficients >0.85 for the large majority of samples (Supplementary Fig. S2B). Third, the distribution of coefficients of variation (CV) was assessed to ensure a small variation in quality control measurements. As expected, coefficients of variation were considerably smaller for the quality control samples compared with the study subject samples (P < 0.00001; Supplementary Fig. S2C).
Predictions of smoking status
As a proof of principle, we aimed to classify individuals by their smoking status (smokers vs. nonsmokers of self-reported smoking status) to ensure that known metabolites related to tobacco smoke were detectable and strongly predictive of the self-reported smoking status. Random Forests (36, 37) was applied to the training set comprising 469 lung cancer cases and 536 population controls and 87% correct classification by smoking status was obtained (Supplementary Fig. S3A). The three most highly associated metabolites, ranked according to the importance score given by Random Forests, were well-known nicotine metabolites: cotinine, nicotine-N'-oxide, and trans-3′-hydroxycotinine. When stratified by smoking status, it became evident that there was a global increase of these nicotine metabolites in current smokers compared with those who had formerly or never smoked (Supplementary Fig. S3B). This finding established the quality of measurements and the utility of our classification approach in identifying diagnostic metabolites of lung cancer.
Predictions of lung cancer status
Classification of our training set samples using Random Forests resulted in 78.1% accuracy [true positive rate (TPR) = 76.5%, false positive rate (FPR) = 18.4%], by using top predictive signals (Supplementary Table S1; see Supplementary Materials and Methods for details about analysis). To account for possible differences in smoking habits between different genders and race, additional classifications of cases and controls were performed on samples stratified by self-reported race and gender. Using top predictive signals, we accurately categorized the following proportion of samples as lung cancer cases or controls: 77.7% for Caucasian males, 78.6% for Caucasian females, 84.9% for African-American males, and 82.3% for African-American females. TPRs and FPRs ranged from 70.0 to 81.7 and from 9.5 to 23.3, respectively (Supplementary Table S1). Four metabolites contributed strongly to the classifications, independent of race and gender (Supplementary Fig. S4): NANA, cortisol sulfate, creatine riboside, novel metabolite identified in this study; and 561+, an unidentified metabolite with a mass/charge ratio of 561.3432 detected in ESI+ that was confirmed to be a glucuronidated compound. We have conducted extensive validation methods to confirm the identity of novel creatine riboside, including UPLC coupled to tandem mass spectrometry (UPLC/MS-MS) and two-dimensional nuclear magnetic resonance (Supplementary Figs. S5 and S6).
This study utilized a case control rather than a cohort setting and, as a result, could not be used for risk assessment. However, we took into account possible confounding factors of lung cancer classification, performing logistic regression in all cases and in stage I–II cases (Table 2), adjusting for race, gender, interview year, smoking status, pack years, and urine collection time (accounting for diurnal effects; Supplementary Fig. S7). Metabolite levels were dichotomized into high and low categorical variables based on the 75th percentile of population control abundances. As expected, associations with diagnosis were confirmed after adjusting for these potential confounders. ROC analysis resulted in areas under the curve ranging from 0.63 to 0.76 for all cases, and from 0.59 to 0.70 for stage I-II cases (Fig. 1), using individual metabolites. Models using creatine riboside or all four biomarkers in all cases and in stage I-II cases were significantly more predictive (P < 0.00001) than models using the other three metabolites individually, and these associations were independent of histology. Of note, lung cancer cases presented in this study were staged according to the latest seventh edition of the AJCC (35); however, 153 of 469 cases could not be restaged because of missing pathology reports, as reflected in the numbers of staged cases in Table 1.
ROC analysis of individual metabolites (creatine riboside, blue; NANA, green; cortisol sulfate, orange; 561+, maroon) and their combination (All, violet) in the training set in all cases (top) and stage I-II cases (bottom).
ROC analysis of individual metabolites (creatine riboside, blue; NANA, green; cortisol sulfate, orange; 561+, maroon) and their combination (All, violet) in the training set in all cases (top) and stage I-II cases (bottom).
Association of top four metabolites with lung cancer diagnosis (unconditional logistic regression) in the training set in all cases and cases of stages I–II
. | . | Univariate . | Multivariatea . | |||||
---|---|---|---|---|---|---|---|---|
Metaboliteb . | Controls (%)c . | Cases (%)c . | OR (95% CI) . | P . | FDRd . | OR (95% CI) . | P . | FDRd . |
All cases (N = 469) | ||||||||
Creatine riboside | 134 (25.0) | 304 (64.8) | 5.50 (4.21–7.26) | 8.35E−35 | 2.64E−31 | 5.05 (3.57–7.14) | 4.93E−20 | 1.56E−16 |
Cortisol sulfate | 134 (25.0) | 227 (48.4) | 2.84 (2.17–3.71) | 1.69E−14 | 2.68E−11 | 2.56 (1.83–3.58) | 3.52E−08 | 2.79E−05 |
N-acetylneuraminc acid | 134 (25.0) | 213 (34.8) | 2.50 (1.91–3.26) | 1.87E−11 | 5.38E−09 | 2.13 (1.52–2.98) | 1.11E−05 | 1.25E−03 |
561+ | 134 (25.0) | 201 (34.1) | 2.25 (1.72–2.94) | 2.90E−09 | 4.37E−07 | 1.89 (1.34–2.67) | 3.17E−04 | 0.01 |
Stage I–II cases (N = 213) | ||||||||
Creatine riboside | 134 (25.0) | 116 (54.5) | 3.59 (2.57–5.01) | 5.59E−14 | 1.77E−10 | 3.34 (2.07–5.39) | 7.85E−07 | 0.002 |
Cortisol sulfate | 134 (25.0) | 88 (41.3) | 2.11 (1.51–2.95) | 1.26E−05 | 0.003 | 1.84 (1.14–2.98) | 0.013 | 0.295 |
N-acetylneuraminc acid | 134 (25.0) | 74 (34.7) | 1.60 (1.13–2.25) | 0.007 | 0.076 | 1.72 (1.05–2.81) | 0.030 | 0.347 |
561+ | 134 (25. 0) | 76 (35.7) | 1.66 (1.18–2.34) | 0.003 | 0.046 | 1.30 (0.80–2.12) | 0.296 | 0.728 |
. | . | Univariate . | Multivariatea . | |||||
---|---|---|---|---|---|---|---|---|
Metaboliteb . | Controls (%)c . | Cases (%)c . | OR (95% CI) . | P . | FDRd . | OR (95% CI) . | P . | FDRd . |
All cases (N = 469) | ||||||||
Creatine riboside | 134 (25.0) | 304 (64.8) | 5.50 (4.21–7.26) | 8.35E−35 | 2.64E−31 | 5.05 (3.57–7.14) | 4.93E−20 | 1.56E−16 |
Cortisol sulfate | 134 (25.0) | 227 (48.4) | 2.84 (2.17–3.71) | 1.69E−14 | 2.68E−11 | 2.56 (1.83–3.58) | 3.52E−08 | 2.79E−05 |
N-acetylneuraminc acid | 134 (25.0) | 213 (34.8) | 2.50 (1.91–3.26) | 1.87E−11 | 5.38E−09 | 2.13 (1.52–2.98) | 1.11E−05 | 1.25E−03 |
561+ | 134 (25.0) | 201 (34.1) | 2.25 (1.72–2.94) | 2.90E−09 | 4.37E−07 | 1.89 (1.34–2.67) | 3.17E−04 | 0.01 |
Stage I–II cases (N = 213) | ||||||||
Creatine riboside | 134 (25.0) | 116 (54.5) | 3.59 (2.57–5.01) | 5.59E−14 | 1.77E−10 | 3.34 (2.07–5.39) | 7.85E−07 | 0.002 |
Cortisol sulfate | 134 (25.0) | 88 (41.3) | 2.11 (1.51–2.95) | 1.26E−05 | 0.003 | 1.84 (1.14–2.98) | 0.013 | 0.295 |
N-acetylneuraminc acid | 134 (25.0) | 74 (34.7) | 1.60 (1.13–2.25) | 0.007 | 0.076 | 1.72 (1.05–2.81) | 0.030 | 0.347 |
561+ | 134 (25. 0) | 76 (35.7) | 1.66 (1.18–2.34) | 0.003 | 0.046 | 1.30 (0.80–2.12) | 0.296 | 0.728 |
NOTE: Bold data designate significant associations with a P value < 0.05.
aAdjusted for race, gender, interview year, smoking status, pack years, and urine collection time.
bLevels dichotomized to high and low based on the 75th percentile of population control abundances (low = referent).
cNumbers of controls and cases with high levels of the corresponding metabolite.
dFDR based on Benjamini and Hochberg.
Association with tobacco smoke exposure
To investigate whether the urinary metabolomic markers are correlated to tobacco smoke exposure, metabolite levels stratified by cigarettes per day (cpd) were investigated. We observed that the number of cpd was neither associated with urinary levels of creatine riboside and NANA, nor was it associated with cortisol sulfate and 561+ (Supplementary Fig. S8). A correlation between abundances of each metabolite and cotinine (accepted indicator of exposure to tobacco smoke) was also investigated and no correlation was observed (data not shown). In addition, logistic regression classification was stratified by smoking status: all four metabolites are also significantly associated with lung cancer status in never smokers (data not shown), further confirming that these metabolites are not associated with smoking.
Association with prognosis
We next investigated whether the four metabolites found to be most robust in predicting lung cancer status are associated with prognosis, and whether they, therefore, may have utility in predicting patient outcome. Metabolite levels were dichotomized into high and low categorical variables based on the 75th percentile of the population control abundances. After adjusting for gender, race, stage, histology, smoking status, pack years, interview year, urine collection time, chemotherapy and/or radiation, and surgery status, we found that high levels of NANA [HR = 1.54 (P = 0.025) in the first 15 months], cortisol sulfate [HR = 1.63 (P = 0.0001)], creatine riboside [HR = 1.81 (P = 0.0002) in the first 45 months], and 561+ [HR = 1.95 (P = 0.0001) in the first 20 months] were associated with worse survival rates (Table 3; Fig. 2A). In stage I-II cases, creatine riboside [HR = 1.71 (P = 0.048)] and 561+ [HR = 8.63 (P = 0.001)] were also associated with worse survival, independent of putative clinical cofactors (Table 3 and Supplementary Fig. S9A). The time cutoffs presented here are chosen to meet the proportional hazards assumption test (39), details of which can be found in the Materials and Methods.
A, Kaplan–Meier survival estimates in the training set are depicted for the top four predictive metabolites in all patients with lung cancer. The P values reported in the Kaplan–Meier plots reflect the maximum likelihood estimates generated using a univariate Cox model, taking into account left truncation (the lag time between diagnosis and time of urine collection). B, the combination of the top four predictive metabolites is shown for all cases. Only metabolites that showed statistically significant associations with survival, independent of clinical cofactors (see Materials and Methods), were combined. Metabolite levels were dichotomized into high and low based on the 75th percentile of population controls abundances.
A, Kaplan–Meier survival estimates in the training set are depicted for the top four predictive metabolites in all patients with lung cancer. The P values reported in the Kaplan–Meier plots reflect the maximum likelihood estimates generated using a univariate Cox model, taking into account left truncation (the lag time between diagnosis and time of urine collection). B, the combination of the top four predictive metabolites is shown for all cases. Only metabolites that showed statistically significant associations with survival, independent of clinical cofactors (see Materials and Methods), were combined. Metabolite levels were dichotomized into high and low based on the 75th percentile of population controls abundances.
Association of top four metabolites with lung cancer survival (Cox proportional hazards regression) in the training set in all cases and cases of stages I–II
. | Univariate . | Multivariatea . | ||||
---|---|---|---|---|---|---|
Metaboliteb . | HR (95% CI) . | P . | FDRc . | HR (95% CI) . | P . | FDRc . |
All cases (N = 469) | ||||||
N-acetylneuraminic acid | ||||||
≤15 mo | 1.74 (1.22–2.48) | 0.002 | 0.06 | 1.54 (1.06–2.25) | 0.025 | 0.09 |
>15 mo | 1.14 (0.82–1.57) | 0.44 | 1.27 (0.90–1.80) | 0.17 | ||
Cortisol sulfate | 1.53 (1.21–1.94) | 0.0004 | 0.01 | 1.63 (1.27–2.08) | 0.0001 | 0.02 |
Creatine riboside | ||||||
≤45 mo | 2.05 (1.54–2.71) | <0.0001 | 0.0005 | 1.81 (1.33–2.45) | 0.0002 | 0.002 |
>45 mo | 0.86 (0.38–1.95) | 0.72 | 0.78 (0.34–1.83) | 0.57 | ||
561+ | ||||||
≤20 mo | 2.32 (1.70–3.15) | < 0.0001 | 0.001 | 1.95 (1.39–2.74) | 0.0001 | 0.009 |
>20 mo | 1.05 (0.70–1.55) | 0.83 | 0.86 (0.56–1.32) | 0.48 | ||
Stage I–II cases (N = 213) | ||||||
NANA | 0.70 (0.41–1.19) | 0.18 | 0.89 | 0.56 (0.32–1.00) | 0.052 | 0.80 |
Cortisol sulfate | 1.45 (0.90–2.32) | 0.12 | 0.89 | 1.39 (0.84–2.29) | 0.20 | 0.84 |
Creatine riboside | 1.78 (1.08–2.93) | 0.02 | 0.81 | 1.71 (1.01–2.92) | 0.048 | 0.67 |
561+ | ||||||
≤15 mo | 7.83 (2.23–27.51) | 0.001 | 0.60 | 8.63 (2.40–31.05) | 0.001 | 0.27 |
>15 mo | 0.83 (0.4 5–1.52) | 0.54 | 0.84 (0.43–1.67) | 0.63 |
. | Univariate . | Multivariatea . | ||||
---|---|---|---|---|---|---|
Metaboliteb . | HR (95% CI) . | P . | FDRc . | HR (95% CI) . | P . | FDRc . |
All cases (N = 469) | ||||||
N-acetylneuraminic acid | ||||||
≤15 mo | 1.74 (1.22–2.48) | 0.002 | 0.06 | 1.54 (1.06–2.25) | 0.025 | 0.09 |
>15 mo | 1.14 (0.82–1.57) | 0.44 | 1.27 (0.90–1.80) | 0.17 | ||
Cortisol sulfate | 1.53 (1.21–1.94) | 0.0004 | 0.01 | 1.63 (1.27–2.08) | 0.0001 | 0.02 |
Creatine riboside | ||||||
≤45 mo | 2.05 (1.54–2.71) | <0.0001 | 0.0005 | 1.81 (1.33–2.45) | 0.0002 | 0.002 |
>45 mo | 0.86 (0.38–1.95) | 0.72 | 0.78 (0.34–1.83) | 0.57 | ||
561+ | ||||||
≤20 mo | 2.32 (1.70–3.15) | < 0.0001 | 0.001 | 1.95 (1.39–2.74) | 0.0001 | 0.009 |
>20 mo | 1.05 (0.70–1.55) | 0.83 | 0.86 (0.56–1.32) | 0.48 | ||
Stage I–II cases (N = 213) | ||||||
NANA | 0.70 (0.41–1.19) | 0.18 | 0.89 | 0.56 (0.32–1.00) | 0.052 | 0.80 |
Cortisol sulfate | 1.45 (0.90–2.32) | 0.12 | 0.89 | 1.39 (0.84–2.29) | 0.20 | 0.84 |
Creatine riboside | 1.78 (1.08–2.93) | 0.02 | 0.81 | 1.71 (1.01–2.92) | 0.048 | 0.67 |
561+ | ||||||
≤15 mo | 7.83 (2.23–27.51) | 0.001 | 0.60 | 8.63 (2.40–31.05) | 0.001 | 0.27 |
>15 mo | 0.83 (0.4 5–1.52) | 0.54 | 0.84 (0.43–1.67) | 0.63 |
NOTE: Bold data designate significant associations with a P value < 0.05.
aAdjusted for gender, race, stage (unless stratified), histology, smoking status, pack years, interview year, urine collection time, chemotherapy and/or radiation status, and surgery status.
bLevels dichotomized into high and low based on the 75th percentile of population control abundances (low = referent).
cFDR based on Benjamini and Hochberg.
Significantly, the combination of these metabolites and their associations with survival demonstrates an independent and additive effect (Fig. 2B and Supplementary Fig. S9B and Supplementary Table S2), suggesting that in combination, these four markers may be of value in therapy decisions, therefore improving patient outcomes. Although this study was limited in the representation of African-Americans, stratification by self-reported race highlighted cortisol sulfate as most strongly associated with survival in African-Americans (Supplementary Table S3).
Validation in independent sample sets and assessment of metabolite stability
When compared with the training set, creatine riboside, NANA, and 561+ were confirmed to be elevated in the urine of patients with lung cancer in an independent validation set comprising 158 more recently diagnosed cases (P < 0.0007; Fig. 3A and B). Although cortisol sulfate was not found to be significantly elevated in cases, possibly due to insufficient power, the expected trend of the levels being higher in patients with lung cancer was observed. Measurements of these metabolites were technically validated on a quantitative Xevo triple quadrupole mass spectrometer in a subset (N = 198) of the training set, representing similar distributions of age, gender, and racial composition to the training cohort (P < 0.00001; Fig. 3C). Conscious of the importance of measurement reproducibility, especially in clinical laboratory practice, the stability of metabolites in storage over time and after a freeze-thaw cycle was studied. The reproducibility of metabolite measurements obtained by a second quantitation carried out 2 years later on the same samples resulted in intraclass correlation coefficients (ICC) from 0.82 to 0.99 (Supplementary Table S4). These high ICCs strongly suggest that these metabolites are sufficiently stable and reproducible and may be used as biomarkers of lung cancer diagnosis in clinical practice.
Abundance and validation of metabolites that were top contributors in the classification of patients as lung cancer or healthy controls. Untargeted and MSTUS normalized UPLC/MS abundances (mean and SEM) are depicted for (A) the training set containing 469 lung cancer cases and 536 controls, (B) the validation set comprising 80 cases and 78 controls. Quantitated UPLC/MS-MS abundances (mean and SEM) in (C) a subset of the training set containing 92 cases and 106 controls. FC, fold-change.
Abundance and validation of metabolites that were top contributors in the classification of patients as lung cancer or healthy controls. Untargeted and MSTUS normalized UPLC/MS abundances (mean and SEM) are depicted for (A) the training set containing 469 lung cancer cases and 536 controls, (B) the validation set comprising 80 cases and 78 controls. Quantitated UPLC/MS-MS abundances (mean and SEM) in (C) a subset of the training set containing 92 cases and 106 controls. FC, fold-change.
Link to tumor metabolome
We next assessed the presence of creatine riboside, NANA, cortisol sulfate, and metabolite 561+ in 48 tumor tissues resected from stage I adeno- and squamous cell carcinoma patients. Their detection in tissue would indicate a direct relationship to lung tumor metabolism. Creatine riboside and NANA were significantly more abundant in tumor compared with adjacent nontumor tissue. Creatine was also elevated in the tumor compared with nontumor tissue (Fig. 4A) and correlates with creatine riboside (Fig. 4B), further confirming the formation of creatine riboside from creatine. These important findings suggest that creatine riboside and NANA are products of altered lung tumor metabolism that can be detected in noninvasively obtained urine.
Linking urinary metabolites to lung cancer tissue metabolome. A, levels of creatine riboside, NANA and creatine in a paired tumor/adjacent nontumor tissue set containing 48 stage I adenocarcinoma and squamous cell carcinoma tumors and 48 adjacent nontumor samples. B, correlation between creatine riboside and creatine quantitated in tumor tissue samples.
Linking urinary metabolites to lung cancer tissue metabolome. A, levels of creatine riboside, NANA and creatine in a paired tumor/adjacent nontumor tissue set containing 48 stage I adenocarcinoma and squamous cell carcinoma tumors and 48 adjacent nontumor samples. B, correlation between creatine riboside and creatine quantitated in tumor tissue samples.
Discussion
A paucity of noninvasive biomarkers for detection and prognostic assessment plagues the lung cancer field, and most preclinical studies aimed to identify putative biomarkers suffer from limited sample sizes (10). Our assessment of 469 cases and 536 population controls revealed two urinary biomarkers for the detection and prognosis of NSCLC: creatine riboside and NANA. Although we also identified cortisol sulfate and 561+ as robust putative biomarkers predictive of lung cancer status, independent of race and gender, creatine riboside and NANA were also elevated in tumor compared with adjacent nontumor tissue, thereby providing a direct link with metabolic changes in the tumor, and allowing for noninvasive detection of these tumor-specific metabolites in easily obtainable urine. This finding may eventually be able to guide therapeutic decisions in improving lung cancer patient outcomes. However, the utility of these metabolites has not been evaluated in other cancers, and their potential to aid early diagnosis of lung cancer remains to be further evaluated. Although there are currently accepted technologies for early detection of lung cancer, such as LDCT, a complementary biomarker is needed; although LDCT has a very high sensitivity and almost no lung lesion goes undetected, it performs poorly in distinguishing benign from malignant nodules. We speculate that creatine riboside and NANA may aid in the early detection of lung cancer, possibly as an adjunct to LDCT, and may perhaps decrease its high FPR of 96.4% (7). Of note, creatine riboside was the strongest classifier of lung cancer status in all cases but also in stage I-II lung cancer. Pending future studies addressing the mechanism of creatine riboside generation and potential causal relationship to lung cancer, this novel metabolite may eventually serve as a therapeutic target in clinical practice.
Therapeutic decisions, including surgery for earlier stages of cancer, adjuvant chemotherapy, and/or radiation therapy, are based on tumor size, molecular biomarkers, morphologic features, and gross tumor characteristics (40). However, the assessment of high risk requires refinement, especially for completely resected stage I NSCLC, where no trial has shown any significant survival benefit in stage IB (41, 42) and where there is a possibly detrimental effect of adjuvant chemotherapy for stage IA patients (43). We propose that these metabolites could be useful in guiding such therapy decisions. In particular, the association of creatine riboside with worse prognosis in stage I-II lung cancer patients and its elevated levels in tumors make creatine riboside a candidate for aiding in therapeutic decisions. Furthermore, the combination of all metabolites should be explored, as the combination of all four metabolites was most strongly associated with prognosis in all stages, and the combination of creatine riboside and 561+ was most strongly associated with prognosis in stage I-II NSCLC patients.
Creatine riboside is also of special interest, as it has not been previously reported. Markedly higher serum levels of the creatine kinase isoenzyme BB, an enzyme responsible for the conversion of creatine into a phosphocreatine, an important energy reserve, have been observed in patients with lung cancer (44, 45). In addition, cancer cells have a higher energy requirement compared with quiescent normal cells (46); as a result, creatine riboside may be a product of both high creatine within the tumor, as reported in our study, and high phosphate flux. Although creatine riboside as a compound has not been described until now, increased mutagenicity of creatine and ribose pyrolysis products in cooked foods has been reported (47), suggesting a functional role of creatine riboside in tumorigenesis. Because creatine riboside is the strongest predictor of lung cancer diagnosis in our study, including stage I-II cases, its abundance may be a useful complement to LDCT in further distinguishing malignant from benign nodules detected at screening and preventing unnecessary and invasive diagnostic work-ups.
NANA and cortisol sulfate have been previously reported in the context of cancer. NANA is one of the two most common forms of sialic acid and plays a role in cell signaling, binding and transportation of positively charged molecules, attraction and repulsion of cells and molecules, and immunity (48). In cancer, these sialylated conjugates protect malignant cells from cellular defense systems. Elevated levels of NANA have been found in various cancer types, including lung cancer (49). Sialic acid as a blood biomarker for prognosis has been assessed with mixed results, although, to our knowledge, not in lung cancer. Because of the role that NANA plays on the cell surface of mammalian cells, this marker may not be lung cancer specific, allowing for a possibility of its utility in other cancers. As for cortisol sulfate, high urinary levels were reported in breast cancer (50), and deregulated cortisol metabolism was reported in critical illness (51), which may, in part, be due to the induction of proinflammatory cytokines, activators of cortisol production (52, 53).
This study and the conclusion that these metabolites may have clinical applications for the diagnosis and prognosis of lung cancer are notable for several reasons. First, urine is abundant, allows for noninvasive sampling, and does not require extensive processing (54). Second, MS-based approaches are cost-effective on a per-sample basis and allow for fast screening with minimal processing, making it suitable for clinical settings. Third, measurements of the metabolites reported here are highly reproducible, indicating their stability in urine over time, despite freeze-thaw cycles (ICCs >0.82). And finally, the robustness of these biomarkers against age, gender, and race points to their universal applicability.
The current study, however, is not without its limitations. Because metabolism can vary due to dietary and drug intake (55, 56), we were unable to adjust for these factors. In addition, we were unable to rule out selection, type of controls, and participation rates biases. An evaluation of these putative biomarkers in a prospective setting and their utility for risk assessment also remains to be carried out. The majority of the patients (323) had urine specimens collected before the administration of chemotherapy and/or radiation. We have determined that there are no differences in metabolite levels between those patients who had received treatment and those who had not (Supplementary Fig. S10A). Furthermore, only 37 out of 469 patients had undergone surgery before urine collection, with no significant differences in metabolite levels between the two groups (Supplementary Fig. S10B). The Cox regression survival analysis was controlled for treatment and surgery status, to ensure no confounding by the aforementioned variables. Furthermore, normalization to urinary creatinine levels is expected to eliminate the potential of altered kidney function to affect metabolite levels.
Overall, our findings indicate that creatine riboside and NANA may be useful in the diagnosis and prognosis of NSCLC, as they showed strong associations with these outcomes and were deregulated in tumor tissue. Undoubtedly, measurement of these metabolites in urine using MS provides great potential for the detection of lung cancer in the clinic and may lead to the identification of novel therapeutic strategies and targets. In addition, the results of this study lay the groundwork for assessing the direct impact of these metabolites in lung tumorigenesis (and possibly other cancers).
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: E.A. Mathé, A.D. Patterson, M. Haznadar., P.G. Shields, J.R. Idle, F.J. Gonzalez, C.C. Harris
Development of methodology: E.A. Mathé, A.D. Patterson, M. Haznadar, S.K. Manna, K.W. Krausz, F.J. Gonzalez
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): E.A. Mathé, A.D. Patterson, M. Haznadar, K.W. Krausz, E.D. Bowman, P.G. Shields, P.B. Smith, E. Hatzakis, F.J. Gonzalez
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): E.A. Mathé, A.D. Patterson, M. Haznadar, S.K. Manna, K.W. Krausz, P.G. Shields, D. Kazandjian, E. Hatzakis, F.J. Gonzalez, C.C. Harris
Writing, review, and or revision of the manuscript: E.A. Mathé, A.D. Patterson, M. Haznadar, S.K. Manna, E.D. Bowman, P.G. Shields, J.R. Idle, P.B. Smith, E. Hatzakis, F.J. Gonzalez, C.C. Harris
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E.A. Mathé, A.D. Patterson, M. Haznadar, E.D. Bowman, K. Anami, D. Kazandjian, F.J. Gonzalez
Study supervision: E.A. Mathé, A.D. Patterson, M. Haznadar, F.J. Gonzalez, C.C. Harris
Acknowledgments
The authors thank Dr. Raymond Jones, John Cottrell, and Audrey Salabes at the University of Maryland and Baltimore Veterans Administration Medical Center for tissue and data collection, Leoni Leondaridis of Advance Medical Systems Consultants for the coordination of data from the NDI, and the Proteomics and Metabolomics Shared Resource at the Georgetown Lombardi Comprehensive Cancer Center, part of Georgetown University Medical Center and MedStar Georgetown University Hospital—specifically, Marc Bourbeau and Dr. Amrita Cheema. We utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD.
Grant Support
This work was partially funded by the NIH grant # ES022186.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.