We report on the development of a monitoring test for recurrent breast cancer, using metabolite-profiling methods. Using a combination of nuclear magnetic resonance (NMR) and two-dimensional gas chromatography–mass spectrometry (GC×GC-MS) methods, we analyzed the metabolite profiles of 257 retrospective serial serum samples from 56 previously diagnosed and surgically treated breast cancer patients. One hundred sixteen of the serial samples were from 20 patients with recurrent breast cancer, and 141 samples were from 36 patients with no clinical evidence of the disease during ∼6 years of sample collection. NMR and GC×GC-MS data were analyzed by multivariate statistical methods to compare identified metabolite signals between the recurrence samples and those with no evidence of disease. Eleven metabolite markers (seven from NMR and four from GC×GC-MS) were shortlisted from an analysis of all patient samples by using logistic regression and 5-fold cross-validation. A partial least squares discriminant analysis model built using these markers with leave-one-out cross-validation provided a sensitivity of 86% and a specificity of 84% (area under the receiver operating characteristic curve = 0.88). Strikingly, 55% of the patients could be correctly predicted to have recurrence 13 months (on average) before the recurrence was clinically diagnosed, representing a large improvement over the current breast cancer–monitoring assay CA 27.29. To the best of our knowledge, this is the first study to develop and prevalidate a prediction model for early detection of recurrent breast cancer based on metabolic profiles. In particular, the combination of two advanced analytical methods, NMR and MS, provides a powerful approach for the early detection of recurrent breast cancer. Cancer Res; 70(21); 8309–18. ©2010 AACR.

Breast cancer remains the leading cause of death among women worldwide. It is the second leading cause of cancer death among women in the United States, with nearly 255,000 new cases and 40,000 deaths expected in the year 2010 (1). Although breast cancer survival has improved over the past few decades owing to better diagnostic screening methods (2), it often recurs anywhere from 2 to 15 years following initial treatment and can occur either locally in the same or contralateral breast or as a distant recurrence (metastasis). Recent studies by Brewster and colleagues (3) of nearly 3,000 breast cancer patients showed that the recurrence rates 5 and 10 years after completion of adjuvant treatment were 11% and 20%, respectively. Numerous factors such as stage, grade, and hormone receptor status are shown to be associated with recurrence. Higher-stage tumors have an increased propensity to recur. For example, a recent study reports recurrence rates of 7%, 11%, and 13% after 5 years for stage I, II, and III tumor cases, respectively (3). In addition, conditions such as lymph node invasion and the absence of estrogen receptors (ER) are factors in a higher relapse rate and shorter disease-free survival (4). Studies have shown that early detection of locally recurrent breast cancers can significantly improve the survival rate (5, 6).

Common methods of routine surveillance for recurrent breast cancer include periodic mammography, self- or physician-performed physical examination, and blood tests. The performance of such tests is lacking, and extensive investigations for surveillance have not proven effective (7). Often, mammography misses small local recurrences or leads to false positives, resulting in suboptimal sensitivity and specificity and unnecessary biopsies. In view of the unmet need for more sensitive and earlier detection methods, the last decade or so has witnessed the development of a number of new approaches for detecting recurrent breast cancer and monitoring disease progression by using blood-based tumor markers or genetic profiles (811). In vitro diagnostic (IVD) markers include carcinoembryonic antigen (CEA), cancer antigen (CA 15-3, CA 27.29), tissue polypeptide antigen, and tissue polypeptide specific antigen. Such molecular markers are thought to be promising because the outcome of the diagnosis based on these markers is independent of expertise and experience of the clinician and their use potentially avoids sampling errors commonly associated with conventional pathologic tests such as histopathology. However, currently, these markers lack the desired sensitivity and/or specificity, and often respond late to recurrence, underscoring the need for alternative approaches (12, 13).

A new approach is to use metabolite profiling (or metabolomics), which can detect disease based on a panel of small molecules derived from the global or targeted analysis of metabolic profiles of samples such as blood and urine, and this approach is increasingly gaining interest (14, 15). Metabolite profiling utilizes high-resolution analytical methods such as nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) for the quantitative analysis of hundreds of small molecules (less than ∼1,000 Da) present in biological samples (16). Owing to the complexity of the metabolic profile, multivariate statistical methods are extensively used for data analysis. The high sensitivity of metabolite profiles to even subtle stimuli can provide the means to detect the early onset of various biological perturbations in real time. Metabolite profiling has applications in a growing number of areas, including early disease diagnosis, investigation of metabolic pathways, pharmaceutical development, toxicology, and nutritional studies (1621). Moreover, the ability to link the metabolome, which constitutes the downstream products of cellular functions, to genotype and phenotype can provide a better understanding of complex biological states that promises routes to new therapy development.

In the present study, we apply metabolite profiling methods to investigate blood serum metabolites that are sensitive to recurrent breast cancer. We utilize a combination of NMR and two-dimensional gas chromatography resolved MS (GC×GC-MS) methods to build and verify a model for early breast cancer recurrence detection based on a set of 257 retrospective serial samples. We compare the performance of the derived 11 metabolite biomarker models with that of the currently used molecular marker, CA 27.29, in particular for providing a sensitive test for follow-up surveillance of treated breast cancer patients. This is the first metabolomics study that combines the information-rich analytical methods of NMR and MS to derive a sensitive and specific model for the early detection of recurrent breast cancer. The results indicate that such an approach may provide a new window for earlier treatment and its benefits.

Sample collection

Subsequent to initial studies, where data remain unpublished, 56 breast cancer patients treated between 1997 and 2003 at The University of Texas M.D. Anderson Cancer Center (Houston, TX) were enrolled through collaboration with Third Nerve, LLC. The cases studied averaged five longitudinal blood draws with a total evaluation of 257 retrospective serum samples (each ∼400 μL). Follow-up of these female patients included routine monitoring for recurrence as indicated by factors including CA 27.29, CEA, and/or CA125 IVD results; patient symptoms; initial breast cancer stage; hormone receptor; and lymph node status. Twenty patients recurred within the sampling period, whereas 36 remained with no evidence of disease (NED). A total of 116 serum samples were obtained from recurrent breast cancer patients; 67 were collected >3 months before clinically determined recurrence (Pre); 18 were collected within 3 months before/after recurrence (Within); and 31 were collected more than 3 months after recurrence diagnosis (Post). The remaining 141 samples comprise the cases remaining NED for at least 2 years beyond their sample collection period. Nearly all samples were evaluated for CA 27.29 values at the time of collection and therefore could be used for comparison. Study samples were maintained at −80°C from collection until their transfer over dry ice to the evaluation laboratory at Purdue University where they were again stored frozen at −80°C until this study was conducted. Serum samples and accompanying clinical data were appropriately deidentified before transfer into this study. Table 1 summarizes the clinical parameters and demographic characteristics of the cancer patients.

Table 1.

Summary of clinical and demographic characteristics of the patients used in this study

Clinical diagnosisControlRecurrence
Samples (patients)Samples (patients)
NED 141 (36)  
Pre — 67 (20) 
Within — 18 (18) 
Post — 31 (20) 
Age, mean {range} 53 {37–75} 55 {36–69} 
Breast cancer stage 
    I 47 (11) 7 (1) 
    II 59 (16) 21 (5) 
    III 10 (3) 34 (6) 
    Unknown 25 (6) 54 (8) 
ER status 
    ER+ 65 (15) 67 (11) 
    ER− 64 (18) 33 (7) 
    Unknown 12 (3) 16 (2) 
PR status 
    PR+ 52 (13) 71 (11) 
    PR− 77 (20) 29 (7) 
    Unknown 12 (3) 16 (2) 
    CA 27.29 140 (36) 92 (19) 
Site of recurrence 
    Bone — 37 (6) 
    Breast — 13 (2) 
    Liver — 11 (2) 
    Lung — 10 (6) 
    Skin — 6 (2) 
    Brain — 15 (2) 
    Lymph — 6 (1) 
    Multiple sites — 18 (3) 
Clinical diagnosisControlRecurrence
Samples (patients)Samples (patients)
NED 141 (36)  
Pre — 67 (20) 
Within — 18 (18) 
Post — 31 (20) 
Age, mean {range} 53 {37–75} 55 {36–69} 
Breast cancer stage 
    I 47 (11) 7 (1) 
    II 59 (16) 21 (5) 
    III 10 (3) 34 (6) 
    Unknown 25 (6) 54 (8) 
ER status 
    ER+ 65 (15) 67 (11) 
    ER− 64 (18) 33 (7) 
    Unknown 12 (3) 16 (2) 
PR status 
    PR+ 52 (13) 71 (11) 
    PR− 77 (20) 29 (7) 
    Unknown 12 (3) 16 (2) 
    CA 27.29 140 (36) 92 (19) 
Site of recurrence 
    Bone — 37 (6) 
    Breast — 13 (2) 
    Liver — 11 (2) 
    Lung — 10 (6) 
    Skin — 6 (2) 
    Brain — 15 (2) 
    Lymph — 6 (1) 
    Multiple sites — 18 (3) 

1H NMR spectroscopy

All NMR experiments were carried out at 25°C on a Bruker DRX 500-MHz spectrometer equipped with a cryogenic probe and triple-axis magnetic field gradients. Two 1H NMR spectra were measured for each sample, a standard one-dimensional nuclear Overhauser effect spectroscopy and Carr-Purcell-Meiboom-Gill (CPMG) pulse sequences coupled with water presaturation (22). For each spectrum, 32 transients were collected using 32,000 data points and a spectral width of 6,000 Hz. Data were processed using Bruker XWINNMR software version 3.5 and saved in ASCII format for further analysis. Relative peak integrals were calculated using the total spectral sum and used for the analysis (see Supplementary Materials for more NMR methodology details).

GC×GC-MS

Two dimensional GC×GC-MS analysis of derivatized samples was performed using a Pegasus 4D system (LECO) consisting of an Agilent 6890 gas chromatograph (Agilent Technologies) coupled to a Pegasus time-of-flight mass spectrometer. LECO ChromaTOF software (version 4.10) was used for automatic peak detection and mass spectrum deconvolution. The NIST MS database (NIST MS Search 2.0, NIST/EPA/NIH Mass Spectral Library; NIST 2002) was used for data processing and peak matching. The identified biomarker candidates were confirmed from the mass spectra and retention times of authentic commercial samples. Relative peak integrals were calculated with respect to the same metabolites in a pooled blood sample that was used as a quality control (see Supplementary Materials for more details).

Metabolite identification and selection

The NMR spectrum from each sample was aligned with reference to the 3-(trimethylsilyl) propionic-(2,2,3,3-d4) acid sodium salt signal at 0 ppm. Spectral regions within the range of 0.5 to 9.0 ppm were analyzed after excluding the region between 4.5 and 6.0 ppm that contained the residual water peak and urea signal. Twenty-two metabolite signals, corresponding to potential biomarkers initially identified in a study on early breast cancer detection involving over 400 patient samples (Asiago, unpublished results) were selected as biomarker candidates for further analysis. The statistical significance of each metabolite in the selected regions was determined by calculating the P values, using the Student's t test, in the training set.

To further enhance the pool of metabolites identified by NMR, additional metabolites were selected from MS analysis. From the nearly 300 compounds identified by similarity to known compounds in the NIST database, additional and unique metabolites were selected (Supplementary Table S1). Eighteen metabolites that (a) showed statistically significant differences between normal and primary breast cancer (Asiago, unpublished results), (b) were related to the identified NMR putative biomarker candidates by pathway analysis, or (c) in a few cases had high similarity scores to the NIST database and had high chemical similarity to the other biomarker candidates were selected for further analysis. A software program was developed in-house to extract these metabolite signals from the GC×GC-MS datasets. Based on the input value of m/z and a retention time range, the program integrates chromatography peaks for each metabolite after the spectrum of the metabolite was matched to the characteristic experimental mass spectrum from the standard NIST library.

Development of prediction model and validation

To select the metabolites with the highest scores for developing the prediction model, samples from NED, Post, and Within recurrence groups were used as described in Supplementary Fig. S1. Pre samples were omitted to avoid any ambiguity in determining the correct disease status before clinical diagnosis. The samples were divided into five cross-validation (CV) groups of patients. Initial multivariate analysis was performed using logistic regression modeling of the 22 NMR and 18 GC×GC-MS detected metabolite signals and was applied to the four CV groups. The resulting model was used to predict the class membership of the fifth CV group. The output of the logistic regression procedure is a ranked set of markers (2325).

Based on their ranked performance, 11 metabolite markers (seven from NMR and four from GC×GC-MS) were then shortlisted for model building. NMR and MS data for these markers were then imported into Matlab software (Mathworks) installed with the PLS toolbox (Eigenvector Research, Inc., version 4.0) for partial least squares discriminant analysis (PLS-DA) modeling. Leave-one-patient-out CV was chosen, and five latent variables were selected according to the root mean square error of CV procedure. The R statistical package (version 2.8.0) was used to generate receiver operating characteristics (ROC) curves. The sensitivity, specificity, and the area under the ROC curve (AUROC) of the PLS-DA weighted linear model was calculated and compared.

The scores from the 11 marker model were scaled to yield a range of 0 to 100, and the cutoff threshold value (score = 48) for recurrence status was determined by a judicious choice between sensitivity and specificity. The performance of the model with reference to the initial stage of the breast cancer, ER/PR status, and the site of recurrence was also assessed.

Finally, the performance of the NMR and MS metabolite markers was also tested using a second statistical approach as described in Supplementary Fig. S2. In this case, the samples were split randomly by patients into two parts, a “training set” consisting of 11 recurrence patients (17 Post, 6 Within, and 43 Pre samples) and 19 NED patients (74 samples), and a slightly smaller testing set consisting of 9 recurrence patients (14 Post, 12 Within, and 24 Pre samples) and 17 NED patients (67 samples), and analyzed. Multivariate logistic regression of the 22 NMR and 18 GC×GC-MS detected metabolite signals was applied to the training data set to optimize variable selection. Ten-fold CV was used during this procedure. The derived model was then validated on the testing set of samples, all from different patients than were used for variable selection and model building.

Analysis of 1H NMR and GC×GC-MS spectra

NMR spectra of breast cancer serum samples obtained using the CPMG sequence were devoid of signals from macromolecules and clearly showed signals for a large number of small molecules, including sugars, amino acids, amines, and carboxylic acids. A representative NMR spectrum from a Post patient is shown in Supplementary Fig. S3A. Individual metabolites were identified using NMR databases (26, 27). Supplementary Fig. S3B shows a typical GC×GC-MS spectrum for the same recurrent breast cancer patient as shown in Supplementary Fig. S3A. Identification of the metabolites in the GC×GC-MS spectra was based on the comparison of the experimental mass spectrum with that in the NIST database, and the assignments were further confirmed by comparing with the GC×GC-MS spectrum of the authentic commercial compounds for the 18 MS detected metabolites of interest. An example of this validation procedure for glutamic acid is shown in Supplementary Fig. S4.

Biomarker selection and validation

Initial data analysis was focused on testing the performance of the 22 metabolites detected by NMR (Asiago, unpublished results) and 18 biomarker candidates selected from the mass spectra (as described above in Materials and Methods). From these data, we selected the markers with the highest rank to maximize diagnostic accuracy. Making use of the logistic regression variable selection protocol described above, a set of 11 markers (7 from NMR and 4 from MS) was selected based on the highest ranking and predictive accuracy. Table 2 shows the list of 11 markers and their P values for Pre versus NED and Within and Post versus NED comparisons using all samples. In general, the P values of these markers were low for Within and Post versus NED, except for four markers that were nevertheless highly ranked by logistic regression. In two of those four cases, the metabolites showed low P values for either Within versus NED or Post versus NED but not both.

Table 2.

P values for all markers, seven NMR (nos. 1–7) and four GC×GC-MS markers (nos. 8–11) for different groups using all samples

MetabolitesWithin and Post vs NEDPre vs NED
PP
Formate 0.0022 0.2 
Histidine 0.000041 0.18 
Proline 0.018 0.9 
Choline 0.000022 0.77 
Tyrosine 0.25 0.1 
3-Hydroxybutyrate 0.86 0.96 
Lactate 0.96 0.54 
Glutamic acid 0.000018 0.74 
N-acetyl-glycine 0.01 0.96 
10 3-Hydroxy-2-methyl-butanoic acid 0.0004 0.35 
11 Nonanedioic acid 0.4 0.089 
MetabolitesWithin and Post vs NEDPre vs NED
PP
Formate 0.0022 0.2 
Histidine 0.000041 0.18 
Proline 0.018 0.9 
Choline 0.000022 0.77 
Tyrosine 0.25 0.1 
3-Hydroxybutyrate 0.86 0.96 
Lactate 0.96 0.54 
Glutamic acid 0.000018 0.74 
N-acetyl-glycine 0.01 0.96 
10 3-Hydroxy-2-methyl-butanoic acid 0.0004 0.35 
11 Nonanedioic acid 0.4 0.089 

NOTE: Within and Post versus NED, and Pre versus NED as determined by univariate Student's t test.

The performance of the 11 metabolite markers in classifying the recurrence of breast cancer was tested both individually and collectively. Box-and-whisker plots for the individual markers expressed in terms of the relative peak integrals are shown in Fig. 1. The ROC curve for the predictive model derived from PLS-DA analysis using Post and Within versus NED samples is very good (Fig. 2A), with an AUROC of 0.88, a sensitivity of 86%, and a specificity of 84% at the selected cutoff value. Further comparison of the discrimination power of the model between recurrent breast cancer and NED is shown in the box-whisker plots in Fig. 2B, which are drawn by using the scores of the model for all Post and Within versus NED samples. A comparison of the metabolite profiling results with the CA 27.29 IVD data that had been obtained for the same samples is made in Table 3, indicating a large difference in sensitivity.

Figure 1.

Box-and-whisker plots illustrating discrimination between Post and Within recurrence versus NED patients for all samples for the seven NMR and the four GC×GC-MS markers. Horizontal line in the middle portion of the box, mean. Bottom and top boundaries of boxes, 25th and 75th percentiles, respectively. Lower and upper whiskers, 5th and 95th percentiles, respectively. Open circles, outliers.

Figure 1.

Box-and-whisker plots illustrating discrimination between Post and Within recurrence versus NED patients for all samples for the seven NMR and the four GC×GC-MS markers. Horizontal line in the middle portion of the box, mean. Bottom and top boundaries of boxes, 25th and 75th percentiles, respectively. Lower and upper whiskers, 5th and 95th percentiles, respectively. Open circles, outliers.

Close modal
Figure 2.

A, ROC curve generated from the PLS-DA model described in the text using Post and Within recurrence versus NED, and the performance of CA 27.29 on the same samples. B, box-and-whisker plots for the two sample classes showing discrimination of Within and Post recurrence from the NED patients by using the model-predicted scores. C, ROC curve generated from the PLS-DA prediction model by using the testing sample set based on the second statistical approach described in Supplementary Figure S2. D, box-and-whisker plots for the two sample classes showing discrimination of Post and Within recurrence from the NED patients by using the predicted scores from the testing set.

Figure 2.

A, ROC curve generated from the PLS-DA model described in the text using Post and Within recurrence versus NED, and the performance of CA 27.29 on the same samples. B, box-and-whisker plots for the two sample classes showing discrimination of Within and Post recurrence from the NED patients by using the model-predicted scores. C, ROC curve generated from the PLS-DA prediction model by using the testing sample set based on the second statistical approach described in Supplementary Figure S2. D, box-and-whisker plots for the two sample classes showing discrimination of Post and Within recurrence from the NED patients by using the predicted scores from the testing set.

Close modal
Table 3.

Comparison of the diagnostic performance of the breast cancer recurrence metabolite profile (at cutoff values of 48 and 54) and CA 27.29

Sensitivity (%)Specificity (%)
BCR profile 1 86 84 
68 94 
CA 27.29 35 96 
Sensitivity (%)Specificity (%)
BCR profile 1 86 84 
68 94 
CA 27.29 35 96 

Abbreviation: BCR, breast cancer recurrence.

Subsequently, the predictive power of the model for early recurrence detection was evaluated. All samples from recurrent breast cancer patients were grouped together with respect to the time of diagnosis for each patient. Samples within 5 months of one another were grouped, and an average value in months was assigned to each group. The number of months and sign represent the average time at which the samples were collected before (negative time) or after (positive time) the clinical diagnosis. The percentage of patients for which the recurrence was correctly diagnosed was calculated using the model as a function of time (Fig. 3A). For this graph, the first time a patient's score increases above the threshold value of 48, the patient is considered to have recurred. For comparison, the results for the conventional cancer antigen marker, CA 27.29, which were obtained at the time of sample collection, are also shown in Fig. 3A. Here, the recommended cutoff value for CA 27.29 of 37.7 U/mL was used for calculating the clinical sensitivity and specificity for the same set of samples (28). As seen in the figure, for both the metabolite profiling model and CA 27.29, the number of patients correctly diagnosed increases as time progresses. However, at the time of clinical diagnosis, our model based on metabolite profiling detects 75% of the recurring patients, whereas the CA 27.29 marker detects only 16%. In addition, 55% of the recurrence patients were diagnosed 13 months before they were clinically diagnosed, compared with ∼5% for CA 27.29. A similar comparison of the results for NED patients indicates that nearly 90% of the patients were correctly diagnosed as true negatives throughout the period of sample collection, and the performance of the metabolite profiling model was comparable with those of CA 27.29 (Fig. 3B), although there was some falling off of the specificity with time. Increasing the threshold value to 54 led to an increase in specificity to ∼94% (Fig. 3D) and concomitantly a decrease in sensitivity to 68% (Fig. 3C). The threshold value for 98% specificity was 65 and for 94% sensitivity was 41.

Figure 3.

A, percentage of recurrence patients correctly identified using the 11-marker model (red squares) and CA 27.29 (blue triangles) as a function of time for all recurrence patients using a cutoff threshold of 48. B, percentage of NED patients correctly identified using the same model (red squares) and CA 27.29 (blue triangles) as a function of time using the same threshold of 48. C, same information as in A, but using a threshold of 54. D, same information as in B, but using a threshold of 54.

Figure 3.

A, percentage of recurrence patients correctly identified using the 11-marker model (red squares) and CA 27.29 (blue triangles) as a function of time for all recurrence patients using a cutoff threshold of 48. B, percentage of NED patients correctly identified using the same model (red squares) and CA 27.29 (blue triangles) as a function of time using the same threshold of 48. C, same information as in A, but using a threshold of 54. D, same information as in B, but using a threshold of 54.

Close modal

Separately, the model was also tested on the recurrent breast cancer patients based on the stage of the cancer at the initial diagnosis, the type of recurrence, and ER and progesterone receptor (PR) status. The results showed some differences between ER+ and ER− patients and between PR+ and PR− patients (see Fig. 4). Whereas the model for ER+ and PR+ patients was comparable with that when all the samples were tested together (Fig. 3A), nearly 40% of the ER− and PR− patients were detected as early as 28 months before the clinical diagnosis. However, the percentage of ER− and PR− patients detected at a later period remained 10% to 20% lower compared with ER+ and PR+ patients. The performance of the model for initial stage III cancer was slightly better than that for stage II. In addition, the model based on the site of recurrence showed comparable performance for bone and lung cancer and was slightly better than when all the recurrent patients were tested together (data not shown).

Figure 4.

Percentage of recurrence patients correctly identified based on their ER (A) and PR (B) status as a function of time by using the same 11-marker model and a cutoff threshold of 48.

Figure 4.

Percentage of recurrence patients correctly identified based on their ER (A) and PR (B) status as a function of time by using the same 11-marker model and a cutoff threshold of 48.

Close modal

A second statistical analysis based on the model derived from variable selection by using a training sample set (see Supplementary Figs. S2 and S5) and predicting the class membership of the samples from an independent sample set (testing set) also provided good performance. The same 11 markers were top ranked by logistic regression, with the exception of nonanedioic acid, which was ranked 13th overall. However, it was included as part of the 11-marker model in this second analysis for consistency and comparison purposes. As shown in Fig. 2C, the testing set of samples yielded an AUROC of 0.84 with a sensitivity of 78% and specificity of 85%. The ROC plot for the testing set thus obtained was also comparable with that obtained by the first statistical analysis (Fig. 2A). Moreover, the average scores for both recurrent breast cancer and NED (Fig. 2D) compared well with those shown in Fig. 2B. The difference between the scores for recurrence and NED were highly statistically significant for both training (P = 1.40E−05) and testing (P = 2.25E−04) sets. The results of this second statistical analysis provide evidence that the data set of samples and the metabolite profile derived from our statistical analysis are quite consistent.

In this study, we present the development of a metabolomics-based profile for the early detection of breast cancer recurrence. The investigation makes use of a combination of analytical techniques, NMR and MS, and advanced statistics to identify a group of metabolites that are sensitive to the recurrence of breast cancer. We have shown that the new method distinguishes recurrence patients from NED patients with significantly improved sensitivity compared with CA 27.29. Using the predictive model, the recurrence in more than 55% of the patients was detected as early as 13 months before the recurrence was diagnosed based on conventional methods.

Breast cancer recurs in more than 20% of patients after treatment. Up to nearly 50% improvement in the relative survival of patients can be achieved by detecting at least local recurrence at asymptomatic phase, underscoring the need to develop reliable markers indicative of secondary tumor cell proliferation (6). Currently, a number of rapid and noninvasive tests based on circulating tumor markers such as CEA and cancer antigens are commercially available. However, the performance of these markers may be too poor to be of significant value for improving early detection because the levels of these markers are also elevated in numerous other malignant and nonmalignant conditions unconnected with breast cancer (29). Considering such limitations, the American Society of Clinical Oncologists guidelines recommend the use of these markers only for monitoring patients with metastatic disease during active therapy in conjunction with numerous other examinations and investigations (30). The results presented here based on the detection of multiple metabolites in the patients' blood provide a new approach for earlier detection.

Although perturbation in the metabolite levels were detected for nearly all the 40 metabolites that were used in the initial analysis (Supplementary Table S1), the use of smaller numbers of metabolites provided improved models. Particularly, the group of 11 metabolites (7 from NMR and 4 from GC; Table 2) significantly contributed to distinguishing recurrence from NED. Further, the predictive model derived from these 11 metabolites performed significantly better in terms of both sensitivity and specificity when compared with those derived using individual metabolites or a group of metabolites derived from a single analytical method, NMR or MS, alone (Figs. 1 and 2). Evaluation of other models with fewer metabolites indicated that they could also provide useful profiles. The AUROC for an eight-metabolite profile (four detected by NMR and four by GC×GC-MS) was 0.86, whereas a seven-marker model detected by NMR alone had an AUROC of 0.80. Nevertheless, the model based on 11 metabolites had the best performance and clearly outperformed the accepted monitoring assay, CA 27.29, currently used for monitoring patients (Fig. 3). These results promise a significant improvement for early detection and potentially better treatment options for recurring patients.

A number of studies to date have used NMR or MS methods to detect altered metabolic profiles in different types of malignancy owing to the ability of the analytical techniques to analyze a large number of metabolites in a single experiment (16, 20, 21, 3135). In particular, several investigations have focused on establishing breast cancer biomarkers by using a metabolomics approach. Numerous metabolites, including glucose, lactate, lipids, choline, and amino acids, were shown to correlate with breast cancer (3642). A sensitivity of 100% and a specificity of 82% in the classification of tumor and noninvolved tissues were achieved from the analysis of NMR data (37). A majority of these investigations focused on either breast cancer tumors or cell lines and all used NMR methods alone, except for a recent study that utilized a combination of NMR and MS methods (42).

All of the 11 metabolites, except four (3-hydroxybutyrate, N-acetyl-glycine, 3-hydroxy-2-methyl-butanoic acid, and nonanedioic acid), have been identified in previous studies as breast cancer associated (3642). The 11 serum metabolites represent some of the changes in metabolic activity of several pathways associated with breast cancer (Supplementary Fig. S6), including amino acid metabolism (glutamic acid, histidine, proline, and tyrosine), glycolysis (lactate), phospholipid metabolism (choline), and fatty acid metabolism (nonanedioic acid). Numerous investigations of metabolic aspects of tumorigenesis have shown the association of a majority of these metabolites with breast cancer. For example, studies on breast cancer cells as well as in vivo and ex vivo studies on breast tumors have shown that choline differentiates breast cancer from normal cells or tissue (36, 40, 42, 43). Choline is one of the most prominent metabolites in cell biology and is invariably associated with increased activity of tumor cell proliferation in breast cancer. Increased lactate is one of the early findings of metabolic changes reported for breast tumors. Similarly, association of a number of amino acids, fatty acids, and organic acids with breast cancer has been established earlier (39, 40, 44). As shown in Fig. 1, the mean concentrations for a number of these metabolites, including formate, histidine, proline, N-acetyl-glycine, and 3-hydroxy-2-methyl-butanoic acid, decrease significantly in recurrent cancer. Numerous studies report decrease in amino acid levels in the plasma of patients with malignant tumors (44). Increased demand for amino acids in the presence of a tumor is thought to be one of the reasons for this decrease. It is inferred based on numerous independent investigations between 1978 and 2003, which showed that metabolic profiles specifically of free amino acids can differ between early-stage and late-stage cancer. It is thus not too surprising that the selection of such metabolites can be used in part to develop a sensitive predictive model to distinguish recurrence from NED patients. We also note that the increase in tyrosine has been observed in two liver cancer studies (45, 46).

Correlation of the metabolites with clinical parameters such as cancer stage and ER and PR status contributes to the extent by which the disease can be detected early. Recently, a link between tumor metabolites and ER and PR status was shown with a prediction accuracy of 88% and 78%, respectively, indicating that the metabolic profile varies with the ER and PR status of the patient (31). These results support our observations and suggest that inclusion of such parameters may help advance further development of early-detection metabolite profiles.

In conclusion, we present the development of a new tool for the surveillance of breast cancer recurrence based on the metabolic profiling of serially obtained blood samples from patients. The performance of the model was optimal when metabolites detected by both NMR and MS were combined. This multiple metabolite model outperforms the current diagnostic methods used for breast cancer patients, including the tumor marker CA 27.29 for which data on the same samples were available for direct comparison. Metabolic profiling of blood serum by NMR and MS can detect breast cancer relapse before it occurs, opening a window of opportunity for patients and oncologists to improve treatment.

Daniel Raftery reports holding equity and an executive role in Matrix-Bio, Inc.

We thank Heather Heiligstedt for her help in constructing the pathway diagram (Supplementary Fig. S6).

Grant Support: NIH (National Institute of General Medical Sciences R01GM085291-02 and R25CA128770, Cancer Prevention Internship Program), Purdue University Center for Cancer Research, Purdue Research Foundation, and Oncological Sciences Center in Discovery Park at Purdue University. Partial funding for this work was received from Matrix-Bio, Inc. Third Nerve, LLC, provided access to specimen materials and data through its ongoing survey study of both new and established cancer markers.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1
American Cancer Society
.
Breast cancer facts and figures 2009-2010
.
Atlanta (GA)
:
American Cancer Society
; 
2009
.
2
Jemal
A
,
Siegel
R
,
Ward
E
,
Hao
Y
,
Xu
J
,
Thun
MJ
. 
Cancer statistics, 2009
.
CA Cancer J Clin
2009
;
59
:
225
49
.
3
Brewster
AM
,
Hortobagyi
GN
,
Broglio
KR
, et al
. 
Residual risk of breast cancer recurrence 5 years after adjuvant therapy
.
J Natl Cancer Inst
2008
;
100
:
1179
83
.
4
Megha
T
,
Neri
A
,
Malagnino
V
, et al
. 
Traditional and new prognosticators in breast cancer: Nottingham index, Mib-1 and estrogen receptor signaling remain the best predictors of relapse and survival in a series of 289 cases
.
Cancer Biol Ther
2010
;
9
:
266
73
.
5
Lu
WL
,
Jansen
L
,
Post
WJ
,
Bonnema
J
,
Van de Velde
JC
,
De Bock
GH
. 
Impact on survival of early detection of isolated breast recurrences after the primary treatment for breast cancer: a meta-analysis
.
Breast Cancer Res Treat
2009
;
14
:
403
12
.
6
Houssami
N
,
Ciatto
S
,
Martinelli
F
,
Bonardi
R
,
Duffy
SW
. 
Early detection of second breast cancers improves prognosis in breast cancer survivors
.
Ann Oncol
2009
;
20
:
1505
10
.
7
Pivot
X
,
Asmar
L
,
Hortobagyi
GN
,
Theriault
R
,
Pastorini
F
,
Buzdar
A
. 
A retrospective study of first indicators of breast cancer recurrence
.
Oncology
2000
;
58
:
185
90
.
8
Ebeling
FG
,
Stieber
P
,
Untch
M
, et al
. 
Serum CEA and CA 15-3 as prognostic factors in primary breast cancer
.
Br J Cancer
2002
;
86
:
1217
22
.
9
Dhafir
A
,
Gabrielle
K
,
Eddie
M
, et al
. 
CA 15-3 is predictive of response and disease recurrence following treatment in locally advanced breast cancer
.
BMC Cancer
2006
;
6
:
220
.
10
Gion
M
,
Mione
R
,
Leon
AE
,
Dittadi
R
. 
Comparison of the diagnostic accuracy of CA 27.29 and CA15.3 in primary breast cancer
.
Clin Chem
1999
;
45
:
630
7
.
11
Duffy
MJ
. 
Serum tumor markers in breast cancer: are they of clinical value?
Clin Chem
2006
;
52
:
345
51
.
12
Lumachi
F
,
Basso
SMM
,
Basso
U
. 
Breast cancer recurrence: role of serum tumor markers CEA and CA 15-3
. In:
Hayat
MA
, editor.
Methods of cancer diagnosis, therapy and prognosis
.
Netherlands
:
Springer
; 
2008
, p.
109
15
.
13
Lumachi
F
,
Ermani
M
,
Brandes
AA
,
Basso
S
,
Basso
U
,
Boccagni
P
. 
Predictive value of different prognostic factors in breast cancer recurrences: multivariate analysis using a logistic regression model
.
Anticancer Res
2001
;
6
:
4105
8
.
14
Nicholson
JK
,
Lindon
JC
,
Holmes
E
. 
“Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data
.
Xenobiotica
1999
;
29
:
1181
9
.
15
Odunsi
K
,
Wollman
RM
,
Ambrosone
CB
, et al
. 
Detection of epithelial ovarian cancer using 1H-NMR-based metabonomics
.
Int J Cancer
2005
;
113
:
782
8
.
16
Gowda
GAN
,
Zhang
S
,
Gu
H
,
Asiago
V
,
Shanaiah
N
,
Raftery
D
. 
Metabolomics-based methods for early disease diagnostics: a review
.
Expert Rev Mol Diagn
2008
;
8
:
617
33
.
17
Pan
Z
,
Raftery
D
. 
Combining NMR spectroscopy and mass spectrometry in metabolomics
.
Anal Bioanal Chem
2007
;
387
:
525
7
.
18
Holmes
E
,
Wilson
ID
,
Nicholson
JK
. 
Metabolic phenotyping in health and disease
.
Cell
2008
;
134
:
714
7
.
19
Clayton
TA
,
Lindon
JC
,
Cloarec
O
, et al
. 
Pharmaco-metabonomic phenotyping and personalized drug treatment
.
Nature
2006
;
440
:
1073
7
.
20
Denkert
C
,
Budczies
J
,
Kind
T
, et al
. 
Mass spectrometry-based metabolic profiling reveals different metabolite patterns in invasive ovarian carcinomas and ovarian borderline tumors
.
Cancer Res
2006
;
66
:
10795
804
.
21
Sreekumar
A
,
Poisson
LM
,
Rajendiran
TM
, et al
. 
Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression
.
Nature
2009
;
457
:
910
4
.
22
Shanaiah
N
,
Zhang
S
,
Desilva
MA
,
Raftery
D
. 
NMR-based metabolomics for biomarker discovery
. In:
Wang
F
, editor.
Methods in pharmacology and toxicology: biomarker methods in drug discovery and development
.
Totowa (NJ)
:
Humana Press
; 
2008
, p.
341
68
.
23
Bursac
Z
,
Gauss
CH
,
Williams
DK
,
Hosmer
DW
. 
Purposeful selection of variables in logistic regression
.
Source Code Biol Med
2008
;
3
:
17
.
24
Kiezun
A
,
Lee
ITA
,
Shomron
N
. 
Evaluation of optimization techniques for variable selection in logistic regression applied to diagnosis of myocardial infarction
.
Bioinformation
2009
;
3
:
311
3
.
25
Chen
MH
,
Dey
DK
. 
Variable selection for multivariate logistic regression models
.
J Stat Plan Inference
2003
;
111
:
37
55
.
26
Markley
JL
,
Anderson
ME
,
Cui
Q
, et al
. 
New bioinformatics resources for metabolomics
.
Pacific Symp Biocomp
2007
;
12
:
157
68
.
27
Wishart
DS
,
Tzur
D
,
Knox
C
, et al
. 
HMDB: the human metabolome database
.
Nucleic Acids Res
2007
;
35
:
D521
6
.
28
Chan
DW
,
Beveridge
RA
,
Muss
H
, et al
. 
Use of Truquant BR radioimmunoassay for early detection of breast cancer recurrence in patients with stage II and stage III disease
.
J Clin Oncol
1997
;
15
:
2322
8
.
29
Cheung
KL
,
Graves
CR
,
Robertson
JF
. 
Tumor marker measurements in the diagnosis and monitoring of breast cancer
.
Cancer Treat Rev
2000
;
26
:
91
102
.
30
Harris
L
,
Fritsche
H
,
Mennel
R
, et al
. 
American Society of Clinical Oncology 2007 update of recommendations for the use of tumor markers in breast cancer
.
J Clin Oncol
2007
;
25
:
5287
312
.
31
Giskeødegård
GF
,
Grinde
MT
,
Sitter
B
, et al
. 
Multivariate modeling and prediction of breast cancer prognostic factors using MR metabolomics
.
J Proteome Res
2010
;
9
:
972
9
.
32
Griffin
JL
,
Lehtimaki
KK
,
Valonen
PK
, et al
. 
Assignment of 1H nuclear magnetic resonance visible polyunsaturated fatty acids in BT4C gliomas undergoing ganciclovir-thymidine kinase gene therapy-induced programmed cell death
.
Cancer Res
2003
;
63
:
3195
201
.
33
Hakumaki
JM
,
Poptani
H
,
Puumalainen
AM
, et al
. 
Quantitative 1H nuclear magnetic resonance diffusion spectroscopy of BT4C rat glioma during thymidine kinase-mediated gene therapy in vivo: identification of apoptotic response
.
Cancer Res
1998
;
58
:
3791
9
.
34
Griffiths
JR
,
McSheehy
PM
,
Robinson
SP
, et al
. 
Metabolic changes detected by in vivo magnetic resonance studies of HEPA-1 wild-type tumors and tumors deficient in hypoxia-inducible factor-1h (HIF-1h): evidence of an anabolic role for the HIF-1 pathway
.
Cancer Res
2002
;
62
:
688
95
.
35
Florian
CL
,
Preece
NE
,
Bhakoo
KK
,
Williams
SR
,
Noble
MD
. 
Cell type-specific fingerprinting of meningioma and meningeal cells by proton nuclear magnetic resonance spectroscopy
.
Cancer Res
1995
;
55
:
420
7
.
36
Gribbestad
IS
,
Sitter
B
,
Lundgren
S
,
Krane
J
,
Axelson
D
. 
Metabolite composition in breast tumors examined by proton nuclear magnetic resonance spectroscopy
.
Anticancer Res
1999
;
19
:
1737
46
.
37
Sitter
B
,
Lundgren
S
,
Bathen
TF
,
Halgunset
J
,
Fjosne
HE
,
Gribbestad
IS
. 
Comparison of HR MAS MR spectroscopic profiles of breast cancer tissue with clinical parameters
.
NMR Biomed
2006
;
19
:
30
40
.
38
Sitter
B
,
Sonnewald
U
,
Spraul
M
,
Fjösne
HE
,
Gribbestad
IS
. 
High-resolution magic angle spinning MRS of breast cancer tissue
.
NMR Biomed
2002
;
15
:
327
37
.
39
Whitehead
TL
,
Kieber-Emmons
T
. 
Applying in vitro NMR spectroscopy and 1H NMR metabonomics to breast cancer characterization and detection
.
Prog NMR Spectrosc
2005
;
47
:
165
74
.
40
Cheng
LL
,
Chang
IW
,
Smith
BL
,
Gonzalez
RG
. 
Evaluating human breast ductal carcinomas with high-resolution magic-angle spinning proton magnetic resonance spectroscopy
.
J Magn Reson
1998
;
35
:
194
202
.
41
Bathen
TF
,
Jensen
LR
,
Sitter
B
, et al
. 
MR-determined metabolic phenotype of breast cancer in prediction of lymphatic spread, grade, and hormone status
.
Breast Cancer Res Treat
2006
;
104
:
181
9
.
42
Yang
C
,
Richardson
AD
,
Smith
JW
,
Osterman
A
. 
Comparative metabolomics of breast cancer
.
Pac Symp Biocomput
2007
;
12
:
181
92
.
43
Gribbestad
IS
,
Singstad
TE
,
Nilsen
G
, et al
. 
In vivo 1H MRS of normal breast and breast tumors using a dedicated double breast coil
.
J Magn Reson Imaging
1998
;
8
:
1191
7
.
44
Lai
HS
,
Lee
JC
,
Lee
PH
,
Wang
ST
,
Chen
WJ
. 
Plasma free amino acid profile in cancer patients
.
Semin Cancer Biol
2005
;
15
:
267
76
.
45
Wanatabe
A
,
Higashi
T
,
Sakata
T
,
Nagashima
H
. 
Serum amino acid levels in patients with hepatocellular carcinoma
.
Cancer
1984
;
54
:
1875
82
.
46
Kubota
A
,
Meguid
MM
,
Hitch
DC
. 
Amino acid profiles correlate diagnostically with organ site in three kinds of malignant tumors
.
Cancer
1992
;
69
:
2343
8
.

Supplementary data