Abstract
Previous studies have suggested occurrence of altered serum glycan profiles in patients with lung cancer. Here, we aimed to determine the predictive value of serum glycans to distinguish non–small cell lung cancer (NSCLC) cases from controls in prediagnostic samples using a previously validated predictive protein marker pro-SFTPB, as anchor. Blinded prediagnostic serum samples were obtained from the Carotene and Retinol Efficacy Trial (CARET), and included a discovery set of 100 NSCLC cases and 199 healthy controls. A second test set consisted of 108 cases and 216 controls. Cases and controls were matched for age at baseline (5-year groups), sex, smoking status (current vs. former), study enrollment cohort, and date of blood draw. Serum glycan profiles were determined by mass spectrometry. Twelve glycan variables were identified to have significant discriminatory power between cases and controls in the discovery set (AUC > 0.6). Of these, four were confirmed in the independent validation set. A combination marker yielded AUCs of 0.74 and 0.64 in the discovery and test set, respectively. Four glycan variables exhibited significant incremental value when combined with pro-SFTPB compared with pro-SFTPB alone with AUCs of 0.73, 0.72, 0.72, and 0.72 in the test set, indicating that serum glycan signatures have relevance to risk assessment for NSCLC. Cancer Prev Res; 9(4); 317–23. ©2016 AACR.
Introduction
Lung cancer is the leading cause of cancer-related death and despite the reduction of smoking incidence in the United States, 29% of cancer-related deaths in men and 26% in women are attributed to lung cancer (1). When lung cancer is diagnosed at a localized stage, survival rates are much higher than when the disease has metastasized (1). The use of imaging techniques, especially CT scanning, has shown good potential for early diagnosis of lung cancer. The National Lung Screening Trial (NLST) in the United States demonstrated an overall decrease in lung cancer mortality of approximately 20% when individuals at risk for lung cancer were screened yearly using low-dose spiral CT (2). However, efficient implementation of lung cancer screening strategies would benefit from the development of means to assess risk of harboring lung cancer. The development of a blood-based biomarker panel that could be used in a test to complement CT screening either for identifying subjects at increased risk or for improving CT screening performance would provide a more effective path to early diagnosis and reduced mortality of lung cancer.
We previously reported on the identification of circulating pro-surfactant protein B (pro-SFTPB) as a promising blood-based biomarker for lung cancer risk assessment (3, 4). We performed an initial study based on the Carotene and Retinol Efficacy Trial (CARET) cohort, which consisted of prediagnostic NSCLC cases and controls, in which pro-SFTPB yielded an AUC of 0.683, indicative of its potential relevance for early detection of lung cancer together with other markers (3). It was recently shown that pro-SFTPB in combination with the metabolic marker diacetylspermine can provide good diagnostic potential (5).
Glycomics represents a novel paradigm for biomarker discovery (6, 7) and has the potential of providing additional biomarkers for lung cancer early detection. Protein glycosylation is the enzymatic addition of oligosaccharide structures to proteins and generally occurs in two forms: N-glycans and O-glycans. In this study, we will focus on N-glycans. N-glycans are attached to an asparagine residue that is present as part of an N-X-S/T motif and are typically highly branched structures (8). They consist of a core that contains five monosaccharides and can be expanded in a nontemplate-driven way, resulting in substantial heterogeneity. Prior studies have suggested a potential of serum N-glycomics signatures to distinguish subjects diagnosed with lung cancer from controls (9–12). However the potential contribution of glycomics to the identification of subjects at risk for lung cancer in the prediagnostic setting has not been assessed in a blinded validation study using the PRoBE design that addresses intended applications and is recommended by the Early Detection Research Network (13, 14).
We have utilized an in-depth N-glycan analysis method to generate glycan signatures from prediagnostic serum samples from NSCLC cases and matched controls. We aimed to identify candidate glycan markers for NSCLC in a discovery set and determine in a test set whether glycan markers can improve the performance of the previously validated protein marker pro-SFTPB in the prediagnostic setting.
Materials and Methods
Clinical samples
Participants in this nested case–control study were selected from the CARET cohort study. CARET was a multicenter, randomized, double-blinded, placebo-controlled trial aimed to assess the safety and efficacy of daily supplementation with 30 mg of β-carotene plus 25,000 IU of retinyl palmitate in reducing lung cancer incidence in persons at high risk for the disease (15). The study comprised two high-risk populations: heavy smokers (N = 14,254) and asbestos-exposed workers (N = 4,060). Eligible participants for the heavy smoker population were men and women, 50 to 69 years of age, who were either current or former smokers (quit within the previous 6 years) with at least 20 pack-years of cigarette smoking. Eligible participants for the asbestos-exposed population were men ages 45 to 69 years who were smoking at baseline or quit within 15 years prior and had a substantial history of asbestos exposure. Participants were enrolled from 1985 to 1994 and participant follow-up for cancer and mortality outcomes continued until 2005. Blood draws were conducted at baseline and every other year thereafter through 1996 for most participants and a common blood collection and processing protocol was used at all of the study centers. Serum samples were created and stored at −20°C for up to two weeks and then transferred to central −70°C freezers for long-term storage. All CARET participants provided informed consent at recruitment and throughout follow-up, and the Institutional Review Boards at each of the six study centers approved all study procedures.
For this study, two independent sets of 100 (discovery) and 108 (test) NSCLC cases for which a serum sample was available from a blood draw that occurred within 12 months prior to diagnosis were selected. For each lung cancer case, sera from two control subjects that were free of lung cancer during the period of follow-up were selected. Cases and controls were matched for age at baseline (5-year groups), sex, baseline smoking status (current vs. former), study enrollment cohort, and date of blood draw (same follow-up collection time point). For one of the cases in the discovery set, only one control could be assigned, resulting in 299 samples in this set. For both the discovery and the validation sets, samples were blinded and randomized by matched case–control triplets, and the sample preparation and analysis of the two sample sets were performed independently and with a 1-year interval. The clinical characteristics of the two sample sets are provided in Table 1.
Participant characteristics of CARET NSCLC cases and controls
. | Discovery set . | Test set . | ||
---|---|---|---|---|
. | Cases . | Controls . | Cases . | Controls . |
. | (N = 100) . | (N = 199) . | (N = 108) . | (N = 216) . |
Agea, mean (SD) | 61.1 (5.8) | 60.9 (5.9) | 61.9 (5.7) | 61.9 (5.9) |
Pack-years, mean (SD)b | 57 (23) | 47 (22) | 54 (23) | 49 (20) |
Age at diagnosis, mean (SD) | 66.2 (6.2) | 65.1 (6.3) | ||
Sexa | ||||
Male | 75 | 149 | 75 | 150 |
Female | 25 | 50 | 33 | 66 |
Race | ||||
White | 94 | 185 | 99 | 200 |
Black | 3 | 5 | 6 | 8 |
Other | 3 | 9 | 3 | 8 |
Exposure population | ||||
Asbestos-exposed worker | 31 | 56 | 35 | 53 |
Heavy Smoker | 69 | 143 | 73 | 163 |
Smoking status at baselinea | ||||
Current | 61 | 121 | 72 | 144 |
Former | 39 | 78 | 36 | 72 |
Histology | ||||
Adenocarcinoma | 40 | 40 | ||
Squamous cell | 30 | 38 | ||
Other/unspecified NSCLC | 30 | 30 | ||
Stage | ||||
I–II | 14 | 26 | ||
III–IV | 69 | 64 | ||
Unknown | 17 | 18 | ||
Months from blood collection to diagnosis | ||||
<6 months | 48 | 40 | ||
6–12 months | 52 | 68 |
. | Discovery set . | Test set . | ||
---|---|---|---|---|
. | Cases . | Controls . | Cases . | Controls . |
. | (N = 100) . | (N = 199) . | (N = 108) . | (N = 216) . |
Agea, mean (SD) | 61.1 (5.8) | 60.9 (5.9) | 61.9 (5.7) | 61.9 (5.9) |
Pack-years, mean (SD)b | 57 (23) | 47 (22) | 54 (23) | 49 (20) |
Age at diagnosis, mean (SD) | 66.2 (6.2) | 65.1 (6.3) | ||
Sexa | ||||
Male | 75 | 149 | 75 | 150 |
Female | 25 | 50 | 33 | 66 |
Race | ||||
White | 94 | 185 | 99 | 200 |
Black | 3 | 5 | 6 | 8 |
Other | 3 | 9 | 3 | 8 |
Exposure population | ||||
Asbestos-exposed worker | 31 | 56 | 35 | 53 |
Heavy Smoker | 69 | 143 | 73 | 163 |
Smoking status at baselinea | ||||
Current | 61 | 121 | 72 | 144 |
Former | 39 | 78 | 36 | 72 |
Histology | ||||
Adenocarcinoma | 40 | 40 | ||
Squamous cell | 30 | 38 | ||
Other/unspecified NSCLC | 30 | 30 | ||
Stage | ||||
I–II | 14 | 26 | ||
III–IV | 69 | 64 | ||
Unknown | 17 | 18 | ||
Months from blood collection to diagnosis | ||||
<6 months | 48 | 40 | ||
6–12 months | 52 | 68 |
aMatching variables.
bCase versus control difference for pack-years is statistically significant among the discovery and validation sets, Wilcoxon test P = 0.0005 and P = 0.009, respectively.
N-Glycomics assay
The total serum N-glycomics profiles of the CARET samples were obtained using mass spectrometry (MS), as previously described (12), with slight modifications. The N-glycan release of the discovery set was performed in microcentrifuge tubes, while the glycan release of the testing set was performed in 96-well plates. Both methods were shown to perform similarly (Supplementary Fig. S3). Briefly, proteins in 25 μL of serum were denatured using dithiothreitol prior to enzymatic release of the N-glycans using PNGaseF. Upon protein precipitation, the N-glycans were purified by porous graphitized carbon SPE and dried in vacuo prior to MS analysis.
N-glycans were analyzed using an Agilent 6200 series nanoHPLC-chip-TOF-MS; the stationary phase in the microfluidic chip used in the analysis was porous graphitized carbon (PGC), both in the trapping and analytic column. N-glycan samples were reconstituted in 100 μL of water and 1 μL was injected. Glycans were then separated using a gradient of 3% ACN with 0.1% FA (solvent A) and 90% ACN with 0.1% FA (solvent B). Mass spectrometric detection was performed in the positive ionization mode and the instrument was calibrated prior to the start of the analysis of both sample sets. Glycan features were identified and extracted using Masshunter qualitative analysis (Agilent) in combination with our previously developed retrosynthetic N-glycan library, consisting of 332 glycans (16). Glycan compositions and peak areas were exported to csv-format for further processing and statistical evaluation. A more detailed description of the N-glycomics analysis procedure is provided in the Supplementary Information.
We have previously evaluated the performance of this method for biomarker discovery and the instrument variation was shown to be very limited (17). To evaluate instrument performance during the runs, one standard sample was run every 12 (discovery set) or 10 (test set) samples; similarly, standard samples were included to evaluate the stability of the sample preparation.
For the discovery set, samples were prepared in batches of 23. To evaluate the stability of the analytic process, standard serum samples were included every 12 (instrument variation) or 23 (sample preparation variation) samples Batch adjustments were made as needed to compensate for batch effects (Supplementary Fig. S1). To this effect, the percent of total glycan values were median-centered by subtracting the median value of the batch in which the sample was run.
Statistical analysis
For statistical analysis, percentile rank scores were calculated for each of the glycans and all further statistical evaluation was performed using these scores. Furthermore, 18 additional glycan features (see Supplementary Table S1 for calculation of these features) were calculated on the basis of structural glycan characteristics. The glycans together with the glycan features will be referred to as glycan variables for the rest of this article. For the discovery set, performance of the 92 glycan variables as markers for lung cancer was assessed with ROC curve analysis. For each glycan variable, total area under ROC curve (AUC) was calculated to evaluate overall performance. Partial area under ROC curve (pAUC) estimates were calculated separately for specificity ≥ 90% to assess accuracy at high levels of specificity (18). Permutation tests were conducted to obtain false discovery rates (FDR). Permutation datasets (N = 1,000) were generated by randomly permuting case–control status from the original dataset. Total AUC and pAUC for specificity ≥ 90% were then calculated on the permuted datasets to obtain a distribution of AUC/pAUCs under the null hypothesis that the markers have no association with cancer. The study set AUC and P value were evaluated against the distributions of AUCs from the permuted datasets to calculate FDRs. FDR and AUC criteria were established to reduce the marker set to a small number of the most promising candidates for validation. Specifically, glycan variable validation candidates were identified as those with AUC > 0.6 and FDR < 0.05. Performance of the individual candidate makers identified in the discovery set was assessed in the test set with ROC curve analysis and Wilcoxon rank-sum tests. A logistic regression model using backward elimination (P < 0.1) was used to determine a combination marker panel. A likelihood ratio test was used to determine whether the additional of individual glycan variables to pro-SFTPB significantly improved the performance over pro-SFTPB alone, while a nonparametric approach was used for to determine statistical significance in the ROC model comparisons of the combination glycan marker panel in the risk model (19).
Results
Serum N-glycomics biomarker discovery
Glycomics analysis was performed by nano-scale liquid chromatography/mass spectrometry using a porous graphitized carbon stationary phase and time-of-flight (TOF) detection. This method has been shown previously to provide good stability over longer run-times (17) and was therefore considered well suited for biomarker discovery. Using this method, N-glycomics analysis was performed on each of the samples in the discovery set and satisfactory N-glycan signals were obtained for 292 samples (98 cases and 194 controls). An overview of a typical N-glycan chromatogram as obtained in this study is depicted in Supplementary Fig. S2. Seventy-four glycans that were detected in at least 75% of the samples were included in the analysis and intensities relative to the total glycan content were determined for further statistical analysis. On average, these glycans accounted in total for 99% of the overall intensity observed in the runs.
To assess which individual glycans may provide predictive value for NSCLC, AUCs and pAUCs were calculated for the individual glycans. Four glycans were found to meet the significance criteria of an AUC > 0.60 and an FDR < 0.05. Their compositions, median values, AUCs, and P values of the AUC are listed in Table 2. Interestingly, all individual glycans that exhibited significance were nonsialylated and values of two nongalactosylated glycans (H3N4 and H3N4F1) were increased in NSCLC cases, while levels of two fully galactosylated glycans (H5N4F1 and H6N5F1) were decreased. No influence of fucosylation was observed.
Glycan features from the discovery sample set meeting performance criteriaa for test set evaluation, together with their putative structure, and median, AUC, and P values in both the discovery and test sample sets
. | . | . | Discovery . | Test . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Glycan . | Glycan Feature . | Glycan variablea . | NSCLC (median, N = 98) . | Control (median, N = 194) . | AUC . | Pb . | FDR . | pAUC (0.10)c . | NSCLC (median, N = 108) . | Control (median, N = 216) . | AUC . | Pb . | FDR . |
X | Gal_1 | −6.34E−02 | 2.46E−02 | 0.66 | >0.001 | 0.017 | 6.43E−06 | −7.06E−02 | 9.24E−03 | 0.61 | 1.06E−03 | 0.007 | |
X | Gal_3 | −3.42E−02 | 1.89E−02 | 0.66 | 0.001 | 0.014 | 1.35E−05 | −6.18E−02 | 1.57E−02 | 0.61 | 1.65E−03 | 0.007 | |
X | Gal_4 | −7.05E−02 | 1.50E−02 | 0.65 | 0.001 | 0.015 | 1.97E−05 | −7.42E−02 | 3.78E−02 | 0.58 | 1.56E−02 | 0.040 | |
X | Gal_5 | −7.64E−02 | 1.59E−02 | 0.65 | 0.001 | 0.011 | 2.37E−05 | −1.21E−01 | 3.67E−02 | 0.54 | 1.94E−01 | 0.195 | |
X | Sia_2 | 2.85E+00 | −9.04E−01 | 0.65 | 0.001 | 0.009 | 2.59E−05 | 1.51E+00 | −2.77E−01 | 0.57 | 4.34E−02 | 0.056 | |
X | H6N5F1 | −4.40E−05 | 2.20E−05 | 0.64 | 0.001 | 0.007 | 6.71E−05 | −1.30E−05 | 1.60E−07 | 0.57 | 3.17E−02 | 0.056 | |
X | Sia_1 | 5.05E+01 | −1.02E+01 | 0.64 | 0.002 | 0.006 | 1.54E−04 | 2.10E+01 | −4.51E+00 | 0.58 | 1.85E−02 | 0.040 | |
X | Tr | 5.80E−03 | −2.23E−03 | 0.63 | 0.004 | 0.013 | 2.73E−04 | 6.27E−03 | −1.18E−05 | 0.57 | 4.21E−02 | 0.056 | |
X | Gal_2 | −1.56E−01 | 5.63E−02 | 0.63 | 0.005 | 0.012 | 3.84E−04 | −1.17E−01 | 2.15E−02 | 0.60 | 2.45E−03 | 0.008 | |
X | H5N4F1 | −1.30E−03 | 4.90E−04 | 0.62 | 0.005 | 0.006 | 5.40E−04 | −2.00E−03 | 1.20E−03 | 0.64 | 3.61E−05 | >0.001 | |
X | H3N4F1 | 3.20E−03 | −1.50E−03 | 0.61 | 0.010 | 0.014 | 1.38E−03 | 3.10E−03 | −8.70E−04 | 0.56 | 6.86E−02 | 0.081 | |
X | H3N4 | 6.70E−04 | −1.40E−04 | 0.60 | 0.025 | 0.009 | 3.82E−03 | 5.90E−04 | −2.00E−03 | 0.54 | 1.95E−01 | 0.195 |
. | . | . | Discovery . | Test . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Glycan . | Glycan Feature . | Glycan variablea . | NSCLC (median, N = 98) . | Control (median, N = 194) . | AUC . | Pb . | FDR . | pAUC (0.10)c . | NSCLC (median, N = 108) . | Control (median, N = 216) . | AUC . | Pb . | FDR . |
X | Gal_1 | −6.34E−02 | 2.46E−02 | 0.66 | >0.001 | 0.017 | 6.43E−06 | −7.06E−02 | 9.24E−03 | 0.61 | 1.06E−03 | 0.007 | |
X | Gal_3 | −3.42E−02 | 1.89E−02 | 0.66 | 0.001 | 0.014 | 1.35E−05 | −6.18E−02 | 1.57E−02 | 0.61 | 1.65E−03 | 0.007 | |
X | Gal_4 | −7.05E−02 | 1.50E−02 | 0.65 | 0.001 | 0.015 | 1.97E−05 | −7.42E−02 | 3.78E−02 | 0.58 | 1.56E−02 | 0.040 | |
X | Gal_5 | −7.64E−02 | 1.59E−02 | 0.65 | 0.001 | 0.011 | 2.37E−05 | −1.21E−01 | 3.67E−02 | 0.54 | 1.94E−01 | 0.195 | |
X | Sia_2 | 2.85E+00 | −9.04E−01 | 0.65 | 0.001 | 0.009 | 2.59E−05 | 1.51E+00 | −2.77E−01 | 0.57 | 4.34E−02 | 0.056 | |
X | H6N5F1 | −4.40E−05 | 2.20E−05 | 0.64 | 0.001 | 0.007 | 6.71E−05 | −1.30E−05 | 1.60E−07 | 0.57 | 3.17E−02 | 0.056 | |
X | Sia_1 | 5.05E+01 | −1.02E+01 | 0.64 | 0.002 | 0.006 | 1.54E−04 | 2.10E+01 | −4.51E+00 | 0.58 | 1.85E−02 | 0.040 | |
X | Tr | 5.80E−03 | −2.23E−03 | 0.63 | 0.004 | 0.013 | 2.73E−04 | 6.27E−03 | −1.18E−05 | 0.57 | 4.21E−02 | 0.056 | |
X | Gal_2 | −1.56E−01 | 5.63E−02 | 0.63 | 0.005 | 0.012 | 3.84E−04 | −1.17E−01 | 2.15E−02 | 0.60 | 2.45E−03 | 0.008 | |
X | H5N4F1 | −1.30E−03 | 4.90E−04 | 0.62 | 0.005 | 0.006 | 5.40E−04 | −2.00E−03 | 1.20E−03 | 0.64 | 3.61E−05 | >0.001 | |
X | H3N4F1 | 3.20E−03 | −1.50E−03 | 0.61 | 0.010 | 0.014 | 1.38E−03 | 3.10E−03 | −8.70E−04 | 0.56 | 6.86E−02 | 0.081 | |
X | H3N4 | 6.70E−04 | −1.40E−04 | 0.60 | 0.025 | 0.009 | 3.82E−03 | 5.90E−04 | −2.00E−03 | 0.54 | 1.95E−01 | 0.195 |
aGlycan variables identified for testing were those with AUC > 0.60 and an FDR < 0.05.
bP values calculated using the Wilcoxon test.
cpAUC associated with a false positive rate (FPR) upper bound of 0.10 (i.e., AUC for the FPR range of 0–0.10).
As glycans are products of the activity of several glycosidases and glycosyltransferases with stringent specificities, the biosynthetic pathway of glycans is well defined. To assess specific biosynthetic features, a subset of glycans was generated, which is enriched for differential potential by using inclusion criteria of AUC > 0.55 and FDR < 0.5. This resulted in a set of 36 glycans (Supplementary Table S2), and based on their structural features, 18 glycan features were defined: one glycan feature each addressed high mannose type glycans (HM), hybrid type glycans (Hyb), truncated nongalactosylated glycans (Tr), and biantennary galactosylated (BA) glycans, seven glycan features addressed the levels of fucosylation (Fuc_#), five glycan features addressed the level of galactosylation (Gal_#), and two glycan features addressed sialylation (Sia_#; Supplementary Table S1).
Eight glycan features met the significance criteria of an AUC > 0.60 and an FDR < 0.05 in the differential analysis (Table 2). These included Gal_1, Gal_2, Gal_3, Gal_4, Gal_5, Sia_1, Sia_2, and Tr. Of the seven glycan features that addressed the levels of fucosylation, none met the criteria for significance, indicating that the overall fucosylation of serum proteins is not altered in NSCLC. On the other hand, all five of the features addressing galactosylation and all two features addressing sialylation met the criteria for significance, indicating differential galactosylation and sialylation on serum proteins in NSCLC. Differential galactosylation on the high abundance protein IgG has previously been implicated in multiple types of cancer (20–23) and autoimmune diseases (24–26).
Validation of candidate glycan markers in a test set
To further assess the predictive power of N-glycosylation, N-glycomics analysis was performed on blinded samples from an independent test set which consisted of prediagnostic serum samples from 108 NSCLC cases and 216 controls, also from the CARET study. The characteristics of the discovery and test set subjects were similar as shown in Table 1.
Upon normalization and batch correction, AUCs were calculated for the 12 glycan variable candidate markers (four glycans and eight glycan features) with significant differences in their levels between cases and controls in the discovery set (Table 2). Nine of the 12 candidate markers had significant P values (<0.05) for total AUC, indicating that the differential potential of these glycan variables was verified in the independent test set. Of these 9 glycan variables, four had AUC > 0.60, indicating high potential for these variables.
Development of a biomarker combination
The 12 glycan variables (four glycans and eight glycan features) that were statistically significant in the discovery set were used to develop a combined marker panel. Using a logistic regression model with backward elimination, an optimal combination marker was developed on the basis of the discovery set. The combination marker contained four glycan variables (N5H4F1, N6H5F1, Sia_2, and Gal_4) and provided a combined AUC of 0.74, with a 95% confidence interval of 0.68–0.80 (Fig. 1).
ROC curves for the prediction of NSCLC. ROC curves are shown for the combination glycan panel for the discovery set (green, AUC 0.74) and the test set (red, AUC 0.64).
ROC curves for the prediction of NSCLC. ROC curves are shown for the combination glycan panel for the discovery set (green, AUC 0.74) and the test set (red, AUC 0.64).
The combination marker panel, which was developed on the basis of the discovery set, was then applied to the independent test set. Both the glycan variables and their coefficients were locked down based on the discovery set and applied to the test set. The β coefficients for the glycan variables in the model are reported in Supplementary Table S3. Using this approach, an AUC of 0.64 was obtained in the test set, with a 95% confidence interval of 0.58–0.71, indicating that the combination marker could be validated in a second, independent sample set.
Combination of glycan markers with pro-SFTPB
We previously reported an AUC of 0.683 for pro-SFTPB in distinguishing CARET study samples collected from subjects diagnosed with NSCLC within a year following blood draw from matched controls (3). As protein glycosylation is likely to reflect the biologic aspects of the disease independent of circulating protein markers, we hypothesized that the combination of glycosylation markers and pro-SFTPB would provide improved performance compared with either alone. Therefore, the AUC was calculated for models containing pro-SFTPB with each of the individual glycan variables that provided AUC > 0.6 with FDR < 0.05 in the discovery set; a likelihood ratio test was used to estimate the P value relative to the AUC of pro-SFTPB alone (Table 3). In the discovery set, the inclusion of each of the 12 glycan variables significantly improved the predictive value of pro-SFTPB. Good concordance was observed between the discovery and the test set as significantly improved AUCs were obtained for four of the glycan features: H5N4F1, Gal_1, Gal_2, and Gal_3 in the test set with AUCs reported of 0.732, 0.724, 0.723, and 0.721, respectively (Table 3).
Performance of the glycan markers in combination with the protein marker pro-SFTPB
. | Discovery . | Test . | ||||
---|---|---|---|---|---|---|
Marker . | AUC . | Pa . | FDR . | AUC . | Pa . | FDR . |
H3N4 | 0.660 | 0.0066 | 0.0077 | 0.704 | 0.2756 | 0.2756 |
H3N4F1 | 0.668 | 0.0055 | 0.0072 | 0.710 | 0.1141 | 0.1349 |
H5N4F1 | 0.664 | 0.0121 | 0.0121 | 0.732 | 0.0004 | 0.0057 |
H6N5F1 | 0.679 | 0.0003 | 0.0006 | 0.708 | 0.0864 | 0.1337 |
Tr | 0.680 | 0.0005 | 0.0009 | 0.709 | 0.0948 | 0.1337 |
Gal_1 | 0.695 | 0.0001 | 0.0003 | 0.724 | 0.0048 | 0.0206 |
Gal_2 | 0.675 | 0.0006 | 0.0010 | 0.723 | 0.0046 | 0.0206 |
Gal_3 | 0.688 | 0.0003 | 0.0006 | 0.721 | 0.0071 | 0.0229 |
Gal_4 | 0.695 | 0.0000 | 0.0002 | 0.717 | 0.0508 | 0.1100 |
Gal_5 | 0.688 | 0.0003 | 0.0006 | 0.704 | 0.2704 | 0.2756 |
Sia_1 | 0.680 | 0.0022 | 0.0031 | 0.711 | 0.0206 | 0.0536 |
Sia_2 | 0.692 | 0.0001 | 0.0005 | 0.707 | 0.1028 | 0.1337 |
pro-SFTPB | 0.634 | — | — | 0.699 | — | — |
. | Discovery . | Test . | ||||
---|---|---|---|---|---|---|
Marker . | AUC . | Pa . | FDR . | AUC . | Pa . | FDR . |
H3N4 | 0.660 | 0.0066 | 0.0077 | 0.704 | 0.2756 | 0.2756 |
H3N4F1 | 0.668 | 0.0055 | 0.0072 | 0.710 | 0.1141 | 0.1349 |
H5N4F1 | 0.664 | 0.0121 | 0.0121 | 0.732 | 0.0004 | 0.0057 |
H6N5F1 | 0.679 | 0.0003 | 0.0006 | 0.708 | 0.0864 | 0.1337 |
Tr | 0.680 | 0.0005 | 0.0009 | 0.709 | 0.0948 | 0.1337 |
Gal_1 | 0.695 | 0.0001 | 0.0003 | 0.724 | 0.0048 | 0.0206 |
Gal_2 | 0.675 | 0.0006 | 0.0010 | 0.723 | 0.0046 | 0.0206 |
Gal_3 | 0.688 | 0.0003 | 0.0006 | 0.721 | 0.0071 | 0.0229 |
Gal_4 | 0.695 | 0.0000 | 0.0002 | 0.717 | 0.0508 | 0.1100 |
Gal_5 | 0.688 | 0.0003 | 0.0006 | 0.704 | 0.2704 | 0.2756 |
Sia_1 | 0.680 | 0.0022 | 0.0031 | 0.711 | 0.0206 | 0.0536 |
Sia_2 | 0.692 | 0.0001 | 0.0005 | 0.707 | 0.1028 | 0.1337 |
pro-SFTPB | 0.634 | — | — | 0.699 | — | — |
aP value obtained from the likelihood-ratio test indicating the significance of marker +proSFTPB compared with pro-SFTPB alone.
We then also assessed the combination of the developed combination glycan panel (consisting of N5H4F1, N6H5F1, Sia_2, and Gal_4) with pro-SFTPB. Using a combination of these five variables, an AUC of 0.756 with a 95% confidence interval of 0.695–0.815 was obtained in the discovery set, indicating substantially improved accuracy of prediction. The glycan and pro-SFTPB combination panel was then applied to the independent test set. The coefficients of the glycan variables and pro-SFTPB were locked down based on the discovery set and applied to the test set. Using this approach, a combined AUC of 0.697 with a 95% confidence interval of 0.638–0.757 was obtained, which is similar to the predictive power of pro-SFTPB alone in the test set.
To assess whether the combined model improves the risk assessment of lung cancer, we assessed the known risk markers for which data are available in the CARET datasets, including pro-SFTPB. These NSCLC risk-associated variables included in the model are age, gender, smoking status, pack years, and BMI. To assess the effect of the glycan panel, AUCs were calculated for the risk-associated variables both with and without the glycan panel in the discovery set. For the risk factors alone, not including pro-SFTPB, an AUC of 0.61 was observed, while AUCs of 0.73 and 0.77 were obtained for models including the risk factors and pro-SFTPB and the risk factors, pro-SFTPB, and the combination glycan panel, respectively (Supplementary Table S4). Using the model containing both the risk factors and pro-SFTPB as a reference model, the glycan marker panel significantly improved the AUC value of the model (P = 0.00068, likelihood ratio test). When the final model, including the risk markers, pro-SFTPB, and the glycan marker panel were applied to the test set, an AUC of 0.71 with a 95% confidence interval of 0.65–0.77 was obtained. These results indicate that glycans have the potential to improve the risk assessment for NSCLC.
We further explored performance in relation to time to diagnosis. Two subsets were generated, one for samples collected 0–6 months prior to diagnosis and another for samples collected 6–12 months prior to diagnosis. Using the fixed coefficients obtained from the whole discovery set (not stratified by time to diagnosis), we observed AUCs for the combination marker panel of 0.775 and 0.721 0 to 6 months prior to diagnosis for the discovery and test set, respectively, and 0.721 and 0.648 6 to 12 months prior to diagnosis for the discovery and test set, respectively (Supplementary Fig. S4), suggesting that glycosylation changes tracked the development and progression of NSCLC.
Discussion
Our study was intended to critically assess the potential of glycomic analysis to contribute to the identification of markers that inform about lung cancer. The experimental design consisted of the use of prediagnostic samples that minimize potential biases between cases and controls, given that at the time of sample collection disease status was not different between the two groups in a manner that impacts sample collection. Moreover the analysis was done in a blinded fashion both in the discovery and validation sets. We provide evidence of differential N-glycosylation in prediagnostic serum samples from NSCLC cases, common to adenocarcinoma and squamous cell carcinoma, compared with healthy controls. Twelve glycan variables (four glycans and eight glycan features) were identified as candidate markers in a discovery set, of which 9 could be confirmed in a second, independent test set. A model using a combination of 4 glycan variables was developed that yielded an AUC of 0.74 in the discovery set. Application of this combination marker on the test set using coefficients obtained from the discovery set yielded an AUC of 0.64, indicative of the potential relevance of the glycan signature in identifying subjects at risk for NSCLC. We also obtained evidence indicating that the combination of glycosylation markers and the previously characterized NSCLC protein marker pro-SFTPB provides increased disease prediction compared with pro-SFTPB alone. Addition of each of the 12 markers to pro-SFTPB significantly improved performance in the discovery set. These results were validated for four markers in the test set, thus providing evidence for the contribution of glycan signatures to assessment of lung cancer risk.
Addition of the four-glycan marker panel as a whole to pro-SFTPB significantly improved performance in the discovery set, but improvements were limited in the test set when the same panel with locked down coefficients were used. Our results indicate the potential of the use of protein glycosylation in a biomarker panel, and encourage the development of methodology and assays for glycomics research that would withstand the rigor required for clinical assays.
The samples used in this study are prediagnostic samples, and therefore the results presented here provide evidence for the potential use of glycans as markers for early detection of lung cancer. However, additional studies will be necessary to further evaluate the clinical potential. These studies include, but are not limited to, larger case–control studies to evaluate the candidate markers in multiple risk groups and prognostic studies in risk populations. Most of the subjects included in this study were diagnosed at later stage (III and IV), which would likely now also be screened positive in LDCT screening. Therefore, further studies should also focus on the detection of these markers in individuals with early-stage lung cancer to better assess the efficacy of the glycan markers for early detection.
Of the four glycans that provided significant predictive value in the discovery set described in this study, levels of the two non-galactosylated glycans (H3N4 and H3N4F1) were increased in NSCLC. Moreover levels of the two galactosylated glycans (H5N4F1 and H6N5F1) were decreased in NSCLC, thus indicating an overall de-galactosylation. This is further confirmed by the significant decrease in the five galactosylation features (Gal_1 to Gal_5) in the discovery set. In a small sample set of plasma samples obtained from NSCLC patients and healthy controls, we previously observed that the level of IgG galactosylation was decreased (12). Another recent study focused on the MS-based differential analysis of glycosylation profiles of serum samples from patients with lung cancer compared with controls (11). Increased levels of tri- and tetra-antennary structures and decreased levels of galactosylated glycans were observed, which is concordant with our findings using prediagnostic samples. Degalactosylation of IgG has often been reported in disease states including cancer (22, 27), rheumatoid arthritis (25), HIV infection (28), and is possibly associated with a host immune response and inflammation. The glycosylation profiles studied here are dominated by the glycosylation profiles of the high abundance serum proteins such as immunoglobulins and acute phase proteins. It may therefore not be very specific for lung cancer, but further studies will be necessary to draw final conclusions. Interestingly, the levels of galactosylated as well as nongalactosylated biantennary glycans are not significantly affected by smoking status (11) indicating that degalactosylation, as a risk marker for NSCLC, may not be related to smoking.
The mechanism behind the decreased levels of galactosylation and the nature of the proteins that display the altered glycan signature we have observed in this study remain to be determined. It is likely given the relatively high concentration of the involved glycans in circulation that either high abundance proteins or a multitude of proteins are affected may occur as a result of a host response. Initial results from a glycan profiling study in diseased and adjacent healthy tissue samples from NSCLC patients point to decreased levels of galactosylation in tumor tissue, potentially providing further mechanistic insights.
Overall, our findings suggest that glycan signatures in biologic fluids may have predictive value for assessing risk of lung cancer. Glycan profiling likely complements profiling using other platforms as we have demonstrated in our study by comparing the performance of a previously validated biomarker, pro-SFTPB. With the performance of pro-SFTPB together with the glycan signature we have further characterized the prospects for the development of predictive signatures that may have utility for lung cancer early detection.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design:G.E. Goodman, S. Miyamoto, D.R. Gandara, Z. Feng, C. Lebrilla, S.M. Hanash
Development of methodology:L.R. Ruhaak, D.R. Gandara, Z. Feng, C. Lebrilla
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):L.R. Ruhaak, G.E. Goodman, C. Lebrilla
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L.R. Ruhaak, J. Dai, M.J. Barnett, A. Taguchi, G.E. Goodman, S. Miyamoto, Z. Feng, C. Lebrilla
Writing, review, and/or revision of the manuscript: L.R. Ruhaak, M.J. Barnett, A. Taguchi, G.E. Goodman, D.R. Gandara, Z. Feng, C. Lebrilla, S.M. Hanash
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Stroble, M.J. Barnett, S. Miyamoto
Study supervision: Z. Feng
Acknowledgments
The authors thank Dr. Kyoungmi Kim for assistance in the assignment of samples to batches.
Grant Support
This work was financially supported by the Department of Defense (DOD) grant no. CDMRP LCRP W81XWH1010635 (to all authors), NIH #R21CA135240 (to S. Miyamoto), the Canary Foundation (to S. Hanash), the LUNGevity Foundation (to S. Miyamoto), the Thomas G. Labrecque Foundation 201118739 (to S. Miyamoto) and the Rubenstein Family Foundation (to S. Hanash).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.