Abstract
Purpose: We aimed to identify candidate proteins for tumor markers to predict the response to gefitinib treatment.
Experimental Design: We did two-dimensional difference gel electrophoresis to create the protein expression profile of lung adenocarcinoma tissues from patients who showed a different response to gefitinib treatment. We used a support vector machine algorithm to select the proteins that best distinguished 31 responders from 16 nonresponders. The prediction performance of the selected spots was validated by an external sample set, including six responders and eight nonresponders. The results were validated using specific antibodies.
Results: We selected nine proteins that distinguish responders from nonresponders. The predictive performance of the nine proteins was validated examining an additional six responders and eight nonresponders, resulting in positive and negative predictive values of 100% (six of six) and 87.5% (seven of eight), respectively. The differential expression of one of the nine proteins, heart-type fatty acid–binding protein, was successfully validated by ELISA. We also identified 12 proteins as a signature to distinguish tumors based on their epidermal growth factor receptor gene mutation status.
Conclusions: Study of these proteins may contribute to the development of personalized therapy for lung cancer patients.
Non–small cell lung carcinoma (NSCLC) accounts for ∼85% of lung cancer cases (1). Biomarker(s) that predict the response to gefitinib (Iressa; AstraZeneca, Macclesfield, United Kingdom), an epidermal growth factor receptor (EGFR) tyrosine kinase inhibitor, may help to improve the choice of therapeutic strategy in patients with NSCLC. Gefitinib improves NSCLC-related symptoms and quality of life in some patients with advanced NSCLC who do not respond to platinum-based chemotherapy. However, the response rate for gefitinib remains <20% in patients with NSCLC (2–4), and treatment with gefitinib is associated with serious adverse effects, such as severe acute interstitial pneumonia in 5.4% of the patients who received the treatment (5, 6). Thus, it is imperative to select appropriate patients for treatment with gefitinib and exclude patients in whom gefitinib is unlikely to exhibit any clinical benefit. Women, patients who have never smoked, patients with adenocarcinoma, and East Asians are major subgroups of responders (3, 4, 6–8). Recently, gain-of-function somatic mutation in the tyrosine kinase domain of the EGFR has been correlated with the response to gefitinib (9, 10). However, other studies have revealed that correction of the phenotype arising from EGFR mutation may not account for all of the clinical benefits of gefitinib (11, 12), and both preclinical and clinical studies have reported that the efficacy of gefitinib is independent of EGFR expression level (11, 13–15). Although molecular features of the EGFR gene, including mutation and high copy number, (16, 17) are associated with response to gefitinib, other molecular markers in the tumor, such as HER2 overexpression (18), Akt phosphorylation (19), and other EGFR downstream molecules (20), also correlate with response. These observations suggest a role for unknown, but important, factors in gefitinib sensitivity. Identification and elucidation of such factors will improve existing therapeutic protocols and contribute to further understanding of the mechanisms of gefitinib sensitivity.
To identify the gene products correlated with the efficacy of gefitinib, genome-wide screening was done recently for NSCLC. A global mRNA expression study using DNA microarrays and biopsy samples identified 51 genes associated with the sensitivity to gefitinib and established a numerical scoring system to predict the response (21). This expression study also led to the establishment of ELISA assays for the identified gene products in serum. Preclinical studies involving mRNA profiling of NSCLC xenografts resulted in the identification of a set of genes that were differentially expressed between tumors that were sensitive and insensitive to gefitinib treatment (22, 23). These studies will lead to the identification of novel biomarkers to predict the response to gefitinib treatment. However, mRNA expression does not necessarily correlate with protein level, and posttranslational modifications, such as phosphorylation, cannot be predicted from the amount of RNA or from the DNA sequence (24). With this background, comprehensive expression studies at the protein level, an approach called proteomics, have been conducted in patients with lung cancer to develop biomarkers that predict clinical outcomes (25). However, no global protein expression study has yet been done on the mechanism of response to gefitinib.
To identify the proteomic signature for sensitivity to gefitinib and to use that signature as a tumor marker to predict the response to gefitinib, we analyzed global protein expression levels in lung adenocarcinoma tissues for whom we have detailed information on EGFR gene status. The surgical specimens were obtained at the time of surgery from patients who subsequently had recurrence and received gefitinib monotherapy. We then used two-dimensional difference gel electrophoresis (2D-DIGE) covering ∼2,000 proteins to identify a set of proteins of which expression was associated with sensitivity to gefitinib and with EGFR mutation. The predictive performance of the protein set was validated with an independent data set and compared with that of EGFR mutation.
Materials and Methods
Patients and tissue samples. We examined tumor tissues from patients who relapsed after surgery and received gefitinib monotherapy. Two hundred seventy-nine patients who received gefitinib at the National Cancer Center Hospital from July 2002 to December 2004 were evaluated for inclusion in this study. Ninety-two patients relapsed after surgical resection of primary NSCLC and started to receive monotherapy with gefitinib 250 mg/d for 14 days (n = 92). We used tumor tissues obtained at the time of surgery and stored in vapor nitrogen. Fifteen patients were excluded from our study for the following reasons: frozen tissues were not available (n = 10) and tumor histology showed squamous cell carcinoma (n = 4) or pleomorphic carcinoma (n = 1). The histologic features of the tissues were reviewed by two board-certified pathologists (Y.M and K.T.) and diagnosis was based on the latest WHO classification of lung adenocarcinoma (8, 26–28). The tumor responses were classified into complete response (CR), partial response (PR), and progressive disease (PD) using standard bidimensional measurements (29). In this study, patients without a marked reduction of tumor size were subdivided into minor response (MR) and stable disease (SD) groups. MR was defined as a 25% decrease in the sum of the products of perpendicular diameters of all measurable lesions at any point during gefitinib treatment. SD was defined as a <25% decrease in tumor size after treatment. The clinical information is summarized in Table 1, and further information, including EGFR mutation status, is summarized in Supplementary Table S1. Consent was obtained from all patients and the protocol was approved by the institutional review board of the National Cancer Center.
Patient characteristics
. | No. patients . | % . | ||
---|---|---|---|---|
Gender | ||||
Female | 33 | 43 | ||
Male | 44 | 57 | ||
Age (y) | ||||
Median (range) | 62.2 (32-80) | — | ||
Histologic type | ||||
Adenocarcinoma | 100 | |||
Papillary/acinar/bronchioloalveolar/solid | 30/16/9/6 | 49/26/15/10 | ||
Smoking history* | ||||
Never smokers | 37 | 48 | ||
Former smokers | 12 | 16 | ||
Current smokers | 28 | 36 | ||
ECOG performance status† | ||||
0/1/2/3 | 24/39/9/5 | 31/51/12/6 | ||
Prior chemotherapy | ||||
Yes | 30 | 39 | ||
No | 47 | 61 | ||
Response to gefinitib | ||||
CR/PR/MR/SD/PD/NE | 2/35/2/8/24/6 | 3/45/3/10/31/8 | ||
EGFR gene status | ||||
Mutation L858R | 18 | 23.4 | ||
DEL‡ | 18 | 23.4 | ||
G719§ | 2 | 2.6 | ||
Wild-type | 35 | 45.4 | ||
Unknown | 4 | 5.2 |
. | No. patients . | % . | ||
---|---|---|---|---|
Gender | ||||
Female | 33 | 43 | ||
Male | 44 | 57 | ||
Age (y) | ||||
Median (range) | 62.2 (32-80) | — | ||
Histologic type | ||||
Adenocarcinoma | 100 | |||
Papillary/acinar/bronchioloalveolar/solid | 30/16/9/6 | 49/26/15/10 | ||
Smoking history* | ||||
Never smokers | 37 | 48 | ||
Former smokers | 12 | 16 | ||
Current smokers | 28 | 36 | ||
ECOG performance status† | ||||
0/1/2/3 | 24/39/9/5 | 31/51/12/6 | ||
Prior chemotherapy | ||||
Yes | 30 | 39 | ||
No | 47 | 61 | ||
Response to gefinitib | ||||
CR/PR/MR/SD/PD/NE | 2/35/2/8/24/6 | 3/45/3/10/31/8 | ||
EGFR gene status | ||||
Mutation L858R | 18 | 23.4 | ||
DEL‡ | 18 | 23.4 | ||
G719§ | 2 | 2.6 | ||
Wild-type | 35 | 45.4 | ||
Unknown | 4 | 5.2 |
Abbbreviation: NE, not evaluated.
Never-smokers: those who had never had a smoking habit; former smokers: those who had stopped smoking at least 1 yr before diagnosis; and current smokers: active smokers at diagnosis of NSCLC or those who had stopped smoking less than 1 yr before diagnosis.
ECOG performance status was monitored according to the previous report (44).
Deletional mutations in exon 19.
G719S and G719C.
To identify the proteins associated with response to gefitinib, we compared the protein expression profiles of responders (CR and PR) and nonresponders (PD). Of 77 samples available, the effects of gefitinib treatment were not examined for six cases because the treatment was not completed. These six samples were excluded from this study. We constructed two sample sets in the following way (Table 2): a training sample set comprising 31 responders (2 CRs + 29 PRs) and 16 nonresponders (16 PDs) and a test set comprising six responders (6 PRs) and 8 nonresponders (8 PDs) from whom samples were obtained between June and December 2004 (Table 2). As no significant differences were observed between CRs and PRs (Supplementary Fig. S1A), we grouped CRs and PRs together in the responder group.
Training and test sets to develop the classifier for the response to gefitinib
. | Training set . | . | . | Test set . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Responders, n = 31 (%) . | Nonresponders, n = 16 (%) . | P . | Responders, n = 6 (%) . | Nonresponders, n = 8 (%) . | P . | ||||||
Age | ||||||||||||
Mean ± SD | 64.0 ± 8.9 | 60.5 ± 12.0 | 0.330 | 57.5 ± 12.8 | 62.8 ± 6.1 | 0.386 | ||||||
Gender | ||||||||||||
Male | 17 (55) | 9 (56) | 0.927 | 3 (50) | 5 (62.5) | 0.640 | ||||||
Female | 14 (45) | 7 (44) | 3 (50) | 3 (37.5) | ||||||||
Smoking history | ||||||||||||
Never smokers | 17 (55) | 9 (56) | 0.286 | 4 (67) | 4 (50) | 0.054 | ||||||
Former smokers | 7 (22.5) | 1 (6) | 2 (33) | 0 (0) | ||||||||
Current smokers | 7 (22.5) | 6 (38) | 0 (0) | 4 (50) | ||||||||
EGFR gene status | ||||||||||||
Mutation | 27 (87) | 1 (6) | <0.001 | 4 (66) | 0 (0) | 0.006 | ||||||
Wild type | 3 (10) | 13 (81) | 1 (17) | 8 (100) | ||||||||
Unknown | 1 (3) | 2 (13) | 1 (17) | 0 (0) | ||||||||
Prior chemotherapy | ||||||||||||
(+) | 12 (39) | 5 (31) | 0.614 | 6 (100) | 0 (22) | <0.001 | ||||||
(−) | 19 (61) | 11 (69) | 0 (0) | 8 (100) | ||||||||
Performance status | ||||||||||||
0 | 11 (35.5) | 6 (37.5) | 0.945 | 2 (33) | 1 (12.5) | 0.347 | ||||||
1 | 11 (35.5) | 10 (62.5) | 4 (67) | 7 (87.5) | ||||||||
2 | 6 (19) | 0 (0) | 0 (0) | 0 (0) | ||||||||
3 | 3 (10) | 0 (0) | 0 (0) | 0 (0) |
. | Training set . | . | . | Test set . | . | . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Responders, n = 31 (%) . | Nonresponders, n = 16 (%) . | P . | Responders, n = 6 (%) . | Nonresponders, n = 8 (%) . | P . | ||||||
Age | ||||||||||||
Mean ± SD | 64.0 ± 8.9 | 60.5 ± 12.0 | 0.330 | 57.5 ± 12.8 | 62.8 ± 6.1 | 0.386 | ||||||
Gender | ||||||||||||
Male | 17 (55) | 9 (56) | 0.927 | 3 (50) | 5 (62.5) | 0.640 | ||||||
Female | 14 (45) | 7 (44) | 3 (50) | 3 (37.5) | ||||||||
Smoking history | ||||||||||||
Never smokers | 17 (55) | 9 (56) | 0.286 | 4 (67) | 4 (50) | 0.054 | ||||||
Former smokers | 7 (22.5) | 1 (6) | 2 (33) | 0 (0) | ||||||||
Current smokers | 7 (22.5) | 6 (38) | 0 (0) | 4 (50) | ||||||||
EGFR gene status | ||||||||||||
Mutation | 27 (87) | 1 (6) | <0.001 | 4 (66) | 0 (0) | 0.006 | ||||||
Wild type | 3 (10) | 13 (81) | 1 (17) | 8 (100) | ||||||||
Unknown | 1 (3) | 2 (13) | 1 (17) | 0 (0) | ||||||||
Prior chemotherapy | ||||||||||||
(+) | 12 (39) | 5 (31) | 0.614 | 6 (100) | 0 (22) | <0.001 | ||||||
(−) | 19 (61) | 11 (69) | 0 (0) | 8 (100) | ||||||||
Performance status | ||||||||||||
0 | 11 (35.5) | 6 (37.5) | 0.945 | 2 (33) | 1 (12.5) | 0.347 | ||||||
1 | 11 (35.5) | 10 (62.5) | 4 (67) | 7 (87.5) | ||||||||
2 | 6 (19) | 0 (0) | 0 (0) | 0 (0) | ||||||||
3 | 3 (10) | 0 (0) | 0 (0) | 0 (0) |
Protein extraction and protein expression profiling. The frozen tumor tissues were crushed to frozen powder with a Multi-Beads Shocker (Yasui-kikai, Osaka, Japan) under cooling with liquid nitrogen. The frozen powder was then treated with urea lysis buffer (7 mol/L urea, 2 mol/L thiourea, 3% CHAPS, 1% Triton X-100) for 30 min on ice. After centrifugation at 15,000 rpm for 30 min, the supernatant was recovered as cellular protein for the protein expression study.
Protein samples were labeled with CyDye DIGE Fluor saturation dye (GE Healthcare Amersham Biosciences, Uppsala, Sweden) according to our previous report (30). We prepared an internal control consisting of a mixture of small portions of all protein samples obtained before May 2004 (31). The internal control sample and the individual experimental samples were labeled with Cy3 and Cy5 CyDye DIGE Fluor saturation dyes, respectively. Five micrograms of Cy3- or Cy5-labeled protein were mixed and coseparated by two-dimensional PAGE. The first-dimension separation was achieved on an Immobiline pH gradient gel (isoelectric point range, 4-7; 24 cm length) with a Multiphor II (GE Healthcare Amersham Biosciences). The second-dimension separation was done with an EttanDalt II (GE Healthcare Amersham Biosciences) with a 9% to 15% gradient polyacrylamide gel. After electrophoresis, the gels were scanned at appropriate wavelengths for Cy3 and Cy5 (Supplementary Fig. S2A). The ratio between Cy5 and Cy3 intensity was calculated for all protein spots in identical gels by the use of DeCyder software (GE Healthcare Amersham Biosciences; ref. 31). The standardized spot intensities were then logarithmically transformed and subjected to a data-mining package (Impressionist; GeneData, Basel, Switzerland). We ran triplicate gels for each sample and calculated the averaged standardized spot intensity.
To assess the reproducibility of the proteomic data with the internal control in our analyses, we generated triplicate protein profiles from identical samples (case 9; Supplementary Table S1) and compared the standardized intensity of the paired spots (Supplementary Fig. S2B). Scattergrams with 1,980, 1,646, and 1,873 spots showed that the intensities of 1,916 (93.7%), 1,599 (94.7%), and 1,770 (94.5%) spots, respectively, were scattered within a 2-fold difference, and the correlation values were also high (r values > 0.93; Supplementary Fig. S2B).
Data analysis. A bioinformatic approach based on a support vector machine (SVM) algorithm and a leave-one-out cross-validation was used to identify proteins of which expression was associated with tumor characteristics, including therapeutic response to gefitinib and the presence of EGFR mutation (32).
Protein identification. Proteins corresponding to the protein spots of interest were identified by mass spectrometry (30). The proteins were recovered in a gel plug by using an automated spot collector (SpotPicker; GE Healthcare Amersham Biosciences) and digested with sequence grade trypsin (Promega, Madison, WI; ref. 30). Trypsin digests were applied to liquid chromatography coupled with tandem mass spectrometry (LTQ, Thermo, Waltham, MA). A database search against Swiss-Prot was done with Mascot software. Patients with a Mascot score of 35 or more were used for protein identification. When multiple proteins were identified in a single spot, the proteins with the highest number of peptides were considered as those corresponding to the spot.
Mutations in the EGFR gene. EGFR mutations in the samples obtained between July 2002 and May 2004 were examined as described in our previous report (8). Analysis of samples obtained between June 2004 and December 2004 was done by high-resolution melting analysis with a LightCycler HR-1 system (Idaho Technology Inc., Salt Lake City, UT).
ELISA. The expression level of heart-type fatty acid–binding protein (H-FABP) in protein samples from 55 lung adenocarcinoma patients (2 CRs, 28 PRs, 6 SDs, 1 MR, and 18 PDs) was measured in a clinical laboratory (SRL, Tokyo, Japan) with a commercially available ELISA kit (MARKIT-M H-FABP, Dainippon Pharmaceutical, Tokyo, Japan) according to the manufacturer's instructions (Supplementary Table S1). All these 55 samples were included in a 2D-DIGE analysis set in this study.
Results
Proteomic signature for the response to gefitinib. We first selected 1,685 protein spots that appeared in at least 80% of the images of Cy3-labeled internal control. We further selected 87 protein spots that showed different intensities between responder and nonresponder groups (P < 0.05, Wilcoxon test). Although potentially resulting in a loss of information, this trimming process decreased the possibility that the classifier would be significantly influenced by irrelevant expression data. We selected protein sets for which expression was associated with response to gefitinib by using a SVM algorithm. Accuracy, plotted as a function of spot number, was constant until the number of spots decreased to less than nine, showing that accurate classification did not require all protein spots (Fig. 1A). The location on the two-dimensional map is shown for the selected nine spots (Fig. 1B; Supplementary Fig. S3). Mass spectrometry revealed that these nine spots corresponded to nine gene products (Table 3). Overall similarity of the selected spots is shown in Supplementary Fig. S1B and C. As the responder group in the training set consisted mainly of PRs, the obtained proteomic signature would presumably be more reflective of PR than CR.
Data-mining procedure to develop the prediction model for the response to gefitinib. A, a spot ranking method selected a few protein spots by which the cumulative error rate of a leave-one-out cross-validation became minimal. The spot ranking method indicated that the error rate was minimal when the prediction model was constructed by a particular nine protein spots. B, localization of the selected nine protein spots on the two-dimensional map. An enlarged two-dimensional image is shown in Supplementary Fig. S2. C, hierarchical clustering analysis of the samples in the learning set using the selected nine protein spots. Black bars, the presence of EGFR mutations within exons 18 to 21. D, principal component analysis of the samples in the learning set using the selected nine protein spots. Comp.1, 2, and 3, the first component 1, 2, and 3, respectively.
Data-mining procedure to develop the prediction model for the response to gefitinib. A, a spot ranking method selected a few protein spots by which the cumulative error rate of a leave-one-out cross-validation became minimal. The spot ranking method indicated that the error rate was minimal when the prediction model was constructed by a particular nine protein spots. B, localization of the selected nine protein spots on the two-dimensional map. An enlarged two-dimensional image is shown in Supplementary Fig. S2. C, hierarchical clustering analysis of the samples in the learning set using the selected nine protein spots. Black bars, the presence of EGFR mutations within exons 18 to 21. D, principal component analysis of the samples in the learning set using the selected nine protein spots. Comp.1, 2, and 3, the first component 1, 2, and 3, respectively.
List of proteins for the response to gefitinib
Spots no.* . | Rank . | Accession no.† . | Identtified protein† . | MW (DA)‡ . | pI‡ . | Ion charge state (+) . | MZ (obs)§ . | Mass∥ . | δ¶ . | Miss** . | Mascot ions score†† . | Peptide sequence . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
384 | 5 | Q96RP9 | Ig mu chain C region | 49,557 | 6.35 | 2 | 810.3 | 1,617.7 | 0.91 | 0 | 74 | QVGSGVTTDQVQAEAK |
2 | 640.1 | 1,277.5 | 0.63 | 0 | 47 | YAATSQVLLPSK | ||||||
671 | 1 | P01876 | Ig α-1 chain C region | 37,655 | 6.08 | 2 | 919.2 | 1,836.0 | 0.32 | 0 | 68 | QEPSQGTTTFAVTSILR |
2 | 771.8 | 1,540.7 | 0.91 | 0 | 54 | DASGVTFTWTPSSGK | ||||||
1090 | 7 | Q9UNH7 | SNX 6 | 46,649 | 5.81 | 2 | 636.5 | 1,270.5 | 0.55 | 0 | 73 | NLVELAELELK |
2 | 577.0 | 1,152.2 | −0.33 | 0 | 39 | SLVDYENANK | ||||||
1182 | 8 | P50453 | Cytoplasmic antiproteinase 3 | 42,404 | 5.61 | 2 | 816.4 | 1,629.8 | 0.95 | 0 | 82 | IEELLPGSSIDAETR |
2 | 626.6 | 1,249.4 | 1.66 | 0 | 75 | AFQSLLTEVNK | ||||||
2 | 591.0 | 1,179.5 | 0.47 | 0 | 63 | LVLVNAIYFK | ||||||
2 | 757.5 | 1,513.6 | −0.56 | 0 | 47 | LQEDYDMESVLR +Oxidation (M) | ||||||
1292 | 6 | P40121 | Macrophage capping protein | 38,518 | 5.88 | 2 | 633.8 | 1,264.4 | 1.18 | 0 | 85 | VSDATGQMNLTK |
2 | 676.8 | 1,351.4 | 0.05 | 0 | 79 | YQEGGVESAFHK | ||||||
2 | 932.1 | 1,861.1 | 1.11 | 0 | 50 | MQYAPNTQVEILPQGR+Oxidation (M) | ||||||
2 | 659.8 | 1,317.3 | 0.23 | 0 | 41 | EGNPEEDLTADK | ||||||
1711 | 3 | Q8NBJ7 | Sulfatase modifying factor 2 | 33,857 | 7.78 | 2 | 792.5 | 1,581.7 | 1.32 | 0 | 112 | MGNTPDSASDNLGFR |
2 | 779.9 | 1,557.6 | 0.15 | 0 | 95 | GASWIDTADGSANHR | ||||||
2 | 740.0 | 1,477.6 | 0.36 | 0 | 83 | LPTEEEWEFAAR | ||||||
2 | 613.2 | 1,224.4 | −0.02 | 0 | 66 | FLMGTNSPDSR | ||||||
2 | 629.9 | 1,256.5 | 1.27 | 0 | 55 | SVLWWLPVEK | ||||||
2 | 818.0 | 1,633.8 | 0.12 | 1 | 55 | RLPTEEEWEFAAR | ||||||
2 | 837.7 | 1,672.9 | 0.48 | 0 | 47 | LEHPVLHVSWNDAR | ||||||
2091 | 9 | P09211 | Glutathione S-transferase P | 23,225 | 5.44 | 2 | 647.5 | 1,292.5 | 0.44 | 0 | 36 | MLLADQGQSWK +Oxidation (M) |
2182 | 4 | P02794 | Ferritin heavy chain | 21,094 | 5.30 | 2 | 823.4 | 1,643.8 | 1.04 | 0 | 91 | MGAPESGLAEYLFDK +Oxidation (M) |
2 | 648.3 | 1,294.5 | 0.03 | 0 | 53 | NVNQSLLELHK | ||||||
2478 | 2 | P05413 | Fatty acid–binding protein, heart | 14,727 | 6.34 | 2 | 735.2 | 1,467.5 | 0.81 | 0 | 103 | LGVEFDETTADDR |
2 | 798.7 | 1,595.7 | −0.32 | 1 | 73 | LGVEFDETTADDRK | ||||||
2 | 603.3 | 1,204.3 | 0.26 | 0 | 70 | WDGQETTLVR | ||||||
2 | 455.0 | 907.0 | 1.04 | 0 | 67 | SLGVGFATR | ||||||
2 | 774.7 | 1,546.8 | 0.56 | 0 | 61 | QVASMTKPTTIIEK | ||||||
2 | 438.0 | 873.0 | 0.88 | 0 | 54 | NGDILTLK | ||||||
1 | 889.6 | 889.0 | −0.41 | 0 | 45 | SIVTLDGGK |
Spots no.* . | Rank . | Accession no.† . | Identtified protein† . | MW (DA)‡ . | pI‡ . | Ion charge state (+) . | MZ (obs)§ . | Mass∥ . | δ¶ . | Miss** . | Mascot ions score†† . | Peptide sequence . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
384 | 5 | Q96RP9 | Ig mu chain C region | 49,557 | 6.35 | 2 | 810.3 | 1,617.7 | 0.91 | 0 | 74 | QVGSGVTTDQVQAEAK |
2 | 640.1 | 1,277.5 | 0.63 | 0 | 47 | YAATSQVLLPSK | ||||||
671 | 1 | P01876 | Ig α-1 chain C region | 37,655 | 6.08 | 2 | 919.2 | 1,836.0 | 0.32 | 0 | 68 | QEPSQGTTTFAVTSILR |
2 | 771.8 | 1,540.7 | 0.91 | 0 | 54 | DASGVTFTWTPSSGK | ||||||
1090 | 7 | Q9UNH7 | SNX 6 | 46,649 | 5.81 | 2 | 636.5 | 1,270.5 | 0.55 | 0 | 73 | NLVELAELELK |
2 | 577.0 | 1,152.2 | −0.33 | 0 | 39 | SLVDYENANK | ||||||
1182 | 8 | P50453 | Cytoplasmic antiproteinase 3 | 42,404 | 5.61 | 2 | 816.4 | 1,629.8 | 0.95 | 0 | 82 | IEELLPGSSIDAETR |
2 | 626.6 | 1,249.4 | 1.66 | 0 | 75 | AFQSLLTEVNK | ||||||
2 | 591.0 | 1,179.5 | 0.47 | 0 | 63 | LVLVNAIYFK | ||||||
2 | 757.5 | 1,513.6 | −0.56 | 0 | 47 | LQEDYDMESVLR +Oxidation (M) | ||||||
1292 | 6 | P40121 | Macrophage capping protein | 38,518 | 5.88 | 2 | 633.8 | 1,264.4 | 1.18 | 0 | 85 | VSDATGQMNLTK |
2 | 676.8 | 1,351.4 | 0.05 | 0 | 79 | YQEGGVESAFHK | ||||||
2 | 932.1 | 1,861.1 | 1.11 | 0 | 50 | MQYAPNTQVEILPQGR+Oxidation (M) | ||||||
2 | 659.8 | 1,317.3 | 0.23 | 0 | 41 | EGNPEEDLTADK | ||||||
1711 | 3 | Q8NBJ7 | Sulfatase modifying factor 2 | 33,857 | 7.78 | 2 | 792.5 | 1,581.7 | 1.32 | 0 | 112 | MGNTPDSASDNLGFR |
2 | 779.9 | 1,557.6 | 0.15 | 0 | 95 | GASWIDTADGSANHR | ||||||
2 | 740.0 | 1,477.6 | 0.36 | 0 | 83 | LPTEEEWEFAAR | ||||||
2 | 613.2 | 1,224.4 | −0.02 | 0 | 66 | FLMGTNSPDSR | ||||||
2 | 629.9 | 1,256.5 | 1.27 | 0 | 55 | SVLWWLPVEK | ||||||
2 | 818.0 | 1,633.8 | 0.12 | 1 | 55 | RLPTEEEWEFAAR | ||||||
2 | 837.7 | 1,672.9 | 0.48 | 0 | 47 | LEHPVLHVSWNDAR | ||||||
2091 | 9 | P09211 | Glutathione S-transferase P | 23,225 | 5.44 | 2 | 647.5 | 1,292.5 | 0.44 | 0 | 36 | MLLADQGQSWK +Oxidation (M) |
2182 | 4 | P02794 | Ferritin heavy chain | 21,094 | 5.30 | 2 | 823.4 | 1,643.8 | 1.04 | 0 | 91 | MGAPESGLAEYLFDK +Oxidation (M) |
2 | 648.3 | 1,294.5 | 0.03 | 0 | 53 | NVNQSLLELHK | ||||||
2478 | 2 | P05413 | Fatty acid–binding protein, heart | 14,727 | 6.34 | 2 | 735.2 | 1,467.5 | 0.81 | 0 | 103 | LGVEFDETTADDR |
2 | 798.7 | 1,595.7 | −0.32 | 1 | 73 | LGVEFDETTADDRK | ||||||
2 | 603.3 | 1,204.3 | 0.26 | 0 | 70 | WDGQETTLVR | ||||||
2 | 455.0 | 907.0 | 1.04 | 0 | 67 | SLGVGFATR | ||||||
2 | 774.7 | 1,546.8 | 0.56 | 0 | 61 | QVASMTKPTTIIEK | ||||||
2 | 438.0 | 873.0 | 0.88 | 0 | 54 | NGDILTLK | ||||||
1 | 889.6 | 889.0 | −0.41 | 0 | 45 | SIVTLDGGK |
Abbreviation: pI, isoelectric point.
Spot numbers refer to those in Fig. 1B (Supplementary Fig. S3).
Accession nos. of proteins were derived from Swiss-Prot and National Center for Biotechnology Information nonredundant databases.
Theoretical molecular weight and isoelectric point were obtained from Swiss-Prot and the ExPASy database (http://au.expasy.org).
Experimental m/z value.
Relative molecular mass calculated from the peptide sequence.
Difference (error) between the experimental and calculated masses.
Number of missed cleavage sites.
Mascot ions score (http://www.matrixscience.com/search_form_select.html).
The classification performance of the selected nine protein spots was validated by unsupervised classification. Hierarchical clustering showed that all tumor samples in the training set, except for cases 5, 20, and 37, were grouped according to their sensitivity to gefitinib based on the expression pattern of the nine proteins (Fig. 1C). In principal component analysis, all 47 samples seemed to be separated into two groups, although the border between these groups was not clear (Fig. 1D). Although hierarchical clustering and principal component analysis are crude methods of validation of classification, the results obtained using them were consistent.
To validate the predictive performance of the nine protein spots, we investigated a newly enrolled test sample set that was completely independent of the learning set. Based on the expression level of the nine protein spots, the distance of each sample from the hyperplane created by the SVM algorithm, defined as the SVM value, was calculated. The samples with a positive SVM value were grouped as responders and the samples with a negative SVM value were grouped as nonresponders. As a consequence, all training set samples were correctly classified in accordance with their clinical response to gefitinib (Fig. 2). All responders (six PRs) and seven of eight nonresponders (eight PDs) in the test set were also correctly classified. The expression pattern of the nine protein spots in the nonresponder patient (case 75) was more similar to that of the responder group, for unknown reasons. We also validated the results using the samples from patients who showed MR and SD. We found that the two patients showing MR were categorized as responders and that among the eight patients showing three SDs were classified into the responder group and five SDs into the nonresponder group. We did a leave-one-out cross validation for all 47 samples in the training set and the test set using nine protein spots with 1,000 times random permutation. All but two cases, cases 37 and 75, were correctly classified according to their status of response to the treatment. The overall misclassification error rate was 3.3%. Consequently, the model predicted the response to gefitinib in 13 of the 14 (92.8%) newly enrolled samples from the responders and nonresponders and may be useful for disease monitoring.
Predictive performance of the nine spots was validated by examining the SVM value of each sample in the group.
Predictive performance of the nine spots was validated by examining the SVM value of each sample in the group.
Proteomic signature for EGFR gene mutation. We studied the spots on the prediction for EGFR mutation. We set a training sample set, including 58 samples (34 mutation-positive samples and 24 mutation-negative samples; Supplementary Table S2). We found that the 12 protein spots showed the high correlation with the EGFR mutation (Supplementary Data; Supplementary Figs. S4-6). The classification and prediction performance of the selected 12 protein spots was successfully validated using the external validation sample set, including four mutation-positive samples and 11 mutation-negative samples (Supplementary Fig. S7). Only one protein, sulfate modifying factor 2, was shared between the signatures for the response and for the mutation (Table 3; Supplementary Table S3).
Expression of H-FABP measured by ELISA. We validated the differential expression of the identified proteins by the use of a widely available clinical assay. The expression level of H-FABP in the same tumor samples as those used in 2D-DIGE was measured with a commercially available ELISA kit intended for serum assays (Fig. 3). H-FABP expression measured by ELISA was highly correlated with that measured by 2D-DIGE (Pearson correlation, 0.76295; P < 0.0001). The ELISA study also showed that the expression level of H-FABP was significantly different between the responder (PR and CR) and nonresponder (PD) groups (P = 0.0031, Mann-Whitney U test) and also between the patients with MR or SD and the nonresponder group (P = 0.0047, Mann-Whitney U test). These results indicate that up-regulation of H-FABP in tumor tissues can be monitored by routine clinical methods.
ELISA assay for H-FABP. The differential expression level of H-FABP was validated by ELISA assay.
ELISA assay for H-FABP. The differential expression level of H-FABP was validated by ELISA assay.
Discussion
We identified 87 protein spots of which the intensity was statistically significantly different between samples from the responder (CR and PR) and nonresponder (PD) groups in the training set. Application of a data-mining procedure allowed identification of a set of nine protein spots that accurately distinguished between responders and nonresponders. The different expression levels of these nine protein spots allowed classification of 13 of 14 of our test PR and PD cases in accordance with their clinical response to gefitinib. These protein spots classified cases showing a MR to gefitinib (MR) into the responder group. The intermediate cases, SD, were categorized into both responder and nonresponder groups. The usefulness of our findings will be validated in a larger clinical data set.
We identified the proteins whose expression was correlated with response to gefitinib and found associations with the EGFR signal pathway and with the biology of lung cancer. Sorting nexin (SNX) 6 is a member of a SNX family that functions in the intracellular trafficking of plasma membrane receptors (33). SNXs form complexes with other SNXs and with plasma membrane receptors. In complexes with SNX1, SNX2, and SNX4, SNX6 interacts with the intercellular portion of the EGFR as well as with transforming growth factor-β receptor, insulin receptor, leptin receptor, and platelet-derived growth factor receptor (34). By binding to the kinase domain of the transforming growth factor-β receptor, SNX6 perturbs transforming growth factor-β signal transduction (34). The other SNX family, SNX1, decreases the expression of EGFR by activating the endosome-to-lysosome pathway with enterophilin-1 (35), although the functions of the complex of SNX6 and EGFR have not yet been reported. The functional association of SNX6 with oncogene product Pim-1, which has been implicated in the development of hematopoietic (36), gastric (37), and prostatic (38) malignancies, suggests the involvement of SNX6 in cancer biology. Kakiuchi et al. (21) reported that another SNX family member, SNX13, was correlated with the response to gefitinib in patients with NSCLC. These reports suggest that SNX6 might play an important role in signal transduction pathways that affect the phenotypes of lung cancer.
We tried to identify the proteins whose expression was associated with EGFR mutation. Because gefitinib is a specific inhibitor of EGFR and mutation of EGFR is considered to be a predictive marker for gefitinib sensitivity, we had expected some similarity between the set of proteins predicting sensitivity to gefitinib and the set of proteins reflecting EGFR mutation status. However, only sulfate modifying factor 2 was common to the two sets. Search of the PubMed database revealed no association of sulfate modifying factor 2 with the EGFR pathway and no evidence for its involvement in resistance to chemotherapy. Similarly, the other proteins correlated with EGFR mutation status had no obvious involvement in the EGFR pathway. Functional studies on these proteins will contribute to further understanding of EGF signaling in cells and to discovery of novel therapeutic targets in lung cancer.
2D-DIGE is a high-performance proteomic technology and a powerful tool to develop candidate biomarkers. However, 2D-DIGE requires expensive fluorescent dyes and well-trained operators to run the gels. Thus, routine clinical studies with multiple large-format two-dimensional gels and a 2D-DIGE protocol are unlikely to be practical. Application of our results requires a simple and cost-effective method that can be used routinely in the clinic. In addition, as we need to examine the expression of multiple proteins, a practical tool for simultaneously measuring the amount of the other proteins is required. With that in mind, we validated measurement of the differential expression of H-FABP by the use of a commercially available ELISA kit (MARLIT-M H-FABP) that is routinely used in hospitals for the early diagnosis of acute myocardial infarction using serum samples. The expression level of H-FABP in tumor tissues as monitored by the ELISA assay was highly correlated with that by 2D-DIGE, and a significant difference in H-FABP expression was observed between responders (CR + PR), minor responders (MR + SD), and nonresponders (PD). Thus, our results can provide a simple and direct method to predict the response to gefitinib.
H-FABP functions in intracellular lipid transport, storage, and metabolism. As H-FABP is highly expressed in heart and released into plasma after myocardial injury, it has been used as a plasma marker for early diagnosis of acute myocardial infarction and stroke. However, many lines of evidence also suggest an association of H-FABP with cancer biology. Higher expression of H-FABP was observed in a more tumorigenic small-cell lung cancer cell line (39) compared with its counterpart. Increased expression of H-FABP is associated with tumor aggressiveness, metastasis, and poor prognosis of gastric cancer (40). In contrast, H-FABP is known to have growth-inhibitory activity in breast cancer cells (41), and breast cancer does not express H-FABP because of gene silencing by hypermethylation (42). These observations suggest complexity in the way that H-FABP is involved in the progression of cancer. Recently, Loeffler-Ragg et al. (43) reported that another FABP family member, E-FABP, is up-regulated in gefitinib-resistant colon cancer cell lines compared with gefitinib-sensitive cell lines. Further study on the contribution of the FABP family to cancer phenotypes, including resistance to chemotherapy, will provide novel insights into cancer biology.
In conclusion, our proteomic study has identified proteins whose expression can predict the response to gefitinib in patients with recurrence of lung adenocarcinoma. Large-scale validation of the present results and functional analysis to elucidate the contribution and synergies of the identified proteins in the response to gefitinib will assist in developing novel therapeutic strategies for lung cancer.
Grant support: ‘Third-Term Comprehensive Control Research for Cancer’ conducted by the Ministry of Health, Labor, and Welfare and by the Program for Promotion of Fundamental Studies in Health Sciences in the National Institute of Biomedical Innovation of Japan. Tetsuya Okano is the recipient of a Research Resident Fellowship from the Foundation for Promotion of Cancer Research (Japan).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
Current address for K. Fujii: Proteome Bioinformatics Project, National Cancer Center Research Institute, Tokyo, Japan.