Abstract
Background: Basal-like breast cancer (BLBC) is a rare aggressive subtype that is less likely to be detected through mammographic screening. Identification of circulating markers associated with BLBC could have promise in detecting and managing this deadly disease.
Methods: Using samples from the Polish Breast Cancer study, a high-quality population-based case–control study of breast cancer, we screened 10,000 antigens on protein arrays using 45 BLBC patients and 45 controls, and identified 748 promising plasma autoantibodies (AAbs) associated with BLBC. ELISA assays of promising markers were performed on a total of 145 BLBC cases and 145 age-matched controls. Sensitivities at 98% specificity were calculated and a BLBC classifier was constructed.
Results: We identified 13 AAbs (CTAG1B, CTAG2, TP53, RNF216, PPHLN1, PIP4K2C, ZBTB16, TAS2R8, WBP2NL, DOK2, PSRC1, MN1, TRIM21) that distinguished BLBC from controls with 33% sensitivity and 98% specificity. We also discovered a strong association of TP53 AAb with its protein expression (P = 0.009) in BLBC patients. In addition, MN1 and TP53 AAbs were associated with worse survival [MN1 AAb marker HR = 2.25, 95% confidence interval (CI), 1.03–4.91; P = 0.04; TP53, HR = 2.02, 95% CI, 1.06–3.85; P = 0.03]. We found limited evidence that AAb levels differed by demographic characteristics.
Conclusions: These AAbs warrant further investigation in clinical studies to determine their value for further understanding the biology of BLBC and possible detection.
Impact: Our study identifies 13 AAb markers associated specifically with BLBC and may improve detection or management of this deadly disease. Cancer Epidemiol Biomarkers Prev; 24(9); 1332–40. ©2015 AACR.
This article is featured in Highlights of This Issue, p. 1299
Introduction
Breast cancer is known to encompass multiple clinically, molecularly, and pathologically defined subtypes that have very different etiologies, clinical presentations, and outcomes (1–3). Mammographic screening has reduced mortality for breast cancer overall but not for all cancer subtypes; specifically, interval cancers that are estrogen receptor (ER) negative tend to be aggressive and are not well detected by mammography (4–6).
There is great interest in identifying circulating markers associated with aggressive ER-negative basal-like breast cancers (BLBC), which although rarer than other subtypes, are more common in BRCA1 mutation carriers and African-Americans, and occur at younger ages when most mammographic screening programs have shown poorer performance to detect cancer (7, 8). In cancer studies, autoantibodies (AAb) represent a promising class of biomarkers for early detection. AAbs are created by the immune system in response to host proteins and have been shown to be elevated in cancer patients (9–13). Large-scale proteomic screening is an important approach to identify AAb markers and we have previously identified AAbs associated with different diseases, including cancers, using nucleic acid programmable protein arrays (NAPPA; refs. 14–18). The advantage of NAPPA over other proteomic screening methods is that it displays thousands of in vitro–expressed full-length human proteins on glass slides without the need of laborious methods for protein purification (19, 20).
Few studies trying to identify AAbs associated with breast cancer have focused on identifying markers for specific molecular subtypes, primarily because of limited access to highly characterized samples of sufficient numbers. Although many immunohistochemical (IHC) signatures have been described to classify BLBC, the proposed IHC panel by Nielsen and colleagues is the most robust with 100% specificity and 76% sensitivity to classify BLBC when compared with molecular profiling methods (21). The Nielsen IHC panel defines BLBC as those lacking estrogen receptor (ER) and HER2 expression, with positive expression of EGFR or basal cytokeratin 5/6 (CK5/6). Here, we carried out a study to identify AAbs associated with BLBC using samples collected from a high quality, population-based case–control study in Poland.
Materials and Methods
Study samples
Subjects were selected from a population-based breast cancer case–control study of 2,386 cases and 2,502 age and study site–matched controls, between ages 20 and 74 years who resided in Warsaw or Łódź, Poland from 2000 to 2003 (22). Breast cancer risk factors were obtained from a questionnaire as previously described (22). We specifically evaluated age at blood collection, age at menarche, parity, menopausal status, current BMI, family history of breast cancer, and history of benign breast disease. Pathology for all the study cases were reviewed centrally as previously described to provide standardized classification. The basal-like subtype was defined by PAM50 signature when mRNA expression profiles were available (n = 18); the rest (n = 127) were identified by IHC staining for the five markers (ER, PR, HER2, CK5/6, EGFR) as previously described (23, 24). The Luminal A (LumA), Luminal B (LumB), and HER2-enriched subtypes were classified using PAM50 signature. We identified 145 cases with tumor tissues and plasma samples available (25). Similar to previous reports (21), we observed an 80% concordance rate between the five IHC marker panel and the mRNA expression profiles. Each case was individually matched on age (5 years) and study site with population-based controls. To determine the specificity of AAbs identified for BLBC, we selected an additional set of age-matched nonbasal cases (age matched to sample sets 2 and 3) classified by mRNA expression profiles (30 Lum A, 22 LumB, and 18 HER2). The majority of blood samples among cases were collected prior to treatment (52% of BLBC, 57% of LumA, 86% of LumB, and 78% of HER2). All subjects provided informed consent and the study was approved by Institutional Review Boards in Poland and NCI (Bethesda, MD).
Protein array experiments
All open reading frames were obtained from DNASU (https://dnasu.org/). Production of the protein array and array quality control experiments were performed as previously described (26, 27). In brief, arrays displaying 10,000 human proteins (distributed evenly on five array sets) were manufactured. A common control plasma sample was repeated in every experiment to assess reproducibility. Consistency among experiments was determined with scatter plots comparing spot intensity measurements of the same plasma sample tested on different experiments. Details are described in Supplementary Materials and Methods.
Protein array image analysis and quantification
We measured the spot intensity of the scanned slides using ArrayPro Analyzer (MediaCybernetics). Raw intensity values were normalized by subtracting the background signal for the slide, which was estimated by the first quantile of signal intensity in spots with no printed DNA, and divided by the median of background-subtracted intensity from noncontrol spots. In addition, to capture diffused signal (ring) that cannot be quantified by the image analysis software, one researcher qualitatively examined all images to identify and confirm positive responses, which was described previously (28). Briefly, the researcher adjusted raw images to extreme contrast and brightness using ArrayPro Analyzer, and graded each spot at a scale of 0 to 5 based on ring's intensity and morphology.
Antigen selection for focused array
Using the normalized array data from the screening, we calculated sensitivity at 95% specificity based on the data generated from printing batch 1 of each array set, area under receiver operating characteristic curve (AUC), partial AUC above 95% specificity (pAUC), as well as Welch t test P value for each tested protein antigen. In addition, we designed a novel metric named K to measure antigens with strong antibody responses in a fraction of BLBC patients while remaining consistent in controls to aid the candidate selection. K was empirically defined using the formula below.
where qcases and qcontrols denote the empirical quantile functions of normalized data from cases and controls, respectively. Specifically, |$q_{cases}\left({0.975}\right)-q_{cases}\left({0.800} \right)$| was chosen to quantify the separation of the top 20 percentile of the cases' signals, as K was designed to be sensitive to markers with moderate sensitivity yet better separation. The K value is useful at identifying biomarkers that work well for a subset of the true cases, even if the marker does not show a response in other cases. To demand that the selected marker works for more than one case, we chose the top boundary at 97.5 percentile. For antibodies with the same classification performance, a higher K value indicated greater separation of seroreactivity of positive cases and negative controls.
We created focused protein arrays for stringent evaluation of antigens that met at least one of the following criteria: (i) antigens ranked in approximately the top 2% of antigens on each array set based on any of these metrics: sensitivity at 95% specificity (n = 228), AUC (n = 185), pAUC (n = 197), or P value of Welch t test (n = 197); (ii) antigens with K > 1.2 and sensitivity at least 15% at 95% specificity (n = 63); and (iii) antigens with greater prevalence in cases than in controls by visual analysis (n = 198). Specifically, frequency in cases minus frequency in controls was greater than or equal to 2, and frequency in cases divided by frequency in controls was greater than or equal to 1.5; 4) antigen with greater prevalence in controls than in cases by visual analysis (n = 16). Specifically, frequency in controls minus frequency in cases was greater than or equal to 5, and frequency in controls divided by frequency in cases was greater than 1.5. In total, 748 proteins were included for manufacturing focused array.
Power analysis for the biomarker discovery
We calculated the power for antigen selection using a homogeneous disease model and a heterogeneous disease model (29). Using each model and Monte Carlo simulation, we calculated the proportion of markers with 20% sensitivity and 95% specificity that met criteria 1 or 3 in above section “Antigen selection for focused array.” The visual inspection criterion was not considered in the power analysis. Under the homogeneous disease model, 95% of markers with 20% sensitivity and 95% specificity met the selection criteria, and 5% of nonmarkers with 5% sensitivity at 95% specificity met the criteria. Under the heterogeneous model, 73% of markers and 6% of nonmarkers met the criteria. Hence, nearly all such markers with 20% sensitivity and 95% specificity would be selected by our screening process if basal-like subtype is homogeneous, and if it is itself heterogeneous, our process would still be expected to select 73% of these markers.
Antigen selection for ELISA verification
Protein antigens were selected for subsequent ELISA verification when they showed higher prevalence in BLBC in sample set 1 based on visual analysis. Specifically, they had to meet all of the following criteria: (i) their frequency in BLBC minus frequency in controls is greater than or equal to 3; (ii) frequency in cases divided by frequency in controls is greater than or equal to 2; and (iii) frequency in cases is greater than or equal to 4. Totally, eighty-two unique proteins met these criteria and we successfully developed programmable ELISA assay for 71 of them. Two pairs of samples (PBCS-1243, PBCS-2930; PBCS-1754, PBCS-1325) were not measured in ELISA verification experiments due to limited amount of plasma at the time of experiment.
ELISA assays
ELISA assays were performed to verify selected AAb responses towards protein antigens using freshly produced human proteins as previously described (30). In brief, 96-well highbind ELISA plates (Corning) were coated with the goat anti-GST antibody (GE Healthcare) at 10 μg/mL in 0.2 mol/L sodium bicarbonate buffer pH 9.4 overnight at 4°C one day prior to experiment. All high-throughput liquid handling were performed using a BioMek NxP Laboratory Automation Workstation (Beckman Coulter). See Supplementary Materials and Methods for additional details.
Statistical analysis
Frequencies of tumor characteristics and demographics between cases were compared using χ2 tests. Associations with known breast cancer risk factors were determined using multivariable logistic regression models as previously described (22).
ROC analysis was performed without feature selection (to avoid over fitting) using leave-one-out cross validation. The 13-AAb classifier was developed by classifying samples as positive if they exceeded antigen-specific cutoffs for at least 2 of the 13 antigens. Antigen-specific cutoffs were set to achieve 98% classifier specificity by adjusting the specificity at the antigen level to 98.7%. In this cross validation, for a given antigen-level specificity, we calculated the cutoffs for each antigen using the remaining samples and used these cutoffs to classify the held-out sample. The ROC curve was calculated by adjusting the antigen-level specificity. 95% Confidence intervals of ROC curve and AUC were estimated by bootstrapping within BLBC or controls.
We assessed the association of AAb levels in relation to demographic characteristics using the Kruskal–Wallis test. To determine AAb responders from ELISA analysis, we categorized subjects as responders to specific antigens of interest using 95-percentile cut-point using data from control subjects. This method was used to determine the association of AAb responses with tissue abundance of TP53 protein, as well as the overall survival.
The Kaplan–Meier method was used to generate survival curves for categories of the AAb responders/nonresponders (31). HR and 95% confidence intervals (CI) associated with AAb markers adjusted for age, tumor size, grade, and node status, were estimated using Cox proportional hazard models (32). Survival analysis was performed using Stata/SE v11.2 for Windows.
Results
Sample tumor characteristics and risk factors
BLBC for 145 cases with plasma and individually age-matched controls were identified in the Polish breast cancer study and their characteristics presented in Table 1. Sample set 1, used for protein array screening, comprised 45 cases and 45 controls (Fig. 1; Table 1). The remaining 100 BLBC were randomly assigned to sample sets 2 and 3 for follow up studies of promising AAb markers. Sample set 4 were age-matched non-basal subtypes selected to determine specificity of AAbs identified for BLBC. Analysis of tumor characteristics showed cases in sample set 1 more likely to be of higher grade and less likely to be node positive compared with sample set 2 and 3. Evaluation of established breast cancer risk factors for the 145 BLBC cases and 145 age-matched controls showed early age at menarche, positive family history of breast cancer, history of benign breast disease, and ever having a screening mammogram associated with increased breast cancer risk (Supplementary Table S1).
. | Sample set 1 . | Sample set 2 . | Sample set 3 . | Sample set 4 . | |||||
---|---|---|---|---|---|---|---|---|---|
. | Basal-like . | Control . | Basal-like . | Control . | Basal-like . | Control . | LumA . | LumB . | HER-2 . |
. | n = 45 . | n = 45 . | n = 50 . | n = 50 . | n = 50 . | n = 50 . | n = 30 . | n = 22 . | n = 18 . |
Age, mean (SD) | 53.1 (9.8) | 51.6 (9.6) | 54.1 (10.4) | 54.1(10.4) | 52.6 (11.2) | 52.6 (11.2) | 54.3 (11.0) | 56.5 (9.6) | 58.0 (7.1) |
Age min—max | 32–74 | 30–70 | 31–72 | 31–72 | 34–74 | 34–74 | 36–73 | 39–73 | 46–71 |
Parous (%) | 93 | 84 | 84 | 82 | 84 | 82 | 87 | 82 | 83 |
Menopausal (%) | 64 | 56 | 74 | 58 | 58 | 52 | 70 | 73 | 83 |
Poorly differentiated (%) | 76 | 43 | 48 | 3 | 23 | 34 | |||
Node positive (%)a | 29 | 53 | 42 | 47 | 45 | 56 | |||
>2 mm size (%) | 62 | 62 | 62 | 33 | 50 | 72 |
. | Sample set 1 . | Sample set 2 . | Sample set 3 . | Sample set 4 . | |||||
---|---|---|---|---|---|---|---|---|---|
. | Basal-like . | Control . | Basal-like . | Control . | Basal-like . | Control . | LumA . | LumB . | HER-2 . |
. | n = 45 . | n = 45 . | n = 50 . | n = 50 . | n = 50 . | n = 50 . | n = 30 . | n = 22 . | n = 18 . |
Age, mean (SD) | 53.1 (9.8) | 51.6 (9.6) | 54.1 (10.4) | 54.1(10.4) | 52.6 (11.2) | 52.6 (11.2) | 54.3 (11.0) | 56.5 (9.6) | 58.0 (7.1) |
Age min—max | 32–74 | 30–70 | 31–72 | 31–72 | 34–74 | 34–74 | 36–73 | 39–73 | 46–71 |
Parous (%) | 93 | 84 | 84 | 82 | 84 | 82 | 87 | 82 | 83 |
Menopausal (%) | 64 | 56 | 74 | 58 | 58 | 52 | 70 | 73 | 83 |
Poorly differentiated (%) | 76 | 43 | 48 | 3 | 23 | 34 | |||
Node positive (%)a | 29 | 53 | 42 | 47 | 45 | 56 | |||
>2 mm size (%) | 62 | 62 | 62 | 33 | 50 | 72 |
aχ2 P < 0.05 between sample set.
AAb screen of 10,000 antigens using NAPPA array
We performed comprehensive profiling of 10,000 full-length human proteins in sample set 1 to identify promising AAb markers associated with BLBC using both quantitative and visual analysis (Supplementary Fig. S1). Quality of protein array and plasma experiments was assured with consistent protein display (Fig. 2A–C) and high reproducibility (Fig. 2D and E; Supplementary Fig. S2). Across the 10,000 antigen array, the median number of AAb responses was similar between cases and controls (Supplementary Fig. S3, Supplementary Table S2, and Table S3). The protein array data have been deposit in the National Center for Biotechnology Information's Gene Expression Omnibus (GEO) and are accessible through GEO series accession number GSE68119 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68119). To assess protein characteristics that trigger AAbs in general, we performed gene set enrichment analysis (GSEA) of potential common biochemical protein properties, such as protein length, isoelectric point, aromaticity, and fractions of predicted secondary structures, as well as predicted antigenicity by common algorithms using proteins that elicit AAb responses in both BLBC cases and controls (see Supplementary Materials and Methods). We found significant over representation of proteins predicted with high antigenicity (Supplementary Fig. S4). AAb responses were positively associated with protein length, isoelectric point, and fraction of turns, and negatively associated with aromaticity, fraction of α helices (Supplementary Fig. S5). Gene ontology of cellular component analysis indicated that autoantigens identified by visual analysis were significantly enriched in the nucleus and centrosome; while significantly depleted in the plasma membrane, extracellular region, and endoplasmic reticulum membrane (Supplementary Fig. S6).
To identify candidate AAbs among the 10,000 antigens that distinguished BLBC from controls, we took two complementary approaches (Fig. 1). In the first approach, using the array data from sample set 1, we selected 748 antigens (see Materials and Methods; Supplementary Table S4; Supplementary Fig. S7; Supplementary Fig. S8) to produce focused arrays for testing using sample set 2 (Table 1). From these, fourteen antigens with sensitivities above 10% at 98% specificity and K > 0.9 in sample set 2 were selected for blinded replication testing with sample set 3 (Supplementary Table S5). In our second approach, using visual analysis, we selected 82 antigens on the 10,000 array that distinguished BLBC compared with controls in sample set 1 for ELISA (see Materials and Methods). ELISA assays were successfully developed for 71 of the 82 antigens (Supplementary Table S6). After analysis of sample set 1, 15 antigens with sensitivities above 10% at 98% specificity and K > 0.9 (Supplementary Table S7) were identified. Using these two approaches, focused arrays and ELISA, AAbs against 26 unique proteins were carried forward for a subsequent blind test by ELISA.
Blind test of 26 AAb markers associated with BLBC
To test these 26 candidate antigens from our staged screen, an ELISA study using sample sets 1 and 2 as a training set was performed to set threshold values (98% specificity) for each antigen. We then blindly tested the performance of these 26 markers using sample set 3 (Fig. 1). In the training set, plasma antibodies against CTAG1B and CTAG2 proteins demonstrated the best ability to differentiate patients from controls with sensitivities of 21%, 19% respectively at 98% specificity (Table 2). In the blinded test, all 26 antigens were assessed using cut-off values defined by the control samples in the training study. AAbs against TP53, CTAG1B, PPHLN1, WBP2NL, and DOK2 showed sensitivities above 5% at 98% specificity in both training and test sets. AAbs against CTAG2 showed lower specificity (96%) in the test set, but its sensitivity remained high at 18% (Table 2).
. | Training (sample sets 1 and 2: basal, n = 95; control, n = 95) . | Blinded test (sample set 3: basal, n = 50; control, n = 50) . | |||
---|---|---|---|---|---|
Antigen . | Sensitivity . | Specificity . | Cutoffsa . | Sensitivity . | Specificity . |
CTAG1B | 0.213 | 0.979 | 1.606 | 0.200 | 1.000 |
CTAG2 | 0.191 | 0.979 | 1.149 | 0.180 | 0.960 |
TRIM21 | 0.158 | 0.979 | 1.208 | 0.140 | 0.860 |
RNF216 | 0.110 | 0.978 | 1.369 | 0.043 | 0.956 |
MN1 | 0.105 | 0.979 | 1.311 | 0.060 | 0.920 |
PIP4K2C | 0.105 | 0.979 | 1.200 | 0.020 | 1.000 |
TP53 | 0.084 | 0.979 | 3.171 | 0.200 | 1.000 |
ZBTB16 | 0.084 | 0.979 | 1.393 | 0.040 | 0.980 |
DOK2 | 0.074 | 0.979 | 1.164 | 0.060 | 1.000 |
PPHLN1 | 0.063 | 0.979 | 3.394 | 0.080 | 1.000 |
TAS2R8 | 0.063 | 0.979 | 1.064 | 0.080 | 0.940 |
SSMEM1 | 0.063 | 0.979 | 1.562 | 0.060 | 0.960 |
DYRK3 | 0.063 | 0.979 | 1.462 | 0.040 | 0.940 |
KRT8 | 0.053 | 0.979 | 1.645 | 0.060 | 0.960 |
LMO4 | 0.053 | 0.979 | 1.199 | 0.020 | 0.980 |
WBP2NL | 0.053 | 0.979 | 1.991 | 0.060 | 0.980 |
JUNB | 0.042 | 0.979 | 1.165 | 0.020 | 0.960 |
TSGA13 | 0.042 | 0.979 | 1.313 | 0.020 | 0.980 |
PVRL4 | 0.042 | 0.979 | 0.899 | 0.020 | 0.920 |
CCDC68 | 0.042 | 0.979 | 2.438 | 0.000 | 0.940 |
BCL2 | 0.042 | 0.979 | 1.160 | 0.000 | 1.000 |
SNRK | 0.032 | 0.979 | 4.127 | 0.020 | 0.960 |
PSRC1 | 0.032 | 0.979 | 1.372 | 0.120 | 0.960 |
KCNIP3 | 0.032 | 0.979 | 0.973 | 0.000 | 0.960 |
POU4F1 | 0.032 | 0.979 | 0.992 | 0.080 | 0.940 |
RNF32 | 0.021 | 0.979 | 1.445 | 0.040 | 0.980 |
. | Training (sample sets 1 and 2: basal, n = 95; control, n = 95) . | Blinded test (sample set 3: basal, n = 50; control, n = 50) . | |||
---|---|---|---|---|---|
Antigen . | Sensitivity . | Specificity . | Cutoffsa . | Sensitivity . | Specificity . |
CTAG1B | 0.213 | 0.979 | 1.606 | 0.200 | 1.000 |
CTAG2 | 0.191 | 0.979 | 1.149 | 0.180 | 0.960 |
TRIM21 | 0.158 | 0.979 | 1.208 | 0.140 | 0.860 |
RNF216 | 0.110 | 0.978 | 1.369 | 0.043 | 0.956 |
MN1 | 0.105 | 0.979 | 1.311 | 0.060 | 0.920 |
PIP4K2C | 0.105 | 0.979 | 1.200 | 0.020 | 1.000 |
TP53 | 0.084 | 0.979 | 3.171 | 0.200 | 1.000 |
ZBTB16 | 0.084 | 0.979 | 1.393 | 0.040 | 0.980 |
DOK2 | 0.074 | 0.979 | 1.164 | 0.060 | 1.000 |
PPHLN1 | 0.063 | 0.979 | 3.394 | 0.080 | 1.000 |
TAS2R8 | 0.063 | 0.979 | 1.064 | 0.080 | 0.940 |
SSMEM1 | 0.063 | 0.979 | 1.562 | 0.060 | 0.960 |
DYRK3 | 0.063 | 0.979 | 1.462 | 0.040 | 0.940 |
KRT8 | 0.053 | 0.979 | 1.645 | 0.060 | 0.960 |
LMO4 | 0.053 | 0.979 | 1.199 | 0.020 | 0.980 |
WBP2NL | 0.053 | 0.979 | 1.991 | 0.060 | 0.980 |
JUNB | 0.042 | 0.979 | 1.165 | 0.020 | 0.960 |
TSGA13 | 0.042 | 0.979 | 1.313 | 0.020 | 0.980 |
PVRL4 | 0.042 | 0.979 | 0.899 | 0.020 | 0.920 |
CCDC68 | 0.042 | 0.979 | 2.438 | 0.000 | 0.940 |
BCL2 | 0.042 | 0.979 | 1.160 | 0.000 | 1.000 |
SNRK | 0.032 | 0.979 | 4.127 | 0.020 | 0.960 |
PSRC1 | 0.032 | 0.979 | 1.372 | 0.120 | 0.960 |
KCNIP3 | 0.032 | 0.979 | 0.973 | 0.000 | 0.960 |
POU4F1 | 0.032 | 0.979 | 0.992 | 0.080 | 0.940 |
RNF32 | 0.021 | 0.979 | 1.445 | 0.040 | 0.980 |
aELISA relative absorbance at 98 percentile of controls.
Of the 26 AAbs, 13 antigens had sensitivity ≥5% and ≥98% specificity and receiver operating characteristic (ROC) analysis showed the 13 AAb classifier had an AUC of 0.68 (95% CI, 0.67–0.70) to distinguish BLBC from controls (Supplementary Table S8). The curve was computed under leave-one-out cross-validation by varying the cut-off values in the prediction model (see Materials and Methods; Fig. 3). This plasma 13 AAb panel predicted BLBC from controls at a sensitivity of 33% and a specificity of 98%. Assessment of these 13 AAb markers by demographic characteristics among controls found limited evidence of associations, with none meeting Bonferroni P value <5 × 10−4. The most significant relationships seen were with PPHLN1 and higher levels with younger ages at menarche (P = 0.009) and RNF216 and higher levels among subjects whose BMI > 30 compared with lower BMIs (P = 0.008, data not shown).
AAb marker responses in other breast cancer subtypes
To examine the specificity of the 13 AAbs relative to other nonbasal subtypes, we performed ELISA using sample set 4 comprising 30 LumA, 22 LumB, and 18 HER2-enriched patients, and compared them with BLBC. Results indicated that AAbs targeting CTAG1B, CTAG2, and TP53 were significantly higher in BLBC patients' plasma (Fig. 4; Supplementary Table S9) relative to other breast cancer subtypes. Other markers showed some responses in other breast cancer subtypes including RNF216, PPHLN1, PIP4K2C, WBP2NL, DOK2, and MN1 (Supplementary Table S9).
Immunohistochemical data on TP53 and mRNA expression for 13-AAb targets
AAbs to TP53 was one of the most significant markers associated with BLBC and because disease-specific AAbs are usually associated with presence of the corresponding antigens in the tumor tissue (33), we examined whether higher protein level expression of TP53 assessed by immunohistochemistry on tumor microarrays was associated with AAb levels. Analysis of AAb response to TP53 by IHC-positive staining from 79 (54%) of BLBC, showed positive TP53 AAb responses observed in 30% (16/54) with positive TP53 IHC staining, compared with 4% (1/25) of cases with AAb responses that were negative for TP53 expression (P = 0.009, Supplementary Table S10). For the remaining 12 AAb biomarkers, using TCGA breast cancer data we found CTAG1B, RNF216, and PSRC1 to show significantly elevated mRNA levels in BLBC compared with other subtypes (ref. 34; Supplementary Fig. S9). Other markers did not show any significant changes in mRNA expression from TCGA.
Association of AAb and survival
Survival analysis of the 13 AAb markers showed two markers associated with overall survival: TP53 and MN1 (Supplementary Table S11). Among BLBC, those who displayed AAbs against TP53 protein had shorter survival than those without responses (P = 0.03; Supplementary Fig. S10A). Patients with AAbs against MN1 protein also presented worse survival than those without responses (P = 0.04) (Supplementary Fig. S10B), although in multivariable models the HR was similar, the association was no longer statistically significant (Supplementary Table S11).
Discussion
Using plasma samples from 145 patients with BLBC and equal number of age-matched controls derived from a large population-based case–control study of breast cancer, our proteomic screen of 10,000 antigens with NAPPA technology identified a 13 AAb signature that distinguishes BLBC from controls with 33% sensitivity and 98% specificity (CTAG1B, CTAG2, TP53, RNF216, PPHLN1, PIP4K2C, ZBTB16, TAS2R8, WBP2NL, DOK2, PSRC1, MN1, TRIM21). Some of these markers are likely related to overexpression of their corresponding mRNA/protein targets in BLBC themselves (TP53, CTAG1B, RNF216, and PSRC1). Our analysis also suggests AAb markers may be important to consider for prognosis/survival of BLBC patients (specifically, TP53 and MN1) and warrant further investigation in future clinical studies.
In general, the NAPPA arrays used to discover AAb responses for BLBC were highly reproducible and our 10,000 antigen panel tested about half of the proteins encoded by human genome. We did not find overall AAb responses to differ significantly between cases and controls and there was considerable variability between subjects. Our data from NAPPA arrays showed AAbs tend to develop in response to proteins with a high fraction of turns, a low fraction of helices, and low aromaticity and tend to be located in the nucleus or centrosome. These biochemical properties and subcellular localization preferences hint at protein characteristics likely to be autoimmunogenic. However, a more definitive study of autoimmunogenicity at the proteome scale would require a larger sample size. It would be particularly interesting to evaluate whether such biochemical properties are specific to cancer.
Previous work on breast cancer-associated AAbs have had mixed results (17, 35–38), likely related to the fact that breast cancer encompasses multiple diseases with limited studies on specific subtypes (29). A handful of studies have evaluated blood markers for triple-negative breast cancers (TNBC, defined as ER, progesterone receptor, and HER2 negative through IHC analysis; refs. 35, 38), which is a heterogeneous group that includes BLBC subtype. Among the 13 AAbs we identified to be associated with BLBC, sensitivities ranged from 6% to 21% with 98% specificity for all markers. AAbs to CTAG1B, CTAG2, and TP53 were specific to BLBC compared with other molecular subtypes. The top AAb marker we identified was CTAG1B/CTAG1A genes (the protein product known as NY-ESO-1) antigen, which has been identified in other cancer sites and first discovered in a study of esophageal cancer (39). CTAG2 AAb was the second best performer with 18.8% sensitivity at 98% specificity, and has 91% sequence homology with coding region of CTAG1B/CTAG1A. Our data are consistent with a recent small study of TNBC by Ademuyiwa and colleagues, in which they reported positive plasma AAb responses against NY-ESO-1 in 8 of 11 TNBC patients who had elevated tissue protein levels of NY-ESO-1(40). This observational study did not include controls or any validation.
Among the other markers we identified, data suggest these to be biologically plausible and relevant markers. For example, PPHLN1 encodes periphilin-1, which is involved in epithelial differentiation and shown to elicit AAbs in gastric and breast cancers (41). PSRC1 encodes the mitotic proline/serine rich coiled-coil protein 1 and data support the presence of its AAb in blood to indicate the transition of the precursor lesion ductal carcinoma in situ to invasive disease (11). AAbs to TRIM21, an E3 ubiquitin ligase that promotes p27 degradation, were initially associated with autoimmune rheumatic disease (42), but their appearance in cancer patients' sera was observed in subsequent studies (43). It also participates in destabilization of TP53 protein according to a recent study (44).
Of all the AAb markers we identified, TP53 is by far the most widely studied one. In our analysis, TP53 AAbs had 12.4% sensitivity and 98% specificity for BLBC. Interestingly, AAbs against TP53 and CTAG1B proteins have also been reported in patients with tumors from other organ sites, such as ovarian cancer (14, 45, 46), colorectal cancer (33, 45, 46), and lung cancer (33, 45–47). The rare occurrence of these two AAbs in nonbasal subtypes (LumA, LumB, HER2-enriched) is encouraging for their potential use as a diagnostic tool for these aggressive breast cancers. Furthermore, the fact that multiple tumors with aggressive phenotypes also show associations with these AAbs suggests this marker might have value in detection of multiple cancers (33). This is also consistent with recent Cancer Genome Atlas (TCGA) transcriptomic profiles showing similarities between breast, ovarian, and lung tumors (3).
The mechanism of AAb generation remains unclear. Possible explanations include high protein abundance in tumor tissues and mutations (33). IHC staining of TP53 and CTAG1B proteins in previous studies have shown increased tumor tissue expression in TNBC (40, 48, 49), and our IHC data in relation to AAb responses for TP53 are consistent. Using TCGA data, we found CTAG1B, RNF216, and PSRC1 to show significant evidence of elevated mRNA levels in BLBC compared with other subtypes. Although other markers did not show altered mRNA levels, it is widely accepted that mRNA expression does not always predict protein levels, and future proteomic profiling could help clarify the underlying molecular underpinnings that elicit AAb responses.
Our data also suggested AAb markers may have prognostic value consistent with previous reports in other cancer studies (11, 50). In particular, we found AAbs against TP53 and MN1 proteins associated with worse survival among BLBC patients consistent with other cancer studies (33, 48). MN1 encodes meningioma 1, a probable tumor suppressor protein of unknown function. MN1 mRNA is a negative prognostic marker in acute myeloid leukemia (51) and its low protein expression is associated with better treatment response (52). Future clinical studies warrant investigation for their value as markers for prognosis or response to treatment in BLBC.
Strengths of our study include use of a large number of BLBC patient samples and age-matched controls collected within a population-based case–control study of breast cancer, with detailed data on tumor characteristics, demographics, treatment, and survival. To identify AAb markers we used highly reproducible NAPPA arrays with validation in independent sample sets and validation of promising AAb markers using more clinically relevant ELISA assays. A limitation of this study is, although we screened for proteins encoded by approximately 50% of the human genome, these arrays do not display many proteins with posttranslational modifications that might also be important AAb targets for distinguishing cases from controls (53–57). Moreover, given that we performed this analysis in a case–control study with a few samples collected after treatment, it is unclear how early these markers are present with respect to clinical diagnosis. And future studies evaluating these markers in prospective cohorts are needed.
In summary, we have performed the largest proteomic screen using NAPPA technology and identified 13 AAb biomarkers associated with BLBC. With further validation, these markers might contribute to improved detection of BLBC, an aggressive subtype that afflicts younger women where mammography is less sensitive (4–6). Our analysis of AAbs associated with BLBC represent promising markers for early detection because: (i) their sensitivity is not dependent on visualization, so young women with poorly imaged dense breasts may still benefit; and (ii) blood testing can be performed repeatedly without risk of radiation exposure or expensive techniques such as MRI, making this a good approach for those who may require frequent testing. Future work in clinical and prospective observational studies is needed to determine the value of these markers for early detection, prognosis, and response to treatment.
Disclosure of Potential Conflicts of Interest
K.S. Anderson has ownership interest (including patents) and is a consultant/advisory board member for Provista Diagnositcs. J. LaBaer is a consultant/advisory board member for Provista Diagnositcs and ATCC. No potential conflicts of interest were disclosed by the other authors.
Disclaimer
Samples from the Polish Breast Cancer Study (PBCS) were collected under IRB protocol# OH99CN040.
Authors' Contributions
Conception and design: J. Wang, J.D. Figueroa, G. Wallstrom, K.S. Anderson, J. Qiu, J. LaBaer
Development of methodology: J. Wang, K.S. Anderson, J. Qiu, J. LaBaer
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Wang, J.D. Figueroa, G. Demirkan, J. Lissowska, K. Barker
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Wang, J.D. Figueroa, G. Wallstrom, J.G. Park, J. Lissowska, K.S. Anderson, J. Qiu
Writing, review, and/or revision of the manuscript: J. Wang, J.D. Figueroa, G. Wallstrom, K. Barker, J. Lissowska, K.S. Anderson, J. Qiu, J. LaBaer
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J.D. Figueroa, J. Lissowska, J. Qiu, J. LaBaer
Study supervision: J. Qiu, J. LaBaer
Acknowledgments
The authors thank Michael Fiacco for maintaining blinded data and Carlos Morales Betanzos for advice on preparing the figures. The PBCS would like to thank Louise Brinton, Montserrat Garcia-Closas, and Mark Sherman for their efforts in launching and conducting of the study. The authors also thank the patients and their families for taking part in this study.
Grant Support
J. Wang, G. Wallstrom, K. Barker, J.G. Park, G. Demirkan, K.S. Anderson, J. Qiu, and J. LaBaer were supported by a grant from the Early Detection Research Network (5U01CA117374-02; to J. LaBaer and K.S. Anderson). J.D. Figueroa and J. Lissowska were supported by the Intramural Research Funds of the NCI, Department of Health and Human Services (to J.D. Figueroa).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.