Background:

Approximately 85% of the U.S. military active duty population is male and less than 50 years of age, with elevated levels of known risk factors for oropharyngeal squamous cell carcinoma (OPSCC), including smoking, excessive use of alcohol, and greater numbers of sexual partners, and elevated prevalence of human papilloma virus (HPV). Given the recent rise in incidence of OPSCC related to the HPV, the Department of Defense Serum Repository provides an unparalleled resource for longitudinal studies of OPSCC in the military for the identification of early detection biomarkers.

Methods:

We identified 175 patients diagnosed with OPSCC with 175 matched healthy controls and retrieved a total of 978 serum samples drawn at the time of diagnosis, 2 and 4 years prior to diagnosis, and 2 years after diagnosis. Following immunoaffinity depletion, serum samples were analyzed by targeted proteomics assays for multiplexed quantification of a panel of 146 candidate protein biomarkers from the curated literature.

Results:

Using a Random Forest machine learning approach, we derived a 13-protein signature that distinguishes cases versus controls based on longitudinal changes in serum protein concentration. The abundances of each of the 13 proteins remain constant over time in control subjects. The AUC for the derived Random Forest classifier was 0.90.

Conclusions:

This 13-protein classifier is highly promising for detection of OPSCC prior to overt symptoms.

Impact:

Use of longitudinal samples has significant potential to identify biomarkers for detection and risk stratification.

Head and neck squamous cell carcinomas (HNSCC) can arise from any tissue of the upper aerodigestive tract (e.g., mouth, nose, pharynx, larynx, sinuses, or salivary glands; ref. 1). It ranks among the top 5 most prevalent cancers worldwide with approximately 835,000 incident cases and 431,000 deaths estimated in 2018 (2). Traditionally, HNSCC is a disease associated with tobacco and alcohol use (3); however, infection of the oropharynx with human papilloma virus (HPV) is rapidly emerging as a significant new risk factor for oropharyngeal-specific HNSCC and results in a disease clinically different from traditional HNSCC (3). HPV-associated oropharyngeal squamous cell carcinoma (OPSCC) has become a significant health issue in the U.S. military population (4).

Early-stage HNSCC, including OPSCC, generally responds well to therapy, but two thirds of new OPSCC cases are first diagnosed at advanced stage III or IV with lymph node metastases (3), when the overall 5-year survival rate is approximately 50% (3, 5). This statistic has not improved significantly for decades (except for HPV-associated disease), and recurrence is a major concern for stage III/IV HNSCC associated with elevated mortality (5). Therefore, there is an urgent need for effective biomarkers for early detection, risk stratification, and therapeutic prognosis of HNSCC.

Previous attempts to identify biomarkers for early detection of HNSCC in biofluids (saliva, serum, and/or plasma) have produced hundreds of candidate protein biomarkers (6–9), but there are currently no FDA-approved biomarkers for early detection of HNSCC. Common issues with previous studies include limited ability to analyze the number of clinical samples required to avoid overfitting, and the reliance on comparisons between cases and controls at a single (diagnostic) time point, resulting in substantial overlap in the observed protein abundances between cases and controls. The focus on comparison at the time of diagnosis may further limit utility for early detection, as tumor burden is already significant at the time of diagnosis. Longitudinal samples from the same individual over time allow each patient to serve as his/her own control, enabling the detection of early changes within individual patient physiology as well as alleviating across patient heterogeneity and providing lower variation.

The Department of Defense Serum Repository (DODSR) was initiated in 1989 and is comprised of serum samples from active and reserve military personnel drawn at enlistment, biennially for routine HIV testing, and predeployment and postdeployment throughout the service members’ participation in the Military Health System, accompanied by the service member's electronic health records. Previous studies have used the DODSR to investigate infectious diseases, autoimmune diseases, multiple sclerosis, multiple myeloma, and other cancers such as lymphoma and breast and testicular cancers (10). Thus, the DODSR represents a unique resource for longitudinal studies of cancer risk, progression, and response to therapy, especially for head and neck cancers.

OPSCC represents an ideal target for further study not only in terms of the growing demographic with potential effects on military readiness (4), but also because of the growing understanding of cancer genomics related to HPV-associated and nonassociated cancers. Candidate biomarkers for OPSCC reported in the literature are generally based on small opportunistic studies and have not been evaluated simultaneously in a large clinical cohort. Reviews of the literature confirmed multiple proteins known to be mutated in head and neck cancer, with a specific subset seen in HPV-positive oropharynx cancer, outlined in the 2015 article by Hayes and colleagues (11). A recent study suggested several targets that can be seen in early oropharyngeal cancer (12). Herein we employed an unbiased multiplexed mass spectrometry (MS)-based targeted proteomics platform for precise identification of serum protein biomarkers, selected from a rich resource of reported HNSCC biomarkers, and tested in a large cohort of longitudinal DODSR serum specimens for early detection of HNSCC.

DODSR serum specimens

All DODSR serum samples used in this study were collected between 2003 and 2013, and processed, aliquoted, and stored using established protocols adopted by the DODSR (13). We identified 3,160 cases that fit the OPSCC primary site and diagnosis date requirements, and this number was reduced to 175 subjects who met the Active Duty requirement. We analyzed at least two and up to four serum samples from each case. The reference specimen, designated “Dx,” is the routine DODSR serum sample drawn closest to the time of the initial OPSCC diagnosis for cases (within 1 year following the diagnosis date). The DODSR was searched for the routine blood draw prior to the OPSCC diagnosis (PreB), and the routine blood draw before that one (PreA). On average, this represented 4 ± 1 years prior to diagnosis (PreA) and 2 ± 1 years prior to diagnosis (PreB), but the interval is not exact, due to logistic issues such as deployment. When available, the routine blood draw subsequent to diagnosis was also retrieved (PostD). Corresponding samples were also obtained from 175 healthy controls, matched by (i) age at the time of diagnosis, (ii) gender, and (iii) time of the blood draw (within ±1 year of each case specimen). In total, there were 978 serum samples. The experimental design and overall distribution of serum collection time points per case and control groups are shown in Fig. 1 and Table 1; Supplementary Table S1 describes the clinical information of the serum samples in more detail. The average patient age was 45 (ranging from 21 to 64) with 171 males and 4 females. This distribution reflects both the difference in incidence of OPSCC in males and females, and the distribution of males and females in the military population. Diagnoses were limited to ICD-O codes corresponding to oropharyngeal primary tumors, irrespective of HPV status. Of the 175 patients, HPV status was confirmed in 35 cases; 18 were confirmed HPV-positive and 17 were confirmed HPV-negative (Supplementary Table S1).

Figure 1.

The experimental design of the OPSCC study using serum samples from the DODSR. Histograms representing number of subjects (y-axis) over time of sample collection (x-axis). Cases were binned into four time points for the comparative analysis. Controls were matched to cases based on age at the time of diagnosis and the availability of serum samples matched on time of blood draw. Each bin represents 2-month period.

Figure 1.

The experimental design of the OPSCC study using serum samples from the DODSR. Histograms representing number of subjects (y-axis) over time of sample collection (x-axis). Cases were binned into four time points for the comparative analysis. Controls were matched to cases based on age at the time of diagnosis and the availability of serum samples matched on time of blood draw. Each bin represents 2-month period.

Close modal
Table 1.

Summary of the overall distribution of serum collection time points per case and control groups.

PreAPreBDxPostD
−4 years−2 yearswithin 1 year+2 years
Samples(±1 year)(±1 year)(+1 year)(±1 year)
 Number of samples 158 157 77 97  
  min −3.0 −1.1 1.1 
Cases Years avg ± SD −4.0 ± 0.5 −2.0 ± 0.4 0.4 ± 0.3 1.9 ± 0.5 
  max −4.9 −3.0 1.0 2.9 
 Number of samples 158 157 77 97  
  min −2.9 −0.4 −0.9 0.4 
Controls (±1 year) Years avg ± SD −4.1 ± 0.5 −2.1 ± 0.6 0.4 ± 0.6 1.9 ± 0.7 
  max −5.7 −3.5 1.8 3.2 
PreAPreBDxPostD
−4 years−2 yearswithin 1 year+2 years
Samples(±1 year)(±1 year)(+1 year)(±1 year)
 Number of samples 158 157 77 97  
  min −3.0 −1.1 1.1 
Cases Years avg ± SD −4.0 ± 0.5 −2.0 ± 0.4 0.4 ± 0.3 1.9 ± 0.5 
  max −4.9 −3.0 1.0 2.9 
 Number of samples 158 157 77 97  
  min −2.9 −0.4 −0.9 0.4 
Controls (±1 year) Years avg ± SD −4.1 ± 0.5 −2.1 ± 0.6 0.4 ± 0.6 1.9 ± 0.7 
  max −5.7 −3.5 1.8 3.2 

Selection of candidate OPSCC biomarkers for targeted proteomics

Candidate targets were selected by searching PubMed and Web of Science for publications containing the keywords “biomarkers, HNSCC, oral cancer, oropharyngeal cancer, head and neck cancer” and curating the results using a genescraper R package (14). A total of 277 candidate biomarkers associated with either HNSCC specifically (131) or identified as general cancer-associated proteins (146) were identified in the curated literature. Studies that focused on serum, saliva, and tissue biomarkers in the diagnosis of HNSCC compared with controls were considered. Thirty-six studies were identified and 209 proteins were initially selected as OPSCC candidate biomarkers (Supplementary Table S2). Members of six signaling pathways (i.e., cell-cycle deregulation by HPV, TGF-β pathway, MAPK pathway, PI3K/AKT/mTOR pathway, NF-κB pathway, and Wnt pathway) reported to be closely related to OPSCC were also included. These candidate markers were further evaluated by three clinician coauthors with expertise in head and neck cancer. The clinical experts also added protein targets directly related to HPV pathogenesis in OPSCC. We next evaluated the detectability of these protein markers in human serum or plasma based on a recent in-depth plasma proteome dataset (15) and unpublished in-house plasma proteome datasets, and found 166 proteins that were detected in either or both plasma proteome datasets. Candidate biomarkers were further downselected in a stepwise workflow that included both expert assessment of clinical relevance and actual MS detectability of the proteins in a mixture of the DODSR serum samples. The final panel consisted of 146 candidate protein biomarkers and targeted proteomic assays were developed for analysis in the large cohort of DODSR samples using liquid chromatography coupled to selective reaction monitoring (LC-SRM). Detailed description of SRM assay development and characterization can be found in Supplementary Materials. The final assay conditions for scheduled LC-SRM analysis of the 146 candidate protein biomarkers are provided in Supplementary Table S3.

Preparation of serum samples

The serum samples were subjected to immunoaffinity depletion for removal of 14 high abundance proteins, followed by protein digestion, sample cleanup, and the addition of heavy isotope–labeled internal peptide standards for targeted proteomics analysis (Fig. 2; Supplementary Data). Two types of unblinded quality control (QC) were implemented to validate the stability and reproducibility of the entire workflow: (i) external QC with the use of commercially available serum [purchased from Biochemed Services (Winchester, VA)], and (ii) internal QC with spiking of two exogenous bovine and yeast proteins into each immunoaffinity-depleted serum sample including the external QC serum samples. Detailed descriptions of the external and internal QC standards and procedures and the detailed process for sample randomization and order of analysis is described in Supplementary Materials.

Figure 2.

Experimental workflow for identification of serum protein biomarkers for early detection of OPSCC using multiplexed targeted proteomics. Both internal and external QCs were used throughout the LC-SRM analysis to ensure robust, high-quality proteomics analysis.

Figure 2.

Experimental workflow for identification of serum protein biomarkers for early detection of OPSCC using multiplexed targeted proteomics. Both internal and external QCs were used throughout the LC-SRM analysis to ensure robust, high-quality proteomics analysis.

Close modal

LC-SRM measurements and data analysis

LC-SRM analysis was performed on a Waters nanoACQUITY UPLC system interfaced to a Thermo Fisher Scientific TSQ Altis triple quadrupole mass spectrometer for simultaneous quantification of a total of 308 peptides from the 146 candidate proteins and two exogenous internal standard proteins across 978 serum samples in the retention time-scheduled SRM mode. Detailed descriptions of the LC-SRM analysis can be found in Supplementary Materials.

SRM data were analyzed using Skyline software. Peak detection and integration were determined on the basis of (i) same retention time and (ii) approximately same relative SRM peak intensity ratios across multiple transitions between light (L) peptides and heavy (H) peptide standards (16). The L/H peak area ratios were automatically calculated by Skyline software (17) and exported to a csv file. All data were manually inspected to ensure correct peak detection and accurate integration. Signal to noise ratio (S/N) was calculated by the peak apex intensity over the mean background noise in a retention time region of ±15 seconds for the target peptides. The background noise levels were conservatively estimated by visually inspecting chromatographic peak regions. If the S/N ratios of light peptides were <3 or there was significant interference, the target was reported as “Not Analyzed” using the abbreviation “NA.” If the endogenous light peptide could not be detected, resulting in an L/H ratio of “zero” due to the lack of endogenous light peptide signal, the target was designated as “Not Detectable” (ND). Data labeled as NA and ND were not used for the statistical analysis. There were no significant outliers in this analysis, as determined by principal component analysis; specifically, there were no outliers attributable to a high number of missing values (Supplementary Fig. S1). All the annotated Skyline output files have been uploaded to Panorama website and are available via https://panoramaweb.org/1MV2jg.url.

Data processing and statistical analysis

The L/H ratios were log2-transformed and zero-centered, followed by batch correction using ComBat (18), and peptide-to-protein rollup using RRollup (19). Quality of protein quantification was estimated on the basis of variances across clinical DODSR samples and external QC serum replicate samples (detailed information on QC serum samples can be found in Supplementary Materials). Total variance represents both technical noise, due to variability in measurement, and biological noise, due to biological differences between samples (i.e., interindividual variability). The variance measured in the technical replicates of the QC samples provides a good estimate of technical noise. Thus, the ratio of the experimental sample variance to the QC sample variance provides a reasonable estimate of the S/N. By eliminating any proteins from further analysis with a S/N <2, we ensured that most of the contribution to protein abundance differences from sample to sample was due to biological differences rather than technical noise. This resulted in 86 proteins passing the threshold.

For statistical significance testing, we used protein relative abundances at individual time points (PreA, PreB, Dx, and PostD). In addition, we used the longitudinal changes between the time points (PreA→PreB, PreB→Dx, and PreA→Dx). The PostD time point was not used for longitudinal analysis as it reflects treatment effect, rather than early detection. The t-tests were based on limma R package (20) that allows for empirical Bayes approach for better estimation of protein variances. The tests were two-tailed and unpaired. To control for multiplicity of hypothesis testing, the P values (<0.05 significance level) were adjusted using the Benjamini–Hochberg approach (21), which is likely to be conservative as it does not account for the correlation structure of the data (22).

A Random Forest approach was used to develop classifiers between case and control groups. Features used for classification were based on protein relative abundances at each individual time points (PreA, PreB, Dx, and PostD) and the longitudinal changes observed comparing the different time points: PreA→PreB, PreA→Dx, and PreB→Dx. Age at diagnosis and sex of a subject were also considered as covariates. Prediction accuracy was estimated using leave-one-out cross-validation (LOOCV). During each round of LOOCV, we selected proteins out of the entire set of 86 using the “boruta” algorithm (Boruta R package; ref. 23) on N-1 samples. Then, we trained a Random Forest classifier model (randomForest R package; ref. 24) using relative abundances or log2 longitudinal change of selected proteins as features. For some analyses, we augmented protein data with age at diagnosis and sex as additional predictive features. The trained Random Forest model then was used to compute the probability that the hold-out sample belongs to the case class. Accuracy of the classifier was assessed using AUC (ROCR R package; ref. 25) composed from LOOCV predictions. The proteins selected into the models in >50% of LOOCV rounds were carried into the final suggested signature.

Multiplex quantification of candidate markers in the DODSR serum samples

Robust targeted proteomics assays were established for multiplex quantification of 146 candidate protein biomarkers across 489 serum samples from 175 OPSCC cases and the 489 serum samples from 175 matched controls. Supplementary Table S4 provides the quantitation results for all 146 candidates and the two exogenous QC proteins across the entire set of 978 DODSR serum samples and the external QC serum samples. The dispersion of SRM signals for surrogate peptides derived from the 146 protein markers indicated relatively stable measurements [coefficient of variation (CV): 25.41%] across all the external QC samples while the same measurement in the DODSR samples (both cases and controls) showed much higher variation (CV: 42.81%; Supplementary Fig. S2A, top), reflecting good reproducibility of the entire workflow. This was further supported by highly stable SRM signals for surrogate peptides derived from two internal QC proteins across all the OPSCC (CV: 14.27%) and external QC samples (CV: 16.09%; Supplementary Fig. S2A, bottom); note that the external QC proteins were added after the immunoaffinity depletion step (Fig. 2), hence having even lower variation compared with that in the target protein measurements. As expected, there is no statistical difference for two internal QC proteins between cases and control groups (Supplementary Fig. S2B).

Identification of proteomic classifiers for OPSCC

Relative protein abundance obtained from the L/H ratios of surrogate peptides across different serum samples were first analyzed by t test for the differentially abundant proteins (Supplementary Table S5). Between cases and controls, the number of statistically significant (adjusted P values ≤0.05) proteins for static time points were 24 and 19 for the Dx and PostD time points, respectively; none of the proteins in the PreA and PreB time points were statistically significant. For the longitudinal abundance changes, the number of statistically significant proteins was 13 and 8 for PreB→Dx and PreA→Dx comparisons, respectively (Supplementary Table S5). All of the proteins consistently selected for Dx or PreB→Dx classification models were significant in t test. We were unable to detect any statistically significant differences between confirmed HPV-positive and confirmed HPV-negative cases, in part due to the limited number of patients with confirmed HPV status. Similarly, we were unable to detect any statistically significant differences correlating with smoking status.

Attempts to use the slope of protein abundance changes computed using actual sample draw dates to compare between case and control groups were complicated by the variability in the timing between the actual clinical diagnosis and the timing of the Dx blood draw. Thus, we decided to model PreB versus Dx as two discrete states.

A Random Forest machine learning approach was used to classify between cases and controls based on relative abundance changes at each time point and longitudinal changes over time. The AUCs for individual time points were 0.44 for PreA, 0.46 for PreB, 0.76 for Dx, and 0.73 for PostD (Fig. 3A). For the longitudinal changes, the AUC values were 0.48 for PreA→PreB, 0.80 for PreA→Dx, and 0.90 for PreB→Dx (Fig. 3B). Because we were selecting features independently for each round of LOOCV, we need to evaluate consistency of how often a certain protein was selected into the Random Forest model (Fig. 4). For the best performing classification based on PreB→ Dx change (AUC, 0.90), 8 proteins (SPARC, SERPINE1, SERPIND1, SELE, LRG1, HPX, GSN, CP) were selected into the model in 100% of rounds. Two more proteins (CTSH and CKM) were selected in >97% of rounds. Three proteins (AHSG, SAA4, CD44) had minor contributions and were selected in <20% of rounds (Fig. 4). We selected 10 proteins that were consistently selected into the classification models and passed a 50% threshold to be carried over for further signature validation. Of the models based on just one time point, the Dx time point (closest to diagnosis) was the most predictive (AUC, 0.76). Proteins consistently (>50% LOOCV iterations) selected into the classification models based on Post1 abundance (SPARC, PKM, LRG1, GSN, CKM, HPX, MDH1, KNG1, SERPINE1) had substantial overlap with consistent proteins from PreB→Dx models (Fig. 4). Six proteins (SPARC, SERPINE1, LRG1, GSN, CKM, HPX) are common between PreB→Dx and Dx models; four proteins (CP, CTSH, SELE, SERPIND1) are unique to PreB→Dx models; three proteins (KNG1, MDH1, PKM) are unique to Dx models. In total, we found 13 proteins that are consistently contributing to classification models based on longitudinal or cross-sectional data. All of the proteins consistently selected for Dx and PreB→Dx classification models were significant in t test (see Supplementary Table S5 and Fig. 4). Out of the common six proteins, four (SPARC, SEPRINE1, GSN, CKM) are lower in cases compare with controls at the Dx time point and two (LRG1 and HPX) are higher. The directionality of abundance change between PreB and Dx time points is also the same. The correlation structure of protein abundances at the Dx time point and PreB→Dx changes is shown in Supplementary Fig. S3. The overlap between proteins significant in t test and selected for Random Forest classifier is substantial, but not 100%.

Figure 3.

The receiver operating curves (ROC) of the Random Forest classifiers for the single time points (A) and for longitudinal changes between different time points (B). Both the AUC curve and the ROC curve are provided with 95% confidence intervals for each comparison between the case and control groups.

Figure 3.

The receiver operating curves (ROC) of the Random Forest classifiers for the single time points (A) and for longitudinal changes between different time points (B). Both the AUC curve and the ROC curve are provided with 95% confidence intervals for each comparison between the case and control groups.

Close modal
Figure 4.

The selection frequency for the differential proteins being selected into the Random Forest classifiers for the single time points (top row) and for longitudinal changes between different time points (bottom row).

Figure 4.

The selection frequency for the differential proteins being selected into the Random Forest classifiers for the single time points (top row) and for longitudinal changes between different time points (bottom row).

Close modal

Supplementary Table S6 presents the analytical performance of the SRM assays including the lower limit of quantitation and measurement reproducibility for the 10 proteins consistently detected in over 97% of OPSCC samples from the longitudinal change PreB→Dx. The overall achievable reproducibility of the measurements over time for these 10 proteins was mostly <20% in CV. The details of these experiments are provided in Supplementary Materials.

We evaluated the stability of abundance of the 13 proteins which are consistently contributing to classification models based on longitudinal or cross-sectional data in the 36 control samples that have all four time points available. To test for changes over time, we applied regression analysis of protein abundance versus actual date of blood draw with the subject being modeled as a fixed effect. None of the proteins has a statistically significant association of abundance with time in the control group (Supplementary Table S7). In addition, we computed CV as a measure of temporal stability. In the control group, the CV over time for the 13 proteins were in the range 6%–36% (Supplementary Fig. S4).

Considerations for patient stratification

Besides protein abundances, we considered age at diagnosis, sex, and HPV status as covariates in the analysis of stratification factors. Only a minority of case subjects had a confirmed HPV diagnosis. Specifically, 17 cases were HPV-negative, 18 HPV-positive, and 140 had no diagnosis. Thus, most likely due to low power, no statistical difference was detected in protein abundances between HPV-positive and HPV-negative cases. Given the male:female sex ratio of the military population, the distribution of sex as a confounding factor was very skewed. Out of 175 case/control pairs, only four pairs are female. Thus, including sex as a confounding factor into the linear model or Random Forest classifier training had no effect on the results. Age at diagnosis had no statistically significant effect on the results of differential abundance testing and was not selected as predictive feature by the Boruta algorithm in Random Forest machine learning approach.

Many HNSCC protein biomarkers have been reported in tissues, saliva, and serum/plasma (6–9), but none of them has been translated into clinical practice. A contributing factor in this failure to validate and convert candidates into robust, FDA-approved biomarkers is the relatively small set of clinical samples analyzed. MS-based targeted proteomics is highly promising to bridge the gap between discovery and validation phases because it allows quantification of hundreds of proteins simultaneously with high specificity and reproducibility, without the need to generate affinity reagents. In this study, a stepwise approach was used to transition from the curated literature to the finalized list based on detectability using LC-SRM assays in real samples; the final 146 protein markers were measured by LC-SRM across 978 serum samples from OPSCC cases and matched controls. The robustness of the entire LC-SRM workflow was monitored using two types of QCs to confirm precise quantification of target proteins in the large cohort of DODSR serum samples. All these results demonstrated that the entire LC-SRM workflow is robust for reliable quantification of the 146 protein biomarkers across the 978 serum samples.

An important distinction of this study, compared with most biomarker discovery studies, is the emphasis on longitudinal comparisons across time points, as compared with a focus on differential protein expression between cases and controls at one single time point. Our longitudinal study using the DODSR samples allowed us to identify biomarkers that differed within the same individual across multiple time points. This approach has significant advantages over conventional biomarker studies in terms of precise identification of biomarkers for early detection of OPSCC and effectively addressing issues with human heterogeneity. Statistical analysis of SRM data from the 978 DODSR serum samples supports this conclusion, as the SRM data comparing cases and controls at one single time point AUC values is 0.76 and 0.73 for the Dx and PostD time points, respectively, compared with an AUC of 0.90 for the classifier based on longitudinal SRM data comparing for PreB→Dx within the same individual. For the PreA→Dx comparison, the AUC value is 0.80, suggesting that changes may be discernible as early as 4 years prior to diagnosis, but may not be sufficiently robust for clinical utility. An important observation in this study is that abundance of these proteins in the classifier was extremely stable over time in the control samples, indicating that a strategy of comparing biomarker abundance measurements with the patient's own serum protein profile over time may have significant clinical utility. The observation of statistically significant changes across the longitudinal OPSCC serum samples from the same individual appears to be a reliable indicator of true abundance changes, which further validated our biomarker discovery.

Among the best classifiers for PreA→Dx and PreB→Dx, there are seven overlapping proteins: hemopexin (HPX), ferritin light chain (FTL), leucine-rich alpha-2-glycoprotein (LRG1), plasminogen activator inhibitor 1 (SERPINE1), creatine kinase M-type (CKM), gelsolin (GSN), and ceruloplasmin (CP). This suggests that they are more robust than the other protein markers and highly promising for early detection of OPSCC. For example, increased expression levels of FTL and ferritin heavy chain (FTH1) have been observed in HNSCC tumor and HNSCC tumor with lymph-node metastasis tissue samples using the chemiluminescent immunoassay method (26). Leucine-rich alpha-2-glycoprotein was downregulated in HNSCC tissues regardless of the grade of the tumor (27), and plasminogen activator inhibitor 1 was detected in the secretome of head and neck/oral squamous cell carcinoma cell lines (28). The level of creatine kinases such as creatine kinase M-type is a useful criterion for several diseases related to HNSCC (29, 30). Gelsolin was one of five proteins on a diagnostic panel derived from comparing 61 oral squamous cell carcinoma (OSCC) and 58 control saliva samples (31). Hemopexin was demonstrated to be differentially expressed between lymph-node metastatic and nonmetastatic OSCC serum samples (31). Ceruloplasmin was upregulated in plasma samples of hypopharyngeal squamous cell carcinoma (32).

Another important aspect of this study is the power of the combination of multiplexed targeted proteomics and high-quality longitudinal clinical specimens for unbiased precise prioritization of protein biomarkers starting with a large biomarker list curated from the literature. The multiplexed biomarker discovery workflow can be easily implemented with simultaneous quantification of approximately 500 candidate proteins for any type of cancer by full utilization of the existing biomarker resource without the need to conduct additional discovery studies. Furthermore, targeted proteomics is more effective than global proteomics for deep biomarker discovery in serum due to the low concentration of potential biomarkers and the wide dynamic concentration range in human serum or plasma. However, targeted proteomic studies may miss other important OPSCC biomarkers because they may rely on an existing list of protein biomarkers from the literature rather than an unbiased, in-depth global proteomics analysis with extensive sample fractionation.

One important consideration in the generalization of these results is a comparison of the characteristics of our study population to the general population. To the best of our knowledge, there have been no published studies directly comparing the incidence rate or prevalence of OPSCC in the general U.S. population and the U.S. military populations. However, it is relevant to point out that the U.S. military population has a higher incidence of high-risk HPV serotypes, specifically HPV-16, than the general population, and also a higher prevalence of other risk factors including smoking and alcohol use (4, 33). The study reported here is a retrospective study of all available OPSCC cases in the DOD serum repository, and thus reflects the incidence of OPSCC in the military population, both male and female. However, females appear to be significantly underrepresented in this study, compared with the expected incidence in females in the general population (approximately one-quarter that in males; ref. 34). This appears to reflect the underrepresentation of females in the military population (15%; ref. 35), rather than any differences in incidence within female members of the military. Thus, it is likely that this study will be most applicable to screening of high-risk males in the general population. It is worth noting that we saw no statistically significant changes when we eliminated females from the analysis.

In conclusion, access to high-quality serum sample cohorts from the DODSR enabled targeted proteomics discovery of promising biomarkers for early detection of OPSCC based on longitudinal data from the same patient. Random Forest analysis of abundance data for146 candidate proteins simultaneously measured by robust SRM assays across 978 DODSR serum samples indicated that comparisons across time in the same patient are superior to single time point comparisons across patients. While direct comparison of protein abundances at the Dx time point for cases and controls showed statistically significant differences in candidate biomarker abundance with AUC >0.70, analysis of longitudinal samples provided substantially improved AUC values of 0.90 for the PreB→Dx classifier and 0.80 for the PreA→Dx classifier. This is due to the observation that OPSCC patients displayed statistically significant changes in protein abundance as early as 2 years prior to diagnosis, effectively captured in the 13-protein classifier, but the abundance of these proteins was constant over time in the matched controls.

While the use of the longitudinal DODSR serum samples enabled the identification of these highly promising markers for early detection of OPSCC, there may be some limitations on the generalizability of this study. First, the military population is both younger and more predominantly male than the general population at risk of OPSCC. Conversely, the military population has an increased incidence of known risk factors for OPSCC, including increased tobacco and alcohol use and an increased incidence of HPV infection (33). The use of p16 has universally become a surrogate marker for HPV-driven OPSCC; however, due to the use of serum samples dating back to 2003, most samples were not tested for HPV in the majority of the cohort. The 34 reported cases were tested with in situ hybridization for low- and high-risk strains of HPV; nevertheless, based on demographics alone this cohort would be considered to have a high prevalence of HPV-driven OPSCC, perhaps with intermediate risk factors due to concomitant tobacco use (36). Absence of HPV-specific markers such as E6 and E7 has been documented elsewhere and postulated to be perhaps a marker of more advanced disease (12). Another possibility is that presence of antibodies to E6 and E7 viral proteins may not correlate with the actual proteins, if they are expressed transiently and subsequently degraded. Specific data on smoking and alcohol use within this cohort is also incomplete, so that there is insufficient power to analyze these confounding variables separately. Variations in the time interval between clinical diagnosis and obtaining the Dx serum sample may have introduced variability related to treatment effects, but this variation appears to be captured in the 95% confidence interval. Validation in a large independent clinical cohort is needed to determine their general utility in clinical practice, and thus justify the use of longitudinal serum analyses as a screening strategy for the general population.

No potential conflicts of interest were disclosed.

Conception and design: T. Shi, V.A. Petyuk, W. Cardoni, G. Coppit, S. Srivastava, J.F. Goodman, C.D. Shriver, T. Liu, K.D. Rodland

Development of methodology: J.Y. Lee, T. Shi, V.A. Petyuk, Y.-T. Wang, J.F. Goodman, C.D. Shriver

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J.Y. Lee, A.A. Schepmoes, C.D. Shriver

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.Y. Lee, V.A. Petyuk, W. Cardoni, G. Coppit, J.F. Goodman, C.D. Shriver, T. Liu, K.D. Rodland

Writing, review, and/or revision of the manuscript: J.Y. Lee, T. Shi, V.A. Petyuk, G. Coppit, J.F. Goodman, C.D. Shriver, T. Liu, K.D. Rodland

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): T.L. Fillmore, S. Srivastava, J.F. Goodman

Study supervision: J.F. Goodman, C.D. Shriver, T. Liu, K.D. Rodland

This work was supported by Federal Award No. HU0001-16-2-0014 (Subaward No. 3879, to K.D. Rodland and T. Liu).

The authors thank the clinical and laboratory staff at the Uniformed Services University of the Health Sciences and Pacific Northwest National Laboratory (PNNL). Portions of the research were performed in the Environmental Molecular Sciences Laboratory (grid.436923.9), a U.S. Department of Energy (DOE) Office of Biological and Environmental Research national scientific user facility on the PNNL campus. PNNL is a multiprogram national laboratory operated by Battelle for the DOE under contract no. DE-AC05-76RL01830. The contents of this publication are the sole responsibility of the author(s) and do not necessarily reflect the views, opinions, or policies of Uniformed Services University of the Health Sciences; The Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc.; the Department of Defense; or the Departments of the Army, Navy, or Air Force. Mention of trade names, commercial products, or organizations does not imply endorsement by the U.S. Government.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Marcu
LG
,
Yeoh
E
. 
A review of risk factors and genetic alterations in head and neck carcinogenesis and implications for current and future approaches to treatment
.
J Cancer Res Clin Oncol
2009
;
135
:
1303
14
.
2.
Bray
F
,
Ferlay
J
,
Soerjomataram
I
,
Siegel
RL
,
Torre
LA
,
Jemal
A
. 
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries
.
CA Cancer J Clin
2018
;
68
:
394
424
.
3.
Leemans
CR
,
Braakhuis
BJ
,
Brakenhoff
RH
. 
The molecular biology of head and neck cancer
.
Nat Rev Cancer
2011
;
11
:
9
22
.
4.
Feinstein
AJ
,
Shay
SG
,
Chang
E
,
Lewis
MS
,
Wang
MB
. 
Treatment outcomes in veterans with HPV-positive head and neck cancer
.
Am J Otolaryngol
2017
;
38
:
188
92
.
5.
Schaaij-Visser
TB
,
Brakenhoff
RH
,
Leemans
CR
,
Heck
AJ
,
Slijper
M
. 
Protein biomarker discovery for head and neck cancer
.
J Proteomics
2010
;
73
:
1790
803
.
6.
Li
SX
,
Yang
YQ
,
Jin
LJ
,
Cai
ZG
,
Sun
Z
. 
Detection of survivin, carcinoembryonic antigen and ErbB2 level in oral squamous cell carcinoma patients
.
Cancer Biomark
2016
;
17
:
377
82
.
7.
Allegra
E
,
Trapasso
S
,
La Boria
A
,
Aragona
T
,
Pisani
D
,
Belfiore
A
, et al
Prognostic role of salivary CD44sol levels in the follow-up of laryngeal carcinomas
.
J Oral Pathol Med
2014
;
43
:
276
81
.
8.
Pereira
LHM
,
Reis
IM
,
Reategui
EP
,
Gordon
C
,
Saint-Victor
S
,
Duncan
R
, et al
Risk stratification system for oral cancer screening
.
Cancer Prev Res
2016
;
9
:
445
55
.
9.
Hsiao
YC
,
Chi
LM
,
Chien
KY
,
Chiang
WF
,
Chen
SF
,
Chuang
YN
, et al
Development of a multiplexed assay for oral cancer candidate biomarkers using peptide immunoaffinity enrichment and targeted mass spectrometry
.
Mol Cell Proteomics
2017
;
16
:
1829
49
.
10.
Perdue
CL
,
Cost
AA
,
Rubertone
MV
,
Lindler
LE
,
Ludwig
SL
. 
Description and utilization of the United States Department of Defense Serum Repository: a review of published studies, 1985–2012
.
PLoS One
2015
;
10
:
e0114857
.
11.
Hayes
DN
,
Van Waes
C
,
Seiwert
TY
. 
Genetic landscape of human papillomavirus-associated head and neck cancer and comparison to tobacco-related tumors
.
J Clin Oncol
2015
;
33
:
3227
34
.
12.
Tuhkuri
A
,
Saraswat
M
,
Mäkitie
A
,
Mattila
P
,
Silén
R
,
Dickinson
A
, et al
Patients with early-stage oropharyngeal cancer can be identified with label-free serum proteomics
.
Br J Cancer
2018
;
119
:
200
12
.
13.
Perdue
CL
,
Eick-Cost
AA
,
Rubertone
MV
. 
A brief description of the operation of the DoD serum repository
.
Mil Med
2015
;
180
:
10
2
.
14.
15.
Keshishian
H
,
Burgess
MW
,
Specht
H
,
Wallace
L
,
Clauser
KR
,
Gillette
MA
, et al
Quantitative, multiplexed workflow for deep analysis of human blood plasma and biomarker discovery by mass spectrometry
.
Nat Protoc
2017
;
12
:
1683
701
.
16.
Song
E
,
Gao
Y
,
Wu
C
,
Shi
T
,
Nie
S
,
Fillmore
TL
, et al
Targeted proteomic assays for quantitation of proteins identified by proteogenomic analysis of ovarian cancer
.
Sci Data
2017
;
4
:
170091
.
17.
MacLean
B
,
Tomazela
DM
,
Shulman
N
,
Chambers
M
,
Finney
GL
,
Frewen
B
, et al
Skyline: an open source document editor for creating and analyzing targeted proteomics experiments
.
Bioinformatics
2010
;
26
:
966
8
.
18.
Johnson
WE
,
Li
C
,
Rabinovic
A
. 
Adjusting batch effects in microarray expression data using empirical Bayes methods
.
Biostatistics
2007
;
8
:
118
27
.
19.
Polpitiya
AD
,
Qian
WJ
,
Jaitly
N
,
Petyuk
VA
,
Adkins
JN
,
Camp
DG
, et al
DAnTE: a statistical tool for quantitative analysis of -omics data
.
Bioinformatics
2008
;
24
:
1556
58
.
20.
Ritchie
ME
,
Phipson
B
,
Wu
D
,
Hu
Y
,
Law
CW
,
Shi
W
, et al
limma powers differential expression analyses for RNA-sequencing and microarray studies
.
Nucleic Acids Res
2015
;
43
:
e47
.
21.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate - a practical and powerful approach to multiple testing
.
J Roy Stat Soc Ser B
1995
;
57
:
289
300
.
22.
Stevens
JR
,
Al Masud
A
,
Suyundikov
A
. 
A comparison of multiple testing adjustment methods with block-correlation positively-dependent tests
.
PLoS One
2017
;
12
:
e0176124
.
23.
Miron Kursa
WR
. 
Feature selection with the boruta package
.
J Stat Softw
2010
;
36
:
1
13
.
24.
randomForest: Breiman and Cutler's Random Forests for Classification and Regression; [about 2 screens]
. Available from: https://cran.r-project.org/web/packages/randomForest/index.html.
25.
ROCR: Visualizing the Performance of Scoring Classifiers; [about 2 screens]
. Available from: https://cran.r-project.org/web/packages/ROCR/index.html.
26.
Hu
Z
,
Wang
L
,
Han
Y
,
Li
F
,
Zheng
A
,
Xu
Y
, et al
Ferritin: a potential serum marker for lymph node metastasis in head and neck squamous cell carcinoma
.
Oncol Lett
2019
;
17
:
314
22
.
27.
Wang
Y
,
Chen
C
,
Hua
Q
,
Wang
L
,
Li
F
,
Li
M
, et al
Downregulation of leucinerichalpha2glycoprotein 1 expression is associated with the tumorigenesis of head and neck squamous cell carcinoma
.
Oncol Rep
2017
;
37
:
1503
10
.
28.
Ralhan
R
,
Masui
O
,
Desouza
LV
,
Matta
A
,
Macha
M
,
Siu
KW
. 
Identification of proteins secreted by head and neck cancer cell lines using LC-MS/MS: strategy for discovery of candidate serological biomarkers
.
Proteomics
2011
;
11
:
2363
76
.
29.
Yamauchi
K
,
Kogashiwa
Y
,
Nagafuji
H
,
Kohno
N
. 
Head and neck cancer with dermatomyositis: a report of two clinical cases
.
Int J Otolaryngol
2010
;
2010
:
401825
.
30.
Cinamon
U
. 
Exceptionally elevated creatine kinase levels in a laryngectomized patient: hypothyroid myopathy
.
J Laryngol Otol
2004
;
118
:
651
2
.
31.
Chen
YT
,
Chen
HW
,
Wu
CF
,
Chu
LJ
,
Chiang
WF
,
Wu
CC
, et al
Development of a multiplexed liquid chromatography multiple-reaction-monitoring mass spectrometry (LC-MRM/MS) method for evaluation of salivary proteins as oral cancer biomarkers
.
Mol Cell Proteomics
2017
;
16
:
799
811
.
32.
Tian
WD
,
Li
JZ
,
Hu
SW
,
Peng
XW
,
Li
G
,
Liu
X
, et al
Proteomic identification of alpha-2-HS-glycoprotein as a plasma biomarker of hypopharyngeal squamous cell carcinoma
.
Int J Clin Exp Pathol
2015
;
8
:
9021
31
.
33.
Agan
BK
,
Macalino
GE
,
Nsouli-Maktabi
H
,
Wang
X
,
Gaydos
JC
,
Ganesan
A
, et al
Human papillomavirus seroprevalence among men entering military service and seroincidence after ten years of service
.
MSMR
2013
;
20
:
21
4
.
34.
HPV and Cancer. How Many Cancers Are Linked with HPV Each Year?; [about 5 screens]
. Available from: https://www.cdc.gov/cancer/hpv/statistics/cases.htm.
35.
Reynolds
GM
,
Shendruk
A
. 
Demographics of the U.S. Military. Council on Foreign Relations
. 
2018 Apr 24
.
Available from
: https://www.cfr.org/article/demographics-us-military.
36.
Gillison
ML
,
D'Souza
G
,
Westra
W
,
Sugar
E
,
Xiao
W
,
Begum
S
, et al
Distinct risk factor profiles for human papillomavirus type 16-positive and human papillomavirus type 16-negative head and neck cancers
.
J Natl Cancer Inst
2008
;
100
:
407
20
.