Abstract
Numerous molecular biomarkers have been suggested for early detection of cervical cancer, but their usefulness in routinely collected exfoliated cells remains uncertain. We used quantitative reverse transcription-PCR to evaluate expression of 40 candidate genes as markers for high-grade cervical intraepithelial neoplasia (CIN) in exfoliated cervical cells collected at the time of colposcopy. Samples from the 93 women with CIN3 or cancer were compared with those from 186 women without disease matched (1:2) for age, race, and high-risk human papillomavirus status. Normalized threshold cycles (Ct) for each gene were analyzed by receiver operating characteristics to determine their diagnostic performance in a split sample validation approach. Six markers were confirmed by an area under the curve >0.6 in both sample sets: claudin 1 (0.75), minichromosome maintenance deficient 5 (0.71) and 7 (0.64), cell division cycle 6 homologue (0.71), antigen identified by monoclonal antibody Ki-67 (0.66), and SHC SH2-domain binding protein 1 (0.61). The sensitivity for individual markers was relatively low and a combination of five genes to a panel resulted in 60% sensitivity with 76% specificity, not positively increasing this performance. Although the results did not indicate superiority of RNA markers for cervical cancer screening, their performance in detecting disease in women referred for colposcopy suggests that the genes and pathways they highlight could be useful in alternative detection formats or in combination with other screening indicators. (Cancer Epidemiol Biomarkers Prev 2007;16(2):295–301)
Introduction
Invasive cervical carcinoma arises from precancerous lesions: cervical intraepithelial neoplasia (CIN). Traditional screening methods are based on visual assessment of exfoliated cervical cells (i.e., the Papanicolaou test), and diagnosis relies on histopathology of tissue samples. Although the implementation of the Papanicolaou test in routine screening programs has greatly reduced cancer incidence and mortality, the test has limited accuracy not likely to improve with enhanced collection and screening procedures (1). Specifically, the inability to distinguish high-grade CIN with the potential to progress to invasive cancer from pathologically insignificant or regressing dysplasia contributes to overtreatment (2), whereas false-negative results are not eliminated.
The etiologic role of high-risk human papillomavirus (HR-HPV) in cervical neoplasia is well established. HPV testing has therefore been approved as an adjunct to cytology to improve cervical cancer screening in women over 30 and to triage women with equivocal cytology results (atypical cells of uncertain significance; refs. 3, 4). However, the greatest strength of HPV testing is its negative predictive value. Specificity remains relatively low because of high prevalence of transient asymptomatic HPV infections in the absence of disease (5).
Molecular biomarkers hold great promise as objective and precise indicators of pathogenetic processes. Discovery efforts have intensified in recent years, but studies to evaluate their usefulness in clinical applications are rather rare and typically lack sufficient sample size (6-8). To date, the most promising candidates, such as 3q amplification, p16, and replication license proteins, require in situ hybridization or immunohistochemistry assays and interpretation within a morphologic context (9-11). Ideally, screening biomarkers should be detected in noninvasively collected samples, such as exfoliated cervical cells, using an assay format with the potential for high throughput and quantitation. We have recently shown that exfoliated cells have a gene transcription profile representative of tissue and might be suitable for biomarker screening (12). Using a quantitative reverse transcription-PCR (RT-PCR) assay, we evaluated 40 candidate RNA markers in exfoliated cervical cells for their potential to detect underlying CIN3 lesions.
Materials and Methods
Sample Source
Samples were selected from high-risk urban women enrolled at the time of colposcopy into an ongoing study of cervical neoplasia as described elsewhere (13). Cervical disease status was determined based on pathologists' review of colposcopy, cytology, and biopsy findings. Endocervical and ectocervical cells were collected in PreservCyt (Cytyc Corp., Marlborough, MA) at the time of colposcopy and extracted with the MasterPure Complete DNA and RNA Purification kit (Epicentre, Madison, WI) following the manufacturer's protocol with modifications (14). The total nucleic acid extract, including both DNA and RNA, was stored at −70°C until use. HPV status was determined using L1 consensus PCR and the Roche prototype line probe assay (reagents provided by Roche Molecular Systems, Alameda, CA; ref. 13).
All samples from women with biopsy-confirmed high-grade CIN (CIN3; n = 88) and invasive cancer (n = 5) not previously analyzed for gene expression with microarrays were included as cases (CIN3+). Two controls with no cervical disease identified at colposcopy with biopsy and/or cytology (CIN0) matched for age (20-50 years) and race/ethnicity (Black, White, and Hispanic) were selected for each case. To evaluate markers that could add to HR-HPV testing, controls were matched for HPV status as much as possible. In the study set, 91 of 93 cases and 95 of 186 controls were HR-HPV positive. The matched samples were split by a pseudo-random number generator into two sample sets: set 1 (exploratory), 47 CIN3+ and 94 CIN0; set 2 (test), 46 CIN3+ and 92 CIN0.
Candidate Genes
Using microarrays for whole genome expression analysis, we had previously identified six genes potentially up-regulated in exfoliated cervical cells from subjects with CIN3+ (12). These, along with an additional seven genes from the same study indicated by other statistical measures, were chosen for further evaluation (genes 1-13 in Table 1). Based on review of recent literature, we selected an additional 27 gene transcripts reported to have potential to identify cervical neoplasia. These genes were either well-known markers of cell transformation (i.e., proliferation and cell cycle progression, immortality, and altered cell adhesion) or were published with compelling experimental evidence for strong overexpression in cervical lesions. Table 1 lists all candidate genes selected for evaluation. With the exception of DAPK1 and IGSF4, all genes were expected to be overexpressed in CIN3+ samples.
Primer sequences, PCR products, and efficiencies
. | Symbol . | Genbank accession no. . | Forward primer . | Reverse primer . | Amplicon (bp) . | Crossing splice junctions . | PCR efficiency E (%) . |
---|---|---|---|---|---|---|---|
1 | FLJ12994 | NM_022841 | CTGGAGATGGGTTGGAAGG | AGACGGCAAGCAGAAGAG | 81 | Y | 100 |
2 | UBE2V1 | NM_021988 | CAGAAAGGAGTAGGAGATGG | CATAAATTGTTCTTGGAGGC | 108 | Y | 97.5 |
3 | RHOT1 | NM_018307 | AGATTACTCAGAAGCAGAACAG | CTCCATACTACTATATTCCACCAG | 219 | Y | 90.9 |
4 | GRTP1 | NM_024719 | GAATACTACCAGATTACTACAGCC | GAGCCTTCGTTAAACAAACAG | 225 | Y | 100 |
5 | NUDT6 | NM_007083 | GAAGAAGATATTGGAGACACAG | GGTGAATGAATATGGCTTTAGG | 164 | Y | 100 |
6 | LOC51149 | NM_016175 | CTTATGGTGAAGGCTCTG | TTTCTTCTAGAGACCTGAGTG | 101 | Y | 91.8 |
7 | S100A8 | NM_002964 | CCTGAAGAAATTGCTAGAGAC | CTTTATCACCAGAATGAGGA | 132 | Y | 100 |
8 | COL1A1 | NM_000088 | GGAAGAGTGGAGAGTACTG | GGTTCTTGCTGATGTACCA | 146 | Y | 100 |
9 | ADORA2B | NM_000676 | TTCTGCACTGACTTCTACG | GCACTGTCTTTACTGTTCC | 238 | Y | 97.9 |
10 | SHCBP1 | NM_024745 | GAGGTATGTGTTTGGTTATCAG | TATAGCAGACAATGGATCAC | 198 | Y | 98.9 |
11 | CORO1A | NM_007074 | CCCAGACACGATCTACAG | GTCCTTCTCAGCTACGAC | 120 | Y | 100 |
12 | FLJ10241 | NM_018035 | CGATCTTACACCTGGCTG | CTTGTCTCGAAACTTGACTG | 95 | Y | 100 |
13 | ALDH7A1 | BC002515 | ATGATCTGTGGAAATGTCTG | CCTTGGCTATTATCTTTGTGAC | 84 | Y | 100 |
14 | TERT | NM_003219 | TTATGTCACGGAGACCAC | AAGTGCTGTCTGATTCCA | 95 | Y | 100 |
15 | IGFBP3 | NM_000598 | ACAGAATATGGTCCCTGC | CCTTCTTGTCACAGTTGG | 108 | Y | 100 |
16 | HSPA9B | NM_004134 | GTTACATTTGACATTGATGCC | GACTGGATTACAATCTGCTG | 88 | Y | 100 |
17 | UBE3A | NM_002690 | TGTTCTGATTAGGGAGTTCTG | AGCAAGTATGAGATGTAGGT | 172 | Y | 98.0 |
18 | FN1 | NM_212476 | GGAGTTGATTATACCATCACTG | TTTCTGTTTGATCTGGACCT | 258 | N | 99.0 |
19 | CLDN1 | NM_021101 | CCTATGACCCCAGTCAATGC | TCCCAGAAGGCAGAGAGAAG | 84 | Y | 99.7 |
20 | EGFR | NM_005228 | AGTGCCTGAATACATAAACC | CAGTGTTGAGATACTCGGG | 160 | Y | 100 |
21 | MCM5 | NM_006739 | TCACCAAGCAGAAATACCC | GTCCATGAGTCCAGTGAG | 140 | Y | 98.6 |
22 | MCM6 | NM_005915 | GAGGAGTTCTATAGAGTTTACCC | GAGTGAGCAAACCAATTCTG | 165 | N | 100 |
23 | MCM7 | NM_005916 | GGAAATATCCCTCGTAGTATCAC | CTGAGAGTAAACCCTGTACC | 144 | N | 100 |
24 | CDC6 | NM_001254 | AAGACCTCAAGAAGGAACTG | ATACCTCTTCCTGACAAATCTC | 116 | Y | 100 |
25 | CDK4 | NM_000075 | AAATCTTTGACCTGATTGGG | CCTTATGTAGATAAGAGTGCTG | 218 | N | 100 |
26 | CCNE1 | NM_001238 | TGGCACAAGATTTCTTTGAC | GATAGATTTCCTCAAGTTTGGC | 116 | N | 91.6 |
27 | CLDN7 | NM_001307 | AGACGACAAAGTGAAGAAGG | CTTGGAAGAGTTGGACTTAGG | 297 | Y | 99.1 |
28 | PCNA | NM_002592 | AGTGGAGAACTTGGAAATGG | CTCTATGGTAACAGCTTCCTC | 80 | Y | 98.6 |
29 | MKI67 | NM_002417 | GAATGTGACATCCGTATCCA | TGTAATATTGCCTCCTGCTC | 82 | Y | 100 |
30 | CDKN1A | NM_000389 | GCATGACAGATTTCTACCAC | GACTAAGGCAGAAGATGTAGAG | 132 | Y | 100 |
31 | POU4F1 | NM_006237 | ACTTTAACTTGCCCTTTCAG | CTCTCCTAATAACTTTCACCC | 106 | Y | 91.2 |
32 | PCNT2 | NM_006031 | GGCTATACAGAAAGAGTCGG | CACCTCTTTATCCTGTTCCA | 165 | Y | 100 |
33 | MYC | NM_002467 | AGACAGAGGAGTTTATACAGAG | CTTGACTTGACTAGGTGTGAG | 131 | N | 100 |
34 | AURKA | NM_198433 | AAATAATCCTGAGGAGGAACTG | AATATTAGGATGCCGAAGGTG | 258 | Y | 97.9 |
35 | NDRG2 | NM_016250 | ATCATTCAGAACTTTGTGCG | GTGGTTAAGAGCATAGCTCG | 206 | N | 100 |
36 | TOP2A | NM_001067 | AGTCATTCCACGAATAACCA | TTCACACCATCTTCTTGAG | 108 | N | 92.3 |
37 | NOTCH1 | NM_017617 | CATCTCCGACTTCATCTACC | GATCAGGATCTGGAAGACAC | 216 | Y | 100 |
38 | AMACR | NM_014324 | GGAGCACCTTTCTATACGAC | TGATTGGGAAGTTCATCAGAC | 127 | N | 100 |
39 | DAPK1 | NM_004938 | GTTAGCAAATGTATCCGCTG | GTGTATCTTTAGGCTTGATCCA | 166 | Y | 100 |
40 | IGSF4 | NM_014333 | CTAGAAGTACAGTATAAGCCTC | GTTATTGATGAACAGGTTGG | 177 | N | 95.8 |
. | Symbol . | Genbank accession no. . | Forward primer . | Reverse primer . | Amplicon (bp) . | Crossing splice junctions . | PCR efficiency E (%) . |
---|---|---|---|---|---|---|---|
1 | FLJ12994 | NM_022841 | CTGGAGATGGGTTGGAAGG | AGACGGCAAGCAGAAGAG | 81 | Y | 100 |
2 | UBE2V1 | NM_021988 | CAGAAAGGAGTAGGAGATGG | CATAAATTGTTCTTGGAGGC | 108 | Y | 97.5 |
3 | RHOT1 | NM_018307 | AGATTACTCAGAAGCAGAACAG | CTCCATACTACTATATTCCACCAG | 219 | Y | 90.9 |
4 | GRTP1 | NM_024719 | GAATACTACCAGATTACTACAGCC | GAGCCTTCGTTAAACAAACAG | 225 | Y | 100 |
5 | NUDT6 | NM_007083 | GAAGAAGATATTGGAGACACAG | GGTGAATGAATATGGCTTTAGG | 164 | Y | 100 |
6 | LOC51149 | NM_016175 | CTTATGGTGAAGGCTCTG | TTTCTTCTAGAGACCTGAGTG | 101 | Y | 91.8 |
7 | S100A8 | NM_002964 | CCTGAAGAAATTGCTAGAGAC | CTTTATCACCAGAATGAGGA | 132 | Y | 100 |
8 | COL1A1 | NM_000088 | GGAAGAGTGGAGAGTACTG | GGTTCTTGCTGATGTACCA | 146 | Y | 100 |
9 | ADORA2B | NM_000676 | TTCTGCACTGACTTCTACG | GCACTGTCTTTACTGTTCC | 238 | Y | 97.9 |
10 | SHCBP1 | NM_024745 | GAGGTATGTGTTTGGTTATCAG | TATAGCAGACAATGGATCAC | 198 | Y | 98.9 |
11 | CORO1A | NM_007074 | CCCAGACACGATCTACAG | GTCCTTCTCAGCTACGAC | 120 | Y | 100 |
12 | FLJ10241 | NM_018035 | CGATCTTACACCTGGCTG | CTTGTCTCGAAACTTGACTG | 95 | Y | 100 |
13 | ALDH7A1 | BC002515 | ATGATCTGTGGAAATGTCTG | CCTTGGCTATTATCTTTGTGAC | 84 | Y | 100 |
14 | TERT | NM_003219 | TTATGTCACGGAGACCAC | AAGTGCTGTCTGATTCCA | 95 | Y | 100 |
15 | IGFBP3 | NM_000598 | ACAGAATATGGTCCCTGC | CCTTCTTGTCACAGTTGG | 108 | Y | 100 |
16 | HSPA9B | NM_004134 | GTTACATTTGACATTGATGCC | GACTGGATTACAATCTGCTG | 88 | Y | 100 |
17 | UBE3A | NM_002690 | TGTTCTGATTAGGGAGTTCTG | AGCAAGTATGAGATGTAGGT | 172 | Y | 98.0 |
18 | FN1 | NM_212476 | GGAGTTGATTATACCATCACTG | TTTCTGTTTGATCTGGACCT | 258 | N | 99.0 |
19 | CLDN1 | NM_021101 | CCTATGACCCCAGTCAATGC | TCCCAGAAGGCAGAGAGAAG | 84 | Y | 99.7 |
20 | EGFR | NM_005228 | AGTGCCTGAATACATAAACC | CAGTGTTGAGATACTCGGG | 160 | Y | 100 |
21 | MCM5 | NM_006739 | TCACCAAGCAGAAATACCC | GTCCATGAGTCCAGTGAG | 140 | Y | 98.6 |
22 | MCM6 | NM_005915 | GAGGAGTTCTATAGAGTTTACCC | GAGTGAGCAAACCAATTCTG | 165 | N | 100 |
23 | MCM7 | NM_005916 | GGAAATATCCCTCGTAGTATCAC | CTGAGAGTAAACCCTGTACC | 144 | N | 100 |
24 | CDC6 | NM_001254 | AAGACCTCAAGAAGGAACTG | ATACCTCTTCCTGACAAATCTC | 116 | Y | 100 |
25 | CDK4 | NM_000075 | AAATCTTTGACCTGATTGGG | CCTTATGTAGATAAGAGTGCTG | 218 | N | 100 |
26 | CCNE1 | NM_001238 | TGGCACAAGATTTCTTTGAC | GATAGATTTCCTCAAGTTTGGC | 116 | N | 91.6 |
27 | CLDN7 | NM_001307 | AGACGACAAAGTGAAGAAGG | CTTGGAAGAGTTGGACTTAGG | 297 | Y | 99.1 |
28 | PCNA | NM_002592 | AGTGGAGAACTTGGAAATGG | CTCTATGGTAACAGCTTCCTC | 80 | Y | 98.6 |
29 | MKI67 | NM_002417 | GAATGTGACATCCGTATCCA | TGTAATATTGCCTCCTGCTC | 82 | Y | 100 |
30 | CDKN1A | NM_000389 | GCATGACAGATTTCTACCAC | GACTAAGGCAGAAGATGTAGAG | 132 | Y | 100 |
31 | POU4F1 | NM_006237 | ACTTTAACTTGCCCTTTCAG | CTCTCCTAATAACTTTCACCC | 106 | Y | 91.2 |
32 | PCNT2 | NM_006031 | GGCTATACAGAAAGAGTCGG | CACCTCTTTATCCTGTTCCA | 165 | Y | 100 |
33 | MYC | NM_002467 | AGACAGAGGAGTTTATACAGAG | CTTGACTTGACTAGGTGTGAG | 131 | N | 100 |
34 | AURKA | NM_198433 | AAATAATCCTGAGGAGGAACTG | AATATTAGGATGCCGAAGGTG | 258 | Y | 97.9 |
35 | NDRG2 | NM_016250 | ATCATTCAGAACTTTGTGCG | GTGGTTAAGAGCATAGCTCG | 206 | N | 100 |
36 | TOP2A | NM_001067 | AGTCATTCCACGAATAACCA | TTCACACCATCTTCTTGAG | 108 | N | 92.3 |
37 | NOTCH1 | NM_017617 | CATCTCCGACTTCATCTACC | GATCAGGATCTGGAAGACAC | 216 | Y | 100 |
38 | AMACR | NM_014324 | GGAGCACCTTTCTATACGAC | TGATTGGGAAGTTCATCAGAC | 127 | N | 100 |
39 | DAPK1 | NM_004938 | GTTAGCAAATGTATCCGCTG | GTGTATCTTTAGGCTTGATCCA | 166 | Y | 100 |
40 | IGSF4 | NM_014333 | CTAGAAGTACAGTATAAGCCTC | GTTATTGATGAACAGGTTGG | 177 | N | 95.8 |
Abbreviations: Y, yes; N, no.
Quantitative RT-PCR
Conditions for RT of the first sample set are reported below. For the second set, the total nucleic acid volume was reduced to 5 μL, and all subsequent volumes were reduced by one third as fewer transcripts were analyzed. For each set, the input total nucleic acid volume was constant regardless of RNA concentration.
A 7.5-μL total nucleic acid aliquot was treated with 13.5 units DNase I (GenHunter, Nashville, TN) in a 27.5-μL reaction with 1× RT buffer (Invitrogen Corp., Carlsbad, CA) for 30 min at 37°C and 2 min at 70°C. The reactions were carried out in 0.6 mL Hi-Yield Nucleic Acid Recovery tubes (Robbins Scientific Corp., Sunnyvale, CA) and incubated in a digital heatblock (VWR Scientific Products, West Chester, PA). One microliter was removed to be tested for residual DNA (no-RT control). To the remainder we added 13.5 μL master mix consisting of 1× RT buffer, 60 ng random primer pd(N)6, 1.5 ng oligo-T primer pd(T)12–18, 4 μL deoxyribonucleotide triphosphates, 10 mmol/L each (all from Invitrogen), and 0.1 pg exogenous plant RNA of the Arabidopsis thaliana Chlorophyll A-B binding protein CAB (Stratagene, La Jolla, CA). This spike was included to serve as an external control to monitor reproducibility of the RT efficiency. The tubes were heated at 70°C for 5 min and placed on ice for 2 min. Finally, 300 units of SuperScript III (Invitrogen) and 3 μL of 20 mmol/L dichlorodiphenyltrichloroethane in 1× RT buffer were added and brought with H2O to a final volume of 60 μL. After incubation for 5 min at 25°C, 50 min at 50°C, and 15 min at 70°C, samples were diluted with 180 μL H2O and stored in 80 μL aliquots at −20°C until PCR amplification. Two microliters of this cDNA was used in subsequent PCR reactions, and the remainder of the aliquot was kept at 4°C.
Efficient DNase treatment was verified in all samples by running 2 μL of the 1:10 diluted no-RT control (see above) in a PCR reaction with primers for the abundant PGK1 gene. No signals from no-RT controls reached the threshold within 35 PCR cycles unlike the positive control of cDNA from pooled cervical extracts.
Primers for candidate genes were designed with PerlPrimer v1.1.9 (15), strictly avoiding 3′ extendable primer dimers. Wherever possible, primers were selected to span an intron, and at least one primer was placed over an exon/exon boundary producing amplicons between 80 and 300 bp (median, 136 bp; Table 1). We verified target specificity of each primer pair by gel electrophoresis of amplicons generated from cDNA from pooled cervical exfoliated cells, human genomic DNA (Roche Diagnostics, Indianapolis, IN), and a-no template control. Primers and conditions for the reference gene transcripts PGK1 and RPL4 as well as the external control CAB were established previously (16). After amplification, all PCR reactions were subjected to dissociation curve analysis to verify product specificity by a single peak of the second derivative of the melting curve. PCR amplification efficiency for all primers was determined by the slope (10−1/slope) of a standard curve over a five-step 10 fold dilution series (100, 10, and 1 ng and 100 and 10 pg). The cDNA template for this standard curve was made from universal human reference RNA (Stratagene). Amplification efficiency had to be above 90% for primers to be used for real-time RT-PCR (Table 1).
Real-time PCR amplification of all samples was done in duplicate on an ABI 7900 HT sequence detection system (Applied Biosystems, Foster City, CA) using 96-well plates. The 25-μL reaction volume contained 1× SYBR Green PCR Master Mix (Applied Biosystems), 0.8 μmol/L of each primer, and 2 μL template. The thermal profile consisted of one cycle at 95°C for 10 min and 40 cycles of 95°C for 15 s, 60°C for 15 s, and 72°C for 45 s. A no-template control was included on each plate.
Analysis
Raw Ct values were retrieved at a threshold of ΔRn = 0.1 and imported as .txt files into MS Excel 2002. Reactions that were not reproducible as shown by a coefficient variation of Ct values >5% or an irregular melting peak profile (in either of the two reactions) were excluded from the analysis. For samples that passed these quality standards, the replicate Ct values were averaged. To normalize for sample differences, we subtracted the geometric means of PGK1 and RPL4 from averaged Ct values of each gene. This normalization factor was previously determined to be the most stable combination of transcripts in exfoliated cervical cells (16).
We applied receiver operating characteristics (ROC) analysis using SPSS for Windows 11.5.0 (SPSS, Inc., Chicago, IL) to normalized Ct values of CIN0 and CIN3+ groups and calculated the area under the curve (AUC) for each gene in the first sample set. We set a low-stringency cutoff (AUC >0.6) to identify genes with diagnostic potential to be evaluated in the second sample set (test set). For markers confirmed by an AUC >0.6 in the second sample set, an overall AUC was calculated from combined Ct values of all samples in both sets (nCIN3+ = 93; nCIN0 = 186).
We evaluated whether markers could be combined to improve detection of CIN3+. We set the threshold for detection of each gene at the normalized Ct value that gave 90% specificity for CIN3+. Samples in which one or more genes were detected were scored as positive for the marker panel. We also used likelihood maximization to select a linear combination of markers yielding a combination score in the context of a logistic regression model of disease status (17). To account for overfitting, the model was derived from a randomly selected 50% of the sample data and tested on the remaining 50%.
Efficiency (E) corrected average expression fold differences between groups were calculated from the means of normalized Ct values as EX̄CIN0-X̄CIN3. The Mann-Whitney U test was carried out to assess the statistical significance of overall differences between the CIN3+ and the control sample groups.
Results
Quality Control of Quantitative RT-PCR
The expression of all 40 candidate genes was assessed by quantitative RT-PCR in the first (exploratory) sample set (47 CIN3+, 94 CIN0 randomly selected). The external control gene CAB was detected within a 2-fold range of concentration in 85% (120 of 141) of samples and within a 4-fold range in the remainder. We regarded this as acceptable RT efficiency.
The sample normalization factor (i.e., geometric means of Ct values for reference genes PGK1 and RPL4) indicates relative cDNA concentration and hence total RNA in each sample. The median difference in this normalization was 9.9-fold, indicating a 1,065-fold difference between the lowest and the highest cDNA concentrations. This reflects substantial sample-to-sample variation in total RNA concentration. Even so, target RNAs could generally be quantified from all samples. Ct values for the majority of transcripts (35 of 40) met quality control standards for quantitative evaluation in at least 75% of the samples in both the CIN3+ and CIN0 groups. Exceptions were the transcripts of AURKA, ADORA2B, SHCBP1, and NUDT6 that could not be determined in over 25% of the samples in one group and TERT that was frequently undetectable in both groups.
ROC Analysis
Table 2 shows the fold difference in expression between the means of the CIN3+ and CIN0 samples and the performance of the markers as evaluated with ROC curve analysis. The exploratory analysis (first sample set) identified 12 genes with an AUC >0.6. These were further analyzed in the test (second) set (46 CIN3+, 92 CIN0). Six markers [claudin 1 (CLDN1), minichromosome maintenance deficient 5 and 7 (MCM5 and MCM7), cell division cycle 6 homologue (CDC6), antigen identified by monoclonal antibody Ki-67 (MKI67), and SHC SH2-domain binding protein 1 (SHCBP1)] passed the second screen, with AUC >0.6 (Table 2).
The results from these six markers performance in all samples (sets 1 and 2) are presented in Table 3 and as ROC curves in Fig. 1. CLDN1 and MCM5 had the highest diagnostic value in both split sample sets and consequently in the combined samples; the AUC was 0.75 and 0.71, respectively. The difference in mRNA expression between cases and all controls was statistically highly significant for all six confirmed genes according to the Mann-Whitney U test (Table 3).
Combined analysis results (93 CIN3/cancer, 186 CIN0) of six candidate genes that were confirmed by an AUC >0.6 in both sample sets
Gene symbol . | Samples evaluated (%)* . | . | CIN3+ fold difference . | ROC AUC . | Mann-Whitney P . | Critical Ct at 90% specificity . | |
---|---|---|---|---|---|---|---|
. | CIN0 . | CIN3 . | . | . | . | . | |
CLDN1 | 97.3 | 98.9 | 2.3 | 0.75 | <0.001 | 0.94 | |
MCM5 | 95.1 | 97.8 | 1.8 | 0.71 | <0.001 | 4.66 | |
MCM7 | 94.6 | 97.8 | 1.6 | 0.64 | <0.002 | 3.26 | |
CDC6 | 91.9 | 94.6 | 1.6 | 0.71 | <0.001 | 4.73 | |
MKI67 | 82.3 | 94.6 | 1.8 | 0.66 | <0.001 | 5.92 | |
SHCBP1 | 84.6 | 71.7 | 1.3 | 0.61 | <0.038 | 6.55 |
Gene symbol . | Samples evaluated (%)* . | . | CIN3+ fold difference . | ROC AUC . | Mann-Whitney P . | Critical Ct at 90% specificity . | |
---|---|---|---|---|---|---|---|
. | CIN0 . | CIN3 . | . | . | . | . | |
CLDN1 | 97.3 | 98.9 | 2.3 | 0.75 | <0.001 | 0.94 | |
MCM5 | 95.1 | 97.8 | 1.8 | 0.71 | <0.001 | 4.66 | |
MCM7 | 94.6 | 97.8 | 1.6 | 0.64 | <0.002 | 3.26 | |
CDC6 | 91.9 | 94.6 | 1.6 | 0.71 | <0.001 | 4.73 | |
MKI67 | 82.3 | 94.6 | 1.8 | 0.66 | <0.001 | 5.92 | |
SHCBP1 | 84.6 | 71.7 | 1.3 | 0.61 | <0.038 | 6.55 |
Proportion of samples in which the quantitative RT-PCR assay for the gene passed quality control.
ROC curve analysis of six genes with diagnostic value for CIN analyzed in combined sample sets (93 CIN3/cancer, 186 CIN0). The black circle marks the performance of a combined marker panel with 90% specificity cutoffs for each gene.
ROC curve analysis of six genes with diagnostic value for CIN analyzed in combined sample sets (93 CIN3/cancer, 186 CIN0). The black circle marks the performance of a combined marker panel with 90% specificity cutoffs for each gene.
Marker Combinations
Markers that are not highly correlated have a higher potential for identifying different subsets of cases, suggesting that combination could improve sensitivity of detection. The 2 × 2 Pearson correlation coefficients of the normalized Ct values for MCM5, MCM7, CDC6, and MKI67 were generally high, reaching 0.84 for CDC6/MCM7. CLDN1 showed only moderate correlation with the latter four markers, and SHCBP1 transcription seemed to be completely independent from the other genes (Table 4).
Correlation of the six marker expression in combined sample sets
. | CLDN1 . | MCM5 . | MCM7 . | CDC6 . | MKI67 . |
---|---|---|---|---|---|
CLDN1 | 1 | ||||
MCM5 | 0.62 | 1 | |||
MCM7 | 0.62 | 0.73 | 1 | ||
CDC6 | 0.54 | 0.65 | 0.84 | 1 | |
MKI67 | 0.45 | 0.68 | 0.73 | 0.73 | 1 |
SHCBP1 | −0.04 | 0.05 | −0.02 | −0.03 | 0.08 |
. | CLDN1 . | MCM5 . | MCM7 . | CDC6 . | MKI67 . |
---|---|---|---|---|---|
CLDN1 | 1 | ||||
MCM5 | 0.62 | 1 | |||
MCM7 | 0.62 | 0.73 | 1 | ||
CDC6 | 0.54 | 0.65 | 0.84 | 1 | |
MKI67 | 0.45 | 0.68 | 0.73 | 0.73 | 1 |
SHCBP1 | −0.04 | 0.05 | −0.02 | −0.03 | 0.08 |
Based on a threshold for detection set at 90% specificity for each gene, we combined results for five of the genes (MCM5, MCM7, CDC6, MKI67, and CLDN1) to evaluate their performance as a panel (Table 3). One or more markers were detected in 56 of 93 cases (60% sensitivity) as well as in 44 of the 186 controls (76% specificity). SHCBP1 was excluded from the panel because only one additional CIN3 sample was implicated, whereas the specificity was lowered significantly.
Interestingly, at 60% sensitivity, CLDN1, the best single marker, also achieved about the same specificity, indicating its performance was similar to the five-marker panel (Fig. 1). However, the two approaches to detection showed little concordance in the sample classification. Of 68 samples classified as positive in either method, 46 were true positives classified as such by both methods, and 67 were false positives with only 23 classified as such by both methods. Discordant samples (either true positive or false positive) did not differ with respect to age, race, and HPV status (data not shown). Results for the samples classified as negative followed the same pattern. Diagnostic accuracy of the quantitative RT-PCR tests was not affected by the amount of RNA in the samples as the mean normalization factor for samples classified as true positive, false positive, true negative, and false negative did not differ (25.34, 25.61, 24.88, and 24.88, respectively, for CLDN1 and 25.10, 25.08, 25.11, and 25.19, respectively, for the five-marker panel).
The alternative combination approach (modeling a score by likelihood maximization from any combination of the six candidate markers) did not yield a higher AUC than single markers when split sample validation was applied (data not shown).
Discussion
This is the largest study to date to evaluate biomarkers for high-grade cervical disease in samples that could be obtained at screening. Using quantitative RT-PCR and ROC curve analysis in a two-step approach, we identified six biomarkers from the original 40 candidates that showed repeated discrimination between exfoliated samples from women with CIN3+ and those without cervical disease (CIN0). The samples in this study were selected from women referred to colposcopy (i.e., women selected by current screening strategies). The identification of gene expression biomarkers in this challenging situation is somewhat encouraging, although the individual performance for each gene was modest (AUC = 0.61-0.75). As the controls were enriched for HR-HPV status, these six markers (CLDN1, MCM5, MCM7, CDC6, MKI67, and SHCBP1) provide information in addition to HPV. All six showed increased expression in CIN3+, although the increase was modest (around 2-fold) and unlikely to be identifiable in microarray experiments. Accordingly, only one marker (SHCBP1) was identified in our earlier microarray analysis, whereas the other five markers were candidates from the literature.
Recent immunohistochemistry studies of CLDN1, an integral membrane protein for tight junction formation, showed strong expression in CIN lesions and cervical cancer (18, 19). Perturbations in CLDN1 have been associated with several cancers and may be an indicator of altered polarity, abnormal cellular organization, and loss of differentiation (20). The minichromosome maintenance proteins MCM5 and MCM7 are normally up-regulated in the G0 to G1-S phase transition and initiate and promote DNA replication. Increased expression at both the protein and RNA level has been shown in cervical tissue (21-23). MCM7 might also be a direct target for HPV (24, 25). Several immunohistochemistry studies have implicated CDC6 and MKI67 as proliferation markers often correlated with the grade of neoplasia (22, 26, 27). SHCBP1 was originally implicated by our own microarray study and had been described as a growth factor stimulated link of signaling pathways governing cell cycle progression (28). Notably, five of the six genes are involved in cellular proliferation.
CLDN1 is located at 3q28-q29, the region most frequently amplified during cervical carcinogenesis as determined by both genomic (29) and gene expression analysis (30). The other genes we identified as potential markers are located on chromosomal regions that have not been reported to be amplified (MCM5, 22q13.1; MCM7, 7q21.3-q22.1; CDC6, 17q21.3; MKI67, 10q21.3; and SHCBP1, 16q11.2).
As correlation between markers was limited, we evaluated whether they could be combined into a panel. Biomarker panels presume that cases have phenotypic diversity requiring more than one marker to achieve sensitive detection. Results from the two approaches that we took to combine markers did not exceed the diagnostic performance of CLDN1 alone (60% sensitivity and 76% specificity or an AUC of 0.72). Other methods for combining markers might be suggested, but it seems unlikely that the sensitivity would be increased significantly without loss of specificity.
As the study is enriched in disease and recruited from a referral population, the observed sensitivity and specificity does not reflect how these markers would perform in routine screening conditions. Based on ROC curve analysis, we feel that these markers may play a role in disease detection but are unlikely to have sufficient sensitivity and specificity to significantly improve screening practices. In guiding further work in biomarker discovery and validation, it is worth exploring reasons why screening sensitivity and specificity are not higher.
Limitations inherent in the sample quality may contribute to low sensitivity. The molecular preservation of exfoliated cervical cells is potentially compromised as a result of terminal differentiation and exfoliation. However, based on our successful demonstration that expression profiles from normal cervical tissue and exfoliated cells are comparable (12), we believe that molecular preservation is adequate. Furthermore, the amount of RNA, reflected in normalization factor, was the same in those samples correctly or incorrectly classified.
Dilution of signal may contribute to low sensitivity. The exfoliated cervical sample includes a mixture of cell types from normal and lesional tissue. As only a minority (1-10%) of cells are morphologically abnormal, their signaling intensity may be diluted in extracts of unselected cells. Detecting molecular changes indicative of disease in these extracts requires a large fraction of abnormal cells from the lesion, a robust change in expression that can withstand dilution, or a change in expression that extends beyond the morphologically detected lesion (i.e., the existence of a field effect). Although dilution cannot be discounted, it should be noted that the level of discrimination in this study using extracts of exfoliated cervical cells was similar to the immunohistochemistry measures of CDC6 and MCM in tissue samples of lesions shown by Murphy et al. (22).
Lower observed performance of a marker's specificity could also result from misclassification of controls. We used a combination of cytology, colposcopy, and biopsy with centralized pathology review to determine disease. However, whereas this is the current “gold standard” for follow-up of an abnormal Papanicolaou test, this approach is nonetheless imperfect. It is increasingly recognized that negative colposcopy and biopsy of women with HR-HPV does not entirely exclude the presence of disease, particularly in the endocervix, that may become detectable later (4, 31). Follow-up will be required to determine the true disease status in this population. The CIN3+ samples are unlikely to be misclassified as the pathology was reviewed by a panel of experts. As expected, all but two of these samples were HR-HPV positive.
Poor discrimination of markers may be a reflection of the biology of preinvasive cervical disease. These lesions present a spectrum of morphologic changes that variably reflect the multiple molecular changes that occur during neoplastic progression. The true biological potential of histologically similar CIN3+ lesions is not clear. It is estimated that up to 33% of these lesions may regress (32). Of those that persist, many years and the acquisition of additional genomic changes are generally required before invasion. As noted above, immunohistochemistry markers in tissue samples have imperfect sensitivity and specificity as well, presumably for these reasons.
Finally, imperfect discrimination could be a result of the cellular functions represented by this set of biomarkers. As these markers mainly target cell cycle proliferation, nonspecific increases in proliferative activity associated with inflammation, squamous metaplasia, or tissue repair unrelated to neoplasia could lead to detection in the cervical epithelium of healthy controls. Conversely, regressing CIN3 lesions may show reduced proliferation. Immunohistochemistry experiments have shown moderate to strong expression of several of the implicated genes in cells of the epithelial basal and parabasal layer (19, 21, 23). Additional markers not related to proliferation might increase the accuracy, but it is striking that none of the genes with other functions were validated among the 40 candidates we tested.
This study supports the concept that exfoliated cervical cells reflect changes in gene transcription and can be used to detect CIN with sensitivity and specificity similar to markers in biopsy tissue. However, the performance of these six markers is not sufficient to significantly improve screening. Problems in sampling cannot be excluded, and it will be important to determine the tissue profile of these markers to evaluate this limitation. Further biomarker discovery, targeting DNA and protein markers in combination with these and additional RNA markers, will be necessary. We recommend that further studies should use screening samples collected from women with a minimum of 2 years of follow-up data to reduce misclassification.
Grant support: National Cancer Institute's Early Detection Research Network, Interagency Agreement Y1-CN-0101-01 and Y1-CN-5005-01.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency.