Abstract
Purpose: The serum tumor marker CA 125 is elevated in most clinically advanced ovarian carcinomas, and currently, one of the most promising early detection strategies for ovarian cancer uses CA 125 level in conjunction with imaging. However, CA 125 is elevated in only 50% of early-stage ovarian cancer and is often elevated in women with benign ovarian tumors and other gynecologic diseases. Additional markers may improve on its individual performance if they increase sensitivity and specificity and are less sensitive to other gynecologic conditions. The human kallikrein 11 (hK11) marker has been reported to have favorable predictive value for ovarian cancer, although, by itself, it may be inferior to CA 125.
Experimental Design: We here validate the performance of hK11 on an independent data set and further characterize its behavior in multiple types of controls. We also investigate its behavior when combined with CA 125 to form a composite marker. hK11 had not previously been evaluated on these serum samples. CA 125, hK11, and the composite marker were evaluated for their performance in identifying ovarian cancer and for temporal stability.
Results: hK11 significantly distinguished ovarian cancer cases from healthy controls and is less sensitive to benign ovarian disease than is CA 125.
Conclusion: We conclude that hK11 is a valuable new biomarker for ovarian cancer and its temporal stability implies that it may do even better when used in a longitudinal screening program for early detection.
Women who are diagnosed with ovarian cancer when the tumor is confined to the ovary have good prognoses; however, most ovarian cancers are diagnosed after the disease has spread throughout the peritoneal cavity, when prognosis is poor. More than 80% of these women with late-stage disease will die within 5 years (1, 2). One strategy to improve survival is to detect cancer early, when the disease is localized and potentially treatable by radical surgery.
Currently, one of the most promising early detection strategies for ovarian cancer uses the serum biomarker CA 125 in conjunction with imaging as a trigger for surgical intervention. CA 125 can identify 85% of clinically advanced ovarian carcinomas (3–5), and several studies have shown that elevations in CA 125 may occur 18 months or more before clinical diagnosis (6, 7). Moreover, the ability of CA 125 to detect cancer early in a screening program is supported by the observation that individual women have temporally stable levels of the marker (8). This suggests that specially tailored screening algorithms could lead to disease detection based on very small serial elevations in CA 125 levels (9–12).
Although CA 125 may be among the best available single diagnostic ovarian cancer biomarkers, its sensitivity and specificity are imperfect. In particular, it is elevated above reference levels in only 50% of clinically detectable early-stage disease (3–5) and it is frequently elevated in patients with benign ovarian tumors and other gynecologic diseases (13–15). Adding one or several markers to CA 125 for use as a composite marker (CM) could improve diagnostic performance if sensitivity were improved with no loss in specificity.
Several research groups recently reported candidate biomarkers for ovarian cancer diagnosis and early detection. For example, carcinoembryonic antigen, placental alkaline phosphatase, various other carbohydrate antigens (e.g., CA15-3 and CA19-9), OVX1, matrix metalloproteinases, prostasin, HE4 protein, mesothelin, members of the interleukin family, and inhibin have all been proposed as candidate ovarian cancer biomarkers (reviewed in ref. 16). In addition, the family of kallikreins, a group of serine proteases encoded by 15 genes that are localized in tandem on human chromosome 19q13.4, has also been shown to be candidate markers for ovarian and other cancers (17–20), including breast, testicular, and prostate cancer (reviewed in refs. 17, 20). Specifically for ovarian cancer, we have previously reported that serum levels of human kallikreins 5, 6, 8, 10, 11, and 14 are elevated in many patients with ovarian cancer (21–26). Human kallikrein 11 (hK11), measured by a newly developed ELISA (19), was found to be elevated in the serum of ∼70% of ovarian cancer patients (at 95% specificity) and has favorable prognostic value in ovarian cancer (26, 27).
Here, we report the performance of hK11 on its own in a set of sera collected from cohorts not previously evaluated for hK11, which constitutes a validation set for this marker. We characterize hK11 on its ability to distinguish ovarian cancer from healthy controls, from women with benign ovarian disease, and from women undergoing surgery who have histologically normal ovaries. We include the latter surgical normal control group to evaluate the potential for biases that may arise when different sample collection methods are used for cases undergoing surgery and for healthy controls; differences between surgical controls and normal controls may indicate biases in the ascertainment of the case and control samples or perhaps biomarker sensitivity to nonspecific conditions. As a first step toward evaluating the performance of hK11 as an early detection marker, we also evaluated the temporal stability of the marker over a 1-year period among healthy women. This temporal stability assesses whether the sensitivity and specificity of the marker will improve if used in a longitudinal screening program (11, 12, 28).
In addition, we evaluated the marker CA 125 in each subject to compare hK11 and CA 125 and also to investigate whether the two markers could complement each other when combined in a CM.
Materials and Methods
Serum specimens
Serum samples from women with ovarian cancer (n = 34) and from women representing three different control groups were collected under human subjects approved protocols as part of National Cancer Institute–funded ovarian cancer research programs. Informed consent was obtained from all participants. Controls of several types were selected to help profile the performance of the biomarker. Controls were matched to age and menopausal status of the cases. Healthy controls (n = 36), appropriate for evaluating the relevance of the marker for ovarian cancer early detection, were from asymptomatic women participating in a National Cancer Institute–funded ovarian cancer screening trial. Benign controls (n = 21) were used to evaluate marker sensitivity to malignant ovarian disease and were relevant for evaluating the use of the marker as a diagnostic test. Surgical normal controls (n = 24) were from women undergoing noncancer gynecologic surgical procedures for pelvic inflammatory disease but who had histologically normal ovaries. The surgical normal controls were included to evaluate the sensitivity of the marker to nonovarian gynecologic conditions and to assess the potential bias due to the surgical collection of our cases compared with nonsurgical collection of the healthy controls. If no difference between healthy and surgical normal specimens is found, we can be confident that a collection bias is not present and that the marker is not sensitive to nonovarian gynecologic disorders.
Specimens from women with ovarian cancer, women with benign ovarian disease, and surgical normal controls were collected in surgery, following anesthesia, but before surgical intervention (i.e., removal of the ovaries). A pathologist examined fixed, paraffin-embedded specimens to confirm the histology for these tumors. The serum from healthy women came from a National Cancer Institute–funded ovarian cancer screening research trial and so represented healthy asymptomatic women (29, 30). Specimen collection and processing protocols were identical for all women regardless of case or control status; participants donated up to 50 mL of blood, which was processed into sera, plasma, and WBC and epithelial cell pellets.
We characterized all research participants at the time of specimen collection with respect to race, age, menopausal status, and use of hormone replacement therapy (HRT). Women were considered postmenopausal if they reported no menstruation for 6 months, used HRT, or were >50 years old and did not report menstrual history. All women using HRT had been taking it for ≥1 year. Stage and histology were recorded for all cancer cases. Table 1 summarizes these variables for the study population.
. | Total . | Menopausal status . | . | HRT use ever . | . | . | Race . | . | . | . | . | Stage . | . | . | . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | Pre . | Post . | Yes . | No . | NA . | Asian . | Black . | Other* . | White . | NA . | I . | II . | III . | IV . | ||||||||||
Ovarian cancer | 34 | 1 | 33 | 22 | 8 | 4 | 0 | 1 | 0 | 30 | 3 | 4 | 3 | 20 | 7 | ||||||||||
Serous | 21 | 1 | 20 | 14 | 5 | 2 | 0 | 0 | 0 | 20 | 1 | 1 | 1 | 15 | 4 | ||||||||||
Mucinous | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | ||||||||||
Endometrioid | 4 | 0 | 4 | 2 | 1 | 1 | 0 | 0 | 0 | 3 | 1 | 2 | 1 | 0 | 1 | ||||||||||
Undifferentiated | 3 | 0 | 3 | 3 | 3 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 3 | 0 | ||||||||||
Other | 5 | 0 | 5 | 3 | 1 | 1 | 0 | 0 | 0 | 4 | 1 | 0 | 1 | 2 | 2 | ||||||||||
Benigns | 21 | 2 | 19 | 13 | 7 | 1 | 3 | 0 | 1 | 16 | 1 | NA | NA | NA | NA | ||||||||||
Surgical controls | 24 | 2 | 22 | 17 | 7 | 0 | 0 | 0 | 0 | 24 | 0 | NA | NA | NA | NA | ||||||||||
Healthy controls | 36 | 4 | 32 | 0 | 0 | 36 | 0 | 0 | 0 | 34 | 1 | NA | NA | NA | NA |
. | Total . | Menopausal status . | . | HRT use ever . | . | . | Race . | . | . | . | . | Stage . | . | . | . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | Pre . | Post . | Yes . | No . | NA . | Asian . | Black . | Other* . | White . | NA . | I . | II . | III . | IV . | ||||||||||
Ovarian cancer | 34 | 1 | 33 | 22 | 8 | 4 | 0 | 1 | 0 | 30 | 3 | 4 | 3 | 20 | 7 | ||||||||||
Serous | 21 | 1 | 20 | 14 | 5 | 2 | 0 | 0 | 0 | 20 | 1 | 1 | 1 | 15 | 4 | ||||||||||
Mucinous | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | ||||||||||
Endometrioid | 4 | 0 | 4 | 2 | 1 | 1 | 0 | 0 | 0 | 3 | 1 | 2 | 1 | 0 | 1 | ||||||||||
Undifferentiated | 3 | 0 | 3 | 3 | 3 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 3 | 0 | ||||||||||
Other | 5 | 0 | 5 | 3 | 1 | 1 | 0 | 0 | 0 | 4 | 1 | 0 | 1 | 2 | 2 | ||||||||||
Benigns | 21 | 2 | 19 | 13 | 7 | 1 | 3 | 0 | 1 | 16 | 1 | NA | NA | NA | NA | ||||||||||
Surgical controls | 24 | 2 | 22 | 17 | 7 | 0 | 0 | 0 | 0 | 24 | 0 | NA | NA | NA | NA | ||||||||||
Healthy controls | 36 | 4 | 32 | 0 | 0 | 36 | 0 | 0 | 0 | 34 | 1 | NA | NA | NA | NA |
NOTE: The “other” ovarian cancer histology cancer category includes 1 adenocarcinoma, 1 clear cell, and 3 unclassified epithelial tumors. Benign tumor conditions include 3 nonneoplastic, 2 mucinous, 1 Brenner, 10 serous, 1 NML, and 4 other categories.
Other does not include Hispanic or Native American.
To reflect the behavior of the marker among the overall population of ovarian cancer cases, all cases were randomly selected from the specimen repository. We selected 34 cases, including 27 late-stage and 7 early-stage cancers. Of those 34 cases, 21 had serous histology and 2 were early-stage, serous cancers. These sample sizes for the specimens, used as part of a larger National Cancer Institute–funded ovarian cancer biomarker validation study, were determined to have a 70% chance of detecting a sensitivity of 30% or more at specificity of 98% when discriminating cases from combined healthy controls and benign controls using a Wilcoxon rank-sum test. Power also had better than even chance of detecting a sensitivity of 20% or more at 98% specificity. Power was determined by a conservative simulation, where the 70% false-negative cases were assigned marker behavior equivalent to true-negative controls and the 2% false-positive controls were assigned marker behavior equivalent to the true-positive cases. Power under more complex behavior of the marker may detect even less subtle sensitivities. This power was more than sufficient to detect or independently validate hK11, which in previous studies reported a sensitivity of 70% at 98% specificity (22).
As a first step toward evaluating the relevance of our marker when used in a screening study, we also did measurements on specimens collected serially (1 year apart) from a subsample (n = 20) of the healthy controls. Temporal stability was measured by the correlation between the two time points. The performance of markers with high temporal stability may improve when measured over time to monitor deviations from baseline.
ELISA assay for hK11
The ELISA assay for quantifying hK11 in serum has been previously published and validated (22). In short, this assay has a detection limit of 0.1 μg/L, and the dynamic range extends to 50 μg/L. The assay has no cross-reactivity from other tissue kallikreins and varies within the measurement range by <10%. All samples were analyzed undiluted in duplicate. More recently, we improved the detection limit to 0.02 μg/L without changing any other assay characteristics.3
Unpublished data.
Statistical analysis
Quantifying the diagnostic ability of a marker. Receiver operating characteristic (ROC) curve methods were used to quantify marker performance for hK11 and CA 125. ROC curves associate the sensitivity of a diagnostic test to the entire range of the possible false-positive rate. The false-positive rate is equal to one minus test specificity. The area under the ROC curve (AUC) indicates the average sensitivity of a marker over the entire ROC curve. We also computed the sensitivity of each marker at 95% specificity, a value more relevant to diagnosis and early detection than the overall average sensitivity measured by the AUC. Establishing statistical significance of a single marker is done by the Wilcoxon rank-sum test, which evaluates the significance of the entire ROC curve.
Standardizing markers. To aid interpretation of our data when comparing two markers, we first transformed all markers with the natural log so their behavior among healthy women more accurately reflected a normal distribution. We then standardized the markers so they had a mean of 0 and unit SD in the sample of healthy controls (28). Standardization of the markers, which leaves the ROC curves and temporal stability unchanged, facilitates the comparison of two different markers because their units of measurement are now similar (the number of SDs above the average normal subject) as illustrated below.
Combining markers. We evaluated a CM of hK11 with CA 125 as a linear combination, or weighting, of the standardized CA 125 and hK11, where logistic regression is used to estimate the weights. Logistic regression has several theoretical properties that make it convenient for applied biomarker research (31), including its capacity to estimate the optimal marker combination.
We estimated our CM by predicting ovarian cancer cases from among all noncancer controls, including healthy, benign, and surgical controls. We limited our attention to a linear combination (i.e., a logistic regression linear link) to facilitate ease of interpretation, although other more complex rules are possible.
Evaluating temporal stability. We measured the temporal stability in healthy subjects by computing the Pearson correlation from two time points in the 20 healthy women for whom yearly specimens were available. Markers with high Pearson correlation yielded improved performance in a longitudinal algorithm (8, 11). A high correlation, in particular one exceeding 0.5, implies that monitoring markers for their deviation from historical levels using the parametric empirical Bayes screening rule will yield earlier detection than a simpler diagnostic rule that ignores screening history (11, 32), although we cannot conclude that one marker is better than another, or a marker panel is better than a simple marker, based solely on its temporal stability.
Results
Table 2 summarizes raw and standardized levels of hK11 and CA 125 for the controls and ovarian cancer cases. Within each subgroup, we provide the number of women, the mean level of the marker, and its SD. We also provide the quantiles of the marker within those groups. The quantiles help summarize the distribution of the marker and provide reference ranges. For example, 95% of all healthy women have hK11 levels below 0.60 μg/L on the raw scale (see Table 2, row 22, column 7) or, equivalently, below 1.193 SDs on the standardized scale (see Table 2, row 32, column 7). Moreover, the median (50th percentile) level of hK11 in all cases is 0.60 μg/L on the raw scale (see Table 2, row 26, column 5) or 1.193 SDs on the standardized scale (see Table 2, row 36, column 5). This means hK11 has 50% or greater sensitivity at 95% specificity.
Raw CA 125 . | n . | Mean (SD) . | 25% . | 50% . | 75% . | 95% . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Healthy | 36 | 13.25 (6.75) | 8.42 | 11.20 | 17.72 | 24.54 | ||||||
Surgically normal | 24 | 30.63 (81.91) | 8.56 | 13.20 | 16.79 | 51.50 | ||||||
All nonbenign controls | 60 | 20.20 (52.11) | 8.42 | 12.58 | 17.46 | 24.96 | ||||||
Benign | 21 | 70.81 (196.36) | 9.21 | 20.76 | 32.62 | 138.71 | ||||||
Cases all | 34 | 414.45 (446.63) | 60.77 | 219.39 | 648.21 | 1,148.90 | ||||||
Early stage | 7 | 313.09 (236.49) | 134.79 | 256.28 | 488.20 | 657.87 | ||||||
Late stage | 27 | 440.72 (486.61) | 49.25 | 214.47 | 735.35 | 1,383.07 | ||||||
Serous | 21 | 397.55 (400.47) | 67.43 | 224.31 | 619.23 | 1,148.90 | ||||||
Nonserous | 13 | 441.74 (529.11) | 31.49 | 202.16 | 657.87 | 980.10 | ||||||
Standardized (log) CA 125 | ||||||||||||
Healthy | 36 | 0.000 (1.000) | −0.579 | −0.058 | 0.781 | 1.376 | ||||||
Surgically normal | 24 | 0.391 (1.654) | −0.501 | 0.298 | 0.741 | 2.801 | ||||||
All nonbenign controls | 60 | 0.134 (1.293) | −0.579 | 1.155 | 0.754 | 1.406 | ||||||
Benign | 21 | 1.302 (2.218) | −0.363 | 1.131 | 1.961 | 4.622 | ||||||
Cases all | 34 | 5.102 (2.876) | 3.102 | 5.464 | 7.455 | 8.507 | ||||||
Early stage | 7 | 5.372 (2.076) | 4.305 | 5.750 | 6.934 | 7.482 | ||||||
Late stage | 27 | 5.032 (3.079) | 2.685 | 5.423 | 7.687 | 8.848 | ||||||
Serous | 21 | 5.283 (2.538) | 3.296 | 5.505 | 7.371 | 8.507 | ||||||
Nonserous | 13 | 4.810 (3.445) | 1.896 | 5.314 | 7.482 | 8.215 | ||||||
Raw hK11 | ||||||||||||
Healthy | 36 | 0.445 (0.115) | 0.350 | 0.445 | 0.543 | 0.600 | ||||||
Surgically normal | 24 | 0.491 (0.142) | 0.378 | 0.460 | 0.568 | 0.700 | ||||||
All nonbenign controls | 60 | 0.463 (0.127) | 0.368 | 0.445 | 0.550 | 0.660 | ||||||
Benign | 21 | 0.543 (0.228) | 0.410 | 0.510 | 0.580 | 0.990 | ||||||
Cases all | 34 | 0.793 (0.446) | 0.525 | 0.600 | 0.978 | 1.500 | ||||||
Early stage | 7 | 0.911 (0.391) | 0.625 | 0.860 | 1.100 | 1.600 | ||||||
Late stage | 27 | 0.762 (0.461) | 0.480 | 0.590 | 0.880 | 1.500 | ||||||
Serous | 21 | 0.749 (0.317) | 0.540 | 0.600 | 0.910 | 1.300 | ||||||
Nonserous | 13 | 0.865 (0.609) | 0.440 | 0.660 | 1.100 | 1.600 | ||||||
Standardized (log) hK11 | ||||||||||||
Healthy | 36 | 0.000 (1.000) | −0.723 | 0.131 | 0.835 | 1.193 | ||||||
Surgically normal | 24 | 0.344 (0.993) | −0.454 | 0.241 | 0.994 | 1.741 | ||||||
All nonbenign controls | 60 | 0.138 (1.003) | −0.549 | 0.131 | 0.884 | 1.532 | ||||||
Benign | 21 | 0.616 (1.206) | −0.160 | 0.615 | 1.072 | 2.973 | ||||||
Cases all | 34 | 1.763 (1.677) | 0.718 | 1.193 | 2.925 | 4.449 | ||||||
Early stage | 7 | 2.401 (1.523) | 1.319 | 2.472 | 3.347 | 4.679 | ||||||
Late stage | 27 | 1.598 (1.702) | 0.388 | 1.133 | 2.552 | 4.449 | ||||||
Serous | 21 | 1.699 (0.317) | 0.818 | 1.193 | 2.673 | 3.941 | ||||||
Nonserous | 13 | 1.867 (2.079) | 0.091 | 1.532 | 3.347 | 4.679 |
Raw CA 125 . | n . | Mean (SD) . | 25% . | 50% . | 75% . | 95% . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Healthy | 36 | 13.25 (6.75) | 8.42 | 11.20 | 17.72 | 24.54 | ||||||
Surgically normal | 24 | 30.63 (81.91) | 8.56 | 13.20 | 16.79 | 51.50 | ||||||
All nonbenign controls | 60 | 20.20 (52.11) | 8.42 | 12.58 | 17.46 | 24.96 | ||||||
Benign | 21 | 70.81 (196.36) | 9.21 | 20.76 | 32.62 | 138.71 | ||||||
Cases all | 34 | 414.45 (446.63) | 60.77 | 219.39 | 648.21 | 1,148.90 | ||||||
Early stage | 7 | 313.09 (236.49) | 134.79 | 256.28 | 488.20 | 657.87 | ||||||
Late stage | 27 | 440.72 (486.61) | 49.25 | 214.47 | 735.35 | 1,383.07 | ||||||
Serous | 21 | 397.55 (400.47) | 67.43 | 224.31 | 619.23 | 1,148.90 | ||||||
Nonserous | 13 | 441.74 (529.11) | 31.49 | 202.16 | 657.87 | 980.10 | ||||||
Standardized (log) CA 125 | ||||||||||||
Healthy | 36 | 0.000 (1.000) | −0.579 | −0.058 | 0.781 | 1.376 | ||||||
Surgically normal | 24 | 0.391 (1.654) | −0.501 | 0.298 | 0.741 | 2.801 | ||||||
All nonbenign controls | 60 | 0.134 (1.293) | −0.579 | 1.155 | 0.754 | 1.406 | ||||||
Benign | 21 | 1.302 (2.218) | −0.363 | 1.131 | 1.961 | 4.622 | ||||||
Cases all | 34 | 5.102 (2.876) | 3.102 | 5.464 | 7.455 | 8.507 | ||||||
Early stage | 7 | 5.372 (2.076) | 4.305 | 5.750 | 6.934 | 7.482 | ||||||
Late stage | 27 | 5.032 (3.079) | 2.685 | 5.423 | 7.687 | 8.848 | ||||||
Serous | 21 | 5.283 (2.538) | 3.296 | 5.505 | 7.371 | 8.507 | ||||||
Nonserous | 13 | 4.810 (3.445) | 1.896 | 5.314 | 7.482 | 8.215 | ||||||
Raw hK11 | ||||||||||||
Healthy | 36 | 0.445 (0.115) | 0.350 | 0.445 | 0.543 | 0.600 | ||||||
Surgically normal | 24 | 0.491 (0.142) | 0.378 | 0.460 | 0.568 | 0.700 | ||||||
All nonbenign controls | 60 | 0.463 (0.127) | 0.368 | 0.445 | 0.550 | 0.660 | ||||||
Benign | 21 | 0.543 (0.228) | 0.410 | 0.510 | 0.580 | 0.990 | ||||||
Cases all | 34 | 0.793 (0.446) | 0.525 | 0.600 | 0.978 | 1.500 | ||||||
Early stage | 7 | 0.911 (0.391) | 0.625 | 0.860 | 1.100 | 1.600 | ||||||
Late stage | 27 | 0.762 (0.461) | 0.480 | 0.590 | 0.880 | 1.500 | ||||||
Serous | 21 | 0.749 (0.317) | 0.540 | 0.600 | 0.910 | 1.300 | ||||||
Nonserous | 13 | 0.865 (0.609) | 0.440 | 0.660 | 1.100 | 1.600 | ||||||
Standardized (log) hK11 | ||||||||||||
Healthy | 36 | 0.000 (1.000) | −0.723 | 0.131 | 0.835 | 1.193 | ||||||
Surgically normal | 24 | 0.344 (0.993) | −0.454 | 0.241 | 0.994 | 1.741 | ||||||
All nonbenign controls | 60 | 0.138 (1.003) | −0.549 | 0.131 | 0.884 | 1.532 | ||||||
Benign | 21 | 0.616 (1.206) | −0.160 | 0.615 | 1.072 | 2.973 | ||||||
Cases all | 34 | 1.763 (1.677) | 0.718 | 1.193 | 2.925 | 4.449 | ||||||
Early stage | 7 | 2.401 (1.523) | 1.319 | 2.472 | 3.347 | 4.679 | ||||||
Late stage | 27 | 1.598 (1.702) | 0.388 | 1.133 | 2.552 | 4.449 | ||||||
Serous | 21 | 1.699 (0.317) | 0.818 | 1.193 | 2.673 | 3.941 | ||||||
Nonserous | 13 | 1.867 (2.079) | 0.091 | 1.532 | 3.347 | 4.679 |
Although raw values are helpful when examining a single marker in different samples, they are not useful for comparing different markers because their scales differ. Converting the raw scores to a standardized scale enables us to compare two or more markers across samples. For instance, the median raw scores in our case samples are 219.39 for CA 125 and 0.600 for hK11, but we cannot compare the relative elevation of the two markers; the greater number for CA 125 does not imply that it is hundreds of times more elevated than hK11 in cases. However, numbers on the standardized scale are more comparable, 5.464 for CA 125 and 1.133 for hK11, which implies that CA 125 elevates, on average, ∼4.822 times more than hK11 in a typical case (5.464/1.133 = 4.822).
Among healthy subjects alone, we found no statistically significant differences in CA 125 and hK11 marker concentration with respect to age, HRT use, or race. Nearly all cases and controls were postmenopausal, but data for CA 125 were consistent with it being elevated among women who are premenopausal and consistent with previous studies for these markers.
Results for CA 125.Table 3 shows that CA 125 is a significant predictor of ovarian cancer from among healthy women and that it did as expected (P < 0.0005; AUC = 0.936; sensitivity of 88.2% at 95% specificity). Moreover, CA 125 significantly differentiates between ovarian cancer cases and surgical normal controls (P < 0.0005; AUC = 0.902; sensitivity of 76%) and to a similar degree as it does between cases and healthy controls. There is no statistically significant difference between surgical normal and healthy normal controls (P = 0.840, bottom row of Table 3), implying that the marker is not affected by surgical collection. However, as expected, CA 125 is significantly different between healthy normal controls and women with benign ovarian disease (P = 0.029), and data also suggest differences between benign ovarian disease and surgical normal controls (P = 0.060). This shows the well-known sensitivity of CA 125 to nonmalignant ovarian tumors (13–15).
. | HK11 . | . | . | . | . | . | CA 125 . | . | . | . | . | . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Wilcoxon . | . | . | . | Confidence interval . | . | Wilcoxon . | . | . | . | Confidence interval . | . | ||||||||||
. | AUC . | P . | Sensitivity . | SE . | Lower . | Upper . | AUC . | P . | Sensitivity . | SE . | Lower . | Upper . | ||||||||||
Cases (n = 34) vs healthy (n = 36) | 0.815 | <0.001 | 0.471 | 0.0856 | 0.303 | 0.639 | 0.936 | <0.001 | 0.882 | 0.0553 | 0.774 | 0.990 | ||||||||||
Cases vs surgical normal (n = 24) | 0.770 | 0.001 | 0.412 | 0.0844 | 0.247 | 0.577 | 0.902 | <0.001 | 0.765 | 0.0727 | 0.622 | 0.908 | ||||||||||
Cases vs all nonbenign controls (n = 60) | 0.797 | <0.001 | 0.441 | 0.0852 | 0.274 | 0.608 | 0.923 | <0.001 | 0.882 | 0.0553 | 0.774 | 0.990 | ||||||||||
Cases vs benigns (n = 21) | 0.716 | 0.009 | 0.265 | 0.0757 | 0.117 | 0.413 | 0.840 | <0.001 | 0.647 | 0.0820 | 0.486 | 0.808 | ||||||||||
Cases vs all noncancer controls (n = 81) | 0.776 | <0.001 | 0.441 | 0.0852 | 0.274 | 0.608 | 0.901 | <0.001 | 0.765 | 0.0727 | 0.622 | 0.908 | ||||||||||
Benigns vs healthy | 0.631 | 0.145 | 0.238 | 0.0929 | 0.0559 | 0.420 | 0.675 | 0.029 | 0.333 | 0.103 | 0.131 | 0.535 | ||||||||||
Benigns vs surgical normals | 0.604 | 0.569 | 0.095 | 0.0640 | −0.0304 | 0.220 | 0.671 | 0.060 | 0.143 | 0.0764 | −0.00673 | 0.293 | ||||||||||
Benigns vs all nonbenign controls | 0.604 | 0.221 | 0.143 | 0.0764 | −0.00673 | 0.293 | 0.671 | 0.021 | 0.333 | 0.103 | 0.131 | 0.535 | ||||||||||
Surgical normal vs healthy | 0.436 | 0.342 | 0.083 | 0.0563 | −0.0274 | 0.193 | 0.484 | 0.840 | 0.083 | 0.0563 | −0.0274 | 0.193 |
. | HK11 . | . | . | . | . | . | CA 125 . | . | . | . | . | . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Wilcoxon . | . | . | . | Confidence interval . | . | Wilcoxon . | . | . | . | Confidence interval . | . | ||||||||||
. | AUC . | P . | Sensitivity . | SE . | Lower . | Upper . | AUC . | P . | Sensitivity . | SE . | Lower . | Upper . | ||||||||||
Cases (n = 34) vs healthy (n = 36) | 0.815 | <0.001 | 0.471 | 0.0856 | 0.303 | 0.639 | 0.936 | <0.001 | 0.882 | 0.0553 | 0.774 | 0.990 | ||||||||||
Cases vs surgical normal (n = 24) | 0.770 | 0.001 | 0.412 | 0.0844 | 0.247 | 0.577 | 0.902 | <0.001 | 0.765 | 0.0727 | 0.622 | 0.908 | ||||||||||
Cases vs all nonbenign controls (n = 60) | 0.797 | <0.001 | 0.441 | 0.0852 | 0.274 | 0.608 | 0.923 | <0.001 | 0.882 | 0.0553 | 0.774 | 0.990 | ||||||||||
Cases vs benigns (n = 21) | 0.716 | 0.009 | 0.265 | 0.0757 | 0.117 | 0.413 | 0.840 | <0.001 | 0.647 | 0.0820 | 0.486 | 0.808 | ||||||||||
Cases vs all noncancer controls (n = 81) | 0.776 | <0.001 | 0.441 | 0.0852 | 0.274 | 0.608 | 0.901 | <0.001 | 0.765 | 0.0727 | 0.622 | 0.908 | ||||||||||
Benigns vs healthy | 0.631 | 0.145 | 0.238 | 0.0929 | 0.0559 | 0.420 | 0.675 | 0.029 | 0.333 | 0.103 | 0.131 | 0.535 | ||||||||||
Benigns vs surgical normals | 0.604 | 0.569 | 0.095 | 0.0640 | −0.0304 | 0.220 | 0.671 | 0.060 | 0.143 | 0.0764 | −0.00673 | 0.293 | ||||||||||
Benigns vs all nonbenign controls | 0.604 | 0.221 | 0.143 | 0.0764 | −0.00673 | 0.293 | 0.671 | 0.021 | 0.333 | 0.103 | 0.131 | 0.535 | ||||||||||
Surgical normal vs healthy | 0.436 | 0.342 | 0.083 | 0.0563 | −0.0274 | 0.193 | 0.484 | 0.840 | 0.083 | 0.0563 | −0.0274 | 0.193 |
NOTE: AUC, Wilcoxon P value, and sensitivity at near 95% specificity are used to evaluate the capabilities of all the markers.
Results for hK11.Table 3 shows that hK11 is also a significant classifier of ovarian cancer compared with healthy controls, although it did not do as well as CA 125 (P < 0.0005; AUC = 0.77; and sensitivity of 41% for 95% specificity). hK11 also distinguishes between benign controls and ovarian cancer cases (P = 0.009; AUC = 0.716; and sensitivity of 26.5%). However, there is no detected difference between hK11 in benign ovarian disease and healthy controls (P = 0.145), between benign disease and surgical normal controls (P = 0.569), or between surgical normal controls and healthy controls (P = 0.342). So, although hK11 cannot distinguish cases from healthy controls as well as CA 125, it may be less sensitive than CA 125 to benign ovarian disease. Moreover, because of the comparable performance between surgical normal and healthy controls, we can be reasonably confident that this conclusion is not related to ascertainment bias.
Marker combinations. We investigated the ability of hK11 and CA 125 to form a CM with a better ROC curve than either individual marker. The initial logistic regression controlled for menopausal status and indicator of serous or nonserous tumor. Because neither the interaction terms nor the main effects were significant, they were dropped from the regression. Our resulting CM weights standardized CA 125 with a coefficient of 1.126 (P < 0.001) and standardized hK11 with a coefficient of 0.761 (P = 0.094). Because the markers have been standardized, these weights imply that CA 125 conveys 60% [1.126 / (1.126 + 0.761) × 100] of the total information in the panel compared with 40% of hK11. The CM is thus defined by CM = 1.126 × (standard CA 125) + 0.761 × (standard hK11). In the marker panel, the coefficient of standard CA 125 is highly significant, but the coefficient for standard hK11 is only modestly significant perhaps due to sample size limitations. Thus, we could not be certain that the two markers combine to form an effective diagnostic panel. Addressing whether they combine together to diagnose cancer early will require the availability of preclinically collected specimens.
The joint behavior of standardized hK11 and CA 125 among cases and all control groups is displayed in Fig. 1. The horizontal axis represents standardized hK11, and the vertical axis represents standardized CA 125. Note that the healthy controls, denoted with “x,” have mean 0 and variance 1 for each marker. The cases, denoted by “o” in the figure, tend to have higher standardized values for each marker than do the various controls, which means a classification rule that uses only one or the other marker has some ability to separate the cases from controls. The role of a CM is to separate cases from controls using two dimensions instead of one. The diagonal line shown in Fig. 1 represents the 95% specificity classification rule estimated from a linear combination of the markers; points above the line are classified as cases and those below are classified as controls. Lines with more (or less) specificity can be represented by lines parallel to that given but higher (or lower) than that shown.
The results above suggest that the diagnostic ROC curves of the CM may not be greater than that of CA 125 alone. Another potential explanation for this result is the lack of power our study has for evaluating marker combinations due to the small sample size of the cases. Careful consideration of the requirements of finding a marker complementary to CA 125 suggests that such a task could be difficult because CA 125 does so well on its own (its overall sensitivity for ovarian cancer is 80%). In our sample, only four individual cases had levels of CA 125 under the usual reference range, and so there are only four cases where hK11 can improve over CA 125 using the usual ROC criterion at high specificity. Definitive conclusions about the complementarities of hK11 and CA 125 for diagnostic testing can only be ascertained with larger studies of the type we have undertaken here.
Longitudinal marker behavior. The performance of markers with temporal stability can be improved by accounting for marker history in a longitudinal screening program (8, 33), and several longitudinal algorithms have been proposed (9, 11, 12). In particular, the parametric empirical Bayes algorithm, intended for application of novel markers, makes use of the simple Pearson correlation of a marker measured at two different time points to generate a screening rule (11). Because the CM summarizes both markers into a single numerical score, the parametric empirical Bayes rule can also be applied to the CM.
Figure 2 plots the values of standardized CA 125, hK11, and the CM for the participants who provided repeat measurements. Both CA 125 and hK11 show high temporal stability, with correlations equaling 0.72 and 0.85, respectively. The high temporal stability of CA 125 has been well established (8), but the temporal stability of hK11 is reported here for the first time. The heterogeneity of hK11and CA 125 imply that the baseline levels of these markers can be highly informative and, if used for early detection in a longitudinal algorithm, that controls for baseline could achieve levels of performance that exceed the performance of the cross-sectional studies presented here. For example, McIntosh et al. (11, 12) show that, when used in a longitudinal study, a correlation of 0.85 for hK11 could mean detecting elevations that are 38% smaller than seen in this cross-sectional study while maintaining the same specificity. One cannot conclude how much earlier cancer would be detected using longitudinal samples without preclinical samples from women who eventually went on to develop cancer; this calculation only establishes the possibility of further gain by controlling baseline, whereas small correlations may rule out such gains.
Discussion
Recent studies have shown that multiple kallikreins are among the most promising biomarkers for diagnosis and prognosis of ovarian cancer (19–27). For example, hK5, hK6, hK8, hK10, hK11, and hK14 (21–26) have shown promise as diagnostic serologic markers, and some kallikreins were also found to have prognostic value. We have previously speculated that kallikreins might represent an enzymatic cascade pathway that is activated in ovarian cancer (34, 35). We also have indications that this proteolytic system may cross-talk to other proteolytic systems, such as the metalloproteinases and the urokinase plasminogen activator system (17).
Here, we have independently validated hK11 as a serologic diagnostic biomarker for ovarian cancer. Our results are independent because we evaluated the concentrations of the marker in cases and in controls not previously characterized by hK11. In addition, we evaluated the temporal stability of hK11 and have shown (because the marker is highly heterogeneous in the population) that the marker could do even better if used in a longitudinal algorithm for early disease detection.
We also combined CA 125 and hK11 to evaluate the potential increase in diagnostic efficacy of the resulting CM. In the future, a multiparametric marker panel may be developed by combining other kallikreins that have shown diagnostic value for ovarian cancer (as mentioned above) as well as other newly discovered biomarkers. Future investigation of the combination of CA 125 and hK11 needs to be confirmed in studies specifically powered to evaluate such combinations (36).
The high correlation of hK11, comparable with that of CA 125 (Fig. 2), implies that its AUC and sensitivity reported in Table 3 could be improved if longitudinal algorithms are used to monitor for abnormal marker levels. The potential improvement in the detectable limit of a biomarker when using the parametric empirical Bayes rule can be quantified by computing [square root (1 − correlation)], which gives the relative size of a revised reference range for markers when controlling for screening history. Estimating this for each of the three markers while controlling for screening history would allow us to detect deviations of approximately half the magnitude with the same specificity and sensitivity. In particular for CA 125, hK11, and the CM, we could detect deviations of 44%, 47%, or 59% the size of those detected without controlling for history, respectively.
Grant support: National Cancer Institute grant P50 CA83636. Research on kallikreins in E.P. Diamandis' lab is supported by grants from the National Cancer Institute (grant R21CA093568), the Natural Sciences and Engineering Research Council of Canada, and IBEX Technologies, Inc., Montreal, Quebec, Canada.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.