Abstract
Recent advances in molecular technology are leading to the discovery of new tumor biomarkers that may be useful for cancer screening and early diagnosis. Translating a potential screening biomarker from the laboratory to its use in patient care may require an algorithm or screening rule for its application. An algorithm that can detect the smallest deviation from a defined norm is likely to achieve the highest sensitivity, but any practical screening algorithm must do so with strict controls on test specificity to avoid false-positive results, and unnecessary patient alarm and risk. Longitudinal algorithms that make use of previous tumor marker values and trends are likely to obtain improvements over single threshold rules. Thus far, a few longitudinal screening algorithms have been proposed (e.g., using serial prostate-specific antigen values for the detection of prostate cancer and serial CA125 values for the detection of ovarian cancer), but these algorithms are not appropriate for novel tumor marker discoveries, because they rely on unverifiable assumptions that may not translate to the behavior of the new marker. The algorithm presented here is motivated by: (a) the need to develop an algorithm for early detection using novel markers; (b) the practical demands on data and specimen availability; and (c) the need to be robust enough to accommodate a wide range of tumor growth behavior. We use Parametric Empirical Bayes statistical theory to model the trajectory of markers over time in a cohort of asymptomatic healthy subjects, and use the estimated trajectory to produce person-specific thresholds that depend on the screening history of each person. The thresholds are chosen to give the person (or population) a specified false-positive rate. The resulting algorithm is simple and can be represented in a simple graph or a chart. The statistical analysis needed to generate the algorithm can be found in nearly every basic statistical package. The algorithm is highly robust and can detect a wide range of tumor behaviors. The Parametric Empirical Bayes screening algorithm should take a central role when evaluating marker discoveries for use in screening. The algorithm is particularly useful when screening with a new marker of which the behavior in the preclinical period is not well known.
Introduction
The most common tumor marker screening algorithm is the ST3 rule, which gives a positive (likely to have disease) outcome for any subject of which the marker concentration deviates from a common population-wide threshold. However, when tumor marker behavior is heterogeneous across individuals, a better approach is to use a longitudinal algorithm to tailor the threshold to the individual screening history. Complex longitudinal algorithms, such as those using PSA values to detect prostate cancer (1, 2, 3) or CA125 values to diagnose ovarian cancer (4), have been proposed, but those algorithms are not appropriate for general use, for new tumor marker discoveries, or for other diseases. These algorithms are computationally intensive and require substantial modeling of the tumor marker growth during the preclinical period. When modeled adequately, these algorithms yield high sensitivity, but the specimens needed to observe and define the preclinical trajectory are so rare that these approaches are impractical for general use. The marker trajectory can be observed only by having access to several serially collected specimens taken from several cases during their preclinical period. Such observations are extremely rare for many diseases, and not even stored blood specimens collected as part of large longitudinal studies may be sufficient. For example, the ovarian cancer screening study by Jacobs et al. (5) randomized half of nearly 22,000 postmenopausal women to three annual primary screens with CA125 and secondary screen with ultrasound. Only 6 ovarian cancer cases were found during screening, with another 10 reported within a 7-year follow-up. However, most of the cases (the 10 found during follow-up) will not have had serial serum samples taken during their early preclinical period. Specimens from the cases found during screening were collected at annual intervals, not frequent enough to observe the early preclinical behavior of a disease such as ovarian cancer where the median preclinical duration is thought to be <18 months. Moreover, in diseases where stored specimens are sufficient, the storage conditions cannot anticipate the processing requirements of all of the future tumor markers.
An alternative to those data- and computation-intensive algorithms is presented here. This algorithm is particularly appropriate for screening with new markers or in new populations. Our approach generates screening rules by focusing its attention on the trajectory of the markers in healthy women, for whom data are plentiful. The algorithm then determines if a general, rather than a specific (i.e., exponential or linear) change in marker behavior has occurred. The algorithm may be calibrated to work using any specificity, chosen either for the population or for each individual. The required specificity, along with the screening history of the subject, are used together to compute a person-specific threshold that is used to assess the next screening event. In this way the behavior of the marker is characterized and used to reflect the disease process or tumor growth and progression. Computing the threshold is very simple and involves computing the sample average of subject screening history. Then, the screening threshold can be read off of a graph or looked up in a simple table. Our approach is based on PEB statistical methods, and we refer to our method simply as the PEB screening algorithm or simply the PEB rule. A full technical description of the algorithm derivation and theoretical properties is given by McIntosh and Urban (6), and is outlined below enough for the reader to understand its properties and implement it.
The PEB screening rule uses PEB statistical theory to infer the individual normal marker levels of each subject accounting for the population heterogeneity. The estimated normal levels of the individual are then used to produce a threshold tailored to his/her normal behavior. This normal behavior is revealed over time by event-free screening. The PEB screening rule has the surprising property of giving the vast majority of all of the women a threshold far lower than the single threshold rule but maintains the same population-wide specificity. Thus, in theory, cancers can be detected at an earlier stage without increasing the burden on the healthy population. In can be shown technically that the PEB rule uniformly outperforms the single threshold rule in that for any level of specificity, the PEB rule achieves a higher sensitivity (6).
The following two sections describe the behavior the PEB algorithm requires of a tumor marker and the data needed to implement the algorithm. We also contrast the PEB requirements with the existing longitudinal algorithms for CA125 and PSA mentioned above. We do this to point out that the PEB assumptions are typically less restrictive but also to point out that those other approaches may be considered when their more stringent requirements are met. The third section below describes how to estimate the parameters needed for the PEB algorithm and presents the algebraic expression to determine the threshold. An informal discussion of the technical aspects of the PEB algorithm follows. Finally, we discuss aspects of implementing the PEB rule that should be considered before applying the model to an actual population.
Materials and Methods
Marker Behavior Required of the PEB Screening Algorithm.
A PEB screening algorithm can be generated for any marker meeting a modest set of requirements. Specifically: (a) marker concentrations in healthy subjects are continuous; (b) marker concentrations tend to either remain unchanged or grow in a known direction (elevate or decrease) with the onset of cancer or other pathologic/disorder; (c) covariates (fixed and time varying) that describe large differences in marker behavior in healthy subjects are available and accounted for (see “Discussion”); and (d) the serial variation of the marker is not extreme over time in healthy subjects. These assumptions appear widely applicable to many tumor markers (7, 8). Importantly, note that no specific assumptions are made about the growth of a marker at cancer onset other than that it elevates (or declines) with cellular transformation.
The first assumption suggests that the marker concentrations, with some transformation, will have a normal distribution in healthy subjects. The second assumption allows us to know in which direction to look for deviation in the marker when disease develops. The final two assumptions allow us to model the trajectory over time for healthy women and without substantial mathematical modeling. Each of these assumptions can be relaxed for the general PEB rule but not for the simpler form presented by our report in detail. See McIntosh and Urban (6) for the most general presentation.
The complex longitudinal algorithms also require these assumptions, but in those applications they are even more specific and restrictive. Those algorithms require a specific (i.e., log) transformation to normality (restrictive form of 1), a constant linear marker growth after cancer onset (restrictive form of 2), and have the same restrictions with serial correlation. Moreover, the transformation to normality must be on the same scale that gives the marker a linear growth. These restrictions are violated, for example, with any marker that is normal on the raw scale and grows with an additive exponential model. It is also violated if the tumor growth model meets all of the assumptions, but the growth is not constant over the entire preclinical period. There is no evidence that the same uniform growth model should hold for every tumor type (9) much less for every potential marker, of which the growth may not follow the tumor volume exactly. However, if those requirements are met, and the data are available to estimate it, those complex algorithms may find higher sensitivity than the PEB algorithm presented here.
Specimen Requirements to Implement the PEB Algorithm.
Novel tumor markers are typically evaluated by measuring their concentrations in stored specimens. Most commonly, the stored specimens are of two types: specimens from healthy control subjects and specimens from clinically presenting cases. A second easily obtained category of specimens is serially collected samples, measured in the same control individual at two time points, typically separated by several months. These serially measured samples from control subjects are necessary to implement the PEB algorithm. Sequential measurements on healthy controls, while not as abundant as cross-sectional specimens from healthy controls, can be obtained in substantial abundance over a short period of time. For technical and practical reasons, the duration between specimen collections should not be too frequent so that the serial correlation assumption above is not violated and not more infrequent than the proposed screening interval. The duration between intervals is likely to be both marker- and disease-dependent, but a large set of markers for ovarian cancer have found that specimens taken at 1-month intervals were sufficiently spaced to satisfy these requirements.
The PEB algorithm can be additionally improved if individual characteristics that can predict the marker behavior in healthy subjects are known. For example, the ovarian cancer tumor marker CA125 is known to have substantial differences based on the menopausal status of a woman or the presence of benign gynecologic conditions. These conditions may not contribute to the ovarian cancer risk substantially, but they may cause statistically significant changes in the CA125 values. Characteristics believed to influence the behavior of the markers in healthy subjects should be recorded and evaluated before implementing the PEB algorithm. The algorithm presented below will assume it is estimated separately for women having similar values of such covariates, but extensions can be made to include covariates in the PEB rule formally (see Ref. 6). Omitting fixed covariates when estimating the PEB rule will not invalidate the algorithm, but it will be less efficient. However, time varying covariates that predict marker behavior must be accommodated (e.g., age for the prostate cancer marker PSA). These points hold for all of the screening algorithms not only the PEB approach. The precision of the PEB algorithm will depend on the number of serial controls available to estimate the parameters, but only a modest number are needed to implement the algorithm.
The specimen needs of the PEB algorithm are far less demanding than those required by the more complex longitudinal screening approaches. Those algorithms have great demands on the rarest type of specimen: serial observations in cancer cases obtained at high frequency during the preclinical period. Such specimens are extremely difficult to come by and typically suggest that implementing the algorithms for new markers or diseases would rely on untested assumptions. Thus, to use those methods for all but highly prevalent and slow growing diseases requires, like the early work by Skates (10, 11), a substantial amount of time to accumulate the information in order to fit and implement the algorithm (4).
How to Estimate the Parameters and Implement the PEB Algorithm.
Presently, we assume that the marker concentration has been measured serially on a large number, N, of healthy control subjects. For clarity exposition we assume: (a) the control subjects have contributed exactly two marker observations; (b) all of the subjects are relatively homogeneous in regard to covariates that predict the normal marker levels (see section above); and (c) the markers are measured on a scale giving them a normal distribution in controls. The first and second assumptions are made only to reduce notation burden, and in practice are not needed, although relaxing them will require more detailed modeling (see Ref. 12 for details about how to measure these parameters in a more general setting and Ref. 6 on how to extend those to include time dependent and fixed covariates more formally). The final assumption (c) is necessary for our method to function properly, but it does not limit the generality of our method for any marker having continuous measurements, because then the existence of a transformation to normality is assured. Transformation can be done by some mathematical transformation such as log, or by simply translating the quantiles of the empirical marker distribution to that of the standard normal distribution.
The PEB algorithm requires that we examine the marker concentration mean, denoted μ, and decompose its variance into its total, within, and between components, denoted by V, σ2, τ2, respectively. Note that by definition V = σ2 + τ2, and so in actuality we need to estimate only three parameters. If subject i donates two sequential marker concentrations we denote the first and second level with Yi1 and Yi2, respectively. Our parameters are defined formally as follows: μ = mean [Yij], the mean of all marker measurements, V = Var[Yij], the variance of the markers in the population, σ2 = ½Var[Yi2 − Yi1] the marker variance within a subject, and τ2 = V − σ2, the between subject heterogeneity of the markers. Decomposing the marker variance into its within- and between-components allows us to characterize the relative subject heterogeneity in the population by the quantity B1 = τ2/V, which is often called the intraclass correlation (the reason for the subscript will become apparent below). Describing the most efficient manner to estimate these quantities is not in the scope of this article, but the expressions above do suggest simple ways to do so (see Ref. 12 for more efficient methods that can accommodate different numbers of observations per subject).
The PEB rule uses these quantities to determine the threshold for a subject from his/her screening history. Suppose a subject with n historical screens with an average concentration denoted by ȳn is about to undergo her next screen, and we wish the screen to operate at specificity α. The threshold for the next screen is given by the PEB algorithm to be equal to:
where
and zα is the α quantile of a standard normal distribution (i.e., zα = 1.96 when α = 0.975). Note that in the expression above only the mean ȳ and shrinkage factor Bn need to be recomputed from time to time, because all of the other quantities remain the same throughout the screening study. An important special case is an initial screen of a subject, when n = 0. At the initial screen Bn = 0, and so the threshold becomes Threshold = μ + zα
Results
Example: Using CA 125 to Screen a Population of High-risk Women.
Here we demonstrate the PEB rule when using CA 125 to screen a population of women for ovarian cancer and implement the algorithm with a specificity of 98%. Several publications have found that log(CA 125), to the base e, has a highly normally distributed distribution (4, 8, 10, 11, 12), and so here we generate the PEB rule using log(CA 125). The behavior of log(CA 125) may depend heavily on the population and the type of assay used (13). Here we use the population investigated by Crump et al. (8), which is a population of women at high risk for ovarian cancer and screened as part of an ongoing screening program at the Gilda Radner Ovarian Cancer Detection Program at Cedars-Sinai Medical Center in Los Angeles, California (14). The following behavior of log(CA 125) has been estimated in that population: μ = 2.27, V = 0.30, τ2 = 0.21, σ2 = 0.09, and B1 = 0.21/0.30 = 0.68. We will demonstrate the algorithm when operated with 98% specificity; z0.98 = 2.05.
Fig. 1 represents the PEB screening rule graphically, and Table 1 shows the PEB rule in tabular form. For convenience, either can be used in place of the formal PEB expression above. Fig. 1 represents the screening rule using log (CA 125; left axis) and raw CA 125 (right axis). Note that the PEB rule is linear only on the transformed scale, and so we note that when summarizing the screening history mean of the log(CA 125) values must be used. This is not the same as first computing the average of the raw CA 125 then taking the log. Doing the latter will give a threshold that is higher than it should be.
The lines in Fig. 1 show the serum concentration thresholds leading to a positive screen for a woman based on her screening history. The horizontal axis in Fig. 1 represents the numeric average of her historic log(CA 125), and each line in the figure is used to determine the threshold based on the value and the amount of screening history available for each woman. To determine the threshold for a woman undergoing screening, a clinician will: (a) compute the average of her previous log (CA 125) values; (b) locate the line in graph or the table column corresponding to the number of historical screens used in step 1; and (c) if using the table, find the appropriate row by identifying the row in column 1 that matches the average computed from step 1 or for the graph, find the point on that line directly above the value computed in step 1, and then read directly across to the vertical axis to find the threshold. For best accuracy, the PEB rule should be computed every time or represented in a table, like Table 1. The following example demonstrates how the PEB rule works on a hypothetical screening history of a woman.
Consider a woman whose first 3-year experience with screening found CA 125 concentrations equal to 8, 16, and 12, in her first, second, and third screen, respectively. The natural log of these concentrations is 2.1, 2.8, and 2.5, respectively. At the initial screen, when no screening history is available, the PEB rule is mathematically equivalent to the single threshold rule, and so a threshold of 2.27 + 2.05
Estimating the Performance of the PEB Algorithm in a Population.
A comprehensive ovarian cancer microsimulation model has been used to evaluate the behavior of the PEB algorithm in a large cohort of women in the United States (15). Our microsimulation model evaluates the performance of the PEB rule when used in a cohort of 1 million United States women who begin screening at age 50 and stop when they turn 85 years of age. The microsimulation model is explained in detail in Urban et al. (16), and described briefly below. These simulations are quite comprehensive. The life histories (cancer incidence, cancer death, treatment, and other causes of death) are modeled to reflect the United States population.
Urban et al. (15) used the microsimulation model and the PEB algorithm to estimate the performance of screening expected when using a screening protocol recommended by Skates et al. (11), who recommend two levels of positivity for a screen: positive screen for extreme elevations and early recall for modest ones. We used a false-positive rate of 2% to define the extreme and 15% for modest (i.e., 13% of healthy women were recalled early). With annual screening, this configuration of the PEB algorithm detects 70% of cancers before their clinical diagnosis, whereas the ST rule finds only 46% of them. Importantly, the PEB rule finds >50% of all cancers at early stage compared to 30% by the ST rule and 20% without screening. The PEB rule found an expected 31% drop in cause-specific mortality compared with 18% using the ST rule. This simulation study used B1 = 0.6, which is lower than what is now reported in general risk populations with modern assays (12). Using a greater value would yield even greater improvement.
The predictions of Urban et al. (15) are very close to those of Skates and Pauler (4), who predict 60% of cancers detected at early stage using a complex longitudinal algorithm and more favorable assumptions. Skates and Pauler (4) assume a larger B1, a longer duration of early stage disease, and recall 15% of women early. Other aspects of their simulation performed are not described with enough detail to make our simulations comparable.
The simulations summarized above are specific to ovarian cancer and CA 125. Here we use a comprehensive ovarian cancer microsimulation model to evaluate the performance of the PEB rule for hypothetical novel markers having B1 between 0 and 0.9. This is somewhat representative of the range found for a host of ovarian cancer markers (8). The microsimulation model is explained in detail in elsewhere (16) but is described briefly at the end of this section.
Table 2 summarizes the simulations when using a marker behaving like CA 125 in all respects but with widely different B1. Note that when B1 = 0 the PEB rule equals the ST rule. The simulation aged 1 million women from age 50 to death with cancer incidence matching SEER. A total of 15,660 of the 1 million women were found with cancer before their natural death (nearly 1 in 64), and 77% of them died from their disease. The rows of Table 2 summarize their expected experience if they had instead been screened between ages 50 and 80 using the PEB algorithm. Note that the PEB algorithm used here is operating at a specificity of 98% and does not perform early recall. Thus, these results should not be compared to those of Skates and Pauler (4).
Row 1 reports the fraction of all cases that would have been detected earlier, row 2 measures the average number of years earlier that all cases would have been detected, row 3 represents the fraction of cases that would have been detected at an earlier stage, row 4 represents the absolute reduction (from 77%) of cancer cases dying of ovarian cancer, and row 5 approximates the mean years extended for all cases. We emphasize that the performance in Table 2 averages the experience of all of the cases, not just those that were screen detected.
The performance of screening improves as B1 increases and, indeed, shows that the PEB rule still achieves better performance than the ST rule even when B1 is small and when the possibility of contamination is high. The ST rule detects 46% of the cases early, but the PEB rule can detect up to 50% of cases when B1 is high (a relative 8% improvement). This improvement is not trivial if detection occurs earlier with the PEB rule. The PEB rule detects cases on average up to 1.5 months earlier than the ST rule (0.93 years versus 0.80 years), which for many women is large enough to achieve a stage shift. Indeed, only 35% have a stage shift with the ST rule but up to 42% find one with the PEB rule (a 20% increase). Because mortality reductions depend primarily on a stage shift, this leads to a substantial improvement in cancer mortality reduction from 19% to 26% (a 40% increase). We may also draw the conclusion that the PEB algorithm is more cost effective, because the algorithm achieves its performance without increasing the number of screening events.
Summary of the Ovarian Cancer Microsimulation Model.
For ovarian cancer and CA 125, screening is done in a multimodal manner where CA 125 is the first stage of a screening procedure; a positive CA 125 screen is used to refer women to additional diagnostic work up using transvaginal sonography. All of the women are assigned age at clinical diagnosis with ovarian cancer, stage of the disease at clinical diagnosis, and age at death because of ovarian cancer to match United States experience (SEER cancer data). We also assign each subject an age of death attributable to other causes to match the United States Census Bureau life tables. The microsimulation model then infers a preclinical duration backward in time from the age of clinical diagnosis to its cancer onset. The preclinical duration is assigned a value that depends on the age and the cancer stage at the time of detection. The cancer marker is then assigned growth behavior over the preclinical period in a manner similar to Skates and Pauler (4) and Pauler et al. (12), log (CA 125) has linear growth with random (person-specific) slope. The cancer is given four sequential stages over the preclinical period representing four progressive FIGO cancer stages, with time spent in each stage based on the opinions of experts. Thus, the observable quantities for women when not screened match the United States SEER registry and United States Census tables, and their unobserved quantities match currently understood behavior of the tumor biology.
An annual screening regimen is imposed on the life histories of these women. We calibrate the PEB algorithm to operate at false-positive rate = 0.01, and the transvaginal sonography component of the screen is assumed 100% sensitive for all tumors older than 6 months and 80% sensitive overall (17, 18). If detected earlier by screening, the simulation model uses the age and stage at the screen detection to re-establish the cause-specific life expectancy. The primary benefit of screening comes when a stage shift is achieved. Women diagnosed at an earlier stage are given extended cause-specific life expectancy, and women without a stage shift are not given any benefit from screening. If the new ovarian cancer age of death extends past her assigned other cause of death then she does not die of the disease, and cancer mortality is reduced. Thus, women can be unaffected by screening (if not detected or detected still at advanced stage), have life extended but still die of the disease (if new ovarian cancer death does not extend past other cause of death), or have life extended and die of another cause. Only in this latter case has ovarian cancer-specific mortality been reduced. The model compares the life histories under screening and without screening, and compares the rates of early detection, stage shifts, lead-time, and mortality effects.
Discussion
The usual method for screening with a tumor marker, the ST rule, gives all women the same population-wide threshold for positivity. This threshold is determined by finding a population-wide reference range, or quantile, and defining a positive screen when the marker exceeds that threshold. Using a α quantile gives a rule with that specificity. A marker transformed to a normal distribution has μ + zα
The PEB rule accounts for heterogeneity and attempts to give each woman the same selected level of specificity. This is done by using PEB statistical methods to estimate the normal marker concentration of each person then determining the threshold defined by deviations from his/her normal levels, rather than population-wide deviations. Without showing the details, it can be shown (6) that the PEB rule: (a) always has higher sensitivity than the ST rule with the same specificity; and (b) has the same overall population-wide specificity as the ST rule.
PEB refers to class of statistical procedures for estimating a group of individual means when the individuals are drawn from a common population (19). Casella (20) and Efron and Morris (21) give introductory descriptions of PEB. The PEB estimator of the normal marker mean of a subject is given by μ + (ȳ − μ)Bn, where ȳ is the average of her historical screens. Note that the PEB estimator is part of the threshold expression above and that the PEB estimate is also a function of the shrinkage factor, Bn = τ2/(σ2/n + τ2). The shrinkage factor ranges from zero (when n = 0) to one (when n is large), and the PEB estimator concomitantly ranges from population mean μ to the sample average ȳ of the individual. The PEB estimate comes into closer agreement with the individual history when history is plentiful, but with small n the PEB estimator is closer to the population average. Thus, the PEB estimator allows the individual history to carry a greater voice when that subject has a substantial history but anticipates regression to the mean for subjects with little screening history.
The PEB estimator has two properties that make it useful for screening: (a) it is unbiased for estimating an individual mean level; and (b) it is more precise (has a smaller variance) than the usual estimator does ȳ. In particular, the variance of the PEB estimator is given by (σ2/n)Bn, which is always <σ2/n, the variance of ȳ. The PEB estimator has added precision compared with the sample average, because it uses information shared among the group. This is often called “borrowing of strength” in meta-analysis. Combining these two properties with the variability of a new observation (σ2), the threshold expression above follows. We note here that generating screening rules using the sample average, ȳ, instead of the PEB estimator can lead to screening rules that do worse than the single threshold rule and can never do better than the PEB screening rule (see Ref. 6 for derivation and additional discussion).
Two special cases of the PEB rule are when there is no screening history, n = 0 (and so Bn = 0) and when screening history is plentiful, n = ∞ (and so Bn = 1). With the former, the PEB rule and the ST rule coincide, and with the latter the PEB threshold becomes ȳ + zα
In between the two extremes, when screening history n is small to modest, the PEB rule is a compromise between the population-wide rule and the limiting individual rule. However, Fig. 1 shows that the performance of the limiting rule does not require a large screening history, and large benefits can follow after only a small number of screens. In Fig. 1 the second screening event has a dramatic adjustment to a woman’s threshold, with each subsequent screen providing comparatively smaller gains. Indeed, the screening rules at the fourth screens are practically equivalent to the limiting rule for the value of B used in our example.
The reason why the PEB rule is able to increase sensitivity without changing the population specificity can be seen by examining Fig. 1 more closely. Fig. 1 shows that the PEB thresholds (the lines) diverge from the single threshold rule (horizontal line), depending on the mean screening history; for our example, women with a mean screening history ȳ < 2.6, approximately, are given lower thresholds than the ST rule, and other women have their levels elevated from the ST rule. However, the average women has ȳ = 2.06, far under this level, and so the majority of women will have their threshold lowered if the PEB rule is used. Indeed at the second screen ∼84% of all of the women will have a lower threshold, and on subsequent screen that fraction increases to 88%, 89%, and 95%, n = 2, 3, ∞, respectively. That is, >95% of all women can have their cancer detected when at lower levels than a ST rule can. The reduction in the threshold can be dramatic, with average women receiving a cutoff of ∼17 on the raw scale, nearly half the ST cutoff. The exact fraction that gets a lower threshold is marker dependent and is determined by the intraclass correlation B1; higher values imply more women with lower thresholds. However, for any value of B1 > 0, more than half of the subjects will be given lower thresholds.
Of course, Fig. 1 shows too that a small minority of women have higher thresholds with the PEB rule, and at first it may appear that the PEB rule may not benefit, and may even harm, these women. Additional consideration shows that these women benefit as well, but in a different way. The PEB rule works by controlling the specificity of the screen for each woman. Compare this with the ST rule, which controls only the population specificity. Overall, the PEB and ST rules will produce the same rate of false positives, but with the PEB rule each woman will share this burden equally. For ovarian cancer screening with the CA125 ST rule it is only a small fraction of all women, who are “unlucky” enough to have naturally high CA125 concentrations, that experiences the false positives. In our example it can be shown that 5% of all women are responsible for >80% of all false positives in the population. This burden is great enough that many women may forego screening because of it. The PEB rule automatically adjusts the thresholds so that women share a controlled chance of experiencing a false-positive test, and all share the same ability to detect comparable elevations. This switching from a population to an individual perspective for screening permits a dramatic increase in sensitivity without any increased burden on false-positive rates.
McIntosh and Urban (6) investigate the conditions sufficient for the PEB algorithm to always meet or exceed the performance of the ST. Particular attention is paid to the performance in subjects after one or many false negative screens, in whom our algorithm at first appears to pose a potential problem; could a sequence of false negative screens, by biasing the screening history, mask the evidence of disease at the next screen and delay detection? McIntosh and Urban (6) show that the algorithm is very robust to false negatives and give conditions under which the PEB rule out performs the ST rule even for subjects who experience one or more false negatives. The conditions are quite general and verifiable for each marker; as long as B1 is not small or if the screening history ȳn contains one or more unbiased screens, then overall, subjects with false negatives will fare better with the PEB rule even for diseases that progress very slowly. See McIntosh and Urban (6) for specific details.
The PEB algorithm requires the markers to have a normal distribution. If no convenient mathematical transformation can achieve this then a nonparametric transformation can be made. The simplest way to do this is to tabulate the quantiles of the observed marker values then convert them to the corresponding quantile of a standard normal distribution. Thus, a marker at the 97.5% quantile in the test population would then be given a value of 1.96, and the median value would be converted to 0.0, and so forth. This gives the required behavior but may also require larger amounts of data so that the conversion is not too sparse. If formal inclusion of covariates is needed, this will require quantile regression methods and, thus, will add computational complexity.
The PEB rule presented here does not contain any covariates, but if they predict the normal marker levels of a population, then covariates must be accommodated by the screening rule. Fixed covariates, denoted X, can be accommodated simply by replacing μ in the PEB rule with a covariate adjusted mean μ + Xβ, but note now that the between-person heterogeneity τ2 is now a model-adjusted quantity. Pauler et al. (12) shows how to estimate these quantities specifically for a biomarker. Accommodation of time-dependent covariates (e.g., age at screen) may also be made but requires models beyond the scope of this manuscript. See McIntosh and Urban (6) for details.
The amount of reduction gained by an individual screen depends on the B1 of the marker, with smaller values implying smaller elevations are detectable, and more screens are needed to approach the limiting rule. This suggests strongly that a marker value of B1 is an important determinant of how the marker will perform when used in a cancer-screening program and, thus, should be reported as a routine part of marker assessment. McIntosh and Urban (6) give an example of a hypothetical marker having lower sensitivity than CA 125 in a ST rule, but its larger B1 gives it greater sensitivity when used longitudinally.
The data requirements of the PEB method are practical for the data available and robust to a wide variety of marker behaviors. The specifics of the model are outlined in McIntosh and Urban (6) and should be consulted before applying this method to novel tumor markers. There are a few considerations that should be made before using the PEB rule in a population such as outlined in that report. Specifically, additional investigation should be made if the intraclass correlation B1 is very small, the screening interval is very short compared to the preclinical period, and the marker grows at much less than a linear rate. If all of these conditions hold, the screening rule should be modified slightly, for example, by not using the PEB rule until at least two or three historical screens are available.
The report has omitted any consideration of the sample size needed to estimate the model parameters. This was done to improve clarity. Of course in practice larger numbers of samples will improve the precision of the estimates, but we make no recommendation regarding the number needed to evaluate the PEB algorithm other than to state that using it to investigate the potential performance of a marker needs less data than when implementing it in an actual population. When beginning a screening program we recommend initially evaluating the model parameters on stored sera, then continually evaluating the behavior of the markers in healthy subjects until a substantial number have accumulated to be confident that it can proceed without monitoring. As new data become available, the PEB algorithm (the transformation and the intraclass correlation estimates) may be updated as needed. Because the PEB rule needs only healthy subjects, limits on data availability will not take long to overcome and can allow more rapid development and evaluation of new tumor marker discoveries.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported by National Cancer Institute Grant NCI 1P50 CA 83636 and (to B. K.) Research for Women’s Cancers, Cedars-Sinai Medical Center.
The abbreviations used are: ST, single threshold; PSA, prostate-specific antigen; PEB, Parametric Empirical Bayes; SEER, Surveillance, Epidemiology, and End Results.
ȳ: mean of historical log(CA 125) . | Screen . | . | . | . | . | . | . | . | . | . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | 1st . | 2nd . | 3rd . | 4th . | 5th . | 6th . | 7th . | 8th . | 9th . | 10th . | |||||||||
0.1 | 3.39 | 1.62 | 1.27 | 1.12 | 1.03 | 0.98 | 0.94 | 0.91 | 0.89 | 0.88 | |||||||||
0.2 | 3.39 | 1.69 | 1.35 | 1.2 | 1.12 | 1.07 | 1.03 | 1.01 | 0.99 | 0.97 | |||||||||
0.3 | 3.39 | 1.76 | 1.43 | 1.29 | 1.21 | 1.16 | 1.13 | 1.1 | 1.08 | 1.07 | |||||||||
0.4 | 3.39 | 1.82 | 1.51 | 1.38 | 1.3 | 1.25 | 1.22 | 1.2 | 1.18 | 1.16 | |||||||||
0.5 | 3.39 | 1.89 | 1.59 | 1.46 | 1.39 | 1.34 | 1.31 | 1.29 | 1.27 | 1.26 | |||||||||
0.6 | 3.39 | 1.96 | 1.67 | 1.55 | 1.48 | 1.44 | 1.41 | 1.38 | 1.37 | 1.35 | |||||||||
0.7 | 3.39 | 2.03 | 1.75 | 1.64 | 1.57 | 1.53 | 1.5 | 1.48 | 1.46 | 1.45 | |||||||||
0.8 | 3.39 | 2.10 | 1.83 | 1.72 | 1.66 | 1.62 | 1.59 | 1.57 | 1.55 | 1.54 | |||||||||
0.9 | 3.39 | 2.16 | 1.92 | 1.81 | 1.75 | 1.71 | 1.68 | 1.66 | 1.65 | 1.64 | |||||||||
1.0 | 3.39 | 2.23 | 2.00 | 1.89 | 1.84 | 1.8 | 1.78 | 1.76 | 1.74 | 1.73 | |||||||||
1.1 | 3.39 | 2.30 | 2.08 | 1.98 | 1.93 | 1.89 | 1.87 | 1.85 | 1.84 | 1.83 | |||||||||
1.2 | 3.39 | 2.37 | 2.16 | 2.07 | 2.02 | 1.98 | 1.96 | 1.94 | 1.93 | 1.92 | |||||||||
1.3 | 3.39 | 2.44 | 2.24 | 2.15 | 2.11 | 2.08 | 2.05 | 2.04 | 2.03 | 2.02 | |||||||||
1.4 | 3.39 | 2.50 | 2.32 | 2.24 | 2.20 | 2.17 | 2.15 | 2.13 | 2.12 | 2.11 | |||||||||
1.5 | 3.39 | 2.57 | 2.40 | 2.33 | 2.28 | 2.26 | 2.24 | 2.23 | 2.22 | 2.21 | |||||||||
1.6 | 3.39 | 2.64 | 2.48 | 2.41 | 2.37 | 2.35 | 2.33 | 2.32 | 2.31 | 2.30 | |||||||||
1.7 | 3.39 | 2.71 | 2.56 | 2.50 | 2.46 | 2.44 | 2.43 | 2.41 | 2.40 | 2.40 | |||||||||
1.8 | 3.39 | 2.78 | 2.64 | 2.59 | 2.55 | 2.53 | 2.52 | 2.51 | 2.50 | 2.49 | |||||||||
1.9 | 3.39 | 2.84 | 2.72 | 2.67 | 2.64 | 2.62 | 2.61 | 2.60 | 2.59 | 2.59 | |||||||||
2.0 | 3.39 | 2.91 | 2.81 | 2.76 | 2.73 | 2.72 | 2.70 | 2.69 | 2.69 | 2.68 | |||||||||
2.1 | 3.39 | 2.98 | 2.89 | 2.85 | 2.82 | 2.81 | 2.80 | 2.79 | 2.78 | 2.78 | |||||||||
2.2 | 3.39 | 3.05 | 2.97 | 2.93 | 2.91 | 2.90 | 2.89 | 2.88 | 2.88 | 2.87 | |||||||||
2.3 | 3.39 | 3.12 | 3.05 | 3.02 | 3.00 | 2.99 | 2.98 | 2.98 | 2.97 | 2.97 | |||||||||
2.4 | 3.39 | 3.18 | 3.13 | 3.10 | 3.09 | 3.08 | 3.07 | 3.07 | 3.07 | 3.06 | |||||||||
2.5 | 3.39 | 3.25 | 3.21 | 3.19 | 3.18 | 3.17 | 3.17 | 3.16 | 3.16 | 3.16 | |||||||||
2.6 | 3.39 | 3.32 | 3.29 | 3.28 | 3.27 | 3.26 | 3.26 | 3.26 | 3.25 | 3.25 | |||||||||
2.7 | 3.39 | 3.39 | 3.37 | 3.36 | 3.36 | 3.36 | 3.35 | 3.35 | 3.35 | 3.35 | |||||||||
2.8 | 3.39 | 3.46 | 3.45 | 3.45 | 3.45 | 3.45 | 3.45 | 3.44 | 3.44 | 3.44 | |||||||||
2.9 | 3.39 | 3.52 | 3.53 | 3.54 | 3.54 | 3.54 | 3.54 | 3.54 | 3.54 | 3.54 | |||||||||
3.0 | 3.39 | 3.59 | 3.62 | 3.62 | 3.63 | 3.63 | 3.63 | 3.63 | 3.63 | 3.63 | |||||||||
3.1 | 3.39 | 3.66 | 3.70 | 3.71 | 3.72 | 3.72 | 3.72 | 3.73 | 3.73 | 3.73 | |||||||||
3.2 | 3.39 | 3.73 | 3.78 | 3.80 | 3.81 | 3.81 | 3.82 | 3.82 | 3.82 | 3.82 | |||||||||
3.3 | 3.39 | 3.80 | 3.86 | 3.88 | 3.90 | 3.9 | 3.91 | 3.91 | 3.92 | 3.92 | |||||||||
3.4 | 3.39 | 3.86 | 3.94 | 3.97 | 3.98 | 3.99 | 4.00 | 4.01 | 4.01 | 4.01 | |||||||||
3.5 | 3.39 | 3.93 | 4.02 | 4.06 | 4.07 | 4.09 | 4.09 | 4.10 | 4.10 | 4.11 |
ȳ: mean of historical log(CA 125) . | Screen . | . | . | . | . | . | . | . | . | . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | 1st . | 2nd . | 3rd . | 4th . | 5th . | 6th . | 7th . | 8th . | 9th . | 10th . | |||||||||
0.1 | 3.39 | 1.62 | 1.27 | 1.12 | 1.03 | 0.98 | 0.94 | 0.91 | 0.89 | 0.88 | |||||||||
0.2 | 3.39 | 1.69 | 1.35 | 1.2 | 1.12 | 1.07 | 1.03 | 1.01 | 0.99 | 0.97 | |||||||||
0.3 | 3.39 | 1.76 | 1.43 | 1.29 | 1.21 | 1.16 | 1.13 | 1.1 | 1.08 | 1.07 | |||||||||
0.4 | 3.39 | 1.82 | 1.51 | 1.38 | 1.3 | 1.25 | 1.22 | 1.2 | 1.18 | 1.16 | |||||||||
0.5 | 3.39 | 1.89 | 1.59 | 1.46 | 1.39 | 1.34 | 1.31 | 1.29 | 1.27 | 1.26 | |||||||||
0.6 | 3.39 | 1.96 | 1.67 | 1.55 | 1.48 | 1.44 | 1.41 | 1.38 | 1.37 | 1.35 | |||||||||
0.7 | 3.39 | 2.03 | 1.75 | 1.64 | 1.57 | 1.53 | 1.5 | 1.48 | 1.46 | 1.45 | |||||||||
0.8 | 3.39 | 2.10 | 1.83 | 1.72 | 1.66 | 1.62 | 1.59 | 1.57 | 1.55 | 1.54 | |||||||||
0.9 | 3.39 | 2.16 | 1.92 | 1.81 | 1.75 | 1.71 | 1.68 | 1.66 | 1.65 | 1.64 | |||||||||
1.0 | 3.39 | 2.23 | 2.00 | 1.89 | 1.84 | 1.8 | 1.78 | 1.76 | 1.74 | 1.73 | |||||||||
1.1 | 3.39 | 2.30 | 2.08 | 1.98 | 1.93 | 1.89 | 1.87 | 1.85 | 1.84 | 1.83 | |||||||||
1.2 | 3.39 | 2.37 | 2.16 | 2.07 | 2.02 | 1.98 | 1.96 | 1.94 | 1.93 | 1.92 | |||||||||
1.3 | 3.39 | 2.44 | 2.24 | 2.15 | 2.11 | 2.08 | 2.05 | 2.04 | 2.03 | 2.02 | |||||||||
1.4 | 3.39 | 2.50 | 2.32 | 2.24 | 2.20 | 2.17 | 2.15 | 2.13 | 2.12 | 2.11 | |||||||||
1.5 | 3.39 | 2.57 | 2.40 | 2.33 | 2.28 | 2.26 | 2.24 | 2.23 | 2.22 | 2.21 | |||||||||
1.6 | 3.39 | 2.64 | 2.48 | 2.41 | 2.37 | 2.35 | 2.33 | 2.32 | 2.31 | 2.30 | |||||||||
1.7 | 3.39 | 2.71 | 2.56 | 2.50 | 2.46 | 2.44 | 2.43 | 2.41 | 2.40 | 2.40 | |||||||||
1.8 | 3.39 | 2.78 | 2.64 | 2.59 | 2.55 | 2.53 | 2.52 | 2.51 | 2.50 | 2.49 | |||||||||
1.9 | 3.39 | 2.84 | 2.72 | 2.67 | 2.64 | 2.62 | 2.61 | 2.60 | 2.59 | 2.59 | |||||||||
2.0 | 3.39 | 2.91 | 2.81 | 2.76 | 2.73 | 2.72 | 2.70 | 2.69 | 2.69 | 2.68 | |||||||||
2.1 | 3.39 | 2.98 | 2.89 | 2.85 | 2.82 | 2.81 | 2.80 | 2.79 | 2.78 | 2.78 | |||||||||
2.2 | 3.39 | 3.05 | 2.97 | 2.93 | 2.91 | 2.90 | 2.89 | 2.88 | 2.88 | 2.87 | |||||||||
2.3 | 3.39 | 3.12 | 3.05 | 3.02 | 3.00 | 2.99 | 2.98 | 2.98 | 2.97 | 2.97 | |||||||||
2.4 | 3.39 | 3.18 | 3.13 | 3.10 | 3.09 | 3.08 | 3.07 | 3.07 | 3.07 | 3.06 | |||||||||
2.5 | 3.39 | 3.25 | 3.21 | 3.19 | 3.18 | 3.17 | 3.17 | 3.16 | 3.16 | 3.16 | |||||||||
2.6 | 3.39 | 3.32 | 3.29 | 3.28 | 3.27 | 3.26 | 3.26 | 3.26 | 3.25 | 3.25 | |||||||||
2.7 | 3.39 | 3.39 | 3.37 | 3.36 | 3.36 | 3.36 | 3.35 | 3.35 | 3.35 | 3.35 | |||||||||
2.8 | 3.39 | 3.46 | 3.45 | 3.45 | 3.45 | 3.45 | 3.45 | 3.44 | 3.44 | 3.44 | |||||||||
2.9 | 3.39 | 3.52 | 3.53 | 3.54 | 3.54 | 3.54 | 3.54 | 3.54 | 3.54 | 3.54 | |||||||||
3.0 | 3.39 | 3.59 | 3.62 | 3.62 | 3.63 | 3.63 | 3.63 | 3.63 | 3.63 | 3.63 | |||||||||
3.1 | 3.39 | 3.66 | 3.70 | 3.71 | 3.72 | 3.72 | 3.72 | 3.73 | 3.73 | 3.73 | |||||||||
3.2 | 3.39 | 3.73 | 3.78 | 3.80 | 3.81 | 3.81 | 3.82 | 3.82 | 3.82 | 3.82 | |||||||||
3.3 | 3.39 | 3.80 | 3.86 | 3.88 | 3.90 | 3.9 | 3.91 | 3.91 | 3.92 | 3.92 | |||||||||
3.4 | 3.39 | 3.86 | 3.94 | 3.97 | 3.98 | 3.99 | 4.00 | 4.01 | 4.01 | 4.01 | |||||||||
3.5 | 3.39 | 3.93 | 4.02 | 4.06 | 4.07 | 4.09 | 4.09 | 4.10 | 4.10 | 4.11 |
Performance measure . | \[B_{1}{=}\frac{{\tau}^{2}}{{\sigma}^{2}{+}{\tau}^{2}}\] | . | . | . | . | . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | 0a . | 0.1 . | 0.3 . | 0.5 . | 0.7 . | 0.9 . | |||||
Percentage of clinical cases found by screening | 0.46 | 0.46 | 0.47 | 0.47 | 0.48 | 0.50 | |||||
Mean lead time effect among all cases (years) | 0.80 | 0.80 | 0.82 | 0.85 | 0.89 | 0.93 | |||||
Fraction of cases finding a stage shift | 0.35 | 0.35 | 0.36 | 0.37 | 0.40 | 0.42 | |||||
Mortality reduction (absolute reduction from 77%) | 19% | 19% | 20% | 21% | 24% | 26% | |||||
Mean years of life saved per case (in years) | 2.53 | 2.55 | 2.67 | 2.87 | 3.13 | 3.44 |
Performance measure . | \[B_{1}{=}\frac{{\tau}^{2}}{{\sigma}^{2}{+}{\tau}^{2}}\] | . | . | . | . | . | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
. | 0a . | 0.1 . | 0.3 . | 0.5 . | 0.7 . | 0.9 . | |||||
Percentage of clinical cases found by screening | 0.46 | 0.46 | 0.47 | 0.47 | 0.48 | 0.50 | |||||
Mean lead time effect among all cases (years) | 0.80 | 0.80 | 0.82 | 0.85 | 0.89 | 0.93 | |||||
Fraction of cases finding a stage shift | 0.35 | 0.35 | 0.36 | 0.37 | 0.40 | 0.42 | |||||
Mortality reduction (absolute reduction from 77%) | 19% | 19% | 20% | 21% | 24% | 26% | |||||
Mean years of life saved per case (in years) | 2.53 | 2.55 | 2.67 | 2.87 | 3.13 | 3.44 |
Equals the ST rule.