Background:

Independent validation of risk prediction models in prospective cohorts is required for risk-stratified cancer prevention. Such studies often have a two-phase design, where information on expensive biomarkers are ascertained in a nested substudy of the original cohort.

Methods:

We propose a simple approach for evaluating model discrimination that accounts for incomplete follow-up and gains efficiency by using data from all individuals in the cohort irrespective of whether they were sampled in the substudy. For evaluating the AUC, we estimated probabilities of risk-scores for cases being larger than those in controls conditional on partial risk-scores, computed using partial covariate information. The proposed method was compared with an inverse probability weighted (IPW) approach that used information only from the subjects in the substudy. We evaluated age-stratified AUC of a model including questionnaire-based risk factors and inflammation biomarkers to predict 10-year risk of lung cancer using data from the Prostate, Lung, Colorectal, and Ovarian Cancer (1993–2009) trial (30,297 ever-smokers, 1,253 patients with lung cancer).

Results:

For estimating age-stratified AUC of the combined lung cancer risk model, the proposed method was 3.8 to 5.3 times more efficient compared with the IPW approach across the different age groups. Extensive simulation studies also demonstrated substantial efficiency gain compared with the IPW approach.

Conclusions:

Incorporating information from all individuals in a two-phase cohort study can substantially improve precision of discrimination measures of lung cancer risk models.

Impact:

Novel, simple, and practically useful methods are proposed for evaluating risk models, a critical step toward risk-stratified cancer prevention.

Risk-stratified cancer prevention requires evaluation of risk prediction models in independent prospective studies that do not contribute to model development. Modern cohort studies often collect data on risk factors using a complex two-phase design. Although all subjects contribute information on some questionnaire-based risk factors, information on expensive biomarkers are collected only on a nested substudy of the original cohort, typically stratified with respect to case/control status and certain other covariates observed for all subjects. The methods for the analysis of data from such two-phase studies for fitting various regression models (e.g., logistic regression or Cox model) have been well studied (1–9). There has been, however, limited investigation on efficient utilization of data from two-phase cohort studies for model validation. We focus on evaluating discriminatory ability of the commonly used models for cancer risk prediction that quantify risks of individuals through an underlying risk-score or “linear predictor,” that is, sum of log-relative risks multiplied by the risk factors.

The AUC for evaluating model discrimination can be defined as the probability that the risk-score for a randomly selected subject with the disease is higher than that for a randomly selected subject without the disease. In two-phase studies, one can estimate AUC based only on the second-phase subjects with complete risk factor information using the inverse probability weighted (IPW) approach that accounts for the bias due to complex nonrandom sampling (10–17). Alternative imputation-based approaches could be more efficient (18, 19), but they typically require parametric assumptions for imputation distribution. This is less desirable in model validation studies which are supposed to be empirical in nature.

We propose an alternative method that utilizes information from both the phases of two-phase studies to precisely estimate AUC for evaluating a model predicting risk over a prespecified time interval of τ-years accounting for incomplete follow-up. The subjects not included in the second phase contribute information through a scalar partial risk-score, derived from the observable risk factors. Using this method, we evaluate the AUC of a model incorporating information from questionnaire-based risk factors and four inflammation biomarkers [C-reactive protein (CRP), serum amyloid A (SAA), soluble tumor necrosis factor receptor 2 (sTNFRII), and monokine induced by gamma interferon (CXCL9/MIG)] to predict 10-year risk of lung cancer. We use data from a two-phase cohort study (30,297 ever-smokers, 1,253 patients with lung cancer) from the Prostate, Lung, Colorectal, and Ovarian Cancer (PLCO: 1993–2009) trial.

Lung cancer data

We evaluated the discrimination of a lung cancer risk prediction model that has potential clinical application for identifying subjects for CT-screening. The CT scan procedure uses low-dose radiation from X-ray machines to scan the body in a helical path and produce detailed images of regions inside the body. It has been demonstrated as an effective method for reducing lung cancer mortality compared with chest radiography (20, 21). The US Preventive Services Task Force recommended annual CT-screening for lung cancer in certain risk factor–based subgroups of individuals (22–24). However, screening could be more beneficial if the high-risk subjects can be identified on the basis of individual risk predictions (21, 25–27), using comprehensive models with the major risk factors.

The original model included some questionnaire-based risk factors: age, gender, race, education, BMI, smoking pack-year categories, number of years since stopped smoking cigarettes, number of years smoked, indicator of >1 pack/day, presence/absence of emphysema, and lung cancer family history (28). It was developed using data from the nonscreening arm of the PLCO study, which included approximately 39,000 ever smokers aged 55 to 74 years with questionnaire-based risk factor information (Fig. 1).

Figure 1.

Schematic representation of the design of two-phase study in the real data application. The questionnaire-based risk factors included in the partial risk-score were gender, race, education, BMI, smoking pack-year categories, number of years since stopped smoking cigarettes, number of years smoked, indicator of >1 pack/day, presence/absence of emphysema, and lung cancer family history. Four inflammation biomarkers were considered: C-reactive protein (CRP), serum amyloid A (SAA), soluble tumor necrosis factor receptor 2 (sTNFRII), and monokine induced by gamma interferon (CXCL9/MIG). The nonscreening arm of the PLCO trial was used to estimate relative risks for the questionnaire-based risk factors (28). Two case–control studies, the discovery study (29) and the replication study (30), both nested within the screening arm of the PLCO trial measured the additional inflammation biomarkers. The discovery study and the replication study did not have any participant overlap. The relative risks for inflammation biomarkers, adjusted for questionnaire-based risk factors, were estimated using the discovery study. All participants in the screening arm of the PLCO trial provided information on the questionnaire-based risk factors. The participants selected in the replication case–control dataset provided information on both the questionnaire-based risk factors and inflammation markers. The screening arm of the PLCO trial and replication case–control dataset provided the two-phase study for evaluation of discrimination of a combined model with questionnaire-based risk factors and inflammation biomarkers.

Figure 1.

Schematic representation of the design of two-phase study in the real data application. The questionnaire-based risk factors included in the partial risk-score were gender, race, education, BMI, smoking pack-year categories, number of years since stopped smoking cigarettes, number of years smoked, indicator of >1 pack/day, presence/absence of emphysema, and lung cancer family history. Four inflammation biomarkers were considered: C-reactive protein (CRP), serum amyloid A (SAA), soluble tumor necrosis factor receptor 2 (sTNFRII), and monokine induced by gamma interferon (CXCL9/MIG). The nonscreening arm of the PLCO trial was used to estimate relative risks for the questionnaire-based risk factors (28). Two case–control studies, the discovery study (29) and the replication study (30), both nested within the screening arm of the PLCO trial measured the additional inflammation biomarkers. The discovery study and the replication study did not have any participant overlap. The relative risks for inflammation biomarkers, adjusted for questionnaire-based risk factors, were estimated using the discovery study. All participants in the screening arm of the PLCO trial provided information on the questionnaire-based risk factors. The participants selected in the replication case–control dataset provided information on both the questionnaire-based risk factors and inflammation markers. The screening arm of the PLCO trial and replication case–control dataset provided the two-phase study for evaluation of discrimination of a combined model with questionnaire-based risk factors and inflammation biomarkers.

Close modal

We wanted to evaluate the potential added value of four inflammation biomarkers (CRP, SAA, sTNFRII, and CXCL9/MIG), using data collected in two nested case–control samples within the screening arm of the PLCO study: the discovery sample (960 ever-smokers ages 55–74 years, 500 cases and 460 controls) and the replication sample (929 ever-smokers ages 55–74 years, 468 cases and 461 controls); information on sample selection is available in earlier papers (29, 30). CRP has been shown to be consistently associated with lung cancer (31–33), while the other three were identified as most promising among a larger set of biomarkers investigated in those studies. Relative risk estimates of these biomarkers were obtained, after multivariate adjustment for other questionnaire-based factors, using the discovery sample (Supplementary Table S6). To compute the partial risk score, we used published relative risk estimates (Supplementary Table S6) of questionnaire-based risk factors based on the nonscreening arm participants of the PLCO study (28). These estimates were not adjusted for inflammation biomarkers as the biomarker information was not available for those participants. On the basis of our investigations using discovery sample that had both questionnaire-based risk factors and inflammation biomarkers, the relative risks for the questionnaire-based risk factors changed minimally after adjusting for these biomarkers. We decided to use the published estimates to ensure that the questionnaire-based risk factor information from the screening arm participants were not used for estimating relative risks for those factors.

The current analysis independently evaluates discriminatory accuracy of a model for τ = 10-year risk of lung cancer based on the questionnaire-based risk factors and four inflammation biomarkers within categories based on age at enrollment (<59, 60–64, 65–69, ≥70 years) using the screening arm of the PLCO study and the replication case–control study and accounting for incomplete follow-up. Because neither the questionnaire-based risk factors from the screening arm nor the biomarker information from the replication case–control study was utilized for model building, the resulting two-phase study can be assumed to be independent of model development (Fig. 1).

Partial risk-score approach

Suppose a study includes |${N_{1\ }}$|subjects who developed the disease within a prespecified time interval (i.e., cases) and |${N_0}$| subjects who did not (i.e., controls) with |$N\ = \ {N_1} \+\ {N_0}$|⁠. Let |${{\bf X}}\ $|denote the design vector associated with the risk factors included in a risk prediction model for a binary disease outcome (e.g., lung cancer) and |${{\bf \beta }}$| denote the corresponding vector of log-relative-risk parameters and the “risk-score,” |$S\ = \ {{\bf X\beta }}$| quantifies the association of risk factors with the disease risk. Here, |${{\bf X}}\ $|can include individual risk factors and possibly their interactions as specified by the model. Both the model and the log-relative-risk parameters |${{\bf \beta }}$| are prespecified on the basis of analysis of prior studies and the current study aims to assess discriminatory accuracy of the given model. Some expensive biomarkers may be measured only on a subsample, selected on the basis of case–control status and certain other covariates. The risk factors observed on all the subjects in the cohort are called phase I risk factors and those observed only on the subsample are called phase II risk factors. Accordingly, we can partition |${{\bf X}}$| as |${{\bf X}}\ = {\bm{\ }}( {{{\bf Z}},{{\bf W}}} )$| and |${{\bf \beta }}$| as |${{\bf \beta }}\ = \ ( {{{{\bf \beta }}_{{\bf Z}}},{{{\bf \beta }}_{{\bf W}}}} )$|⁠, where |${{\bf Z}}$| denotes the subvector of |${{\bf X}}$| observed on all subjects in the study, |${{\bf W}}$| denotes the subvector of |${{\bf X}}$| observed only on the subjects in the subsample, and |${{{\bf \beta }}_{{\bf Z}}}$| and |${{{\bf \beta }}_{{\bf W}}}$| are the log-relative-risks associated with |${{\bf Z}}$| and |${{\bf W}}$|⁠, respectively. If the model includes interaction terms among phase I and phase II risk factors, then |${{\bf Z}}$|⁠, by definition, will not include elements for corresponding interaction terms because they are not “observable” at phase I.

The AUC is defined as |${\delta _0}\ = \ {\rm{Pr}}( {{S_1} \ge {S_0}} )$|⁠, where |${S_1}$| and |${S_0}$| are the full risk-scores for a random case and a random control, respectively. Using the observed and missing design vector profiles, we decompose the full risk-scores into a partial risk-score and a missing risk-score: |${S_d}\ = \ {S_{d,{\rm{obs}}}} + \ {S_{d,{\rm{mis}}}}$|⁠, for |$d\ = \ 0,1$|⁠. For subjects included in the second phase, the full risk-scores are observed, but for subjects not included, the partial risk-scores are observed. Let |${R_1}( {{R_0}} )$| be the indicator of inclusion of a case(control) in the second-phase sample. The selection probabilities for second-phase subjects, denoted as |${\pi _1}( {{{{\bf Z}}_{\bf 1}}} )\ = \ \Pr {\rm{(}}{R_1}\ = \ 1\ {\rm{|}}\ {{{\bf Z}}_{\bf 1}})$| for cases and |${\pi _0}( {{{{\bf Z}}_{\bf 0}}} )\ = \ \Pr {\rm{(}}{R_0}\ = \ 1\ {\rm{|}}\ {{{\bf Z}}_{\bf 0}})$| for controls, are assumed known and may depend on some selection variables in the observed design vector profile.

In two-phase studies, the IPW estimator of AUC (15) can be obtained by the following formula:

|${\hat{\delta }_{{\rm{IPW}}}} = \bigg( {\mathop \sum \limits_{i = 1}^{{N_1}}\; \mathop \sum \limits_{j = 1}^{{N_0}} {\frac{{{R_{1i}}}}{{{\pi _1}( {{{{\bf Z}}_{{{\bf 1 i}}}}} )}}}{\frac{{{R_{0j}}}}{{{\pi _0}( {{{{\bf Z}}_{{{\bf 0j}}}}} )}}}I( {{S_1} \ge {S_0}} )} \bigg) \bigg / \bigg. \bigg( {\mathop \sum \limits_{i = 1}^{{N_1}}\; \mathop \sum \limits_{j = 1}^{{N_0}} {\frac{{{R_{1i}}}}{{{\pi _1}( {{{{\bf Z}}_{{{\bf1 i}}}}} )}}}{\frac{{{R_{0j}}}}{{{\pi _0}( {{{{\bf Z}}_{{{\bf 0j}}}}} )}}} \bigg)$|

The IPW approach may be inefficient as it discards the partial risk factor information from the subjects not included in the second phase subsample. To improve efficiency, we propose solving an alternative estimating equation: |$\mathop \sum \limits_{i\ = \ 1}^{{N_1}} \;\mathop \sum \limits_{j\ = \ 1}^{{N_0}} ({\delta _{ij}} - \ {\delta _0}) = 0$|⁠, where

formula

The above decomposition of |${\delta _{ij}}$| into four components correspond to the inclusion status of the different case–control pairs to the phase II subsample. Each term of Equation 1 includes a conditional probability that the risk-score for a case is greater than that of the control, given the partial risk-scores for the case–control pairs. We consider fine categories (e.g., deciles) of the partial risk-score and estimate these conditional probabilities empirically based on the second phase subjects using sampling weights (details in supplement). The proposed two-phase estimator is given by:

formula

where |${\hat{\delta }_{ij}}$| is obtained by plugging in the estimators of conditional probabilities in equation 1. Two assumptions ensure consistency of the proposed estimator: (i) the selection of subjects to the second phase is conditionally independent of the phase II risk factors given the phase I risk factors, and (ii) the disease status and the phase II risk factors are conditionally independent of any sampling selection variables given the partial risk-score. We have derived an influence-function based variance formula of the proposed estimator (details in supplement). The R code implementing these estimators with examples and illustrations on using the code is available in GitHub (https://github.com/parichoy/two_phase_AUC).

Handling loss to follow-up

In cohort studies, incomplete follow-up can cause misclassification of binary disease status. Here we propose a modification of the method to evaluate the AUC of a model predicting risk over a fixed time period, τ-years, accounting for incomplete follow-up. The subjects who developed disease within τ-years were considered as cases and those who did not were designated as controls. Incomplete follow-up is accounted for by computing the adjusted two-phase and IPW estimators, denoted by |${\hat{\delta }_{{\rm{TPS,adj}}}}$| and |${\hat{\delta }_{{\rm{IPW,adj}}}}$|⁠, based on all the cases and the subset of controls with at least τ-years of follow-up and adjusted for this selection using inverse of the probability that a subject is followed up for at least τ-years (details in supplement). This probability, |${\pi _F}( Z ) = P{\rm{[}}F \ge \tau \ {\rm{|}}\ Z]$|⁠, where |$F$| denotes the observed follow-up, can be computed using a logistic regression model with the phase I risk factors as predictors. SEs were appropriately adjusted to account for the additional uncertainty introduced by these random weights (details in supplement).

Simulation methods

Data for a cohort study with 50,000 individuals were simulated on the basis of a model including eight independent risk factors: |${X_1},{X_2},{X_3},{X_4}$| are continuously distributed as standard normal variates, and |${X_5},{X_6},{X_7},{X_8}$| are binary, distributed as Bernoulli random variables each with probability |$P = 0.5$|⁠. Let |${{\bf X}} = ( {{X_1}, \ldots ,\ {X_8}} )$| and |${{\bf \beta }}$| be the associated log-relative-risk set to |${{\bf \beta }} = {( {\log ( {1.1} ),\log ( {1.1} ),$||$\log ( {1.1} ),\, \, {\log ( {1.1} ),\, \, \log ( {1.2} ),\,\, \log ( {1.2} ),\ - \log ( {1.2} ),\ - \log ( {1.2} )} )^{{T}}}.\ $| The age of disease onset |$T$| was generated from the Cox model: |$\lambda ( {t{\rm{|}}{{\bf X}}} ) = \ {\lambda _0}( t ){\rm{exp}}( {{{\bf X\beta }}} )$|⁠, with |${\lambda _0}( t ) = \lambda \gamma {t^{\gamma - 1}}$| being the hazard function of a |$Weibull( {\lambda ,\gamma } )$| distribution. The parameters |$\lambda $| and |$\gamma $| were chosen such that the probabilities of developing the disease by age |$50$| and |$70$| years were approximately |$5\% $| and |$12\% $|⁠, respectively. We generated age of entry and observed follow-up from separate discrete uniform distributions between 40 and 60 years and 19 and 21 years, respectively. We assume that each subject is disease-free at study entry.

We use alternative designs to select a nested case–control sample from the full cohort and assume that data on two risk factors, |${X_7},{X_8}$|⁠, are available only on the subsample. The sampling designs considered were (i) simple case–control sampling, that is, selecting random samples of cases and controls from the cohort, and (ii) stratified case–control sampling, that is, selecting cases and controls by frequency matching at the deciles of the partial risk-score. In each design, we vary the fraction of cases |$( \eta )$| sampled over the range 1, 0.75, 0.5, 0.25 keeping the case–control ratio approximately |$1\!:\!1$|⁠. The selection probabilities of the controls are empirically estimated from each simulation (details in supplement). We define |$f$|⁠, as the ratio of the variance of the partial risk-score |$( {{S_{{\rm{obs}}}}} )$| and the variance of the full risk-score |$( S )$| and use it as an index to quantify the relative importance of the phase I risk factors compared with the entire set of risk factors. For our chosen |${{\bf \beta }}$|⁠, we have |$f \approx 0.75$|⁠. We vary |$f$| in the set 0.1, 0.2, 0.4, 0.5, 0.75 by modifying the values of the Cox model parameters.

For each simulation setting, we generate 1,000 cohort datasets with complete risk factor information and compute the AUC from each dataset as the empirical proportion of all possible case–control pairings for which the risk-score for the case is greater than that of the control. The average of the 1,000 AUC values was considered as the true AUC of the underlying model. Finite sample performance of the proposed estimator was assessed, and its efficiency was evaluated relative to the IPW estimator.

Impact of incomplete follow-up

We performed additional simulations to assess the impact of incomplete follow-up on the estimation of AUC for a model predicting risk over 10 years. We slightly modified our simulation setting to generate a cohort of 50,000 subjects, all of whom were assumed to be followed up for at least |$10\ $|years and a subject was classified as case/control depending on whether (s)he developed the disease by 10 years. Since all subjects in this setting were followed beyond the specified time interval, the target AUC could be unambiguously defined as the probability of the risk-score for the cases being greater than that of the controls. We considered scenarios with moderate association of risk factors with disease (i.e., setting log-relative risk at |${\rm{\beta }}$| specified in simulation setting) and strong association of risk factors with disease (i.e., setting log-relative risk at |$3 \times {\rm{\beta }}$|⁠).

In the same simulation setting, we then introduced random variation in the follow-up time and first considered scenarios where it did not depend on risk factors. We generated observed follow-up from the truncated normal distribution between 0 and 20 years with mean 12 years and standard deviation 4 years. We also considered scenarios where observed follow-up depended on risk factors, by adding a linear predictor term to the mean of truncated normal distribution. The percentage of variability of observed follow-up explained by this linear predictor was set at 1, 5, 20. We used simple case–control sampling with fraction of cases 0.5 and a case–control ratio 1:1 to generate the nested case–control study. In this setting, we evaluated the bias and standard error (over 1,000 simulations) of the unadjusted two-phase estimator, where case–control status was defined on the basis of whether a subject developed the disease within 10 years, ignoring incomplete follow-up. We also evaluated an adjusted two-phase estimator that uses information on all the cases diagnosed within 10 years and a subset of the controls with at least 10 years of follow-up and accounted for this selection using the inverse of probability of follow-up beyond 10 years.

Lung cancer example

The median follow-up time of the smokers in the PLCO screening arm was nearly 12 years with an interquartile range of 2.6 years. They were categorized on the basis of age at enrollment: ≤59 years, 60–64 years, 65–69 years, ≥70 years. The risk-score explained less than 1% variability of the observed follow-up in each subcohort. Table 1 shows the AUC estimates and the SEs in these subcohorts based on three combinations of risk factors: (i) questionnaire-based risk factors only, (ii) questionnaire-based risk factors and CRP, and (iii) questionnaire-based risk factors and all four inflammation biomarkers. For computing the proposed AUC estimator, we stratified the partial risk-score into six categories based on sextiles. For combination (i), we evaluated the standard AUC estimator accounting for incomplete follow-up based on all smokers who developed lung cancer within 10 years or were followed up for at least 10 years in the PLCO screening arm (phase I sample). The AUC estimates for each type of model were lower in the oldest age group compared with the other groups. For combinations (ii) and (iii), comparison of the proposed and IPW estimators showed that although point estimates were similar within the limits of uncertainty, the precision of the proposed estimator was much higher across age categories. Moreover, the inclusion of inflammation biomarkers led to some improvement of AUC in the older age groups.

Table 1.

Age category–specific AUC estimates and SEs for predicting 10-year risk in three alternative risk models: (i) questionnaire-based risk factors only, (ii) questionnaire-based risk factors and CRP, (iii) questionnaire-based and four inflammation markers using subjects from PLCO (1993–2009) who were diagnosed with lung cancer within 10 years or who were followed up at least 10 years and accounting for selection using the inverse of the probability that a subject is followed up for at least 10 years.

AUC estimate (SE)
Number of subjects (number of cases)Questionnaire-based risk factors + CRPQuestionnaire-based risk factors + four inflammation biomarkers
Age (in years)Phase IaPhase IIbQuestionnaire-based risk factors onlycTPSIPWTPSIPW
≤59 10,616 (247) 189 (101) 0.792 (0.013) 0.81 (0.017) 0.791 (0.043) 0.791 (0.02) 0.778 (0.046) 
60–64 9,793 (363) 255 (135) 0.762 (0.012) 0.76 (0.016) 0.758 (0.032) 0.759 (0.017) 0.752 (0.033) 
65–69 6,693 (402) 251 (137) 0.789 (0.011) 0.808 (0.014) 0.796 (0.038) 0.811 (0.017) 0.799 (0.038) 
≥70 3,195 (241) 148 (83) 0.713 (0.016) 0.725 (0.019) 0.723 (0.05) 0.738 (0.021) 0.736 (0.048) 
Total 30,297 (1,253) 843 (456)      
AUC estimate (SE)
Number of subjects (number of cases)Questionnaire-based risk factors + CRPQuestionnaire-based risk factors + four inflammation biomarkers
Age (in years)Phase IaPhase IIbQuestionnaire-based risk factors onlycTPSIPWTPSIPW
≤59 10,616 (247) 189 (101) 0.792 (0.013) 0.81 (0.017) 0.791 (0.043) 0.791 (0.02) 0.778 (0.046) 
60–64 9,793 (363) 255 (135) 0.762 (0.012) 0.76 (0.016) 0.758 (0.032) 0.759 (0.017) 0.752 (0.033) 
65–69 6,693 (402) 251 (137) 0.789 (0.011) 0.808 (0.014) 0.796 (0.038) 0.811 (0.017) 0.799 (0.038) 
≥70 3,195 (241) 148 (83) 0.713 (0.016) 0.725 (0.019) 0.723 (0.05) 0.738 (0.021) 0.736 (0.048) 
Total 30,297 (1,253) 843 (456)      

Abbreviations: CRP, C-reactive protein; IPW, inverse probability weighted estimator; TPS, two-phase estimator.

aPhase I subjects are the ever-smokers ages 55–74 years in the screening arm of the PLCO trial who were diagnosed with lung cancer within 10 years or who were followed up for at least 10 years.

bPhase II subjects are those phase I subjects included in the replication study (nested case–control study within PLCO screening arm).

cFor questionnaire-based risk factor only model, the standard AUC estimator based on phase I sample is reported after accounting for incomplete follow-up using the inverse of the probability that a subject is followed for at least 10 years.

Simulation results

Table 2 shows the simulation results evaluating the performance of the proposed and the IPW estimators under alternative sampling schemes with sampling fraction for cases η = 1, 0.5, 0.25 and |$f = 0.75$|⁠. In all the scenarios, both the estimators had very small bias and the confidence intervals, constructed using influence function-based variance estimates, achieved the nominal 95% level. The percent bias in the SE estimate was also very small. Moreover, the proposed estimator was much more efficient than the IPW estimator. When the second-phase sample included all the cases and a random sample of the controls, the proposed approach led to nearly 50% efficiency gain. Compared with simple case–control sampling, stratified case–control sampling of the second-phase subjects led to modest efficiency loss. Both the estimators performed similarly in settings where binary risk factors had probabilities less than 0.5 (Supplementary Tables S1 and S2) and when a continuous risk factor X4 and a binary risk factor X5 was available only in phase II sample (Supplementary Table S5).

Table 2.

Simulation studies evaluating performance of the proposed and the IPW estimators of AUC estimation under two-phase studies for f = 0.75, that is, 75% of the variability of the full risk-score explained by the partial risk-score.a

BiasEmpirical SE (×103)bAverage of SE estimate (×103)cCoverage
Sampling schemeTPSIPWTPSIPWTPSIPWTPSIPW
Simple case–control sampling (case–control ratio 1:1) 
 Sampling fraction of cases = 1 9.56 × 10−5 0.0002 5.1 6.3 5.3 6.5 0.96 0.94 
 Sampling fraction of cases = 0.5 0.0002 −4.5 × 10−5 6.2 9.1 6.3 9.2 0.95 0.96 
 Sampling fraction of cases = 0.25 0.0002 0.0004 7.5 12.9 7.9 13 0.96 0.95 
Stratified case–control sampling (case–control ratio 1:1)         
 Sampling fraction of cases 1 7.08 × 10−5 −6.39 × 10−5 5.6 6.8 5.3 6.6 0.94 0.94 
 Sampling fraction of cases 0.5 5.74 × 10−5 0.0001 6.2 6.3 9.3 0.96 0.96 
 Sampling fraction of cases 0.25 −0.0002 0.0001 7.6 13.1 7.9 13.1 0.96 0.95 
BiasEmpirical SE (×103)bAverage of SE estimate (×103)cCoverage
Sampling schemeTPSIPWTPSIPWTPSIPWTPSIPW
Simple case–control sampling (case–control ratio 1:1) 
 Sampling fraction of cases = 1 9.56 × 10−5 0.0002 5.1 6.3 5.3 6.5 0.96 0.94 
 Sampling fraction of cases = 0.5 0.0002 −4.5 × 10−5 6.2 9.1 6.3 9.2 0.95 0.96 
 Sampling fraction of cases = 0.25 0.0002 0.0004 7.5 12.9 7.9 13 0.96 0.95 
Stratified case–control sampling (case–control ratio 1:1)         
 Sampling fraction of cases 1 7.08 × 10−5 −6.39 × 10−5 5.6 6.8 5.3 6.6 0.94 0.94 
 Sampling fraction of cases 0.5 5.74 × 10−5 0.0001 6.2 6.3 9.3 0.96 0.96 
 Sampling fraction of cases 0.25 −0.0002 0.0001 7.6 13.1 7.9 13.1 0.96 0.95 

Abbreviations: IPW, inverse probability weighted estimator; TPS, two-phase estimator.

aApproximately 92% of study subjects are censored. The true value of AUC based on all the risk factors in the underlying model is 0.577 and the empirical SE based on 1,000 simulated cohort datasets is 0.0048. Results are shown under alternative sampling schemes with varying sampling probabilities and a fixed value of f = 0.75. The true AUC based on the partial risk score is 0.567.

bEmpirical SE (×103): 103 times the empirical SE of estimated AUC over 1,000 simulations.

cAverage of SE estimate (×103): 103 times the mean of the estimated standard errors across 1,000 simulations. SEs are estimated using the influence function-based variance estimator.

Figure 2 shows the relative efficiency of the proposed estimator compared with the IPW estimator as a function of |$\eta $| under simple case–control sampling of the second-phase subjects. For fixed f, the relative efficiency increased as η decreased because the IPW estimator failed to incorporate information from the increasing number of unselected subjects. Moreover, for fixed η, the relative efficiency increased with increase in f , as the partial risk-score explained a larger proportion of the variability of the full risk-score and the proposed estimator gained efficiency by using the partial risk-score from phase I subjects (Tables 2 and 3).

Figure 2.

Relative efficiency of our proposed estimator compared with the IPW estimator as a function of the fraction of cases sampled under simple case–control sampling of the second-phase subjects. Different lines correspond to different values of f characterizing different proportions of the variability of the full risk-score explained by the partial risk-score.

Figure 2.

Relative efficiency of our proposed estimator compared with the IPW estimator as a function of the fraction of cases sampled under simple case–control sampling of the second-phase subjects. Different lines correspond to different values of f characterizing different proportions of the variability of the full risk-score explained by the partial risk-score.

Close modal
Table 3.

Simulation studies evaluating performance of the proposed and the IPW estimators of AUC estimation under two-phase studies for f = 0.5, that is, 50% of the variability of the full risk-score explained by the partial risk-score.a

BiasEmpirical SE (×103)bAverage of SE estimate (×103)cCoverage
Sampling schemeTPSIPWTPSIPWTPSIPWTPSIPW
Simple case–control sampling (case–control ratio 1:1) 
 Sampling fraction of cases = 1 5.85 × 10−5 0.0002 6.7 5.8 6.5 0.94 0.94 
 Sampling fraction of cases = 0.5 0.0004 0.0005 7.4 9.2 7.6 9.2 0.96 0.95 
 Sampling fraction of cases = 0.25 0.0002 0.0005 10.8 13.2 10.3 13.1 0.95 0.95 
Stratified case–control sampling (case–control ratio 1:1) 
 Sampling fraction of cases 1 0.0001 0.0002 5.7 6.3 5.8 6.6 0.95 0.95 
 Sampling fraction of cases 0.5 0.0001 2.36 × 10−5 7.6 9.2 7.6 9.3 0.95 0.96 
 Sampling fraction of cases 0.25 0.0003 5.95 × 10−5 10.3 13 10.3 13.1 0.96 0.96 
BiasEmpirical SE (×103)bAverage of SE estimate (×103)cCoverage
Sampling schemeTPSIPWTPSIPWTPSIPWTPSIPW
Simple case–control sampling (case–control ratio 1:1) 
 Sampling fraction of cases = 1 5.85 × 10−5 0.0002 6.7 5.8 6.5 0.94 0.94 
 Sampling fraction of cases = 0.5 0.0004 0.0005 7.4 9.2 7.6 9.2 0.96 0.95 
 Sampling fraction of cases = 0.25 0.0002 0.0005 10.8 13.2 10.3 13.1 0.95 0.95 
Stratified case–control sampling (case–control ratio 1:1) 
 Sampling fraction of cases 1 0.0001 0.0002 5.7 6.3 5.8 6.6 0.95 0.95 
 Sampling fraction of cases 0.5 0.0001 2.36 × 10−5 7.6 9.2 7.6 9.3 0.95 0.96 
 Sampling fraction of cases 0.25 0.0003 5.95 × 10−5 10.3 13 10.3 13.1 0.96 0.96 

Abbreviations: IPW, inverse probability weighted estimator; TPS, two-phase estimator.

aApproximately 92% of study subjects are censored. The true value of AUC of the underlying model is 0.596 and the empirical SE based on 1,000 simulated cohort datasets is 0.0047. Results are shown under alternative sampling schemes with varying sampling probabilities and a fixed value of f = 0.5; the true AUC based on the partial risk score is 0.568.

bEmpirical SE (×103): 103 times the empirical SE of estimated AUC over 1,000 simulations.

cAverage of SE estimate (×103): 103 times the mean of the estimated SEs across 1,000 simulations. SEs are estimated using the influence function-based variance estimator.

Table 4 shows the impact of incomplete follow-up in the estimation of AUC for a model predicting 10-year risk using the proposed method. Incomplete follow-up created noticeable bias in the unadjusted two-phase estimator only when the underlying disease risk-score was fairly strongly associated with the follow-up time (i.e., dependent censoring). When the risk-score was strongly predictive of the disease and the levels of AUCs were high (e.g., 0.714), the percentage biases were small. But when the risk-score was only weakly predictive of the disease and the level of AUCs were moderate (e.g., 0.575), the percentage biases were more notable. In such scenarios, the adjusted two-phase estimator that accounts for incomplete follow-up had a lower bias but had slightly bigger variances compared with the unadjusted two-phase estimator that ignored incomplete follow-up. The SEs for both estimators were slightly higher in settings with moderate discrimination as opposed to high discrimination.

Table 4.

Simulation studies showing the impact on the AUC estimatea for a model predicting 10-year risk due to incomplete follow-upb of the subjects in the cohort.

Moderate association of risk factors on diseaseStrong association of risk factors on disease
Targeta 10-yearTargeta 10-year
AUC = 0.575AUC = 0.714
Percentage of variability of observed follow-up explained by risk-scorebUnadjusted for censoringAdjusted for censoringUnadjusted for censoringAdjusted for censoring
0.575 (0.0111) 0.575 (0.0107) 0.712 (0.0101) 0.712 (0.0101) 
0.578 (0.0111) 0.575 (0.011) 0.715 (0.01) 0.712 (0.0101) 
0.582 (0.011) 0.574 (0.011) 0.718 (0.0099) 0.712 (0.0099) 
20 0.589 (0.011) 0.573 (0.0115) 0.723 (0.0099) 0.71 (0.0105) 
Moderate association of risk factors on diseaseStrong association of risk factors on disease
Targeta 10-yearTargeta 10-year
AUC = 0.575AUC = 0.714
Percentage of variability of observed follow-up explained by risk-scorebUnadjusted for censoringAdjusted for censoringUnadjusted for censoringAdjusted for censoring
0.575 (0.0111) 0.575 (0.0107) 0.712 (0.0101) 0.712 (0.0101) 
0.578 (0.0111) 0.575 (0.011) 0.715 (0.01) 0.712 (0.0101) 
0.582 (0.011) 0.574 (0.011) 0.718 (0.0099) 0.712 (0.0099) 
20 0.589 (0.011) 0.573 (0.0115) 0.723 (0.0099) 0.71 (0.0105) 

aThe target AUCs over a 10-year time interval were based on simulated cohort where all subjects were followed for at least 10 years.

bWe generated follow-up time such that subjects can be censored prior to specified time interval. We allowed the follow-up time to potentially depend on the disease risk-score. We show the average and SE (over 1,000 simulated cohort datasets) of the unadjusted two-phase estimator that ignores the incomplete nature of follow-up. In the same setting, we also report the average and SE (over 1,000 simulated cohort datasets) of the adjusted two-phase estimator that uses information from all the cases and a subset of the controls with at least 10 years of follow-up and accounts for selection using the inverse of probability of a control subject being followed up for at least 10 years.

Modern cohort studies of cancer often collect information on expensive biomarkers in a subsample of subjects, leading to a two-phase data structure. Two-phase data structures can also arise in complex observational data settings, for example, with electronic health records in integrated healthcare systems (34). In such settings, we propose a simple estimator of AUC that gains efficiency by combining complete risk factor information from the second-phase sample and partial risk factor information from subjects not included in the second phase and accounts for incomplete follow-up of subjects. This study demonstrates the enhanced precision of the proposed estimator compared with an IPW approach that uses information only from the subsample. This implies that AUC estimation based on a single validation study using the proposed method will be more likely to be closer to the truth.

The AUC measures how well a model discriminates between cases and controls. It may not alone be adequate to assess suitability of models for clinical applications in risk-stratified cancer prevention. A model needs to be also assessed for calibration, that is, whether it can produce unbiased estimate of risk (35–37). Moreover, there are other measures of discriminatory performance that can more directly measure clinical utility of models (38–40). Although we focus only on the popular metric AUC, the framework can be generalized for evaluating other metrics assessing model calibration and discrimination in two-phase studies.

Our work is focused on evaluating a given risk prediction model with the regression parameters supplied externally, a problem intrinsically different from efficient estimation of regression parameters. Intuitively, the former involves efficient evaluation of univariate scalar risk-scores derived from potentially numerous risk factors. However, such dimension reduction may not be very efficient in the latter problem. Because of the intrinsic difference in the target quantity of interest, most of the existing methods (e.g., ref. 41) are not directly applicable in our setting. Some existing methods consider study settings with “verification” bias where the disease diagnosis itself may be available on a subset of subjects in the main study (42–51). Although it is possible that some of underlying concepts are transportable, these methods are not directly applicable in our setting with missing data on risk factors.

We considered simple binary classification of disease status over a fixed prediction interval, for example, whether a patient is diagnosed with lung cancer in 10 years. Future research should explore possible extensions of this method to estimate time-dependent measures of risk model discrimination. For example, the method proposed by Zheng and colleagues (17), to estimate time-dependent AUC essentially employs IPW analysis of the phase II data, with the sampling probability being estimated nonparametrically using phase I data to achieve the efficiency gain. However, in settings with numerous phase I covariates as is common in studies of cancer epidemiology, estimation of sampling probabilities, nonparametrically, conditional on all the phase I covariates, will be infeasible. Our proposed method, which uses the partial risk-score for dimension reduction, in principle, can also be utilized in such time-to-event setting, but this requires more detailed investigation.

The estimation of conditional probabilities accounts for the nonrandom sampling of second-phase subjects, assuming that the sampling probabilities are known by design. Simulation studies show that estimation of sampling probability, as opposed to using known values, does not alter the precision of AUC estimation in the proposed method (Supplementary Table S3). In studies of cancer epidemiology, where design information is unknown, it may be required to estimate these probabilities under parametric assumptions. Double robust estimation (e.g., ref. 52) could be more relevant in such settings as it guarantees unbiasedness if the selection mechanism or the distribution of missing risk factors is correctly specified.

Violation of assumptions underlying our method can lead to bias. In the lung cancer example, there was evidence of violation of the assumption that the disease status and the phase II risk factors are conditionally independent of any sampling selection variables given the partial risk-score. Simulation studies, however, showed moderate bias even under explicit violation of assumption by incorporating “matching factors,” strongly related to disease and certain risk factors, but themselves not part of the risk-score (Supplementary Table S4). One can consider stratified estimation of model evaluation statistics within categories of matching factors.

The empirical estimates of conditional probabilities based on binning into categories involves subjectivity in the choice of the number of categories (e.g., deciles) and is not fully nonparametric. Since our approach, based on scalar partial risk-scores, already involves dimension reduction, one can construct such estimates based on kernel smoothing. Under suitable regularity conditions (53–55), the two estimators will be asymptotically equivalent when the window/bin lengths are allowed to decrease at suitable rates, but in small samples the estimator based on kernel smoothing could be more efficient.

Our simulation results indicate that when the second-phase subjects are selected by frequency-matching of cases and controls within deciles of partial risk-score as opposed to simple case–control sampling, there is no efficiency gain of the two-phase estimator compared with the IPW estimator. Frequency-matching of cases and controls, within categories of a matching factor, may lead to increased precision in estimating associations of certain risk factors (e.g., exposure of interest). However, loss of information on the association between the matching factor and disease at the design stage may result in inefficiency in the estimation of overall measures of model validation (e.g., AUC) that requires robust estimation of parameters associated with all the risk factors, including potential matching factors.

In summary, we have proposed a novel, simple to implement and practically useful method for enhancing precision of discrimination measures of risk prediction. This is particularly useful in studies of risk model validation, a critical step toward risk-stratified cancer prevention.

No potential conflicts of interest were disclosed.

Conception and design: P. Pal Choudhury, N. Chatterjee

Development of methodology: P. Pal Choudhury, N. Chatterjee

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.K. Chaturvedi

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): P. Pal Choudhury, A.K. Chaturvedi

Writing, review, and/or revision of the manuscript: P. Pal Choudhury, A.K. Chaturvedi, N. Chatterjee

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): P. Pal Choudhury

Study supervision: P. Pal Choudhury, N. Chatterjee

The first author would like to thank Dr. Mustapha Abubakar (DCEG, NCI) for his helpful comments in improving the presentation of manuscript. The works of P. Pal Choudhury and N. Chatterjee were supported by the Patient-Centered Outcomes Research Institute (PCORI) Award (ME-1602-34530). The work of A.K. Chaturvedi was supported by the Intramural Research Program, Division of Cancer Epidemiology and Genetics, NCI, NIH, Department of Health and Human Services. The statements and opinions in this article are solely the responsibility of the authors and do not necessarily represent the views of the PCORI, its Board of Governors, or Methodology Committee or the NCI, NIH, or the Department of Health and Human Services.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Prentice
RL
. 
A case-cohort design for epidemiologic cohort studies and disease prevention trials
.
Biometrika
1986
;
73
:
1
11
.
2.
Breslow
NE
,
Cain
KC
. 
Logistic regression for two-stage case-control data
.
Biometrika
1988
;
75
:
11
20
.
3.
Langholz
B
,
Borgan
OR
. 
Counter-matching: a stratified nested case-control sampling method
.
Biometrika
1995
;
82
:
69
79
.
4.
Reilley
M
,
Pepe
MS
. 
A mean score method for missing and auxiliary covariate data in regression models
.
Biometrika
1995
;
82
:
299
314
.
5.
Scott
AJ
,
Wild
CJ
. 
Fitting regression models to case-control data by maximum likelihood
.
Biometrika
1997
;
84
:
57
71
.
6.
Langholz
B
,
Borgan
OR
. 
Estimation of absolute risk from nested case-control data
.
Biometrics
1997
;
53
:
767
74
.
7.
Breslow
NE
,
Holubkov
R
. 
Maximum likelihood estimation of logistic regression parameters under two-phase, outcome-dependent sampling
.
J R Stat Soc Ser B
1997
;
59
:
447
61
.
8.
Chatterjee
N
,
Chen
Y
,
Breslow
NE
. 
A pseudoscore estimator for regression problems with two-phase sampling
.
J Am Stat Assoc
2003
;
98
:
158
68
.
9.
Weinberg
CR
,
Wacholder
S
. 
The design and analysis of case-control studies with biased sampling
.
Biometrics
1990
;
46
:
963
75
.
10.
Horvitz
DG
,
Thompson
DJ
. 
A generalization of sampling without replacement from a finite universe
.
J Am Stat Assoc
1952
;
47
:
663
85
.
11.
Cai
T
,
Zheng
Y
. 
Evaluating prognostic accuracy of biomarkers in nested case–control studies
.
Biostatistics
2012
;
13
:
89
100
.
12.
Zheng
Y
,
Cai
T
,
Pepe
MS
. 
Adopting nested case-control quota sampling designs for the evaluation of risk markers
.
Lifetime Data Anal
2013
;
19
:
568
88
.
13.
Yao
W
,
Li
Z
,
Graubard
BI
. 
Estimation of ROC curve with complex survey data
.
Stat Med
2015
;
34
:
1293
303
.
14.
Zhou
QM
,
Zheng
Y
,
Chibnik
LB
,
Karlson
EW
,
Cai
T
. 
Assessing incremental value of biomarkers with multi-phase nested case-control studies
.
Biometrics
2015
;
71
:
1139
49
.
15.
Huang
Y
. 
Evaluating and comparing biomarkers with respect to the area under the receiver operating characteristics curve in two-phase case–control studies
.
Biostatistics
2016
;
17
:
499
522
.
16.
Payne
R
,
Yang
M
,
Zheng
Y
,
Jensen
MK
,
Cai
T
. 
Robust risk prediction with biomarkers under two-phase stratified cohort design
.
Biometrics
2016
;
72
:
1037
45
.
17.
Zheng
Y
,
Brown
M
,
Lok
A
,
Cai
T
. 
Improving efficiency in biomarker incremental value evaluation under two-phase designs
.
Ann Appl Stat
2017
;
11
:
638
54
.
18.
Long
Q
,
Zhang
X
,
Hsu
CH
. 
Nonparametric multiple imputation for receiver operating characteristics analysis when some biomarker values are missing at random
.
Stat Med
2011
;
30
:
3149
61
.
19.
Liu
X
,
Zhao
Y
. 
Semi-empirical likelihood inference for the ROC curve with missing data
.
J Stat Plan Inference
2012
;
142
:
3123
33
.
20.
National Lung Screening Trial Research Team
. 
Reduced lung-cancer mortality with low-dose computed tomographic screening
.
N Engl J Med
2011
;
365
:
395
409
.
21.
Gould
MK
. 
Lung-cancer screening with low-dose computed tomography
.
N Engl J Med
2014
;
371
:
1813
20
.
22.
Moyer
VA
. 
Screening for lung cancer: US Preventive Services Task Force recommendation statement
.
Ann Intern Med
2014
;
160
:
330
8
.
23.
de Koning
HJ
,
Meza
R
,
Plevritis
SK
,
Haaf
Kt
,
Munshi
VN
,
Jeon
J
, et al
Benefits and harms of computed tomography lung cancer screening strategies: a comparative modeling study for the US Preventive Services Task Force
.
Ann Intern Med
2014
;
160
:
311
20
.
24.
McMahon
PM
,
Mezal
R
,
Plevritis
SK
,
Black
WC
,
Tammemagi
CM
,
Erdogan
A
, et al
Comparing benefits from many possible computed tomography lung cancer screening programs: extrapolating from the national lung screening trial using comparative modeling
.
PLoS One
2014
;
9
:
e99978
.
25.
Bach
PB
,
Gould
MK
. 
When the average applies to no one: personalized decision making about potential benefits of lung cancer screening
.
Ann Intern Med
2012
;
157
:
571
3
.
26.
Bach
PB
. 
Raising the bar for the US Preventive Services Task Force
.
Ann Intern Med
2014
;
160
:
365
66
.
27.
Tammemägi
MC
. 
Application of risk prediction models to lung cancer screening: a review
.
J Thorac Imaging
2015
;
30
:
88
100
.
28.
Katki
HA
,
Kovalchik
SA
,
Berg
CD
,
Cheung
LC
,
Chaturvedi
AK
. 
Development and validation of risk models to select ever-smokers for CT lung cancer screening
.
J Am Med Assoc
2016
;
315
:
2300
11
.
29.
Shiels
MS
,
Pfeiffer
RM
,
Hildesheim
A
,
Engels
EA
,
Kemp
TJ
,
Park
JH
, et al
Circulating inflammation markers and prospective risk for lung cancer
.
J Natl Cancer Inst
2013
;
105
:
1871
80
.
30.
Shiels
MS
,
Katki
HA
,
Hildesheim
A
,
Pfeiffer
RM
,
Engels
EA
,
Williams
M
, et al
Circulating inflammation markers, risk of lung cancer, and utility for risk stratification
.
J Natl Cancer Inst
2015
;
107
. pii:
djv199
.
31.
Chaturvedi
AK
,
Caporaso
NE
,
Katki
HA
,
Wong
H-L
,
Chatterjee
N
,
Pine
SR
, et al
C-reactive protein and risk of lung cancer
.
J Clin Oncol
2010
;
28
:
2719
26
.
32.
Pine
SR
,
Mechanic
LE
,
Enewold
L
,
Chaturvedi
AK
,
Katki
HA
,
Zheng
Y-L
, et al
Increased levels of circulating interleukin 6, interleukin 8, C-reactive protein, and risk of lung cancer
.
J Natl Cancer Inst
2011
;
103
:
1112
22
.
33.
Leuzzi
G
,
Galeone
C
,
Gisabella
M
,
Duranti
L
,
Taverna
F
,
Suatoni
P
, et al
Baseline {C}-reactive protein level predicts survival of early-stage lung cancer: evidence from a systematic review and meta-analysis
.
Tumori
2016
;
102
:
441
9
.
34.
Chaturvedi
AK
,
Udaltsova
N
,
Engels
EA
,
Katzel
JA
,
Yanik
EL
,
Katki
HA
, et al
Oral leukoplakia and risk of progression to oral cancer: a population-based cohort study
.
J Natl Cancer Inst
2019
Dec 20 [Epub ahead of print]
.
35.
Gail
MH
,
Pfeiffer
RM
. 
On criteria for evaluating models of absolute risk
.
Biostatistics
2005
;
6
:
227
39
.
36.
Chatterjee
N
,
Shi
J
,
Garcia-Closas
M
. 
Developing and evaluating polygenic risk prediction models for stratified disease prevention
.
Nat Rev Genet
2016
;
17
:
392
406
.
37.
Pfeiffer
RM
,
Gail
MH
.
Assessment of risk model performance. In: Absolute risk: methods and applications in clinical management and public health
.
Boca Raton (FL)
:
CRC Press
; 
2017
. p. 75–100.
38.
Cook
NR
. 
Use and misuse of the receiver operating characteristic curve in risk prediction
.
Circulation
2007
;
115
:
928
35
.
39.
Baker
SG
,
Cook
NR
,
Vickers
A
,
Kramer
BS
. 
Using relative utility curves to evaluate risk prediction
.
J R Stat Soc Ser A Stat Soc
2009
;
172
:
729
48
.
40.
Cook
NR
. 
Quantifying the added value of new biomarkers: how and how not
.
Diagnostic Progn Res
2018
;
2
:
14
.
41.
Robins
JM
,
Rotnitzky
A
,
Zhao
LP
. 
Estimation of regression coefficients when some regressors are not always observed
.
J Am Stat Assoc
1994
;
89
:
846
66
.
42.
Zhou
X-H
. 
Maximum likelihood estimators of sensitivity and specificity corrected for verification bias
.
Commun Stat Theor Methods
1993
;
22
:
3177
98
.
43.
Zhou
X-H
. 
Effect of verification bias on positive and negative predictive values
.
Stat Med
1994
;
13
:
1737
45
.
44.
Zhou
X-H
. 
Correcting for verification bias in studies of a diagnostic test's accuracy
.
Stat Methods Med Res
1998
;
7
:
337
53
.
45.
Kosinski
AS
,
Barnhart
HX
. 
Accounting for nonignorable verification bias in assessment of diagnostic tests
.
Biometrics
2003
;
59
:
163
71
.
46.
Rotnitzky
A
,
Faraggi
D
,
Schisterman
E
. 
Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias
.
J Am Stat Assoc
2006
;
101
:
1276
88
.
47.
Liu
D
,
Zhou
XH
. 
A model for adjusting for nonignorable verification bias in estimation of the ROC curve and its area with likelihood-based approach
.
Biometrics
2010
;
66
:
1119
28
.
48.
Liu
D
,
Zhou
XH
. 
Semiparametric estimation of the covariate-specific ROC curve in presence of ignorable verification bias
.
Biometrics
2011
;
67
:
906
16
.
49.
Gu
J
,
Ghosal
S
,
Kleiner
DE
. 
Bayesian ROC curve estimation under verification bias
.
Stat Med
2014
;
33
:
5081
96
.
50.
Adimari
G
,
Chiogna
M
. 
Nearest-neighbor estimation for ROC analysis under verification bias
.
Int J Biostat
2015
;
11
:
109
24
.
51.
Adimari
G
,
Chiogna
M
. 
Nonparametric verification bias-corrected inference for the area under the ROC curve of a continuous-scale diagnostic test
.
Stat Interface
2017
;
10
:
629
41
.
52.
Long
Q
,
Zhang
X
,
Johnson
BA
. 
Robust estimation of area under ROC curve using auxiliary variables in the presence of missing biomarker values
.
Biometrics
2011
;
67
:
559
67
.
53.
Carroll
RJ
,
Wand
MP
. 
Semiparametric estimation in logistic measurement error models
.
J R Stat Soc Ser B
1991
;
53
:
573
85
.
54.
Carroll
RJ
,
Knickerbocker
RK
,
Wang
CY
. 
Dimension reduction in a semiparametric regression model with errors in covariates
.
Ann Stat
1995
;
23
:
161
81
.
55.
Chatterjee
N
,
Chen
Y
. 
A semiparametric pseudo-score method for analysis of two-phase studies with continuous phase-I covariates
.
Lifetime Data Anal
2007
;
13
:
607
22
.

Supplementary data