Abstract
Clinico-genomic databases favor inclusion of long-term survivors, leading to potentially biased overall survival (OS) analyses. Risk set adjustments relying on the independent delayed entry assumption may mitigate this bias. We aimed to determine whether this assumption is satisfied in a dataset of patients with advanced non–small cell lung cancer (aNSCLC), and to give guidance for clinico-genomic OS analyses when the assumption is not satisfied.
We analyzed the association of timing of next-generation sequencing (NGS) testing with real-world OS (rwOS) in patient data from a United States–based nationwide longitudinal deidentified electronic health records–derived database. Estimates of rwOS using risk set adjustment were compared with estimates computed with respect to all patients, regardless of NGS testing.
The independent delayed entry assumption was not satisfied in this database, and later sequencing had a negative association with the hazard of death after sequencing. In a model adjusted for relevant characteristics, each month delay in sequencing was associated with a 2% increase in the hazard of death. However, until the median survival time, estimates of OS using risk set adjustment are similar to estimates computed for all patients, regardless of NGS testing.
rwOS analyses in clinico-genomic databases should assess the independent delayed entry assumption. Comparisons versus broader population may be useful to evaluate the rwOS differences between calculations using risk set adjustment and patient cohorts where the bias relates to overrepresentation of long survivors.
This study illustrates practices that can increase the interpretability of findings from OS analyses in clinico-genomic databases.
Introduction
In recent years, databases consisting of clinical and genomic data on oncology patients have been used to investigate the predictive and prognostic effects of biomarkers (1–4) and as external control arms (5, 6). These databases provide longitudinal information on a patient's clinical treatment and outcomes derived from electronic health records (EHR), as well as genomic information obtained from next-generation sequencing (NGS) test results.
An important endpoint used in follow-up studies in clinico-genomic databases is real-world overall survival (rwOS) measured from an index date, for example, start of first-line therapy. Analyses of rwOS in clinico-genomic databases are subject to potential bias, as patients who survive a long time have more opportunities to undergo NGS testing than patients who do not, and so will be overrepresented (7). This bias results in rwOS overestimation compared with what would be obtained in a prospective study, in which long-surviving patients would not be preferentially sampled.
Risk set adjustment, the usual way to avoid this bias, treats patients as at risk only after they have satisfied the requisite inclusion criteria, even if that is after the index date (8). For example, the inclusion criteria of a database may require patients to have undergone NGS testing and visited a physician (9). Risk set adjustment treats patients as at risk only after both requirements are satisfied, and allows for estimation of the marginal survival function (ref. 10; See Supplementary Fig. S1). The time elapsed between the index date and when a patient satisfies inclusion criteria is called the entry time. In a prospective study, such as a clinical trial, all patients have an entry time of zero, and the number of patients at risk will be highest at the index date and decrease as patients die or are censored. In contrast, in a retrospective study, patients may have positive entry times, and the number at risk may increase as patients satisfy inclusion criteria.
Risk set adjustment relies on an assumption, "independent delayed entry," that a patient's hazard of death after the entry time does not depend on the entry time (1, 5, 8, 11). For example, the hazard of death at one year after the index date should be the same for a patient NGS-tested three months post-index date as for one tested 6 months post-index date. In the absence of independent delayed entry, the marginal survival function, that is, the survival function that would be estimated if the whole population were prospectively NGS-tested, cannot be unbiasedly estimated with risk set adjustment. In a clinico-genomic dataset, the independent delayed entry assumption may not be satisfied, as physicians may order NGS testing based on, for example, the exhaustion of available treatment options. Therefore, the timing of NGS testing may be associated with hazard of death after testing. This may be particularly true in settings where NGS testing is not part of the standard of care and not ordered routinely. Although several methods have been proposed to estimate the marginal survival function in the absence of independent delayed entry, each relies on untestable assumptions (12–14).
Several tests have been proposed to check for independent delayed entry (15, 16). Jones and Crowley (1992) posed a Cox proportional hazards model with the entry time as a covariate, so that the hazard function can depend on a patient's entry time. The analyst tests the null hypothesis that the coefficient for this covariate equals 0. If this null hypothesis is rejected, there is evidence that there is not independent delayed entry.
In the absence of independent delayed entry, landmark analysis, which includes only patients satisfying inclusion criteria on or prior to some date, can be used (17). The marginal survival function so estimated will likely differ from the marginal survival function of the entire population, as patients NGS-tested prior to the index date may have different characteristics than patients tested after the index date. Reduced sample sizes may also hamper landmark analyses, as patients with positive entry times are excluded.
To our knowledge, no previous work has addressed the problem of rwOS estimation in clinico-genomic databases. Here we do so using an EHR-derived database of patients diagnosed with advanced non–small cell lung cancer (aNSCLC). Our first objective is to describe NGS testing patterns in this dataset, and the baseline characteristics of tested and untested patients. Our second objective is to test whether the assumption of independent delayed entry is satisfied. Our third objective is to evaluate rwOS estimates from the risk set adjustment method.
The dataset we use provides a unique opportunity to answer these questions as it includes patients regardless of whether they received NGS testing, allowing us to compare survival functions estimated for NGS-tested patients to those estimated for all.
We discuss our data and our analysis methods in Methods section and our findings in Results section. Discussion includes our perspective on selecting appropriate methods for OS analyses in clinico-genomic databases.
Materials and Methods
Our cohort includes patients selected from the nationwide Flatiron Health deidentified EHR-derived database, which as of May 2019 included data originating from approximately 280 U.S. cancer clinics (∼800 unique sites of care). The Flatiron Health database is a longitudinal database, comprising deidentified patient-level structured and unstructured data, curated via technology-enabled abstraction (18, 19). The majority of patients in the database originate from community oncology settings; relative community/academic proportions may vary depending on study cohort. To be included in the study cohort, patients must (i) have at least two visits after 2010 in the Flatiron Health network, (ii) an ICD-9 or -10 code for lung cancer, and (iii) an abstractor-confirmed diagnosis of aNSCLC between 2011 and 2018. Patients with no visits in the 90 days after advanced diagnosis are excluded. NGS testing dates were abstracted along with initial and advanced NSCLC diagnosis dates, histology, biomarker results from NGS and other tests, smoking status and clinical real-world progression dates (20). Death dates were determined via a previously described mortality variable (21). Patients are considered ALK positive (ALK+) or EGFR positive (EGFR+) if they have a positive mutation result any time before the advanced NSCLC diagnosis date plus 60 days.
Our first objective was to describe NGS testing patterns and baseline characteristics of tested and untested patients. We tabulated by year of advanced diagnosis the proportion of patients NGS-tested within 60 days and one year of the advanced diagnosis date, with various baseline characteristics, along with 95% binomial confidence intervals. The baseline characteristics were (i) age above 70 years at advanced diagnosis, (ii) ever smoked, (iii) EGFR+, (iv) squamous cell cancer, and (v) sex. Patients with missing data are excluded where applicable.
Our second objective was to test independent delayed entry. We define rwOS as the time from advanced diagnosis to the 15th of the month of death or, if month of death is unavailable, as censored at the later of the last visit date and the last NGS test date.
We used a Cox proportional hazards model with risk set adjustment to test independent delayed entry. We model rwOS as a function of entry time, adjusting for sex, age at and time of advanced diagnosis (i.e., year of entry in the cohort), and histology, and stratifying by smoking, whether cancer is recurrent (22) and EGFR status. Entry time was measured from advanced diagnosis until the later of first NGS test date and second visit on or after January 2011, and is set to 0 if negative. For 75% of patients, the NGS test date was after the second visit. Stratification was used to resolve non-proportional hazards. We set out to conclude that there was not independent delayed entry if the coefficient for entry time is significantly different from 0.
To graphically illustrate departures from independent delayed entry, we plotted Kaplan–Meier curves, showing rwOS from 12 months after advanced diagnosis for patients with entry times prior to that time, stratified by EGFR status and also by entry time.
We tabulated the proportion of patients with progression in the first, second and third months before NGS testing, stratified by time of test. If NGS tests were often ordered when patients have progressed, we would have expected for testing to occur close in time to progression.
Our third objective was to explore the performance of risk set adjustment. We considered survival from start of first-line therapy for four different groups of patients who received standard of care treatment, defined on the basis of histology, biomarker status and timing of treatment. Supplementary Figure S1 shows the selection of these cohorts. For each group, we compared the survival function for all patients, NGS-tested or not, to three survival functions for NGS-tested patients: (i) a survival function not accounting for delayed entry, (ii) a risk set–adjusted survival function, and (iii) a landmark analysis for NGS-tested patients entering the study before start of first-line therapy (namely, only individuals who underwent sequencing before start of first-line therapy, pre-L1). When computing the all-patients survival function, we used the second visit date as the entry date.
We used simulation to evaluate how prevalence of delayed entry affects estimation of risk set-adjusted survival functions. We simulated samples in which 25%, 50%, or 75% of NGS-tested patients entered the sample after the index date, by resampling 100 times with replacement 1,000 patients from a cohort of NGS-tested patients, using different sampling rates for patients entering before or after the index date. We estimated risk set–adjusted survival using each sample, and plotted the median survival function across samples for each sampling percentage.
The following terms are used in this report: First visit (date of): earliest date documented in the EHR (earliest EHR activity) for a visit to an originating care site/clinic. Initiation of treatment (date of): earliest date documented in the EHR for a prescription for an oral therapy, or an administration for an infusional therapy. When the treatment of interest is the first one after diagnosis, it is considered the first line of therapy (L1), if the treatment has been preceded by others, it is considered a subsequent therapy line (L2, L3, etc). NGS testing (date of): date documented in the EHR for the collection of the sample for an NGS test. Second visit (date of): EHR documented visit to an originating care site/clinic that has been preceded by another visit. Standard of care: treatment for a given tumor type and clinical setting within either the range of drugs/interventions approved by regulatory authorities for that setting, or the recommendations by established professional societies. Survival (time): time lapsed from an index date of interest to death. Date of death in the databases used in this study is determined based on a composite mortality variable that aggregates multiple mortality surveillance sources (21).
Results
Our first objective was to describe NGS testing patterns and baseline characteristics of tested and untested patients with aNSCLC. Table 1 shows that the percentage of patients who underwent testing within 60 days of advanced diagnosis increased from close to 0% in 2011 to above 35% in 2018. In earlier years, a substantial proportion of tested patients were tested between 60 days and one year after the date of advanced diagnosis (also see histogram of time to testing in Supplementary Fig. S2; see comparison between tested and untested patients, and general SEER cohort, in Supplementary Table S1).
Year of advanced diagnosis . | n . | % patients tested within 60 days of advanced diagnosisa . | % patients tested within 1 year of advanced diagnosisa . |
---|---|---|---|
2011 | 3,545 | 0.08 | 0.14 |
2012 | 4,867 | 0.43 | 1.50 |
2013 | 5,911 | 2.76 | 4.75 |
2014 | 6,676 | 5.77 | 8.42 |
2015 | 6,874 | 12.92 | 17.08 |
2016 | 7,065 | 18.13 | 23.03 |
2017 | 7,103 | 26.71 | 31.24 |
2018 | 6,523 | 35.80 | NA |
Year of advanced diagnosis . | n . | % patients tested within 60 days of advanced diagnosisa . | % patients tested within 1 year of advanced diagnosisa . |
---|---|---|---|
2011 | 3,545 | 0.08 | 0.14 |
2012 | 4,867 | 0.43 | 1.50 |
2013 | 5,911 | 2.76 | 4.75 |
2014 | 6,676 | 5.77 | 8.42 |
2015 | 6,874 | 12.92 | 17.08 |
2016 | 7,065 | 18.13 | 23.03 |
2017 | 7,103 | 26.71 | 31.24 |
2018 | 6,523 | 35.80 | NA |
aAdvanced NSCLC diagnosis refers to the diagnosis of stage IIIB or IV disease, or the diagnosis of a recurrence (progression) of initial early-stage disease; this differs from the initial diagnosis, which refers to the earliest diagnosis for NSCLC, regardless of stage at that moment.
Figure 1 shows that baseline characteristics in an overall cohort of patients with aNSCLC were stable over time. In contrast, characteristics of NGS-tested patients changed substantially, becoming more similar to those of the general aNSCLC population, with testing rates increasing for smokers, older patients, and patients with squamous histology.
Our second objective was to test independent delayed entry. Table 2 shows the Cox model used to test independence of entry time and survival. There was not an independent delayed entry in this dataset, and entry time is negatively associated with survival after entry (coefficient of entry time in months is 1.020, 95% CI, 1.016–1.025). A patient entering the study cohort 6 months after advanced diagnosis is estimated to have a hazard of death 11% higher than a patient entering as of the advanced diagnosis date. See Supplementary Fig. S3 for a graphical illustration of the lack of independent delayed entry.
Factors . | Estimate . | 95% CI . |
---|---|---|
Months from advanced diagnosisa to entry | 1.018 | 1.013–1.022 |
Male | 1.200 | 1.139–1.264 |
Age at advanced diagnosis | 1.012 | 1.009–1.014 |
Histology: squamous | 1.201 | 1.120–1.287 |
Histology: unknown | 1.392 | 1.240–1.563 |
Months from Jan-2011 to advanced diagnosisa | 0.998 | 0.997–1.000 |
Factors . | Estimate . | 95% CI . |
---|---|---|
Months from advanced diagnosisa to entry | 1.018 | 1.013–1.022 |
Male | 1.200 | 1.139–1.264 |
Age at advanced diagnosis | 1.012 | 1.009–1.014 |
Histology: squamous | 1.201 | 1.120–1.287 |
Histology: unknown | 1.392 | 1.240–1.563 |
Months from Jan-2011 to advanced diagnosisa | 0.998 | 0.997–1.000 |
Note: In an unadjusted model, the coefficient of "Months from advanced diagnosis to entry" is 1.020 (1.016–1.025). The model is stratified by smoking, whether cancer is recurrent, and baseline EGFR status.
aAdvanced NSCLC diagnosis refers to the diagnosis of stage IIIB or IV disease, or the diagnosis of a recurrence (progression) of initial early-stage disease; this differs from the initial diagnosis, which refers to the earliest diagnosis for NSCLC, regardless of stage at that moment.
Poorer relative survival for patients NGS-tested late could result if physicians use NGS testing to identify treatment options after failure of standard of care treatment (23–25). Table 3 indicates that patients undergoing testing after advanced diagnosis often had a progression event just before testing.
. | % patients with progression prior to test . | ||
---|---|---|---|
Timing of NGS test, after advanced diagnosisa . | 0–1 mo before . | 1–2 mo before . | 2–3 mo before . |
6–9 months | 46.46 | 16.16 | 9.43 |
9–12 months | 48.63 | 21.96 | 13.33 |
12–15 months | 44.80 | 23.98 | 14.48 |
. | % patients with progression prior to test . | ||
---|---|---|---|
Timing of NGS test, after advanced diagnosisa . | 0–1 mo before . | 1–2 mo before . | 2–3 mo before . |
6–9 months | 46.46 | 16.16 | 9.43 |
9–12 months | 48.63 | 21.96 | 13.33 |
12–15 months | 44.80 | 23.98 | 14.48 |
aAdvanced NSCLC diagnosis refers to the diagnosis of stage IIIB or IV disease, or the diagnosis of a recurrence (progression) of initial early-stage disease; this differs from the initial diagnosis, which refers to the earliest diagnosis for NSCLC, regardless of stage at that moment.
Our third objective was to explore the performance of risk set adjustment. To do so, we selected cohorts based on changing standards of care in aNSCLC and compared survival computed using different methods for NGS-tested patients to survival for all patients, regardless of testing. Figure 2A shows survival functions for patients with non-squamous histology without EGFR or ALK alterations treated with first-line chemotherapy pre-2017. The naïve method, which fails to account for delayed entry, resulted in survival much higher than survival for all patients and survival for NGS-tested patients computed using risk set adjustment or landmark analysis. Risk set–adjusted survival was very similar to survival for all patients until about 24 months after advanced diagnosis. After that point, the risk set–adjusted survival function was below that for all patients. In contrast, the landmark analysis survival function lies above the survival function for all patients from the median survival time.
Figure 2B shows survival functions for patients with the same clinical characteristics treated with first-line immunotherapy post-2016. Whereas only 48% of the NGS-tested patients represented in Fig. 2A entered the study cohort on or prior to the start of first-line therapy, 74% of the NGS-tested patients represented in Fig. 2B did. Delayed entry is therefore uncommon in this population, and the survival functions for NGS-tested patients using the three different methods are very similar. Supplementary Figures S4 and S5 present analyses for two more populations, patients with EGFR mutations and patients with squamous histology.
To evaluate risk-adjusted survival functions in a range of different delayed entry scenarios, we simulated samples in which 25%, 50% or 75% of patients, sampled from the cohort of Fig. 2A, entered after the index date. As shown in Fig. 3, the 25th or 50th percentiles of survival estimates are relatively insensitive to the percent of patients entering after the index date, although estimates of the 75th percentile vary considerably.
Discussion
We have described NGS testing patterns in a real-world dataset, shown that the independent delayed entry assumption does not hold, and presented examples of risk set adjustment and landmark analysis. We now describe what can be learned about how to carry out rwOS analyses in clinico-genomic databases.
First, any rwOS analysis in a clinico-genomic database requires a consideration of delayed entry. Although in aNSCLC early NGS testing has in recent years become the norm, in many diseases testing late in the disease course is still common. Moreover, even in a disease like aNSCLC, analysis of data from earlier years will be greatly affected by delayed entry. Analyses like those in our first objective can reveal to what extent NGS testing is routine in a given disease, and which demographic and clinical characteristics differ between patients getting NGS testing and other patients.
Second, whenever delayed entry is relevant to an rwOS analysis, the analyst should always assess whether the independent delayed entry assumption is satisfied. In our aNSCLC dataset, the independent delayed entry assumption is not satisfied. Physicians may often order NGS tests soon after a patient has progressed, when their prognosis is likely to be poor, and test time is associated with outcomes after testing.
Third, risk adjustment may reasonably estimate survival even when the delayed entry assumption is not satisfied.
Here we consider risk adjustment to reasonably estimate survival when it yields similar results to the survival function for all patients, regardless of NGS testing. We do so since when there is independent delayed entry, risk adjustment allows for estimation of the marginal survival function, that is, the survival function for the population of patients from which a particular sample was drawn. In our case, the NGS-tested patients in the sample we use for estimation are drawn from the broader population of patients.
We note that this approach is empirical, that is, we cannot prove that the risk set-adjusted estimates are close to the true marginal survival function. For example, if the NGS test itself affects outcomes, by enabling physicians to select more effective therapies, survival for NGS-tested and non-NGS–tested patients would differ. Given that most targeted therapies for aNSCLC are based on biomarker results available from other testing modalities; however, we expect any difference to be small (25). In addition, some patients likely would never get NGS tested, no matter their survival after the advanced diagnosis date. Even with independent delayed entry, the survival of these patients would not be reflected in the marginal survival function estimated for NGS-tested patients.
What do our analyses say about whether risk adjustment can reasonably estimate survival, even in the absence of independent delayed entry? Applied to NGS-tested patients in our dataset, risk adjustment yields survival functions that are very similar to the survival function for all patients, until at least the median survival time (e.g., Fig. 2A and B). Our simulation study likewise suggests that survival function estimation through the median survival time is relatively robust, regardless of how many patients enter the study cohort after the index date. These results suggest that the risk set adjustment method may yield survival estimates that have relatively minor bias for shorter times, even without independent delayed entry. This is reasonable since patients that enter the cohort late do not affect risk-set adjusted estimation until after they enter, and also since the hazard for patients entering the cohort soon after the index date is not much different from the hazard for patients entering before the index date. In contrast, for later survival times, many clinico-genomic datasets may be enriched in patients with poor prognosis, for example, patients whose disease has just progressed and for whom the hazard of death is substantially higher, and for these times the analyst should be more wary of substantial bias associated with risk set adjustment.
Landmark analyses, in which attention is restricted to patients sequenced before the index date, may also be an attractive strategy. However, sample size will be limited with this strategy, although to a lesser extent as NGS testing moves earlier in the disease course. Also, the analyst should note that, as here, patients NGS-tested before the index date may have better outcomes than the general population. Early testing may be a proxy for other factors associated with a favorable prognosis, including health-seeking behavior, and might also affect treatment by giving physicians more of an opportunity to select more effective therapies.
We conclude that risk adjustment may reasonably be used for analyses of rwOS in clinico-genomic databases even in the absence of independent delayed entry, unless attention focuses on survival long after advanced diagnosis. To establish the reasonableness of using the risk adjustment method, we suggest that analyses like those carried out here, including comparisons of risk set adjustment and landmark analyses, as well as comparisons to survival in broader populations including both NGS and non-NGS–tested patients, should be carried out. This is important both to guide the choice of a strategy for dealing with delayed entry and also to understand the generalizability of conclusions drawn from an analysis restricted to an NGS-tested patient population. We expect that disease-specific patterns of clinical practice, or patient characteristics influencing the representativeness of NGS-tested populations, such as socioeconomic status or healthcare insurance coverage, will affect the extent to which a lack of independent delayed entry influences survival function estimation. In this report, we present a simple comparison of tested and untested cohorts, benchmarked to the SEER population; it is apparent that practice patterns (such as guideline-recommended testing for patients with advanced disease, rather than all patients) affect the distribution of clinical stage across cohorts, but other demographic characteristics are similar. We hope that the analyses presented here may serve as a guide for how to approach a survival analysis in a clinico-genomic database in a new disease setting, but we also acknowledge that it will be valuable for future studies to expand their scope to better understand multifactorial effects on survival analyses.
Authors' Disclosures
D. Backenroth reports personal fees from Flatiron Health and personal fees and other support from Janssen outside the submitted work. E. Castellanos reports personal fees from Flatiron Health and Roche outside the submitted work. M. McCusker was an employee of Flatiron Health at the time the analysis was performed and the manuscript was submitted and owned shares in Roche during this time. Currently, M. McCusker is an employee of Grail, LLC and owns shares in Illumina. S. Sarkar reports personal fees from Flatiron Health, Inc. and Roche during the conduct of the study. No disclosures were reported by the other authors.
Authors' Contributions
D. Backenroth: Conceptualization, formal analysis, validation, investigation, methodology, writing–original draft, writing–review and editing. J. Snider: Formal analysis, validation, writing–review and editing. R. Shen: Conceptualization, methodology, writing–review and editing. V. Seshan: Conceptualization, methodology, writing–review and editing. E. Castellanos: Validation, investigation, writing–review and editing. M. McCusker: Supervision, validation, writing–review and editing. D. Feuchtbaum: Supervision, project administration, writing–review and editing. M. Gönen: Conceptualization, methodology, writing–review and editing. S. Sarkar: Conceptualization, supervision, writing–review and editing.
Acknowledgments
We thank Julia Saiz-Shimosato of Flatiron Health for her editorial assistance with this manuscript. This study was sponsored by Flatiron Health, which is an independent subsidiary of the Roche Group.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).