## Abstract

In an era of ongoing improvement in cancer patient survival, available long-term survival figures from cancer registries are often outdated and too pessimistic for two reasons: first, delay in availability of cancer registry data, typically in the order of a few years, and, second, application of cohort-based methods of survival analysis, which provide survival estimates for patients diagnosed many years ago. We developed a model-based period analysis approach aimed to overcome both problems. We provide extensive empirical evaluation of our approach by comparing its performance with that of previously available methods for monitoring of 5- and 10-year relative survival, with the use of data from the nationwide Finnish Cancer Registry of 490,279 patients ages ≥15 years and diagnosed with one of 20 common forms of cancer between 1953 and 1997. We show that, in most cases, the model-based approach predicts 5- and 10-year relative survival expectations of newly diagnosed patients quite closely and much better than any of the previously available methods, including standard period analysis. We conclude that the model-based approach may enable deriving up-to-date cancer survival rates even with the common latency in availability of cancer registry data. (Cancer Epidemiol Biomarkers Prev 2006;15(9):1727–32)

## Introduction

Monitoring cancer patient survival is an important task of both clinical and population-based cancer registries. To be of maximum use for both clinical and public health purposes, estimates of cancer patient survival should be as up-to-date as possible. Period analysis has been introduced 10 years ago (1) to provide more up-to-date estimates of long-term survival than traditional cohort-based methods of survival analysis. The principle of period analysis consists of restricting the analysis to some recent time period, which is achieved by left truncation of observations at the beginning of that time period (in addition to right censoring of observations at its end). It has been shown that period analysis provides quite accurate and up-to-date estimates of long-term survival for patients diagnosed in the period of interest (2-5). In practice, however, there is often a latency of several years in availability of cancer registry data, and there may be further delay in the process of publication. Therefore, even if period analysis is applied to provide up-to-date estimates of long-term cancer patient survival for the most recent period for which cancer registry data are available, these estimates may still be somewhat outdated with respect to currently diagnosed cancer patients. For example, period estimates of long-term cancer survival published in 2002 for the United States and in 2005 for Germany pertained to patients diagnosed with cancer up to the year 1998 and 2002, respectively (6, 7).

In this article, we develop and empirically evaluate a model-based period analysis approach aimed to provide up-to-date estimates of long-term survival even with the common latency in availability of cancer registry data.

## Materials and Methods

### Database

Our analysis is based on data from the nationwide Finnish Cancer Registry, which is well known for its high levels of completeness and data quality (8). At the time of this analysis, the database encompassed patients diagnosed within half a century from 1953 to 2002, with a follow-up with respect to vital status until the end of 2003. In this analysis, we included patients ages ≥15 years with a first diagnosis with 1 of 20 common forms of cancer between 1953 and 1997.

### Statistical Analysis

Throughout this article, we present relative rather than absolute survival rates, as the former are most commonly reported by cancer registries. Relative survival rates reflect the probability of surviving the cancer of interest rather than the total survival probability (9, 10), taking expected deaths in the absence of cancer into account. For this analysis, the expected numbers of deaths were derived from age-, gender-, and calendar period–specific mortality figures of the general population of Finland according to the so-called Ederer II method (11).

We first assessed overall trends in survival during the past decades by looking at the development of 5-year relative survival for successive cohorts of patients diagnosed between 1953 to 1957 and 1993 to 1997.

Next, 5-year relative survival actually observed for patients diagnosed in 1993 to 1997 and followed through 2002 (Fig. 1, *bottom solid rectangular frame*) was compared with the most up-to-date estimates of 5-year survival that would potentially have been available somewhere in the middle of 1993 to 1997, the years of diagnosis of this cohort. Assuming that, with the common latency of cancer registration of about several years, available data might have included patients diagnosed up to the year 1992, the following survival estimates were derived: First, a cohort-based estimate for the cohort of patients diagnosed in 1983 to 1987 and followed over 5 years since then (Fig. 1, *top solid rectangular frame*). Second, a so-called complete estimate additionally included patients diagnosed in 1988 to 1992, although the latter could not have been under observation for 5 years by the end of 1992 and might be censored at that date unless they died or were lost to follow-up earlier (Fig. 1, *triangular frame*). Third, a standard period estimate for the 1988 to 1992 period, derived by left truncation of observations at the beginning of 1988 in addition to censoring of observations at the end of 1992 as previously described (Fig. 1, *dashed frame*; ref. 1). Fourth, to overcome the latency in cancer registration, a model-based period estimate for the 1993 to 1997 period was derived.

For derivation of the model-based period estimates for the 1993 to 1997 period, we first calculated numbers of patients at risk and of deaths by year of follow-up for each of the three preceding 5-year periods, i.e., 1978 to 1982, 1983 to 1987, and 1988 to 1992, just like one would do in standard period analysis for each of these periods. Next, we used a Poisson regression model (12) for the total 1978 to 1992 period. A formal description of the model is given in Appendix 1. Briefly, the numbers of deaths for each combination of 5-year calendar period and year of follow-up were modeled as a function of the calendar period (included as a numerical predictor variable) and year of follow-up (included as a categorical predictor variable). Based on this model, we calculated the projected numbers of deaths and conditional relative survival probabilities for each year of follow-up in 1993 to 1997, assuming that the linear trend from the 1978 to 1982 period to the 1988 to 1992 period would prevail, and that the pattern of follow-up year-specific survival would remain unchanged otherwise. The model-based period estimate of 5-year relative survival for 1993 to 1997 was then obtained as the product of these conditional survival probabilities.

To address the performance of the various methods in a much broader range of settings, we repeated the analyses of 5-year relative survival outlined for the cohort of patients diagnosed in 1993 to 1997 in the preceding paragraphs for cohorts of patients diagnosed in 1988 to 1992, 1983 to 1987, 1978 to 1982, and 1973 to 1977 as well (which is the widest possible range of cohorts for which analogous calculations could be carried out with the available database of the Finnish Cancer Registry). We calculated the following summary indicators of the performance of the various methods: The mean difference and the mean squared difference between 5-year relative survival later observed for patients diagnosed in the respective calendar periods and the various estimates potentially available during these periods. The mean differences reflect the average underestimation or overestimation of the 5-year relative survival rates. The mean squared differences, in addition, reflect average absolute levels of deviations of single estimates (with particular “punishment” of strong deviations).

To address the performance of the modeling approach for longer-term survival, we carried out analogous analyses comparing 10-year relative survival actually observed for patients diagnosed in 1988 to 1992, 1983 to 1987, and 1978 to 1982 with the most up-to-date estimates of 10-year relative survival that might have been available in those periods.

The analyses were carried out with the SAS statistical software package (Cary, NC), using the macro *period*, to derive the effective numbers of patients at risk and of deaths (13), and the procedure GENMOD to carry out Poisson regression.

## Results

Overall, 577,924 patients ages ≥15 years were reported to the Finnish Cancer Registry with a first diagnosis of cancer between 1953 and 1997. Of these, we excluded 2.6% notified by death certificate only, another 2.6% notified by autopsy only, and 0.1% due to missing information on month of diagnosis. The 20 forms of cancer specifically addressed in this article include ∼89.6% of the remaining cancer cases (*n* = 490,279).

The numbers of notifications, as well as 5-year relative survival rates of patients with these 20 forms of cancer, are shown for calendar years 1953 to 1957 and 1993 to 1997 (Table 1). Five-year relative survival strongly varied by cancer site. Among patients diagnosed in 1953 to 1957, it ranged from 66.3% for patients with cancer in the oral cavity to 3.2% for patients with esophagus cancer. Five-year relative survival rates increased between 1953 to 1957 and 1993 to 1997, albeit to a strongly varying degree, for all but one (pancreas cancer) of the assessed forms of cancer. A most pronounced increase by 49.4%, 45.4%, and 40.3% units (i.e., an average annual increase by >1% unit) was seen for patients with prostate, bladder, and thyroid cancer, respectively. With 89.4%, the latter had the highest 5-year relative survival rates in 1993 to 1997 among the cancer patients included in this analysis. On the other hand, 5-year relative survival still remained just above 10% for patients with cancers of the esophagus, liver, and lung, and <4% for patients with cancer of the pancreas.

As Fig. 2 shows, increases in 5-year relative survival mostly occurred steadily throughout the second half of the 20th century, and for some forms of cancer, such as colon, rectum, or stomach cancer, increases were very close to linear over several successive decades. On the other hand, quite varying trends were seen in different periods for a few other forms of cancer, such as lymphomas and particularly cervical cancer.

Table 2 shows the estimates of 5-year relative survival potentially available in 1993 to 1997 with cancer registry data up to the year 1992 by the four analytic approaches compared with the 5-year relative survival rates later observed for patients diagnosed in 1993 to 1997. For urological cancers, results are also illustrated graphically (see Fig. 3). For those forms of cancer whose prognosis improved over time, the available cohort estimates, which pertained to patients diagnosed in 1982 to 1987, but also (albeit to a slightly lesser degree) the available complete estimates, which pertained to patients diagnosed in 1982 to 1992, were much lower than the 5-year relative survival rates later observed in 1993 to 1997. The “standard” period estimates for the 1988 to 1992 period were less outdated in most cases, but for 14 of the 20 forms of cancer, the model-based period estimate for the 1993 to 1997 period came closest to the 5-year relative survival rates later observed in 1993 to 1997 (bold figures in Table 2). The latter was true for three, zero, and four forms of cancer for cohort analysis (which performed worst for 15 of 20 forms of cancer), complete analysis, and standard period analysis, respectively. However, for prostate and cervical cancer, all estimates were much too low. For prostate cancer, underestimation was worst for cohort analysis, and for cervical cancer, it was worst for the modeled period analysis.

With few exceptions, cohort, complete, and standard period analysis provided, on average, somewhat too pessimistic estimates of 5-year relative survival rates later observed for patients diagnosed in each 5-year calendar period between 1973 to 1977 and 1993 to 1997, which can be seen from the negative values of most of the mean differences shown in Table 3. This underestimation was generally largest for the conventional cohort analysis (with mean differences ranging up to −11.5% units for melanoma), intermediate for complete analysis, and somewhat lower for standard period analysis. With modeled period analysis, mean differences were generally much closer to 0. They were below 0 for 12 forms of cancer, and above 0 for eight forms of cancer, with a range from −3.5% units to +1.9% units. For 17 and 15 of 20 cancers, absolute values of mean differences and mean-squared differences were lowest with the use of modeled period analysis, respectively. Standard period analysis performed best for three and two forms of cancer, and cohort analysis performed worst for 18 and 18 forms of cancer, respectively, according to these two criteria. The only major exception from the good performance of modeled period analysis was cervical cancer, for which the mean squared difference was substantially higher with modeled period analysis than with the other methods of analysis.

The advantages of modeled period analysis are even more striking for 10-year relative survival rates, where underestimation of survival by traditional methods of survival analysis may be even much more severe, whereas the performance of the modeled period analysis is, on average, about equally good as for 5-year relative survival (see Table 4).

## Discussion

With ongoing progress in prognosis, conventional cohort and complete estimates of long-term survival are overly pessimistic for many forms of cancer. Although this problem may be partly overcome by period analysis, even the “standard” period estimates are often outdated to some extent once they can be derived due to the usual delay in availability of cancer registry data. In this article, we introduced a model-based extension of period analysis of cancer patient survival that allows to provide up-to-date estimates of long-term survival even in that situation. The model-based period analysis was found to be superior to any of the other methods, including standard period analysis, for the clear majority of cancers in extensive empirical evaluation.

Our findings regarding the advantage of “standard” period analysis over cohort and complete analysis in terms of up-to-dateness of survival estimates are in agreement with previous evaluations that had not taken the usual delay in availability of cancer registry data into account (3-5). Previous analyses had also shown that period analysis may advance detection of trends in 5-, 10-, 15-, and 20-year survival by almost 5, 10, 15, and 20 years compared with cohort analysis (2). Model-based period analysis as applied in this study would be expected to result in further reduction of delay in disclosure of progress in survival by another 5 years. In fact, the mean difference in model-based period estimates for the current period from 5-year and 10-year survival later observed for patients diagnosed in that period was close to zero for most forms of cancer in our extensive empirical evaluation, which indicates that the delay in disclosure of progress in prognosis can be overcome almost entirely.

In our modeling approach, we assumed persistent increasing or decreasing trends in relative survival rates over time. Obviously, the model-based period approach performs best when this assumption holds entirely. In our time trend analyses of cancer survival over half a century, we found entirely or close to monotonic upward trends in survival for most forms of cancer. However, for cervical cancer, divergent trends in various time intervals were found. The transient drop in survival for this form of cancer probably reflects selection processes resulting form the very successful screening program for this form of cancer (14). As a result of this pattern, the performance of model-based period estimates was worse for this form of cancer than the performance of the other methods of survival analysis. For cancers with little change in prognosis over time, neither standard period analysis nor model-based period analysis provides advantages over conventional cohort or complete analysis, and all methods essentially perform equally well. Such a pattern was seen, for example, for pancreas cancer in our analysis.

In the application of the modeling approach, there are a number of design options that may be considered. These include, for example, the number and width of periods included in the modeling. In our analysis, we chose to use three 5-year periods, covering a 15-year time fame altogether. Although a longer time frame (e.g., use of four or five 5-year periods) would provide a broader basis for estimation, the assumption of a persistent trend within such a long time frame may often be more problematic. Also, such an approach would only be feasible in cancer registries with a long history of cancer registration at high levels of quality and completeness. Although this prerequisite would be given in the Finnish Cancer Registry, it would hinder application of modeled period analysis in a large and growing number of younger high-quality cancer registries. On the other hand, relying on just two 5-year periods might provide a too weak basis for reliable estimation of trends and may give too much weight to possible random deviations from long-term trends. The width of time periods used for modeling and prediction is likewise somewhat arbitrary. We feel, however, that the 5-year periods used in our analysis may be a reasonable choice, as it limits the influence of short-term fluctuations in prognosis (whether they are due to chance or due to other reasons), and avoids the limited flexibility that would result from too broad intervals.

A potential tradeoff of the modeling approach, apart from its reliance on the existence of a long-term history of reliable cancer registration, is the increased complexity in calculation and data interpretation. However, the increased complexity of calculations also comes along with increased flexibility, e.g., to take account of possible deviations from linearity in trends, or of other variables that might affect prognosis that might be included in the models. In fact, the modeling approach may be considered as a broader general methodologic framework, which includes standard period analysis as one special case (i.e., the case of a saturated model in which follow-up year–specific survival rates are estimated for one single calendar period).

In summary, our empirical evaluation supports expectations from theory that the modeling approach further enhances possibilities of deriving up-to-date cancer survival rates. This approach may be particularly helpful to overcome outdatedness of survival data resulting from the common latency in availability of cancer registry data.

## Appendix 1

Let *l _{ij}* = the effective numbers of patients at risk (accounting for late entries and withdrawals as half persons),

*d*= the observed numbers of deaths, and

_{ij}*e*= the expected numbers of deaths (from population life tables) for each combination of follow-up-year

_{ij}*i*and calendar period

*j*.

The calendar periods are coded in such a way that *j* = 0 for the first calendar period, and *j* = 1 and 2 for the subsequent calendar periods included in the modeling.

Then, a generalized linear model *d _{ij}* =

*f*(

*i,j*) is fitted with outcome

*d*, Poisson error structure, predictor variables

_{ij}*i*(categorical) and

*j*(numerical), link ln(

*μ*−

_{ij}*d**), and offset ln(

_{ij}*l*−

_{ij}*d*/ 2), where μ

_{ij}_{ij}= the model-based numbers of deaths and

*d** = −(

*l*−

_{ij}*d*/ 2) × ln((

_{ij}*l*−

_{ij}*e*) /

_{ij}*l*).

_{ij}Let *α _{i}* and

*β*be the estimated regression coefficients for follow-up years

*i*and calendar periods

*j*. Then, estimates of conditional relative survival for each combination of follow-up year

*i*and calendar period

*j*are given as

and an estimate of cumulative *k*-year relative survival for each calendar period *j* is given as

The modeled *k*-year cumulative relative survival for the current calendar period is obtained by setting *j* = 3.

**Grant support:** German Cancer Foundation, Deutsche Krebshilfe, project no. 70-3166-Br 5 (H. Brenner), and Academy of Finland and the Cancer Society of Finland (T. Hakulinen).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.