## Abstract

In recent years, biologically motivated continuous tumor growth models have been introduced for breast cancer screening data. These provide a novel framework from which mammography screening effectiveness can be studied.

We use a newly developed natural history model, which is unique in that it includes a carcinogenesis model for tumor onset, to analyze data from a large Swedish mammography cohort consisting of 65,536 participants, followed for periods of up to 6.5 years. Using patient data on age at diagnosis, tumor size, and mode of detection, as well as screening histories, we estimate distributions of patient's age at onset, (inverse) tumor growth rates, symptomatic detection rates, and screening sensitivities. We also allow the growth rate distribution to depend on the age at onset.

We estimate that by the age of 75, 13.4% of women have experienced onset. On the basis of a model that accounts for the role of mammographic density in screening sensitivity, we estimated median tumor doubling times of 167 days for tumors with onset occurring at age 40, and 207 days for tumors with onset occurring at age 60.

With breast cancer natural history models and population screening data, we can estimate latent processes of tumor onset, tumor growth, and mammography screening sensitivity. We can also study the relationship between the age at onset and tumor growth rates.

Quantifying the underlying processes of breast cancer progression is important in the era of individualized screening.

## Introduction

Researchers have attempted to quantify the natural history of breast cancer. Often, these attempts have been based on multi-state Markov models (1–3). Recent years, however, have seen the development of more biologically motivated continuous tumor growth models (4–7). These models allow the possibility to study a number of unobservable processes involved in tumor progression, such as tumor onset and growth, symptomatic detection, and spread. Abrahamsson and colleagues (8) have, for example, studied screening sensitivity as a function of mammographic density, growth rate as a function of body mass index, and symptomatic detection as a function of breast size. In this article, we place our focus on the model for tumor onset. We present the first application of a continuous growth model, which includes together the processes of tumor onset (carcinogenesis), tumor growth and detection, through mammography screening or symptoms, to a detailed mammography screening cohort.

It has been observed that younger women have, on average, faster growing tumors than older women, based on observed age at diagnosis (1, 4, 9, 10). This does not necessarily translate to an equivalent relationship between age at onset (carcinogenesis) and tumor growth rate, as has been demonstrated in ref. 11. Here, we study directly the relationship between these latent processes.

We use a large, ongoing Swedish breast cancer mammography cohort to jointly estimate key parameter values in our comprehensive natural history model for invasive breast cancer. As well as using our model for the first time on empirical data, we also present an extension to the model described in ref. 11, which allows the tumor growth rate to depend on the age at onset. We study the dependency between the two unobserved processes of onset and growth, and we show that, under modeling assumptions, it can be quantified on the basis of observational data. We model screening sensitivity as a function of tumor size and mammographic density. Mammographic density is the amount of radio-dense tissue in the breast (predominantly the epithelial tissue and the stroma), and high density has been shown to reduce mammography screening sensitivity (7, 12). Younger women are known to have, on average, higher mammographic density than older women (13, 14).

## Materials and Methods

### Data

We use data from the KARolinska MAmmography Project for Risk Prediction of Breast Cancer (KARMA; ref. 15). KARMA is an ongoing Swedish prospective screening cohort which includes 70,877 women who were recruited between January 2011 and March 2013, when they attended mammography screening at one of four hospitals in Sweden. In Sweden, all women are invited to screening every 18–24 months between the ages 40 and 75 years.

Participants gave blood samples, and filled in a web-based questionnaire. Both raw and processed digital mammograms were stored. Breast cancer cases were identified through the Swedish national quality register INCA (Swedish Information Network for CAncer treatment). Follow-up for our study ended on August 12, 2017, and the median follow-up time was 5.4 years from entry. The tumor sizes were reported as the largest diameter measured during histopathology (in millimeters).

We included only women without a breast cancer diagnosis prior to study entry (left truncation, in statistical terms). The type of model we use in this study requires information on each woman's individual screening history—both before and after study entry. We therefore also excluded women without any records of previous mammograms or screening dates. Any woman without a recorded measurement of mammographic density (measured as the percentage of radio-dense tissue in the breast using *STRATUS*; ref. 16) was also excluded. From the 1,303 breast cancer cases recorded during the study period, we included those with invasive breast cancer, and with a recorded date of diagnosis, mode of detection (through mammography screening or through symptoms), and with measurements of primary tumor size at diagnosis. Out of the 70,877 KARMA participants, after applying these criteria, we included a total of 65,536 women in our study, of which 1,032 were diagnosed with invasive breast cancer during follow-up. See Fig. 1 for a flowchart of the data selection.

### Statistical analysis

We use a continuous growth natural history model which jointly models the likelihood of tumor size, age at detection, and mode of detection in invasive breast cancer. The model consists of four different submodels, each representing a latent process in the natural history of breast cancer. These processes are (i) the onset of invasive breast cancer (carcinogenesis); (ii) the continuous growth of the tumor; (iii) the manifestation of symptoms which cause the tumor to be spontaneously detected; and (iv) the mammography screening test sensitivity for early detection of asymptomatic tumors.

These four submodels are combined to produce a timeline for an invasive breast cancer tumor, from the patient's birth to tumor detection. The three possible endpoints of these timelines in our study are: (i) symptomatically detected cancer; (ii) screen-detected cancer; (iii) censored, or no detected cancer. We calculate the model likelihood, by using available data from each individual on age at detection, tumor size at detection, mode of detection, and individual screening histories for the cases; or age at the end of follow-up and the individual screening history for the censored. The likelihood is then maximized with respect to the total of 10 parameters from the submodels, to jointly estimate them, and to produce fitted distributions/functions of these submodels.

### Age at onset

The first process of the model covers the time from birth to breast cancer carcinogenesis. We refer to this as the onset of tumor, measured in terms of a woman's age in years. This should not be confused with the detection or diagnosis of the tumor. The other three submodels (below), in combination, represent the time from onset to detection.

We use the established Moolgavkar–Venson–Knudson (MVK) two-stage model of carcinogenesis (17) to model the age at tumor onset *T*. This model supposes that a malignant cancer cell is formed from a normal cell that has undergone two separate mutations. The model combines four Poisson processes corresponding to the cell division rate (with rate |\tilde{\alpha }$ |), the first mutation (initiation with rate |\tilde{\nu }$ |), cell death (rate |\tilde{\beta }$ |), and the second mutation for initiated cells (malignant transformation with rate |\tilde{\mu }$ |). The age at onset (carcinogenesis) is usually defined as the time of the first malignant cell. However, for this study, we assume that the starting tumor diameter at onset is 0.5 mm. From that point on, we assume that the tumor growth is largely deterministic. It is also from this size that we assume that tumors are detectable (with nonzero probability).

Heidenreich and colleagues (18) showed, however, that the four parameters in the MVK model are not jointly identifiable using only time-to-event data. Instead, we can use the following identifiable three parameters:

Using this parameterization, the hazard function for age at onset |{h_T}( t )$ | is

From this, the survival function can be derived as

and the probability density function (PDF) is

The new parameters *A* and *B* control the shape of the function but are difficult to interpret. The parameter δ is proportional to the rate of the first mutation, |\tilde{\nu }.$ |

### Tumor growth

Once the onset of breast cancer has occurred, we assume that the tumor grows exponentially. If the onset occurred at age |T\ = \ t$ |, the tumor volume (mm^{3}) at age *x* is

for an *inverse growth rate r*, which relates to the tumor volume doubling time through |\ln 2 \cdot r$ |. The tumor size is usually measured by its diameter (in mm). In our model, we assume that the tumor is spherical. This way, we can easily convert from the measured diameter to a corresponding volume. We note that Talkington and Durrett (19) have shown the exponential growth model to give a good fit, in comparison to other growth functions, to breast cancer *in vivo* data.

To account for the vast variability in tumor growth between individual tumors, previous continuous growth models have used random effects to model the tumor growth rates as a random variable *R* (6, 7, 20), where an individual tumor's (inverse) growth rate is a random draw |R\ = \ r$ | from this variable. The goal of the model is then to estimate the parameters of this random variable, representing the tumor growth rates on a population level.

As an extension, we propose to allow the inverse growth rate distribution to depend on the age at onset *T*, as defined above. We assume, given the age at tumor onset |T\ = \ t$ |, that the inverse growth rate *R* is drawn from a gamma distribution with PDF

where |{\rm{E}}[ {R|T\ = {\rm{\ }}t} ]{\rm{\ }} = {\rm{\ }}{\mu _t}$ |, |{\rm{Var}}( {R|T\ = {\rm{\ }}t} ){\rm{\ }} = {\rm{\ }}\phi \mu _t^2$ |, and |\ln {\mu _t}{\rm{\ }} = {\rm{\ }}{\theta _0} + {\theta _1}t.$ |

The extension for the inverse growth rate to depend on an observed covariate has been done before (8), but the strategy to allow growth rate to depend on a latent variable (from another submodel) is novel. Note that in the special case |{\theta _1}\ = \ 0$ |, the inverse growth rate is independent of the age at onset and follows the same assumptions as in (7, 6, 11), with mean |{\rm{exp}}( {{\theta _0}} )$ |.

### Symptomatic detection

Given enough time after breast cancer onset, symptoms will emerge, and the breast cancer will be detected due to these symptoms. We call the time from onset to symptomatic detection *U*′. We assume that *U*′ follows a continuous hazard function which is, at time *z* after onset, proportional to the latent tumor volume |V( z )$ |, i.e.,

for a parameter |\eta > 0$ |. This hazard represents the increased risk of displaying symptoms as the tumor progresses, either through a palpable lump, or other breast cancer–related symptoms. The parameter η determines the relationship between latent volume (mm^{3}) and the instantaneous hazard rate, where each mm^{3} increase in tumor volume increases the hazard rate by η. This simple formulation has been used previously (5–7), and gives tractable mathematical results (6).

### Screen detection

Before the tumor is symptomatically detected, there are opportunities to detect the tumor early through mammography screening. We assume that, if there is a breast cancer tumor at the time a woman attends a screening, there is a probability of detecting it at the screening. We call this probability the screening test sensitivity (STS), and represent this as a logistic function of the current tumor diameter *d* and the mammographic percent density *m*:

This represents the fact that larger tumors are easier to detect, and that high mammographic density can mask tumors. Here, we choose to not use the tumor volume as in the other submodels, because the mammogram is a two-dimensional projection of the breast. The diameter has been previously used in similar approaches (4, 7). We note that we use the statistical definition of sensitivity (above) as opposed to that which defines sensitivity as one minus the proportion of cancers that are interval detected (21).

At the end of follow-up, each woman will have an individual screening history, including the ages they attended, and the results (positive or negative). Only the cases that were observed as screen-detected will have a positive screening result (their last screening, marking the end of their follow-up). The rest will have a series of negative screening results. In the likelihood, these screening results are combined into the probability of the observed screening history. If a particular woman attended screening at ages |{\tau _1},\ {\tau _2}, \ldots ,\ {\tau _k}$ |, this probability is

Where |ST{S^*}( {{\tau _j}} )$ | is the STS where the tumor diameter is unknown at age |{\tau _j}$ |. Because the tumor diameter is only known at diagnosis, the rest needs to be inferred by using the other submodels. This is done by back-calculation, as part of the final likelihood function.

In our model, the processes of symptomatic detection and screen-detection act as competing risks. After all, the goal of mammography screening is to detect the tumor before it would be detected by symptoms. The observed mode of detection is determined by whichever comes first.

### Likelihood

The above submodels are assembled into a joint likelihood function of age at detection, mode of detection, and tumor size for each individual. The form of the likelihood contribution depends on each woman's observed endpoint. For the two types of breast cancer cases, the individual likelihood contribution is given by

where

We have no information on the size of any possibly latent tumor for the censored women. If we want to determine their likelihood of not having a detected breast cancer by the censoring time, we must therefore compound over an additional variable, for example, the inverse growth rate. The log-likelihood contribution is

See Eq. (K) for a derivation of these formulas, and for how they are adapted to account for left truncation. Heuristically, we can recognize all four submodels in Eqs.(L) and (N). The exponential expression in the enumerator represents surviving symptomatic detection up to age *x*, which is derived from Eqs. (I) and (J). The product in the denominator represents surviving screen-detection (i.e., having all screenings be negative) and comes from Eq. (J). For the cases, the age and tumor volume are used to substitute |r\ = \ ( {x - t} )/{\rm{ln}}( {v/{v_0}} )$ |, and the factor C is added with an additional expression relating to the observed mode of detection.

Each individual contribution is then combined into a log-likelihood

which is maximized with respect to the 10 model parameters |A,\ B,\ \delta ,\ {\theta _0},{\theta _1},\phi ,\ \eta ,\ {\beta _0},{\beta _1},{\beta _2}.$ |

### Implementation

The model likelihood and estimation procedure were implemented in R version 3.6.1 and C++ functions integrated using the package RcppArmadillo. Confidence intervals (CI) were calculated by inverting the Hessian of the log-likelihood, which was approximated using finite differences as part of the numerical optimization of the log-likelihood. Monte Carlo sampling (10,000 times) from the estimated joint parameter distribution was used to repeatedly calculate the quantities presented below, and the 2.5th and 97.5th percentiles of each calculated quantity was used for the 95% CIs.

### Data availability

The data that support the findings of this study are available from www.karmastudy.org but restrictions apply to the availability of these data, which were used under license for this study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of www.karmastudy.org.

## Results

A summary of key characteristics of the KARMA data included in this study is presented in Table 1. At the end of follow-up, there were a total of 1,032 cases of invasive breast cancer, which corresponds to 1.6% of the women. Out of these, 703 (68%) were screen-detected, and 329 were symptomatically detected. The percent mammographic density significantly differs between screen-detected and symptomatic cases. The higher percent density (PD) among symptomatic cases is consistent with the tumors being masked at screening among women with high percent density. Tumor sizes are on average smaller among screen-detected cases. This is due to the earlier detection that screening gives compared with the eventual symptomatic detection, that is, the lead-time (22) conferred by screening. Tumor sizes were also, on average, larger in women with high mammographic density (medians of 12 mm and 16 mm in the lowest and highest quartiles of density, respectively).

. | Screen-detected cases . | Symptomatic cases . | Censored . |
---|---|---|---|

Number of women | 703 | 329 | 64,504 |

Age at entry | 62 | 56 | 54 |

(median and quartiles) | (53–67) | (48–66) | (54–63) |

percent mammographic density | 0.14 | 0.26 | 0.18 |

(median and quartiles) | (0.05–0.28) | (0.12–0.42) | (0.06–0.35) |

Number of negative screenings | 2 | 2 | 3 |

(median and quartiles) | (1–2) | (1–3) | (3–4) |

Tumor size, mm | 13 | 17 | — |

(median and quartiles) | (9–20) | (12–24) | — |

. | Screen-detected cases . | Symptomatic cases . | Censored . |
---|---|---|---|

Number of women | 703 | 329 | 64,504 |

Age at entry | 62 | 56 | 54 |

(median and quartiles) | (53–67) | (48–66) | (54–63) |

percent mammographic density | 0.14 | 0.26 | 0.18 |

(median and quartiles) | (0.05–0.28) | (0.12–0.42) | (0.06–0.35) |

Number of negative screenings | 2 | 2 | 3 |

(median and quartiles) | (1–2) | (1–3) | (3–4) |

Tumor size, mm | 13 | 17 | — |

(median and quartiles) | (9–20) | (12–24) | — |

In Fig. 2A, we display the occurrence of all mammography screenings after study entry (i.e., we exclude the entry screen and the screenings before entry). We see that, after the entry screen, the women are typically screened around either 18 or 24 months later. We also see that, over time, the screenings get increasingly spread out. This particularly highlights the importance of considering individual screening histories and behaviors, as is incorporated into our modeling strategy (see section 4 of ref. 11).

A histogram representing the incidence of the 1,032 invasive breast cancers since study entry is shown in Fig. 2B. In this figure, patients have all been synchronized according to their respective date of entry. We see that most of the patients are screen-detected and occur in approximately two-year intervals after entry. There are very few symptomatically detected cancers occurring at these time points. The number gradually increases afterwards, leading up to the next round. This behavior is due to the fact that soon-to-be symptomatic cancers are instead detected at screening.

The estimates of the 10 parameters in our continuous growth model are presented in Table 2. Because the model parameters are not straightforward to interpret, we also present some more intuitive statistics, which are functions of the parameter estimates. We estimated (with 95% CIs in parentheses) the probability of onset occurring before age 75; the median tumor volume doubling time (for tumors with onset at ages 40 and 60); and the screening sensitivities for detecting a 13-mm tumor (the median screen-detected size), for women with the median percent density of 18%, and for women with the 90th percentile of density 50%, and for women with the 10th percentile of density of 2%. Sensitivities for other values can be read from Fig. 3 (see below).

Statistic . | Estimate . | 95% CI . |
---|---|---|

Probability of onset before age 75 | 13.4% | (10.3–17.1) |

Median tumor doubling times (days) | ||

Onset at age 40 | 167 | (118–236) |

Onset at age 60 | 207 | (163–262) |

Screening sensitivity for a 13-mm tumor | ||

with 2% PD | 79.3% | (68.1–87.3) |

with 18% PD | 73.3% | (61.1–82.7) |

with 50% PD | 58.3% | (43.6–71.9) |

Parameter estimates | ||

A×100 | −7.22 | (−3.73 to −14.00) |

B×1,000 | 1.18 | (0.65–2.16) |

δ×100 | 9.52 | (2.10–43.19) |

ln(η) | −8.82 | (−8.98 to −8.66) |

β_{0} | −4.99 | (−5.39 to −4.60) |

β_{1} | 0.49 | (0.43–0.55) |

β_{2} | −2.09 | (−2.93 to −1.26) |

φ | 0.56 | (0.46–0.68) |

exp(θ_{0}) | 0.52 | (0.22–1.23) |

exp(θ_{1}) | 1.011 | (0.997–1.025) |

Statistic . | Estimate . | 95% CI . |
---|---|---|

Probability of onset before age 75 | 13.4% | (10.3–17.1) |

Median tumor doubling times (days) | ||

Onset at age 40 | 167 | (118–236) |

Onset at age 60 | 207 | (163–262) |

Screening sensitivity for a 13-mm tumor | ||

with 2% PD | 79.3% | (68.1–87.3) |

with 18% PD | 73.3% | (61.1–82.7) |

with 50% PD | 58.3% | (43.6–71.9) |

Parameter estimates | ||

A×100 | −7.22 | (−3.73 to −14.00) |

B×1,000 | 1.18 | (0.65–2.16) |

δ×100 | 9.52 | (2.10–43.19) |

ln(η) | −8.82 | (−8.98 to −8.66) |

β_{0} | −4.99 | (−5.39 to −4.60) |

β_{1} | 0.49 | (0.43–0.55) |

β_{2} | −2.09 | (−2.93 to −1.26) |

φ | 0.56 | (0.46–0.68) |

exp(θ_{0}) | 0.52 | (0.22–1.23) |

exp(θ_{1}) | 1.011 | (0.997–1.025) |

Note: The CIs are derived from the estimated hessian of the log-likelihood function.

Abbreviation: PD, percent density.

The parameter estimates that relate to the association between age at onset and tumor growth rate (θ_{1}), and the association between percent density and screening sensitivity (β_{2}) are especially relevant. We see in Table 2 that increased percent density is associated with a reduction in screening sensitivity. The estimated effect of age at onset on the inverse tumor growth rate was |\exp \widehat {{\theta _1}}\ = 1.011\ ( {95\% \ {\rm{CI}}, 0:997, 1:025} )$ |, which is interpreted as a 1.1% increase in tumor volume doubling time per year difference in age at onset, that is, that earlier onset tumors are, on average, faster growing than later onset tumors.

Using our estimated parameters, we constructed the estimated distributions of age at onset and tumor volume doubling time, along with functions of screening sensitivity and tumor growth. These are presented in Fig. 3.

Figure 3A shows the distribution of age at onset, with a 95% confidence area. This function corresponds to the PDF in Eq. (F). In Fig. 3B, we show the estimated cumulative risk of onset, corresponding to one minus the survival function in Eq. (E). The estimated cumulative risk of onset by age 75 is 13.4% (95% CI, 10.3–17.1). For reference, the life-time risk of breast cancer diagnosis in American women is 13% (SEER cancer statistics review 1975–2016). We note also that the estimated incidence of tumor onset resembles a time-shifted version of incidence rates of breast cancer diagnosis that is observed in Sweden (23).

In Fig. 3C, we see the estimated screening sensitivity (with 95% confidence areas) for two different percent densities, 2% and 50%, which are the 10th and 90th percentiles of percent density found in our data. These functions correspond to Eq. (J).

Fig. 3D and E show, respectively, the distributions of the tumor volume doubling times [ln(2) times the inverse growth rate] of tumors when onset occurs at age 40 and 60 (with 95% confidence areas). These are estimates of the PDF in Eq. (H), with inverse growth rates transformed to tumor doubling times. We estimated the median to be 0.46 years (95% CI, 0.32–0.65) or 167 days (95% CI, 118–236) when onset occurs at age 40; and 0.56 years (95% CI, 0.44–0.72) or 207 days (95% CI, 163–262) when onset occurs at age 60. We estimate that the time it takes for a tumor with these estimated median tumor volume doubling times to grow from 0.5 mm to 15 mm (the median observed tumor size) is 6.7 years for onset at age 40, and 8.4 years for onset at age 60.

Because the processes we estimate are not directly observed, it is difficult to assess the goodness-of-fit of our model. To do this, we took our fitted model and performed a simulation study, where the goal was to replicate the outcome in the study data. We generated a large cohort of women, and matched them to the women in the observed data. To approximate the screening attendance, we randomly assigned each woman a screening program as follows: with 25% probability, screening every 24 months between ages 40 and 74; with 75% probability, screening every 18 months between ages 40 and 52, and every 24 months between ages 52 and 74. We then randomly assigned one of the screenings to be the entry into the study. We then matched our simulated women to the observed data with respect to the age at study entry, and PD. For each woman in the data, we matched 100 simulated women. A random maximum follow-up time was then sampled from the observed follow-ups, and the outcome was simulated.

The goodness-of-fit results can be found in Fig. 4. In Fig. 4A, we present a histogram of the observed incidence by age, with the simulated incidence as a smoothed curve. The simulated incidence was slightly lower (1.35% cases) than the observed (1.57% cases) but resemble each other closely with age. In Fig. 4B, we present the tumor size distributions for all ages, separated by mode of detection. Again, histograms of the observed sizes are compared with a smoothed curve of the simulated sizes. The simulated size distributions look similar to the observed for both screen-detected and symptomatic cases, but the median tumor size of the simulated cases was overall slightly higher (15.6 mm) than the observed (15.0 mm). In Fig. 4C, we present the tumor sizes only for women ages 40–49 at diagnosis, and in panel (D) only for women aged 50 and over.

## Discussion

In this study, we used a large breast cancer mammography cohort to quantify the distributions of age at onset (carcinogenesis), the heterogeneity of tumor growth, and mammography screening sensitivity. In particular, we studied the effect age at onset has on tumor growth rates, and the effect of mammographic density on screening sensitivity. Younger age at onset was associated with faster tumor growth. We note that, although the 95% CI for exp |\widehat {{\theta _1}}$ | (just) included 1, a log-likelihood ratio test comparing the presented model to one without onset-dependent growth (i.e., |{\theta _1}\ = \ 0$ |) showed a (just) statistically significant improvement (|P\ = \ 0.045$ |). Despite its borderline statistical significance, the association is in the same direction as reports of reports of age at diagnosis and tumor growth (1, 4, 9, 10).

Our estimates of the median tumor doubling time—both for onset at 40 years and 60 years—were noticeably longer than estimated in previous applications of continuous growth models. Weedon-Fekjær and colleagues (4) and Abrahamsson and Humphreys (7) estimated 142 days and 124 days, respectively. However, our estimate of 207 days for tumors with onset at age 60 is in line with estimates obtained by Fournier and colleagues (24). They estimated a median tumor doubling time of 212 days by measuring tumor size change *in vivo* using sequential mammograms. Similarly, Zhang and colleagues (25) used serial ultrasound images, and estimated 167 days for women aged 26–51 (at diagnosis), and 227 days for women aged 52–71 (at diagnosis).

We used simulations to assess the goodness-of-fit of our model to the observed data, since the model consists of four unobservable processes. The general impression it gives is that incidence and screen-detected tumor sizes are well fitted, but that the symptomatic tumor sizes are overestimated, particularly in the younger women. The inferior fit for the young symptomatic women could be attributed to the scarcity of data for that subgroup, which contributed to only 80 of the 1,032 cases. It could, however, indicate an area where the model could be improved, where additional attention is needed for women ages 40–49.

The choice to do a simulation for the model assessment is not ideal, due to the individual screening attendances, screening intervals, study entries and ends of follow-up. These all had to be approximated to some extent. However, these complications are precisely the reason why simulation was the best approach. While it is conceivable to use formulas to represent the expected outcome, these remain to be derived. These formulas would be related to making age-, size-, and mode-specific risk predictions, which will be the focus of a future study of ours on risk prediction, once more follow-up data from the KARMA study is available. This requires deriving a number of theoretical results, such as conditional growth rate distributions (conditioning on tumor size, detection mode, and screening history), and conditional distributions of future onset times.

It is worth noting that the women in our study are all above the age of 40, since that is when the Swedish screening program starts. In the context of breast cancer, young women are often defined as those being under 40 years (26). This is a limitation that is shared with similar previous studies, however, where the youngest women were either 40 years (1) or 50 years (4). Those studies did not take mammographic density into account.

Another potential limitation of our study is the relatively short follow-up time (median 5.4 years). However, this is enough follow-up to include at least two full rounds of screening, as seen in Fig. 3. If we combine this with the fact that the participants were recruited at all screening ages (40–74 years in Sweden), we believe that our results are valid for women attending screening in Sweden.

More aggressive subtypes of breast cancer (e.g., estrogen receptor–negative, HER2-enriched, or triple negative), and tumors of a higher grade, are associated with younger age (26, 27). The type of model presented here can conceivably be extended to model different subtypes of breast cancer, possibly by using mixture distributions. More methodologic development is needed. In the currentis model, the different subtypes and tumor grades can be considered to occupy different parts of the random effects submodel for the inverse growth rate.

By modeling the natural history of breast cancer through four different submodels, we open up the possibility of studying the effect known breast cancer risk factors have on individual submodels. In this study, we only included PD in the submodel for screening sensitivity. Our group is currently working on adding more factors to other submodels. To obtain reliable estimates, it will be important to include relevant risk factors, and to carry out studies with larger numbers of patients with cancer. We showed here how including mammographic density in the screening sensitivity model impacts estimates of tumor growth rates.

Continuous growth models are proving to be a promising approach for studying breast cancer natural history. By better understanding the processes of onset, growth, and screening sensitivity, we can study the effects of factors such as mammographic density, hormone replacement therapy use, parity, or family history on more than just breast cancer incidence. Among other things, this opens up the possibility to assess individualized screening, by identifying the individuals for which screening will be most necessary and effective.

## Authors' Disclosures

R. Strandberg reports grants from Vetenskapsrådet (VR), Swedish Cancer Society, and Karolinska Institutet during the conduct of the study. No disclosures were reported by the other authors.

## Authors' Contributions

**R. Strandberg:** Conceptualization, formal analysis, visualization, methodology, writing–original draft. **K. Czene:** Supervision, writing–review and editing. **M. Eriksson:** Data curation, writing–review and editing. **P. Hall:** Supervision, writing–review and editing. **K. Humphreys:** Conceptualization, supervision, funding acquisition, writing–review and editing.

## Acknowledgments

This work was supported by grants awarded to K. Humphreys from the Swedish Research Council (2020-01302), the Swedish Cancer Society (CAN 2020-0714), and Karolinska Institutet (KID block grant). These sources had no involvement in any aspect of this article.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

## References

*in vivo*