Cancers originate due to the accumulation of multiple mutations. The mechanism causing this, particularly in the case of epithelial cancers, or carcinomas, has not been clearly determined. Epidemiological studies indicate that environmental and lifestyle factors can account for most carcinomas, but do not point to a specific mechanism. Some models for human carcinogenesis suggest that the accumulation of four to six somatic mutations drives the neoplastic transformation. The number of mutations, four to six, is based almost exclusively on the analysis of age-specific incidence data. Several anomalies exist with the age-specific incidence data. For instance, breast carcinoma incidence data does not follow this model. (This has been explained by suggesting that breast tissue ages at a different rate than calendar time.) Prostate carcinoma incidence increases more rapidly with age than most others, implying that 15 to 20 mutations are required. Hence, one approach to better understand epithelial carcinogenesis is to find a mathematical model that can explain the age incidence curves for all carcinomas.

To pursue this goal, I compared a model (a normal distribution) with age-specific incidence data compiled by the Surveillance, Epidemiology and End Results (SEER) program on human carcinomas. Specifically, I used annual SEER-9 (Single Ages) data from 1973-2003 for human colon, lung, prostate, and breast carcinomas. I fit a normal distribution to each of the 31 annual period's data using maximum likelihood estimators, then tested whether the normal distribution was consistent with the data using the chi-squared goodness of fit test.

Results indicate that the age-specific incidence data from most human carcinomas are consistent with being generated by a normal distribution. For instance, each of the 31 individual data sets, representing incidence in the 31 years from 1973 to 2003, gave P-values between 0.01 and 1.0 for both lung and colon carcinoma, suggesting no basis for discarding the hypothesis that these incidence numbers follow a normal distribution. Breast carcinoma age-specific incidence data is only consistent with a model in which the data is generated by two distinct normal distributions. Prostate carcinoma age-specific incidence data was consistent with a single normal distribution from 1973 to 1990, but since then is only consistent with a model in which two distinct normal distributions generate the data. The distinct change post-1991 correlates with the widespread implementation of PSA screening.

I conclude that the underlying carcinogenesis process, along with cohort effects, can be mathematically modelled with a single normal distribution for colon and lung carcinomas, and with two normal distributions for breast and modern prostate carcinomas. The reason for this is unclear, but its significance is not. Understanding the mechanism will lead to a better understanding of the carcinogenesis process.

98th AACR Annual Meeting-- Apr 14-18, 2007; Los Angeles, CA