Although many diseases are associated with cancer, the full spectrum of temporal disease correlations across cancer types has not yet been characterized. A population-wide study of longitudinal disease trajectories is needed to interrogate the general medical histories of patients with cancer. Here we performed a retrospective study covering a 20-year period, using 6.9 million patients from the Danish National Patient Registry linked to 0.7 million patients with cancer from the Danish Cancer Registry. Statistical analysis identified all significant disease associations occurring prior to cancer diagnoses. These associations were used to build frequently occurring, longitudinal disease trajectories. Across 17 cancer types, a total of 648 significant diagnoses correlated directly with a cancer, while 168 diagnosis trajectories of time-ordered steps were identified for seven cancer types. The most common diseases across cancer types involved cardiovascular, obesity, and genitourinary diseases. A comprehensive, publicly available web tool of interactive illustrations for all cancer disease associations is provided. By exploring the precancer landscape using this large dataset, we identify disease associations that can be used to derive mechanistic hypotheses for future cancer research.

Significance:

This study offers an innovative approach to examine prediagnostic disease and cancer development in a large national population-based setting and provides a publicly available tool to foster additional cancer surveillance research.

Patients with cancer often have diverse histories of hospital encounters before they acquire their first cancer diagnosis. Longitudinal health care information can be used to reveal causes of cancer at the level of individuals such as repeated or prolonged infections with oncogenic pathogens due to impaired immune surveillance (1). A given cancer type may be caused by different processes, which might be reflected in different prehistories and multimorbidities associated with the ultimate development of one cancer.

It is clinically apparent that there is a substantial variation in precancer medical histories among patients ultimately diagnosed with the same cancer type. One example is head and neck cancer where evidence shows that it can arise from two distinct etiologies (one human papillomavirus–related and another alcohol and smoking-related), which should be treated as two separate entities (2). Another example is chronic infectious diseases associated with increased risk of lymphoma, which could be due to sources such as chronic inflammation or immunodeficiency caused by for example immunosuppressive therapy.

Healthcare data are increasingly being applied in biomedical research to investigate patient stratification, comorbidities, and clinical outcomes (3). A patient can traverse multiple diseases during a lifetime driven by risk factors such as lifestyle, genetics, or treatment-provoked disease associations (4). Some studies have already performed disease trajectory analysis on healthcare data, for example, studies that either use population-wide registry data (6–7 million patients) from a single country (5) or small sets of electronic patient records (6). Diverse cancer routes converging toward sepsis have also been analyzed previously using such a longitudinal approach (7). However, no study has systematically mapped cancer disease associations and trajectory trends using large cohorts without selection bias (8) followed over several decades. Here we use a population-wide approach to delineate all precancer diagnosis pairs and diagnosis trajectories using the Danish Cancer Registry (9) and the Danish National Patient Registry (NPR; ref. 10) in the period 1994–2015. As the registry covers a nation-wide healthcare system, selection biases such as selective inclusion of specific patient groups/hospitals, income levels, age groups, and genders are eliminated (8).

When using diagnosis trajectories to infer causal deductions between a cancer and its disease cooccurrences, a key challenge is to distinguish between diagnoses that drive cancer formation from those that do not. Diagnoses that arise just before the cancer diagnosis may derive from the cancer (e.g., pneumonia preceding lung cancer), whereas those emerging after the cancer diagnosis may be the result of complications to treatment. Hence, we designed this study to acquire the full temporal map of all diagnoses occurring years before the first cancer diagnosis. By applying temporality as a tool, it would be assumed that those diagnoses that occurred long before the cancer is diagnosed are more likely be causally related to the development of the cancer than those appearing closer to the diagnostic timepoint even if the temporality in itself does not represent a proof.

The complex map of specific diagnoses upstream cancers has been made available as an interactive web catalog.

Study design

The objective of this retrospective cohort study was to determine and characterize disease associations and trajectories prior to cancer diagnoses using population-wide electronic health registry data. The trajectories were derived by joining temporal diagnosis correlations using a previously published statistical approach (5).

The data used in the study is from the Danish NPR and the Danish Cancer Registry. Registry data comprise administrative data and primary and secondary diagnoses covering all hospital encounters in Denmark. All diagnoses are classified using the International Classification of Diseases (ICD-10) classification system, which has a hierarchical structure. ICD-10 codes can thus be rounded to a higher-level diagnosis code, block, or chapter. Here, we used level 3 codes (e.g., C50 for breast cancer).

Diagnoses occurring after first cancer diagnoses were left out to reduce confounding posttreatment-related effects from the analysis. To reduce confounding from these effects, our analysis disregards diagnoses following the cancer and only uses the first cancer diagnosis for each patient. Correspondingly, we only included the first occurrence of all noncancer diagnoses in the analysis prior to the cancer diagnosis. In this way, we remove many of the posttreatment effects and achieve balance between cancer and noncancer diagnoses.

Patient cohorts

NPR covers all encounters with Danish hospitals since 1977 including public and private hospital visits. In 1995, data were added for patients from emergency rooms and psychiatric wards (outpatients). The Cancer Registry contains information on cancer incidences for the Danish population since 1943 and onwards. Reporting has been made mandatory since 1987 and notifications were made electronic since 2004. In addition, it includes information on tumor characteristics such as tumor staging, topography and morphology, and treatment strategies. Our extracts from NPR and the Cancer Registry cover the period 1994–2015 and include 6.9 million patients and 0.8 million patients (0.7 million malignant cases), respectively. We did not include data from before 1994 from the two registries, because these were coded in the older version ICD-8. Patient-specific information could be linked between the Cancer Registry and NPR by using the nationwide Danish identification numbers. All patient identification numbers were deidentified during analysis. The median age and sex frequency for each cancer type are listed in Table 1.

Table 1.

Characteristics and statistics for cancer types directly associated with a diagnosis

Directional (D→C) disease pairs
ICD-10Number of incidencesin cohortMedian age atdiagnosis%Male%FemaleN patientsN pairs
Skin C44 158,970 68 48 52 52,252 89 
Lung C34 70,875 69 55 45 27,931 288 
Prostate C61 57,840 72 100 23,177 73 
Breast C50 88,609 63 99 8,189 31 
Stomach C16 10,389 70 64 36 2,994 69 
Corpus uteri C54 12,281 67 100 1,869 
Ovary C56 10,868 65 100 1,325 20 
Diffuse NHL C83 3,372 67 58 42 998 32 
Other NHL C85 3,140 67 54 46 949 21 
Thyroid C73 3,780 51 27 73 638 
Cervix uteri C53 8,482 49 100 273 
Follicular NHL C82 1,344 62 49 51 100 
Tonsil C09 2,503 59 71 27 90 
Oropharynx C10 1,176 61 71 29 89 
Anus and anal canal C21 1,876 63 30 70 44 
Vulva C51 1,829 73 100 23 
Tongue C01 593 60 76 24 16 
Directional (D→C) disease pairs
ICD-10Number of incidencesin cohortMedian age atdiagnosis%Male%FemaleN patientsN pairs
Skin C44 158,970 68 48 52 52,252 89 
Lung C34 70,875 69 55 45 27,931 288 
Prostate C61 57,840 72 100 23,177 73 
Breast C50 88,609 63 99 8,189 31 
Stomach C16 10,389 70 64 36 2,994 69 
Corpus uteri C54 12,281 67 100 1,869 
Ovary C56 10,868 65 100 1,325 20 
Diffuse NHL C83 3,372 67 58 42 998 32 
Other NHL C85 3,140 67 54 46 949 21 
Thyroid C73 3,780 51 27 73 638 
Cervix uteri C53 8,482 49 100 273 
Follicular NHL C82 1,344 62 49 51 100 
Tonsil C09 2,503 59 71 27 90 
Oropharynx C10 1,176 61 71 29 89 
Anus and anal canal C21 1,876 63 30 70 44 
Vulva C51 1,829 73 100 23 
Tongue C01 593 60 76 24 16 

Abbreviations: C, cancer diagnosis; D, noncancer diagnosis.

To include the most reliable cancer hospital encounters, we excluded cancer diagnoses in the NPR not present in the curated Cancer Registry, because these can be false-positive cases. Furthermore, we included cancer cases from the Cancer Registry not registered in NPR for patients already in NPR.

Diagnosis pair correlations and construction of disease trajectories

We applied a previously published method (5) to calculate significant directional diagnosis pairs and join these into disease trajectories. The strength of correlation between a pair of diagnoses was estimated using RR scores (Eq. A). For a given pair of diagnoses, D1 followed by D2, an exposed group of patients was created by identifying all discharges with D1 assigned. Each of the exposed groups were compared to N = 10,000 randomly sampled comparison groups matched by age, sex, type of hospital encounter (inpatient, outpatient, and emergency room), and calendar time (discharge week). Matching by discharge week was done to eliminate seasonal fluctuations in, for example, diagnoses patterns that could have confounding effects on the findings. Subsequently, the occurrence of D2 within a timeframe of 15 years from the first D1 discharge was counted for both the exposed and comparison groups. This count was denoted Cexposed for the exposed group and Ci for the ith comparison group. As the population sizes are nexposed for both the exposed and all comparison groups, the RR is given by:

formula

P values for the RR were obtained using a binomial distribution, where Cexposed is compared with the average probability of sampling a control patient with the second disease within the timeframe (5). When testing hundreds of thousands of correlations in a batch manner, it is very difficult to ensure that the assumptions for advanced hierarchical models are met and that they will converge. The strength of the method here is that it does not rely on assumptions based on Poisson distributions, nontime-dependent proportional hazards, or similar concepts.

Next, the method tests for significant directionality (D1 → D2) between each significantly correlated diagnosis pair. This is done with a binomial test comparing the number of times the first diagnosis precedes the second to a probability of 50%. Finally, longitudinal disease trajectories were generated joining significant directional diagnosis pairs into trajectories of various lengths. Only trajectories with at least 100 patients following it were included.

To address the issue of multiple testing, Bonferroni correction was applied to both the P values associated with RR and with directionality (D1 → D2). The P values for directionality were only corrected for those diagnosis pairs that passed the RR significance cut-off value of 1.

Temporal analysis of precancer diagnoses

We selected all significant directional diagnosis pairs with a RR > 1 that comprise of a noncancer diagnosis leading to a cancer diagnosis (D1 → C1). For each diagnosis pair, the differential time (in years) was calculated between D1 and C1 for each patient. The average differential time across patients with the specific diagnosis pair was then used as an estimate of the time occurrence of D1 before C1. In this analysis, we have excluded inverse relationships (RR > 1) although they are of significant interest in relation to preventive approaches or improved understanding of cancer etiology (11). There is now an interesting literature on inverse relationships, for example psychiatric disease in relation to lower incidence levels of certain cancers (12, 13). However, the inverse relationships are more sensitive to confounding factors, for example, different smoking or dietary patterns in psychiatric patients that can alter cancer incidences. We do in general not have access to exposome data at the population-wide scale and therefore decided to not include inverse relationships to avoid presenting spurious associations of this kind. Underdiagnosis is another problem that may lead to the spurious identification of inverse comorbidities. One example is cancer and Alzheimer's, where a comparison with vascular dementia may be needed to establish that the lower risk of cancer in patients with Alzheimer disease is likely correct and not caused by general underdiagnosis of dementia in a particular cohort (14). It has also been reported that underdiagnoses, when analyzing chronic obstructive pulmonary disease (COPD) and its comorbidities, including lung cancer, may contribute to explaining the varied incidence and prevalence of common comorbidities between studies (15).

Construction of disease trajectories with aggregated diagnoses

Aggregated diagnoses were also created for both cancer and noncancer diagnoses and analyzed by chapter in the ICD-10 classification system. Cancer types that have previously been linked to risk factors: infectious agents, hormonal factors, or smoking in the literature (Supplementary Table S1) were grouped into these respective groups. Noncancer diseases related to immune dysfunction were grouped and diseases that could relate to aging-related causes were grouped. The same approach was applied to find disease correlations and trajectories as for the nongrouped analysis.

Interactive figures

All results are freely available as a web catalog of interactive figures at http://cprlinxweb05.sund.ku.dk/.

Data and materials approval

The NPR and Cancer Registry are protected by the Danish Act on Processing of Personal Data and can only be accessed through application. We obtained approval through the Danish Data Protection Agency, Copenhagen (ref. no. 2015–54–0939) and the National Board of Health, Copenhagen (ref. no. FSEID-00001627). Informed consent and assessment of the proposal in scientific ethical committees are not required for registry-based research in Denmark.

Generating precancer diagnosis maps

Across all ICD-10 chapters, we carried out a population-wide statistical analysis to obtain directional disease cooccurrence pairs of significance for patients with cancer. We obtained 10,675 significant cancer-associated diagnosis pairs (Bonferroni-corrected P ≤ 0.05) with a RR higher than 1.0 (where the second diagnosis occurred within a 15-year time frame of the first disease; Supplementary Fig. S1). Of these, 3,526 pairs had a significant direction (Bonferroni-corrected P ≤ 0.05; Supplementary Table S2). To analyze single diseases directly associated with cancer, we selected the 648 pairs that comprised one cancer diagnosis. Overall, these pairs span 17 cancer types and the number of diagnosis pairs varies from one to 288 pairs between cancer types (Table 1). The cancer types with the most diagnosis pairs were lung (288 pairs), skin (89 pairs), and stomach (69 pairs).

Figure 1 presents a comprehensive precancer map of diagnoses for the eight most prevalent cancer types based on data spanning 15 years prior to a cancer diagnosis (for remaining cancer types and specific diagnoses, see Supplementary Table S3 and Supplementary Fig. S2, respectively). The results reflect current knowledge for recognized disease associations including benign mammary dysplasia prior to breast cancer (5.3 years; RR, 2.1), hyperplasia of prostate prior to prostate cancer (2.0 years; RR, 2.0), and COPD prior to lung cancer (1.5 years; RR, 2.2; for all RR scores see Supplementary Table S2 and Supplementary Fig. S3). These cases serve as positive controls to demonstrate that the method reliably can detect known disease associations. Figure 1 furthermore shows that the precancer disease maps vary both in terms of time-wise accumulation of diagnoses and frequency of ICD-10 chapters. For example, breast cancer–related disease pairs occurred further from the cancer diagnosis than those developing prior to lung cancer. Also, diseases related to lung and stomach cancer span substantially more ICD-10 chapters than those of hormone-related cancers (breast and ovarian cancer).

Figure 1.

A stacked density plot of precancer disease occurrences across 10 years. The density of noncancer diagnoses occurrences before the cancer types (with at least 20 significant directional diagnosis pairs); breast, prostate, ovary, lung, skin, stomach, diffuse NHL (diff. NHL), and other NHL (other NHL) are shown at the ICD-10 chapter level represented by the colors. The timeline spans 15 years before first cancer diagnosis, but we only detected diagnoses within 10 years. The time at each diagnosis is averaged across all patients with cancer with that specific disease pair. Only disease pairs with a significant directionality are included. Significant disease pairs without directionality can be found in Supplementary Fig. S1.

Figure 1.

A stacked density plot of precancer disease occurrences across 10 years. The density of noncancer diagnoses occurrences before the cancer types (with at least 20 significant directional diagnosis pairs); breast, prostate, ovary, lung, skin, stomach, diffuse NHL (diff. NHL), and other NHL (other NHL) are shown at the ICD-10 chapter level represented by the colors. The timeline spans 15 years before first cancer diagnosis, but we only detected diagnoses within 10 years. The time at each diagnosis is averaged across all patients with cancer with that specific disease pair. Only disease pairs with a significant directionality are included. Significant disease pairs without directionality can be found in Supplementary Fig. S1.

Close modal

Recurrent diagnoses across cancer types

The most frequently shared diagnosis is excessive, frequent, and irregular menstruation from chapter XIV (Diseases from the Genitourinary System), which was observed across 11 cancer types (Fig. 2 and web catalog) with RR scores ranging from 4 to 8. In the same chapter, both male infertility (prostate and skin) and female infertility (breast, lung, and skin) were observed with extremely high RR scores, ranging from 12 to 32. Other common diagnoses from this chapter are hyperplasia of prostate (six cancer types), female genital prolapse (five cancer types), dysplasia of cervix uteri (five cancer types), and hydrocele and spermatocele (five cancer types).

Figure 2.

Overlap of cancer-associated diagnoses across 17 cancer types. The occurrence of overlapping cancer-associated diagnoses across cancer types is shown, where each needle represents a significant cancer-associated diagnosis. Most of the overlapping diagnoses are in chapter 9 (Diseases of the Circulatory System) and chapter 13 (Diseases of Musculoskeletal System and Connective Tissue). Top overlapping diagnoses (n ≥ 5) are labeled with their corresponding diagnosis name. The colors represent the ICD-10 chapters (see Fig. 1 for color legend).

Figure 2.

Overlap of cancer-associated diagnoses across 17 cancer types. The occurrence of overlapping cancer-associated diagnoses across cancer types is shown, where each needle represents a significant cancer-associated diagnosis. Most of the overlapping diagnoses are in chapter 9 (Diseases of the Circulatory System) and chapter 13 (Diseases of Musculoskeletal System and Connective Tissue). Top overlapping diagnoses (n ≥ 5) are labeled with their corresponding diagnosis name. The colors represent the ICD-10 chapters (see Fig. 1 for color legend).

Close modal

Interestingly, one of the most pervasive disease groups is chapter IX (Diseases of the Circulatory System) involving diagnoses such as angina pectoris (eight cancer types), chronic ischemic heart disease (five cancer types), and paroxysmal tachycardia (four cancer types). Lung cancer has up to 30 different disorders in this chapter with most accumulating at 1–2 years before the cancer diagnosis, followed by stomach cancer with 11 different disorders.

Diagnoses from chapter XI (Diseases of the Digestive System), such as hernia, cholelithiasis, and irritable bowel disease were also prevalent among cancer types. A more cancer type–specific chapter observed was chapter IV (Endocrine, Nutritional and Metabolic Diseases), observed in diffuse non–Hodgkin lymphoma (NHL), lung, and stomach cancer, which include the diagnoses hyperthyroidism, type 2 diabetes mellitus, and disorders of lipoprotein metabolism and other lipidemias.

Analyzing diseases by their temporal proximities to cancer

We searched specifically for diagnoses for which the pathophysiologic cause could be made clearer in light of their temporal proximity to cancer. Diagnoses in chapter I (Certain Infectious and Parasitic Diseases) reside both far and close in time to the cancer diagnosis in several cancer types. Examples of infectious “close-up” diagnoses (1–2 years before) are infectious diarrhea and gastroenteritis and viral infections of central nervous system. Infectious diagnoses more distant (3–8 years before) are spirochetal infections or sexually transmitted diseases. The years prior to the cancer diagnoses lack statistically significant diagnoses (see Supplementary Fig. S4 for nondirectional pairs). Other chapters such as chapter IV (Endocrine, Nutritional and Metabolic Diseases) have the diagnoses obesity, noninsulin-dependent diabetes, and disorders of lipoprotein metabolism in closer proximity (1–2 years) to the cancer types lung, stomach, and diffuse NHL.

Interestingly, chapter XIV (Diseases of the Genitourinary System), one of the chapters with highest prevalence among cancer types, had several diagnoses residing distant from the cancer, such as irregular menstruation (9 years), infertility (6–7 years), and endometriosis (6–7 years). On the other hand, diagnoses from the same chapter, but related to tumor symptoms such as benign mammary dysplasia or hyperplasia of prostate, were found to occur much closer to the time of cancer diagnosis.

Angina pectoris and chronic ischemic heart disease appear closer to the cancer diagnosis than varicose veins of lower extremities and hemorrhoids. From clinical evidence, these groups of heart disorders are generally treated very differently due to differing causes (16, 17), with the latter two being linked to obesity (18, 19). Additional obesity-linked diagnoses were identified for chapter XI (Diseases of the Digestive System) with hernia (six cancer types; ref. 20) and cholelithiasis (five cancer types; ref. 21) being to be most frequently shared among cancer types.

Longitudinal disease trajectories

We generated longitudinal cancer trajectories of variable lengths with noncancer diagnoses occurring sequentially before the cancer diagnosis (Fig. 3). The trajectories were built by joining the directional diagnosis pairs and can therefore model the longitudinal order of multiple diagnoses. Each trajectory illustrates the flow of a patient group that traverse the entire path of diagnoses, where the width represents the relative size of the patient group. We obtained 162 significant trajectories of length three (three directionally joined diagnoses) for seven cancer types (lung, prostate, skin, stomach, breast, diffuse NHL, and other NHL) and six trajectories of length four for one cancer type (lung cancer). We plotted significant length three trajectories for six of the cancer types: breast, prostate, skin, stomach, diffuse NHL, and other NHL. For illustrative purposes, we plotted length four trajectories for lung cancer (length three lung cancer trajectories can be found in Supplementary Table S4).

Figure 3.

Precancer disease trajectories. Significant disease trajectories for cancer types: lung, breast, prostate, skin, stomach, and NHL (top left to bottom right). The trajectory sequence (from left to right) represents the temporal order in which the diseases were diagnosed, and the width of the trajectories represents the relative number of patients that follow the entire trajectory. We included only trajectories with at least 100 patients. Patients can participate in multiple trajectories and the number represents the sum of all patient cases for all trajectories within a cancer type. The colors of the nodes in the trajectories represent the ICD-10 chapters (see Fig. 1 for color legend).

Figure 3.

Precancer disease trajectories. Significant disease trajectories for cancer types: lung, breast, prostate, skin, stomach, and NHL (top left to bottom right). The trajectory sequence (from left to right) represents the temporal order in which the diseases were diagnosed, and the width of the trajectories represents the relative number of patients that follow the entire trajectory. We included only trajectories with at least 100 patients. Patients can participate in multiple trajectories and the number represents the sum of all patient cases for all trajectories within a cancer type. The colors of the nodes in the trajectories represent the ICD-10 chapters (see Fig. 1 for color legend).

Close modal

Figure 3 shows that the trajectories fall in different pathophysiologic classes represented by four dominating classes of trajectories overall. We found that several cardiovascular disease (CVD) diagnoses from chapter IX form one class of trajectories. This could, for instance, involve angina pectoris leading to chronic ischemic heart disease or tachycardia (lung, skin, prostate, and stomach). We furthermore observed acute myocardial infarction leading to chronic ischemic heart disease (lung, prostate, stomach, and diffuse and other NHL). A second class of trajectories are related to dysfunctions in the digestive system involving diagnoses such as gastritis and duodenitis, gastric ulcer, cholelithiasis, irritable bowel syndrome, and dyspepsia (lung, skin, and prostate). A third class has irregularities in menstruation followed by endometrioses or lump in breast followed by benign mammary dysplasia (skin and breast cancer) from chapter XIV. This class of trajectory relates to altered hormonal balances. Finally, the fourth class seems to involve diagnoses that are associated with symptoms of aging, even though age-related associations have been corrected for by our approach (Materials and Methods). For instance, this class could comprise of internal derangement of knee followed by gonarthrosis or senile cataract followed by other disorders of lens.

To investigate chronic immunocompromised diseases that might have had subclinical manifestations years before the onset of the first cancer diagnosis, we generated diagnosis trajectories using an a priori hypothesis approach. We aggregated diagnoses into grouped immune-related diagnoses (diseases related to immune system defects and infections), grouped hormone cancers, grouped infectious cancers, and grouped smoking-related cancer (Supplementary Table S1). Even though, few immune-defect trajectories were observed, we did identify trajectories that did not show up in the single-diagnosis analysis. For example, the grouped smoking-related cancers comprise two types of trajectories, one involving immune-defect diagnoses and the other CVDs. The patients following the CVD-related trajectory showed better survival rates than patients following the immune-defect trajectory (Supplementary Fig. S5).

We conducted a systematic, retrospective population-wide analysis to map out all precancer disease histories, including disease pairs and trajectories. The method is based on traditional case–control analysis but compared with traditional epidemiologic studies, it can examine the global disease spectrum in a longitudinal manner beyond a limited set of hypotheses or diseases. The explorative method thus allows for the discovery of unexpected disease correlations. Temporality was used as a means to form hypotheses on cancer etiologies and suggest whether a diagnosis is more likely to be cancer-causing or indicative of reverse causation. The latter relies on the tenet that if a diagnosis resides temporally close to the cancer, it is more likely to reflect complications to an already established, but not-yet-diagnosed cancer.

From the Danish registry data spanning more than 20 years, we could uncover 3,526 significant directional precancer disease associations, of which, 648 single diseases were directly associated with one (or more) of 17 cancer types. By joining the directional disease pairs, we furthermore obtained 168 disease trajectories (162 length three and six length four) covering seven cancer types. Overall, the data suggests that frequently observed diagnoses and trajectories across cancer types are related to obesity, CVDs, or genitourinary disorders. Evidence in the literature show that inflammation has a key tumor-activating role in many types of disorders: dysbiosis (22), obesity (23), atherosclerosis (24, 25), or other inflammation-driven disorders (26, 27).

We identified a high prevalence of CVDs (1–5 years) prior to most cancer diagnoses. A recent study supports this finding by showing a higher prevalence of CVDs across different cancer types but for a selected cohort (patients with cancer often requiring cardiotoxic treatment) and without considering the longitudinal aspect (28). CVDs and cancer are known to share many risk factors (such as diet, obesity, and tobacco; ref. 29) and common pathways (30). We also observed obesity-related diagnoses, including hernia, diverticular disease, cholelithiasis, hemorrhoids, and varicose veins, in several cancer types that can be validated from previous studies (18, 20, 21, 31, 32). The actual obesity diagnosis is shared among two cancer types and type 2 diabetes mellitus is shared among three cancer types. These findings provide additional evidence in support of the contention that low-grade, systemic inflammation, linked to metabolic syndrome and obesity (33), could elevate the risk of certain cancer types. In the United States, approximately 20% of cancers are estimated to be related to weight gain and obesity (34).

We observe that genitourinary disorders associate with multiple types of cancers (not only hormone-regulated), manifest distant from the cancer diagnoses and have high RR scores. These observations suggest that an inflammatory state could elevate cancer risk. Endometriosis has been linked to abnormal high estrogen levels. Sparse evidence has associated it with higher risk of hormone-regulated cancer types (35, 36) without identifying an overall increase of cancer risk (37). In our study, we found that endometriosis is associated with breast, ovarian, skin, and lung cancer, with the latter two associations not previously reported. Recent findings show that endometriosis not only plays a role in hormone pathways but also immune-mediated pathways (38–41) and that anti-inflammatory treatment can inhibit endometriosis in mice (42, 43). We furthermore observed irregular menstruation and female/male infertility with extremely high RR scores. Irregular menstruation has previously been associated with an altered hormonal environment and polycystic ovarian syndrome (44), but limited evidence show that it can elevate cancer risk (45). Our findings rank irregular menstruation as the top shared diagnosis occurring in 11 cancer types.

A better understanding of the underlying pathophysiology linking diseases in various organs is important to provide a biological understanding for use to predict disease onset. Patients identified at excessive risk could be candidates for more intense screening, modification of behavioral patterns (e.g., smoking cessation), and the application of medical interventions that reduce the impact of the pathophysiologic trajectory they follow. One example is IL1β inhibition for patients with lung cancer that can benefit from more targeted therapies because these interventions are expensive and carry risk of fatal adverse consequences. Estimating the likelihood that the patient is on a pathophysiologic trajectory can improve the number of patients benefitting from treatment. Classifying risk phenotypes found here such as CVDs, obesity, or genitourinary disorders allows for the investigation of potential disease biomarkers that can be used to support early decision-making in the clinics. Moreover, exploring causes and mediators of inflammation can help identify novel cancer drugs through drug-repurposing strategies. Epidemiologic studies have linked the long-term use of nonsteroidal anti-inflammatory drugs, such as aspirin, to a lower incidence of certain cancer types due to their anti-inflammatory effect (46). A recent randomized study showed that by inhibiting IL1β, a key mediator of inflammation, the incidence of lung cancer could be reduced (24).

A limitation of the study is that the analyses are based on subjectively classified diagnoses codes and relies on the ICD-10 classification system. The strength of association thus depends on the accuracy of the given diagnoses, which can be influenced by the prevalence of the diagnoses and accuracy of the ICD-10 classification system. However, the accuracy of the diagnoses has been estimated quantitatively with the result that it is quite high as the registry describes a single-payer health care system with limited billing bias (47). Any misclassification from our approach should reduce the power to detect significant associations (false-positives), thus leading to a bias of a null result. Also, the registry does not contain diagnoses from general practitioners and correlations with these are therefore not included (although many of them are repeated in the hospital setting). We furthermore emphasize that the study relies on a discovery-based methodology to explore the global disease patterns and may pinpoint overlooked disease associations, which need to be supported by follow-up studies to confirm causality.

This is the first study to systematically map all precancer medical histories in a population-wide cancer and structured cohort. We suggest that the contribution of a broad span of diagnoses related to obesity, CVDs, or genitourinary disorders (endometriosis and irregular menstruation) could converge toward a common cancer risk factor being low-grade inflammation. Our results are made freely available and can be explored for future research or clinical purposes.

S. Brunak has ownership interest (including stock, patents, etc.) in Intomics A/S, Hoba Therapeutics Aps, Novo Nordisk A/S, and Lundbeck A/S and is a consultant/advisory board member for Proscion A/S. No potential conflicts of interest were disclosed by the other authors.

Conception and design: J.X. Hu, M. Helleberg, A.B. Jensen, S. Brunak, J. Lundgren

Development of methodology: J.X. Hu, A.B. Jensen, S. Brunak

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J.X. Hu, M. Helleberg, A.B. Jensen, S. Brunak

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.X. Hu, M. Helleberg, S. Brunak, J. Lundgren

Writing, review, and/or revision of the manuscript: J.X. Hu, M. Helleberg, A.B. Jensen, S. Brunak, J. Lundgren

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M. Helleberg, S. Brunak

Study supervision: M. Helleberg, S. Brunak, J. Lundgren

We would like to acknowledge the Novo Nordisk Foundation (grant agreement NNF14CC0001), as well as the Innovation Fund Denmark (grant agreement 5153-00002B) and Danish National Research Foundation (grant no. 126) for funding the research.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Grulich
AE
,
van Leeuwen
MT
,
Falster
MO
,
Vajdic
CM
. 
Incidence of cancers in people with HIV/AIDS compared with immunosuppressed transplant recipients: a meta-analysis
.
Lancet
2007
;
370
;
59
6
.
2.
Field
N
,
Lechner
M
. 
Exploring the implications of HPV infection for head and neck cancer
.
Sex Transm Infect
2015
;
91
:
229
30
.
3.
Jensen
PB
,
Jensen
LJ
,
Brunak
S
. 
Mining electronic health records: towards better research applications and clinical care
.
Nat Rev Genet
2012
;
13
:
395
405
.
4.
Hu
JX
,
Thomas
CE
,
Brunak
S
. 
Network biology concepts in complex disease comorbidities
.
Nat Rev Genet
2016
;
17
:
615
29
.
5.
Jensen
AB
,
Moseley
PL
,
Oprea
TI
,
Ellesøe
SG
,
Eriksson
R
,
Schmock
H
, et al
Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients
.
Nat Commun
2014
;
5
:
4022
.
6.
Jensen
K
,
Soguero-Ruiz
C
,
Mikalsen
KO
,
Lindsetmo
R-O
,
Kouskoumvekaki
I
,
Girolami
M
, et al
Analysis of free text in electronic health records for identification of cancer patient trajectories
.
Sci Rep
2017
;
7
:
46226
.
7.
Beck
MK
,
Jensen
AB
,
Nielsen
AB
,
Perner
A
,
Moseley
PL
,
Brunak
S
. 
Diagnosis trajectories of prior multi-morbidity predict sepsis mortality
.
Sci Rep
2016
;
6
:
36624
.
8.
Schmidt
M
,
Schmidt
SAJ
,
Sandegaard
JL
,
Ehrenstein
V
,
Pedersen
L
,
Sørensen
HT
. 
The Danish National patient registry: a review of content, data quality, and research potential
.
Clin Epidemiol
2015
;
7
:
449
90
.
9.
Gjerstorff
ML
. 
The Danish Cancer Registry
.
Scand J Public Health
2011
;
39
:
42
5
.
10.
Lynge
E
,
Sandegaard
JL
,
Rebolj
M
. 
The Danish National Patient Register
.
Scand J Public Health
2011
;
39
:
30
3
.
11.
Tabarés-Seisdedos
R
,
Baudot
A
. 
Editorial: direct and inverse comorbidities between complex disorders
.
Front Physiol
2016
;
7
:
117
.
12.
Sánchez-Valle
J
,
Tejero
H
,
Ibáñez
K
,
Portero
JL
,
Krallinger
M
,
Al-Shahrour
F
, et al
A molecular hypothesis to explain direct and inverse co-morbidities between Alzheimer's disease, glioblastoma and lung cancer
.
Sci Rep
2017
;
7
:
4474
.
13.
Catalá-López
F
,
Hutton
B
,
Driver
JA
,
Page
MJ
,
Ridao
M
,
Valderas
JM
, et al
Cancer and central nervous system disorders: protocol for an umbrella review of systematic reviews and updated meta-analyses of observational studies
.
Syst Rev
2017
;
6
:
69
.
14.
Driver
JA
. 
Inverse association between cancer and neurodegenerative disease: review of the epidemiologic and biological evidence
.
Biogerontology
2014
;
15
:
547
57
.
15.
Smith
MC
,
Wrobel
JP
. 
Epidemiology and clinical impact of major comorbidities in patients with COPD
.
Int J Chron Obstruct Pulmon Dis
2014
;
9
:
871
888
.
16.
Mackay
TFC
. 
Epistasis and quantitative traits: using model organisms to study gene–gene interactions
.
Nat Rev Genet
2013
;
15
:
22
33
.
17.
Ohman
EM
. 
Chronic stable angina
.
N Engl J Med
2016
;
374
:
1167
76
.
18.
Davies
HO
,
Popplewell
M
,
Singhal
R
,
Smith
N
,
Bradbury
AW
. 
Obesity and lower limb venous disease – the epidemic of phlebesity
.
Phlebology
2017
;
32
:
227
33
.
19.
Negri
E
,
Pagano
R
,
Decarli
A
,
La Vecchia
C
. 
Body weight and the prevalence of chronic diseases
.
J Epidemiol Community Health
1988
;
42
:
24
9
.
20.
Fischer
JP
,
Basta
MN
,
Mirzabeigi
MN
,
Bauder
AR
,
Fox
JP
,
Drebin
JA
, et al
A risk model and cost analysis of incisional hernia after elective, abdominal surgery based upon 12,373 cases: the case for targeted prophylactic intervention
.
Ann Surg
2016
;
263
:
1010
17
.
21.
Shabanzadeh
DM
,
Skaaby
T
,
Sørensen
LT
,
Eugen-Olsen
J
,
Jørgensen
T
. 
Metabolic biomarkers and gallstone disease – a population-based study
.
Scand J Gastroenterol
2017
;
52
:
1270
77
.
22.
Elinav
E
,
Nowarski
R
,
Thaiss
CA
,
Hu
B
,
Jin
C
,
Flavell
RA
. 
Inflammation-induced cancer: crosstalk between tumours, immune cells and microorganisms
.
Nat Rev Cancer
2013
;
13
:
759
71
.
23.
Deng
T
,
Lyon
CJ
,
Bergin
S
,
Caligiuri
MA
,
Hsueh
WA
. 
Obesity, inflammation, and cancer
.
Annu Rev Pathol Mech Dis
2016
;
11
:
421
49
.
24.
Ridker
PM
,
MacFadyen
JG
,
Thuren
T
,
Everett
BM
,
Libby
P
,
Glynn
RJ
, et al
Effect of interleukin-1β inhibition with canakinumab on incident lung cancer in patients with atherosclerosis: exploratory results from a randomised, double-blind, placebo-controlled trial
.
Lancet
2017
;
6736
:
1
10
.
25.
Libby
P
. 
Inflammation and cardiovascular disease mechanisms
.
Am J Clin Nutr
2006
;
83
:
456S
460S
.
26.
Grivennikov
SI
,
Greten
FR
,
Karin
M
. 
Immunity, inflammation, and cancer
.
Cell
2010
;
140
:
883
99
.
27.
Coussens
LM
,
Werb
Z
. 
Inflammation and cancer
.
Nature
2002
;
420
:
860
67
.
28.
Al-Kindi
SG
,
Oliveira
GH
. 
Prevalence of preexisting cardiovascular disease in patients with different types of cancer: the unmet need for onco-cardiology
.
Mayo Clin Proc
2016
;
91
:
81
3
.
29.
Koene
RJ
,
Prizment
AE
,
Blaes
A
,
Konety
SH
. 
Shared risk factors in cardiovascular disease and cancer
.
Circulation
2016
;
133
:
1104
114
.
30.
Masoudkabir
F
,
Sarrafzadegan
N
,
Gotay
C
,
Ignaszewski
AP
,
Krahn
AD
,
Davis
MK
, et al
Cardiovascular disease and cancer: evidence for shared disease pathways and pharmacologic prevention
.
Atherosclerosis
2017
;
263
:
343
51
.
31.
Ruhl
CE
,
Everhart
JE
. 
Risk factors for inguinal hernia among adults in the US population
.
Am J Epidemiol
2007
;
165
:
1154
161
.
32.
Portincasa
P
,
Moschetta
A
,
Palasciano
G
. 
Cholesterol gallstone disease
.
Lancet
2006
;
368
:
230
39
.
33.
Reilly
SM
,
Saltiel
AR
. 
Adapting to obesity with adipose tissue inflammation
.
Nat Rev Endocrinol
2017
;
13
:
633
43
.
34.
Calle
EE
,
Rodriguez
C
,
Walker-Thurmond
K
,
Thun
MJ
. 
Overweight, obesity, and mortality from cancer in a prospectively studied cohort of U.S. Adults
.
N Engl J Med
2003
;
348
:
1625
638
.
35.
Giudice
LC
,
Kao
LC
. 
Endometriosis
.
Lancet
2004
;
364
:
1789
99
.
36.
Brinton
LA
,
Gridley
G
,
Persson
I
,
Baron
J
,
Bergqvist
A
. 
Cancer risk after a hospital discharge diagnosis of endometriosis
.
Am J Obstet Gynecol
1997
;
176
:
572
579
.
37.
Melin
A
,
Sparén
P
,
Bergqvist
A
. 
The risk of cancer and the role of parity among women with endometriosis
.
Hum Reprod
2007
;
22
:
3021
6
.
38.
Burns
KA
,
Thomas
SY
,
Hamilton
KJ
,
Young
SL
,
Cook
DN
,
Kenneth
S
, et al
Early endometriosis in females is directed by immune-mediated estrogen receptor alpha and IL6 cross-talk
.
Endocrinology
2018
;
159
:
103
18
.
39.
Mu
F
,
Harris
R
,
Rich-Edwards
JW
,
Hankinson
SE
,
Rimm
EB
,
Spiegelman
D
, et al
A prospective study of inflammatory markers and risk of endometriosis
.
Am J Epidemiol
2018
;
187
:
515
22
.
40.
Zhao
Y
,
Gong
P
,
Chen
Y
,
Nwachukwu
JC
,
Srinivasan
S
,
Ko
CJ
, et al
Dual suppression of estrogenic and inflammatory activities for targeting of endometriosis
.
Sci Transl Med
2015
;
7
:
271ra9
271ra9
.
41.
Tanaka
Y
,
Mori
T
,
Ito
F
,
Koshiba
A
,
Takaoka
O
,
Kataoka
H
, et al
Exacerbation of endometriosis due to regulatory T cell dysfunction
.
J Clin Endocrinol Metab
2017
;
102
:
3206
217
.
42.
Schwager
K
,
Bootz
FO
,
Imesch
P
,
Kaspar
M
,
Trachsel
E
,
Neri
D
. 
The antibody-mediated targeted delivery of interleukin-10 inhibits endometriosis in a syngeneic mouse model
.
Hum Reprod
2011
;
26
:
2344
352
.
43.
Quattrone
F
,
Sánchez
AM
,
Pannese
M
,
Hemmerle
T
,
Viganó
P
,
Candiani
M
, et al
The targeted delivery of interleukin 4 inhibits development of endometriotic lesions in a mouse model
.
Reprod Sci
2015
;
22
:
1143
152
.
44.
Chittenden
BG
,
Fullerton
G
,
Maheshwari
A
,
Bhattacharya
S
. 
Polycystic ovary syndrome and the risk of gynaecological cancer: a systematic review
.
Reprod Biomed Online
2009
;
19
:
398
405
.
45.
Cirillo
PM
,
Wang
ET
,
Cedars
MI
,
Chen
LM
,
Cohn
BA
. 
Irregular menses predicts ovarian cancer: prospective evidence from the Child Health and Development Studies
.
Int J Cancer
2016
;
139
:
1009
017
.
46.
Cuzick
J
,
Otto
F
,
Baron
JA
,
Brown
PH
,
Burn
J
,
Greenwald
P
, et al
Aspirin and non-steroidal anti-inflammatory drugs for cancer prevention: an international consensus statement
.
Lancet Oncol
2009
;
10
:
501
07
.
47.
Thygesen
SK
,
Christiansen
CF
,
Christensen
S
,
Lash
TL
,
Sørensen
HT
. 
The predictive value of ICD-10 diagnostic coding used to assess Charlson comorbidity index conditions in the population-based Danish National Registry of Patients
.
BMC Med Res Methodol
2011
;
11
:
83
.