Background: Certain medical conditions affect risk of non-Hodgkin lymphoma (NHL), but the full range of associations is unknown. We implemented a novel method (“medical condition-wide association study,” MedWAS) to comprehensively evaluate medical risk factors for NHL documented in administrative health claims.

Methods: Using Surveillance, Epidemiology, and End Results (SEER)-Medicare data, we conducted a case–control study comparing NHL cases [N = 52,691, age 66+ years, with five subtypes: chronic lymphocytic leukemia/small lymphocytic lymphoma, diffuse large B-cell lymphoma (DLBCL), follicular lymphoma, marginal zone lymphoma (MZL), T-cell lymphoma (TCL)] to controls (N = 200,000). We systematically screened for associations with 5,926 medical conditions documented in Medicare claims more than 1 year before selection.

Results: Fifty-five conditions were variously associated with NHL. Examples include well-established associations of human immunodeficiency virus, solid organ transplantation, and hepatitis C virus with increased DLBCL risk (ORs 3.83, 4.27, and 1.74, respectively), and autoimmune conditions with DLBCL and MZL (e.g., ORs of 2.10 and 4.74, respectively, for Sjögren syndrome). Risks for all NHL subtypes were increased after diagnoses of nonmelanoma skin cancer (ORs 1.19–1.55), actinic keratosis (1.12–1.25), or hemolytic anemia (1.64–4.07). Nine additional skin conditions increased only TCL risk (ORs 2.20–4.12). Diabetes mellitus was associated with increased DLBCL risk (OR 1.09). Associations varied significantly across NHL subtypes for 49 conditions (89%).

Conclusion: Using an exploratory method, we found numerous medical conditions associated with NHL risk, and many associations varied across NHL subtypes.

Impact: These results point to etiologic heterogeneity among NHL subtypes. MedWAS is a new method for assessing the etiology of cancer and other diseases. Cancer Epidemiol Biomarkers Prev; 25(7); 1105–13. ©2016 AACR.

Non-Hodgkin lymphoma (NHL) is a common malignancy, with 465,000 incident cases worldwide in 2013 (1). Incidence rises with age, and 57% of U.S. cases occur after the age of 65 years (2). Although considered a single entity for descriptive purposes, NHL comprises a group of heterogeneous subtypes with distinct clinical presentations and, as is increasingly recognized, differing causal pathways (i.e., etiologic heterogeneity; ref. 3). Common NHL subtypes include tumors derived from B cells such as diffuse large B-cell lymphoma (DLBCL), chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL), follicular lymphoma, and marginal zone lymphoma (MZL). T-cell lymphomas (TCL) are less common.

Among the strongest risk factors for NHL are medical conditions, including those associated with immune dysfunction and chronic infections. For example, immunosuppression due to human immunodeficiency virus (HIV) infection or solid organ transplantation greatly increases NHL risk (4, 5). Immunosuppression facilitates activation of Epstein–Barr virus infection, which contributes especially to DLBCL. Autoimmune diseases such as rheumatoid arthritis, systemic lupus erythematosus, and Sjögren syndrome increase risk for DLBCL and MZL (6). These medical conditions are thought to promote the development of NHL by causing long-term immune activation. Treatment of these conditions by immunosuppressive medications also likely contributes. In contrast, medical risk factors for CLL/SLL, follicular lymphoma, and TCL are not clearly established.

Large administrative databases provide a valuable resource for examining associations between medical risk factors and cancer. We have previously used the Surveillance, Epidemiology, and End Results (SEER)-Medicare database (described below) to conduct case–control studies among the U.S. elderly population, assessing associations between various medical conditions and cancers such as NHL, leukemias, and skin cancers (7–10). Strengths of SEER-Medicare include availability of data from cancer registries (which provide reliable case ascertainment and detailed information on cancer subtypes), its large size (i.e., 1.6 million cancer cases diagnosed in 1991–2009), and information on medical risk factors detailed in Medicare claims beginning at the age of 65 years (11). Until now, SEER-Medicare has been used to evaluate a limited number of medical risk factors for cancer (7–10, 12–15), selected on the basis of previously published findings or plausible biologic mechanisms.

A comprehensive assessment of medical conditions in association with cancer would be an attractive new approach to characterizing a wide spectrum of risk factors. Using claims data, for example, this could be done by separately evaluating associations with every medical condition specified by billing codes. We term this novel approach “MedWAS,” for “medical condition-wide association study,” given its use of a broadly agnostic assessment that is a feature of genome-wide association studies (GWAS). One might anticipate that a MedWAS assessment of all medical conditions could uncover previously unsuspected associations with cancer, which would then prompt an investigation into possible biologic mechanisms. A comprehensive assessment might also uncover a diversity of conditions associated with different cancer subtypes, providing evidence for etiologic heterogeneity.

In the current study, we implement this new MedWAS approach using SEER-Medicare data to assess potential risk factors for five subtypes of NHL. We demonstrate the utility of this method for characterizing the spectrum of medical risk factors for NHL, assessing etiologic heterogeneity among NHL subtypes, and identifying new medical conditions associated with NHL. Given the large multiplicity of testing, we emphasize the exploratory nature of this approach. We review possible etiologic and artifactual explanations for associations that we uncover.

Subject selection and ascertainment of medical conditions

SEER-Medicare (http://healthcaredelivery.cancer.gov/seermedicare/) links data from SEER cancer registries (covering 28% of the U.S. population in 2010) and Medicare (which provides medical insurance for the U.S. elderly; ref. 11). For this study, we selected cases and controls from the SEER-Medicare dataset as described previously (16). Specifically, we identified NHL cases in SEER that were indicated by SEER to be the person's first invasive cancer (except for a possible diagnosis of basal and squamous cell skin cancers, which are common nonmelanoma skin cancers not captured by SEER). We included the five most common NHL subtypes, defined according to the World Health Organization classification (17): CLL/SLL, DLBCL, follicular lymphoma, MZL, and TCLs considered as a group.

All Medicare beneficiaries are enrolled in part A, which covers hospital care, and most also subscribe to part B which covers physician and outpatient services. Health maintenance organizations (HMO) do not routinely bill Medicare for individual encounters. To ensure availability of Medicare claims prior to NHL diagnosis, we required cases that: (i) were ages 66–99 years at diagnosis; (ii) were diagnosed in 1992–2009; (iii) had a minimum of 13 months of part A and part B Medicare coverage before diagnosis, during which they were not enrolled in an HMO; and (iv) had at least one Medicare claim for a hospitalization (documented in the MEDPAR file), provider visit (NCH file), or outpatient services (OUTPATIENT file) at least 13 months before diagnosis. Medicare coverage and claims were considered back to the later of age 65 or a calendar year cutoff that varied according to calendar year of NHL diagnosis based on the availability of Medicare claims data (1991 for cases diagnosed in 1992–2002, 1998 for 2003–2005, 2000 for 2006–2007, 2002 for 2008–2009). Cases diagnosed only on autopsy or death certificate were excluded. Selection yielded N = 52,691 cases. For comparison, omission of the requirements for non-HMO coverage and at least one Medicare claim would have yielded N = 68,044 cases.

Controls were selected from the random 5% sample of Medicare beneficiaries living in SEER areas included in the SEER-Medicare dataset (16). Controls were selected separately for each calendar year 1992–2009. As of July 1 (the selection date) of each year, controls were required to: (i) be alive and cancer-free; (ii) have at least 13 months of prior Medicare coverage; and (iii) have at least one Medicare claim at least 13 months earlier; age and calendar year cutoffs, and requirements for Medicare coverage and claims, were as described above for cases. From eligible controls, we randomly selected 200,000 controls, frequency matched to cases according to calendar year, sex, age, and race. Controls could be selected more than once for multiple years or included later as a case.

We searched Medicare claims to identify medical conditions diagnosed more than 12 months before case/control selection. The 1-year period immediately before case/control selection was excluded to minimize bias due to reverse causality or differential medical work-up of cases. We initially considered medical conditions defined by the first four digits of International Classification of Diseases (version 9, ICD-9) codes. However, we also considered three-digit codes when providers only indicated this level of detail. To indicate that the condition was present, we required one inpatient claim with the diagnosis (MEDPAR file) or at least two physician or outpatient claims at least 30 days apart (NCH and OUTPATIENT files). Medical conditions could be described at any position in the claim, that is, as primary or secondary diagnoses.

Statistical analysis

ICD-9 is a hierarchical coding system designed to provide an international standard for morbidity and mortality statistics, and especially as implemented in Medicare claims, for use in reimbursement of care providers. One challenge is that no level of the scheme uniformly captures all medical conditions at the same degree of detail, and some conditions are indicated by multiple codes in separate parts of the classification. We therefore used a stepwise approach to identify medical conditions associated with NHL subtypes (Fig. 1).

Specifically, in the first step, the prevalence of every ICD-9–specified condition was compared separately between each NHL subtype and all controls. This group of unselected conditions (SELECT0) was defined by categorizing Medicare claims based on all provided four-digit ICD-9 codes (or occasionally, as noted above, by three-digit codes). In comparing the prevalence of the conditions in cases and controls, we selected conditions for further evaluation if: (i) the lowest achievable significance level computed from marginal totals in the 2 × 2 table (minalpha statistic) was less than 0.001 (18); and (ii) the P value from the Cochran–Mantel–Haenszel test (conditioning on the matching factors) was less than the Bonferroni cutoff (defined as 0.05 divided by the number of conditions remaining after applying the minalpha criterion). We excluded ICD-9–coded conditions for invalid conditions, specifically those obviously corresponding to a possible NHL diagnosis (e.g., NHL itself, lymphadenopathy, or splenomegaly), nonspecific symptoms (e.g., headache, fatigue), and spurious codes that could not be matched to diagnoses. Because cases and controls were selected to have no prior SEER-documented cancer, we considered claims for previous cancer diagnoses (other than nonmelanoma skin cancer) as having uncertain reliability; therefore, we also excluded these claims diagnoses from analysis.

This procedure yielded a subset of conditions for each NHL subtype, which we refer to as SELECT1. Next, we used binary logistic regression models to derive ORs, measuring the associations of each SELECT1 condition with the NHL subtype, adjusted for demographic characteristics (sex, age, and calendar year of case/control selection, race), and as a measure of health care utilization, the number of provider claims per year (see Table 1 footnote for details). Each NHL subtype was compared with all controls, and the variance of the ORs accounted for the multiple sampling of some controls (16).

Two physicians reviewed these results to group similar SELECT1 conditions together (e.g., multiple ICD-9 codes for skin cancer at different body sites), add related ICD-9 codes not in SELECT1 as part of the groups, and in rare instances, to break SELECT1 conditions into finer categories (e.g., distinguishing different hepatitis virus infections based on five-digit codes). SELECT1 conditions were removed if they were rare (<5 affected cases when the OR > 1, or <100 affected controls when the OR < 1) and could not be grouped with other conditions. This process led to SELECT2 conditions for each NHL subtype.

We again used binary logistic regression to assess associations of SELECT2 conditions, adjusted for demographic characteristics and yearly physician claims. We excluded SELECT2 conditions for which the adjusted OR was both nonsignificant (P ≥ 0.05) and close to the null value, or for which the models did not converge due to small numbers of affected cases or controls. This process yielded the final group of SELECT3 medical conditions associated with each NHL subtype.

We compiled the list of SELECT3 medical conditions across the five NHL subtypes and used polytomous logistic regression to assess the association of each SELECT3 condition with each subtype, adjusting for demographic characteristics and yearly provider claims. Although we present ORs for each condition with all five subtypes, we focus on the subset of conditions that were associated with each subtype in its separate SELECT3 analysis. Each polytomous logistic regression model provided a test of heterogeneity of the ORs across NHL subtypes.

We also conducted a sensitivity analysis for SELECT3 conditions to further minimize the possibility of reverse causality. Specifically, we excluded from evaluation the 3-year period immediately preceding NHL diagnosis/control selection. This approach left N = 39,995 cases and N = 158,706 controls (age 68 or older) with evaluable time covered by Medicare. For these subjects, we reascertained the previously identified SELECT3 medical conditions, this time excluding the 3-year window before case/control selection, and reran the polytomous logistic regression models.

Replication of selected findings

We sought to replicate selected associations with SELECT3 medical conditions in two independent datasets: the National Institutes of Health-AARP Diet and Health Study (NIH-AARP) and the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO), as described in the notes to Supplementary Table S1. These replications required data on both the medical condition and NHL subtypes, so replications could not be undertaken for all findings. We utilized a two-sided α of 0.05 to assess significance in these analyses. In some instances, the magnitude of the association that we tried to replicate and the number of cases with each NHL subtype were both small, greatly limiting the statistical power for the replication (Supplementary Table S1).

Characteristics of study subjects and stepwise selection of medical conditions

We included 52,691 cases with NHL (N = 19,078 for DLBCL, N = 18,236 for CLL/SLL, N = 8,881 for follicular lymphoma, N = 4,289 for MZL, N = 2,207 for TCL) from SEER-Medicare. Overall, cases were well matched to the 200,000 controls according to demographic characteristics, although there were some minor differences among NHL subtypes (Table 1). Cases tended to have slightly shorter duration of prior Medicare coverage (median 52 vs. 54 months, excluding the 12 months immediately before case/control selection) but had more physician visits per year (Table 1).

Figure 1 and Table 2 document the process by which medical conditions associated with each NHL subtype were identified. For each NHL subtype, we screened 5,605 to 5,785 conditions indicated by unique four-digit (and occasional three-digit) ICD-9 codes, or a total of 5,926 conditions across all five subtypes (SELECT0 conditions). More than 97% of these conditions were excluded because they were not significantly associated with the subtype or were for invalid diagnoses. This procedure left 30 to 52 remaining conditions for each subtype, identified by (mostly) four-digit ICD-9 codes (SELECT1 conditions; see Supplementary Table S2 for a complete list). Two physicians reviewed these conditions and grouped related codes to create 13 to 28 SELECT2 conditions for each NHL subtype.

Most SELECT2 conditions remained associated with their respective NHL subtypes in multivariate logistic regression models, yielding the final SELECT3 group of medical conditions (N = 17 conditions for CLL/SLL, N = 27 for DBLCL, N = 13 for follicular lymphoma, N = 18 for MZL, and N = 15 for TCL, for a total of N = 55 unique SELECT3 conditions; Table 2). ICD-9 codes for SELECT3 conditions, as well as the number of subjects with each condition, are presented in Supplementary Table S3.

Findings of MedWAS analyses

Table 3 presents associations for the 55 SELECT3 medical conditions with each NHL subtype. Of note, most conditions (N = 49, 89%) varied significantly in their associations across NHL subtypes (i.e., Pheterogeneity < 0.05; Table 3).

We highlight associations for each subtype that were identified in the SELECT3 group for that subtype (shaded in gray in Table 3). For most of these, the 95% confidence interval (CI) for the OR excludes 1.00 (underlined in Table 3). Only 3 of the 55 medical conditions were associated with increased risk for all five NHL subtypes: nonmelanoma skin cancer (ORs 1.19–1.55), actinic keratosis (1.12–1.25), and hemolytic anemia (1.64–4.07; Table 3).

Among the positive associations, several immunodeficiency or infectious conditions were associated with increased risk for 1–2 NHL subtypes (Table 3). Specifically, associations were observed for HIV infection and solid organ transplantation with DLBCL (ORs 3.83 and 4.27, respectively), and deficiency of humoral immunity with CLL/SLL (3.85) and TCL (5.70). CLL/SLL risk was increased in association with herpes zoster (OR 1.44), acute sinusitis (1.12), and acute bronchitis (1.06), and DLBCL was increased with hepatitis C virus (HCV) infection (1.74).

Autoimmune diseases were also associated with increased risk for some subtypes, including for DLBCL with rheumatoid arthritis (OR 1.43), sarcoidosis (2.11), and uveitis (3.17). Systemic lupus erythematosus was associated with risk of DLBCL (OR 1.74) and MZL (2.57); Sjögren syndrome with DLBCL (2.10) and MZL (4.74); and celiac disease with TCL (8.09). An elevated erythrocyte sedimentation rate was associated with increased risk of MZL (OR 2.00).

Among hematologic conditions (in addition to hemolytic anemia which was associated with all NHL subtypes), thrombocytopenia was associated with increased risk for all subtypes other than follicular lymphoma (ORs 1.46–2.21). Aplastic anemia, anemia not otherwise specified (NOS), and neutropenia were positively associated with MZL (ORs 3.26, 1.12, and 2.47, respectively). Monoclonal paraproteinemia was associated with CLL/SLL, DLBCL, and MZL (ORs 1.80–3.43), and cryoglobulinemia with DLBCL (6.36) and follicular lymphoma (10.14).

Among skin conditions other than nonmelanoma skin cancer and actinic keratosis, nine were associated with increased risk only for TCL: atopic dermatitis (OR 4.12), contact dermatitis (2.61), dermatitis due to substances taken internally (4.05), bullous skin diseases (3.43), discoid lupus (4.00), psoriasis (3.72), folliculitis (2.65), asteatosis (2.20), and urticaria (2.42). Seborrheic keratosis was associated with increased risk of follicular lymphoma (OR 1.18) and MZL (1.24).

There were a few additional positive associations. Diabetes mellitus was associated with DLBCL (OR 1.09). Testicular hypofunction, benign prostatic hyperplasia, and gastric ulcer were associated with MZL (ORs 1.58, 1.30, and 1.55, respectively), and spinal cord anomalies was associated with CLL/SLL (3.20).

No medical condition was inversely associated with all five NHL subtypes. However, inverse associations were observed with several neurologic/psychiatric conditions (Table 3). Three such conditions were inversely associated with all NHL subtypes except TCL: senile dementia (ORs 0.40–0.61), stroke (ORs 0.74–0.86), and psychosis NOS (0.37–0.64). Inverse associations were also observed for arteriosclerotic dementia with CLL/SLL, DLBCL, and follicular lymphoma (ORs 0.32–0.54); Parkinson disease with DLBCL (0.67); depression NOS with DLBCL (0.74) and follicular lymphoma (0.70); nonpsychotic mental disorder NOS with DLBCL (0.23); and alcoholism with DLBCL (0.59).

Among infections, inverse associations were found for chronic bronchitis and urinary tract infection with DLBCL (ORs 0.78 and 0.83, respectively). CLL/SLL was reduced in association with hypertension (OR 0.83), abdominal aortic aneurysm (0.67), and hyperlipidemia (0.86), and follicular lymphoma was reduced in association with systolic heart failure (0.80). Among miscellaneous conditions, inverse associations were observed between decubitus ulcer and DLBCL (OR 0.57); asphyxia and follicular lymphoma (0.52); and hip fracture and DLBCL (0.62) and follicular lymphoma (0.50).

Sensitivity and replication analyses

We performed a sensitivity analyses for the 55 SELECT3 medical conditions, excluding claims during the 3 years immediately before case/control selection. ORs were very similar to the primary analysis (Supplementary Table S4).

Table 4 presents results of replication analyses in additional populations for some SELECT3 conditions. For diabetes mellitus and DLBCL, positive associations in the NIH-AARP and PLCO cohorts appeared consistent with the MedWAS observation, although the replications did not reach statistical significance. There was also an inverse (though statistically nonsignificant) association in PLCO between chronic bronchitis and DLBCL. Associations with hypertension and stroke were not significant in replication analyses.

We surveyed a large number of medical conditions as risk factors for NHL using a new approach termed “MedWAS.” This method characterized the full spectrum of medical conditions related to NHL among the U.S. elderly. Some associations that we document are well established, whereas others are new or less supported.

Importantly, for the 55 medical conditions retained in our final (SELECT3) analyses, we found that most associations varied significantly across the five NHL subtypes. In fact, most medical conditions were associated with only one or a few NHL subtypes. Because some associations are likely etiologic (as we review below), these differences point to etiologic heterogeneity, that is, distinct causal factors that contribute to each NHL subtype. Other differences in associations with environmental risk factors and genetic polymorphisms (3, 19–22) likewise support that different NHL subtypes arise through separate (although perhaps overlapping) mechanisms.

Some associations that we demonstrated reflect well-established contributions of chronic immune disturbances to development of NHL. These include associations of HIV infection, HCV infection, and solid organ transplantation with DLBCL and, to a lesser extent, MZL (4, 5, 23). As described previously (6), we found an increased risk of DLBCL and/or MZL associated with a range of autoimmune conditions including rheumatoid arthritis, systemic lupus erythematosus, and Sjögren syndrome, and strongly elevated risk of TCL with celiac disease. In striking contrast, none of these conditions was associated with CLL/SLL or follicular lymphoma. Increased risk of CLL/SLL following herpes zoster, acute sinusitis, and acute bronchitis may plausibly be a manifestation of chronic immune deficits preceding this malignancy (24, 25).

Interestingly, risk for each NHL subtype was elevated following nonmelanoma skin cancer. A few prior studies have described increased risk for NHL or CLL/SLL following diagnoses of basal or squamous cell skin cancers (26–28), but the current investigation is the first to document associations for specific NHL subtypes other than CLL/SLL. This study is also the first to show an increased risk for NHL following a diagnosis of actinic keratosis, the precursor of squamous cell skin cancer. Although skin damage from ultraviolet radiation strongly increases risk for skin cancer, ultraviolet radiation actually appears inversely associated with NHL (29). More likely, the association of nonmelanoma skin cancer and NHL is related to immunosuppression, as suggested by associations of HIV and solid organ transplantation with skin cancer (30, 31).

Hematologic conditions associated with increased NHL risk in our study have been described as complications of lymphoproliferative conditions (e.g., hemolytic anemia and cryoglobulinemia) or nonspecific manifestations of chronic illness (32–35). It is unlikely that these conditions were caused by undiagnosed NHL (i.e., reverse causality), because we did not consider Medicare claims within 1 year before NHL diagnosis, and remarkably, almost all of the associations persisted in sensitivity analyses excluding claims within 3 years of NHL. Instead, these associations may again reflect the presence of chronic immune disturbances that contribute to the development of NHL over a prolonged period.

We found strong associations between a large number of dermatologic conditions and risk of TCL, some of which have been observed previously (6, 36). TCLs can have an indolent presentation (37), and 48% of TCLs in our study were cutaneous lymphomas (mycosis fungoides and less common variants). Diagnostic confusion between skin conditions and TCL could be an explanation, but the associations persisted when we excluded diagnoses within 3 years of NHL. Alternatively, these associations may reflect a shared predisposition to skin diseases and TCL, immune effects of chronic skin diseases, or effects of treatment of the skin diseases (38).

The association of gastric ulcer with MZL may be explained by diagnostic confusion between gastric ulcers and gastric MZL, or by the etiologic contribution of Helicobacter pylori to both conditions (39, 40). There was also a positive association of diabetes mellitus with DLBCL. This finding showed some evidence, although inconclusive, for replication in our analyses in NIH-AARP and PLCO. Diabetes mellitus has previously been associated with increased risk of NHL overall (41, 42).

Some inverse associations with NHL risk could have biologic explanations. Decreased risk of CLL/SLL following a diagnosis of hyperlipidemia is intriguing, and prior studies have noted a protective effect of statins (a widely used class of lipid-lowering medications) for NHL overall and leukemia (43). Decreased CLL/SLL risk associated with alcoholism may reflect protective effects of ethanol consumption (3).

Given our lack of data on lifestyle factors, it is possible that some associations could be due to confounding, for example, by smoking, drinking, or occupation (3). Other artifacts could underlie inverse associations. It is likely that clinicians limited the medical work-up of some frail and debilitated elderly adults among the Medicare population, which would have led to underascertainment of NHL. This bias could explain the inverse associations with a broad range of neurologic and psychiatric conditions and conditions associated with advanced illness or nursing home care (e.g., decubitus ulcer, hip fracture). Indeed, we did not observe decreased NHL risk when, in our replication analyses, we assessed people with a history of a stroke in the NIH-AARP and PLCO studies. Participants in these cohort studies were younger and would have been healthier than unselected Medicare beneficiaries, and so NHLs arising in these individuals would have been less vulnerable to under diagnosis.

The current study is the first implementation of a new method, “MedWAS,” which we used to comprehensively evaluate a very large number of medical conditions as NHL risk factors. Demonstration of multiple known associations with NHL supports the validity of this approach. MedWAS incorporates the same agnostic, wide-based approach used in GWAS studies to survey thousands of DNA variations (44). MedWAS could also be applied to other sources of administrative data and electronic health records. Recent “big data” analyses of large administrative databases by others have focused on characterizing the network properties of related medical conditions (45) or, with respect to cancer, selecting optimum patient treatments and predicting outcomes (46).

Several limitations of our study should be noted. First, it was restricted to the U.S. elderly, and our results may not generalize to other populations. Although Medicare covers essentially all U.S. adults over the age of 65 years, our requirements that subjects were not in an HMO and had at least one documented claim led to some exclusions. Furthermore, we assessed Medicare data beginning at the age of 65 years, so we were unable to evaluate medical conditions that did not generate claims at older ages. Second, we are unaware of a systematic method for using ICD-9 codes to classify unique medical conditions at a consistent and informative level of detail. We therefore found it necessary to review individual codes to identify biologically relevant conditions and to eliminate invalid or irrelevant codes, which likely introduced some subjectivity. Medicare claims can be inaccurate, but we sought to increase the positive predictive value by requiring one inpatient or two physician/outpatient claims at least 30 days apart (16). Also, it is not possible to assess duration or severity of medical conditions using Medicare claims, and as our study was exploratory, we did not attempt to examine associations with treatments for these conditions. Third, because we made thousands of comparisons, some associations could have been due to chance. We sought to minimize this issue by requiring strict statistical significance in the initial screening of the ICD-9 codes and by attempting to independently replicate some findings. Unfortunately, it was challenging to find appropriate data sources for replication, and due to the small number of outcomes and modest size of the associations, the replication analyses were inconclusive.

In conclusion, our study comprehensively assessed a very large number of medical conditions and thereby identified a subset associated with increased or decreased NHL risk. Specific associations varied according to NHL subtype. Many risk factors were related to immune disturbances and chronic infections, as expected, but some (such as the associations with nonmelanoma skin cancer, skin conditions, and diabetes) point to new avenues for research. It will be important to replicate some findings in additional populations, and to uncover biologic mechanisms underpinning the best supported and strongest associations. We believe that this MedWAS approach can be useful in epidemiologic research aimed at understanding the etiology of cancer and other complex diseases.

No potential conflicts of interest were disclosed.

The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by the NCI.

Conception and design: E.A. Engels, L.M. Morton, E.L. Yanik

Development of methodology: E.A. Engels, R. Parsons, L. Enewold, E.L. Yanik, R.M. Pfeiffer

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): E.A. Engels, R. Parsons, C. Besson, L.M. Morton, L. Enewold, E.L. Yanik, H. Arem, A.A. Austin, R.M. Pfeiffer

Writing, review, and/or revision of the manuscript: E.A. Engels, R. Parsons, C. Besson, L.M. Morton, L. Enewold, E.L. Yanik, H. Arem, A.A. Austin, R.M. Pfeiffer

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): R. Parsons, W. Ricker

Study supervision: E.A. Engels

This study was supported by the Intramural Research Program of the NCI. The authors acknowledge the efforts of the Applied Research Program, NCI; the Office of Research, Development, and Information, Centers for Medicare and Medicaid Services; Information Management Services, Inc.; and the Surveillance, Epidemiology, and End Results (SEER) program tumor registries in the creation of the SEER-Medicare database.

The authors thank the NCI for access to NCI's data collected by the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial.

For the NIH-AARP Diet and Health Study, cancer incidence data from the Atlanta metropolitan area were collected by the Georgia Center for Cancer Statistics, Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia. Cancer incidence data from California were collected by the California Cancer Registry, California Department of Public Health's Cancer Surveillance and Research Branch, Sacramento, California. Cancer incidence data from the Detroit metropolitan area were collected by the Michigan Cancer Surveillance Program, Community Health Administration, Lansing, Michigan. The Florida cancer incidence data used in this report were collected by the Florida Cancer Data System (Miami, Florida) under contract with the Florida Department of Health, Tallahassee, Florida. The views expressed herein do not necessarily reflect those of the FCDC or FDOH. Cancer incidence data from Louisiana were collected by the Louisiana Tumor Registry, Louisiana State University Health Sciences Center School of Public Health, New Orleans, Louisiana. Cancer incidence data from New Jersey were collected by the New Jersey State Cancer Registry, The Rutgers Cancer Institute of New Jersey, New Brunswick, New Jersey. Cancer incidence data from North Carolina were collected by the North Carolina Central Cancer Registry, Raleigh, North Carolina. Cancer incidence data from Pennsylvania were supplied by the Division of Health Statistics and Research, Pennsylvania Department of Health, Harrisburg, Pennsylvania. The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations, or conclusions. Cancer incidence data from Arizona were collected by the Arizona Cancer Registry, Division of Public Health Services, Arizona Department of Health Services, Phoenix, Arizona. Cancer incidence data from Texas were collected by the Texas Cancer Registry, Cancer Epidemiology and Surveillance Branch, Texas Department of State Health Services, Austin, Texas. Cancer incidence data from Nevada were collected by the Nevada Central Cancer Registry, Division of Public and Behavioral Health, State of Nevada Department of Health and Human Services, Carson City, Nevada. We are indebted to the participants in the NIH-AARP Diet and Health Study for their outstanding cooperation. The authors also thank Sigurd Hermansen and Kerry Grace Morrissey from Westat for study outcomes ascertainment and management, and Leslie Carroll at Information Management Services for data support and analysis.

Finally, the authors thank an anonymous reviewer for proposing the term “MedWAS” to describe our study approach.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Fitzmaurice
C
,
Dicker
D
,
Pain
A
,
Hamavid
H
,
Moradi-Lakeh
M
,
MacIntyre
MF
, et al
The global burden of cancer 2013
.
JAMA Oncol
2015
;
1
:
505
27
.
2.
Surveillance, Epidemiology, and End Results (SEER) Program (http://www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER 18 Registries Research Data Nov 2013 Submission (2000–2011)
.
3.
Morton
LM
,
Slager
SL
,
Cerhan
JR
,
Wang
SS
,
Vajdic
CM
,
Skibola
CF
, et al
Etiologic heterogeneity among non-Hodgkin lymphoma subtypes: the InterLymph Non-Hodgkin Lymphoma Subtypes Project
.
J Natl Cancer Inst Monogr
2014
;
2014
:
130
44
.
4.
Gibson
TM
,
Morton
LM
,
Shiels
MS
,
Clarke
CA
,
Engels
EA
. 
Risk of non-Hodgkin lymphoma subtypes in HIV-infected people during the HAART era: a population-based study
.
AIDS
2014
;
28
:
2313
8
.
5.
Clarke
CA
,
Morton
LM
,
Lynch
C
,
Pfeiffer
RM
,
Hall
EC
,
Gibson
TM
, et al
Risk of lymphoma subtypes after solid organ transplantation in the United States
.
Br J Cancer
2013
;
109
:
280
8
.
6.
Ekstrom Smedby
K
,
Vajdic
CM
,
Falster
M
,
Engels
EA
,
Martinez-Maza
O
,
Turner
J
, et al
Autoimmune disorders and risk of non-Hodgkin lymphoma subtypes: a pooled analysis within the InterLymph Consortium
.
Blood
2008
;
111
:
4029
38
.
7.
Anderson
LA
,
Gadalla
S
,
Morton
LM
,
Landgren
O
,
Pfeiffer
R
,
Warren
JL
, et al
Population-based study of autoimmune conditions and the risk of specific lymphoid malignancies
.
Int J Cancer
2009
;
125
:
398
405
.
8.
Anderson
LA
,
Pfeiffer
RM
,
Landgren
O
,
Gadalla
S
,
Berndt
SI
,
Engels
EA
. 
Risks of myeloid malignancies in patients with autoimmune conditions
.
Br J Cancer
2009
;
100
:
822
8
.
9.
Lanoy
E
,
Costagliola
D
,
Engels
EA
. 
Skin cancers associated with HIV infection and solid-organ transplantation among elderly adults
.
Int J Cancer
2010
;
126
:
1724
31
.
10.
Lanoy
E
,
Engels
EA
. 
Skin cancers associated with autoimmune conditions among elderly adults
.
Br J Cancer
2010
;
103
:
112
4
.
11.
Warren
JL
,
Klabunde
CN
,
Schrag
D
,
Bach
PB
,
Riley
GF
. 
Overview of the SEER-Medicare data: content, research applications, and generalizability to the United States elderly population
.
Med Care
2002
;
40
:
IV-3-18
.
12.
Welzel
TM
,
Graubard
BI
,
El-Serag
HB
,
Shaib
YH
,
Hsing
AW
,
Davila
JA
, et al
Risk factors for intrahepatic and extrahepatic cholangiocarcinoma in the United States: a population-based case-control study
.
Clin Gastroenterol Hepatol
2007
;
5
:
1221
8
.
13.
Welzel
TM
,
Graubard
BI
,
Zeuzem
S
,
El-Serag
HB
,
Davila
JA
,
McGlynn
KA
. 
Metabolic syndrome increases the risk of primary liver cancer in the United States: a study in the SEER-Medicare database
.
Hepatology
2011
;
54
:
463
71
.
14.
Trabert
B
,
Wentzensen
N
,
Felix
AS
,
Yang
HP
,
Sherman
ME
,
Brinton
LA
. 
Metabolic syndrome and risk of endometrial cancer in the united states: a study in the SEER-medicare linked database
.
Cancer Epidemiol Biomarkers Prev
2015
;
24
:
261
7
.
15.
Nogueira
L
,
Freedman
ND
,
Engels
EA
,
Warren
JL
,
Castro
F
,
Koshiol
J
. 
Gallstones, cholecystectomy, and risk of digestive system cancers
.
Am J Epidemiol
2014
;
179
:
731
9
.
16.
Engels
EA
,
Pfeiffer
RM
,
Ricker
W
,
Wheeler
W
,
Parsons
R
,
Warren
JL
. 
Use of surveillance, epidemiology, and end results-medicare data to conduct case-control studies of cancer among the US elderly
.
Am J Epidemiol
2011
;
174
:
860
70
.
17.
Turner
JJ
,
Morton
LM
,
Linet
MS
,
Clarke
CA
,
Kadin
ME
,
Vajdic
CM
, et al
InterLymph hierarchical classification of lymphoid neoplasms for epidemiologic research based on the WHO classification (2008): update and future directions
.
Blood
2010
;
116
:
e90
e98
.
18.
Tarone
RE
. 
A modified Bonferroni method for discrete data
.
Biometrics
1990
;
46
:
515
22
.
19.
Morton
LM
,
Wang
SS
,
Cozen
W
,
Linet
MS
,
Chatterjee
N
,
Davis
S
, et al
Etiologic heterogeneity among non-Hodgkin lymphoma subtypes
.
Blood
2008
;
112
:
5150
60
.
20.
Berndt
SI
,
Skibola
CF
,
Joseph
V
,
Camp
NJ
,
Nieters
A
,
Wang
Z
, et al
Genome-wide association study identifies multiple risk loci for chronic lymphocytic leukemia
.
Nat Genet
2013
;
45
:
868
76
.
21.
Cerhan
JR
,
Berndt
SI
,
Vijai
J
,
Ghesquieres
H
,
McKay
J
,
Wang
SS
, et al
Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma
.
Nat Genet
2014
;
46
:
1233
8
.
22.
Skibola
CF
,
Berndt
SI
,
Vijai
J
,
Conde
L
,
Wang
Z
,
Yeager
M
, et al
Genome-wide association study identifies five susceptibility loci for follicular lymphoma outside the HLA region
.
Am J Hum Genet
2014
;
95
:
462
71
.
23.
de Sanjose
S
,
Benavente
Y
,
Vajdic
CM
,
Engels
EA
,
Morton
LM
,
Bracci
PM
, et al
Hepatitis C and non-Hodgkin lymphoma among 4784 cases and 6269 controls from the International Lymphoma Epidemiology Consortium
.
Clin Gastroenterol Hepatol
2008
;
6
:
451
8
.
24.
Anderson
LA
,
Landgren
O
,
Engels
EA
. 
Common community acquired infections and subsequent risk of chronic lymphocytic leukaemia
.
Br J Haematol
2009
;
147
:
444
9
.
25.
Landgren
O
,
Gridley
G
,
Check
D
,
Caporaso
NE
,
Morris
BL
. 
Acquired immune-related and inflammatory conditions and subsequent chronic lymphocytic leukaemia
.
Br J Haematol
2007
;
139
:
791
8
.
26.
Levi
F
,
Randimbison
L
,
Te
VC
,
La Vecchia
C
. 
Non-Hodgkin's lymphomas, chronic lymphocytic leukaemias and skin cancers
.
Br J Cancer
1996
;
74
:
1847
50
.
27.
Wheless
L
,
Black
J
,
Alberg
AJ
. 
Nonmelanoma skin cancer and the risk of second primary cancers: a systematic review
.
Cancer Epidemiol Biomarkers Prev
2010
;
19
:
1686
95
.
28.
Adami
J
,
Frisch
M
,
Yuen
J
,
Glimelius
B
,
Melbye
M
. 
Evidence of an association between non-Hodgkin's lymphoma and skin cancer
.
BMJ
1995
;
310
:
1491
5
.
29.
Kricker
A
,
Armstrong
BK
,
Hughes
AM
,
Goumas
C
,
Smedby
KE
,
Zheng
T
, et al
Personal sun exposure and risk of non Hodgkin lymphoma: a pooled analysis from the Interlymph Consortium
.
Int J Cancer
2008
;
122
:
144
54
.
30.
Euvrard
S
,
Kanitakis
J
,
Claudy
A
. 
Skin cancers after organ transplantation
.
N Engl J Med
2003
;
348
:
1681
91
.
31.
Silverberg
MJ
,
Leyden
W
,
Warton
EM
,
Quesenberry
CP
 Jr
,
Engels
EA
,
Asgari
MM
. 
HIV infection status, immunodeficiency, and the incidence of non-melanoma skin cancer
.
J Natl Cancer Inst
2013
;
105
:
350
60
.
32.
Ding
W
,
Zent
CS
. 
Diagnosis and management of autoimmune complications of chronic lymphocytic leukemia/small lymphocytic lymphoma
.
Clin Adv Hematol Oncol
2007
;
5
:
257
61
.
33.
Zent
CS
,
Kay
NE
. 
Autoimmune complications in chronic lymphocytic leukaemia (CLL)
.
Best Pract Res Clin Haematol
2010
;
23
:
47
59
.
34.
Molina-Garrido
MJ
,
Guillen-Ponce
C
. 
A revision on cryoglobulinaemia associated to neoplastic diseases
.
Clin Transl Oncol
2007
;
9
:
229
36
.
35.
Murakami
H
,
Irisawa
H
,
Saitoh
T
,
Matsushima
T
,
Tamura
J
,
Sawamura
M
, et al
Immunological abnormalities in splenic marginal zone cell lymphoma
.
Am J Hematol
1997
;
56
:
173
8
.
36.
Legendre
L
,
Barnetche
T
,
Mazereeuw-Hautier
J
,
Meyer
N
,
Murrell
D
,
Paul
C
. 
Risk of lymphoma in patients with atopic dermatitis and the role of topical treatment: a systematic review and meta-analysis
.
J Am Acad Dermatol
2015
;
72
:
992
1002
.
37.
Willemze
R
,
Jaffe
ES
,
Burg
G
,
Cerroni
L
,
Berti
E
,
Swerdlow
SH
, et al
WHO-EORTC classification for cutaneous lymphomas
.
Blood
2005
;
105
:
3768
85
.
38.
Aschebrook-Kilfoy
B
,
Cocco
P
,
La Vecchia
C
,
Chang
ET
,
Vajdic
CM
,
Kadin
ME
, et al
Medical history, lifestyle, family history, and occupational risk factors for mycosis fungoides and Sezary syndrome: the InterLymph Non-Hodgkin Lymphoma Subtypes Project
.
J Natl Cancer Inst Monogr
2014
;
2014
:
98
105
.
39.
Malfertheiner
P
,
Link
A
,
Selgrad
M
. 
Helicobacter pylori: perspectives and time trends
.
Nat Rev Gastroenterol Hepatol
2014
;
11
:
628
38
.
40.
Dover
F
,
Ipek
S
. 
Malignancy risk of gastric ulcers: could it be higher than the expected values?
Hepatogastroenterology
2003
;
50
Suppl 2
:
cccxii
cccxiv
.
41.
Mitri
J
,
Castillo
J
,
Pittas
AG
. 
Diabetes and risk of Non-Hodgkin's lymphoma: a meta-analysis of observational studies
.
Diabetes Care
2008
;
31
:
2391
7
.
42.
Chao
C
,
Page
JH
. 
Type 2 diabetes mellitus and risk of non-Hodgkin lymphoma: a systematic review and meta-analysis
.
Am J Epidemiol
2008
;
168
:
471
80
.
43.
Pradelli
D
,
Soranna
D
,
Zambon
A
,
Catapano
A
,
Mancia
G
,
La Vecchia
C
, et al
Statins use and the risk of all and subtype hematological malignancies: a meta-analysis of observational studies
.
Cancer Med
2015
;
4
:
770
80
.
44.
Chung
CC
,
Chanock
SJ
. 
Current status of genome-wide association studies in cancer
.
Hum Genet
2011
;
130
:
59
78
.
45.
Hidalgo
CA
,
Blumm
N
,
Barabasi
AL
,
Christakis
NA
. 
A dynamic network approach for the study of human phenotypes
.
PLoS Comput Biol
2009
;
5
:
e1000353
.
46.
Kantarjian
H
,
Yu
PP
. 
Artificial intelligence, big data, and cancer
.
JAMA Oncol
2015
;
1
:
573
4
.