Background:

Relatives of patients with bladder cancer have been shown to be at increased risk for kidney, lung, thyroid, and cervical cancer after correcting for smoking-related behaviors that may concentrate in some families. We demonstrate a novel approach to simultaneously assess risks for multiple cancers to identify distinct multicancer configurations (multiple different cancer types that cluster in relatives) surrounding patients with familial bladder cancer.

Methods:

This study takes advantage of a unique population-level data resource, the Utah Population Database (UPDB), containing vast genealogy and statewide cancer data. Familial risk is measured using standardized incidence risk (SIR) ratios that account for sex, age, birth cohort, and person-years of the pedigree members.

Results:

We identify 1,023 families with a significantly higher bladder cancer rate than population controls (familial bladder cancer). Familial SIRs are then calculated across 25 cancer types, and a weighted Gower distance with K-medoids clustering is used to identify familial multicancer configurations (FMC). We found five FMCs, each exhibiting a different pattern of cancer aggregation. Of the 25 cancer types studied, kidney and prostate cancers were most commonly enriched in the familial bladder cancer clusters. Laryngeal, lung, stomach, acute lymphocytic leukemia, Hodgkin disease, soft-tissue carcinoma, esophageal, breast, lung, uterine, thyroid, and melanoma cancers were the other cancer types with increased incidence in familial bladder cancer families.

Conclusions:

This study identified five familial bladder cancer FMCs showing unique risk patterns for cancers of other organs, suggesting phenotypic heterogeneity familial bladder cancer.

Impact:

FMC configurations could permit better definitions of cancer phenotypes (subtypes or multicancer) for gene discovery and environmental risk factor studies.

Bladder cancer is the fifth most common cancer in the United States with nearly 80,000 new cases per year (1). Many genetic and environmental risk factors have been proposed, including smoking, which is estimated to account for approximately 50% of cases and other chemical exposures such as aniline dyes. However, genetic causes remain largely unexplored. There is increasing evidence of familial clustering of bladder cancer with first- and second-degree relatives of individuals with bladder cancer having a 73% and 35% increase is risk of bladder cancer, respectively (2). This suggests an underlying germline genetic risk factor predisposing individuals to bladder cancer. Studies have tried to parse out the heritable and shared environmental components of risk using various methods and have estimated that somewhere between 7% and 12% of bladder cancers are due to heritable genetic risk factors and 12% due to shared environment (3). Utilizing a longitudinal population-based database, we show that combining big data analytics with pedigree and cancer registry data has the potential to identify and characterize “genetically driven bladder cancer subtypes,” “environmentally driven bladder cancer subtypes,” or “gene–environment bladder cancer subtypes.”

Family history of cancer is an important risk factor for many cancers, which may extend across cancer types due to genetic pleiotropy or shared health behaviors (4–6). For example, relatives of patients with bladder cancer are at increased risk for not only bladder cancer, but also kidney, lung, thyroid, and cervical cancers (2, 7–9). In terms of familial multicancer configurations, these different etiologic factors, genetic and environmental, may manifest as different multicancer configurations across a spectrum of organs. Breast cancer is an exemplar for genetically driven pleiotropic multicancer configurations in families. Approximately 30% of hereditary breast cancer is explained by intermediate- and high-risk inherited variants, like BRCA1 and BRCA2. Key factors contributing to the discovery of BRCA1 were dense familial clustering and coaggregation with ovarian cancer (10–12). Unique multicancer configurations for carriers of BRCA1 mutations (breast, ovarian, Fanconi anemia, prostate, pancreatic, fallopian tube, and peritoneal cancers) and BRCA2 mutations (breast, male breast, prostate, pancreatic cancers, and Fanconi anemia) are now widely accepted (Fig. 1). These multicancer configurations were identified after discovery of BRCA1/2 mutations in breast cancer. However, data-driven methods make it possible to uncover multicancer configurations before gene discovery. These configurations could immediately permit better definitions of cancer phenotypes (subtypes or multicancer) for focused gene discovery and environmental risk factor studies.

Figure 1.

Example of a familial multiphenogram for known genetic mutations. BRCA1 (A) and BRCA2 (B). Familial multiphenograms illustrate patterns of FMCs that are unique to the underlying etiologic subtype of breast cancer (source: https://ghr.nlm.nih.gov). Combining machine learning tools with large, familial databases can enable the identification of these unique patterns and allow for subclassification of tumors into more homogenous subtypes.

Figure 1.

Example of a familial multiphenogram for known genetic mutations. BRCA1 (A) and BRCA2 (B). Familial multiphenograms illustrate patterns of FMCs that are unique to the underlying etiologic subtype of breast cancer (source: https://ghr.nlm.nih.gov). Combining machine learning tools with large, familial databases can enable the identification of these unique patterns and allow for subclassification of tumors into more homogenous subtypes.

Close modal

In contrast to BRCA1/2, smoking-related cancers are an archetype for shared health behaviors leading to familial multicancer configurations. Just as germline genetic mutations may lead to distinct familial multicancer configurations, smoking-related cancers may have a unique pattern of familial risk. Parsing out whether multicancer configurations are related to gene, environment, or a combination of the two remains challenging for many cancers that have a strong environmental component.

Bladder cancer represents a clear example of this difficulty. While studies suggest a possible genetic link in bladder cancer, teasing out this relationship remains a significant challenge because there is a strong environmental etiologic factor (i.e., smoking, chemical exposure). Previous studies suggest that familial cancer risk in smokers may be the combination of gene and environmental exposures and familial clustering of smoking-related cancers has been demonstrated by multiple studies (2, 13). Identifying families with multicancer configurations that appear to be related to shared environment, shared genetics, or gene–environment interactions would allow us to categorize families into more homogenous subtypes of cancer. Decreasing heterogeneity-related noise in statistical analyses will increase our ability to find meaningful genetic and environmental determinants of bladder cancer and related cancers.

Classical methods for assessing familial coaggregation determine the relative risk of cancer in first-degree (FDR), second-degree (SDR), and third-degree (TDR) relatives of individuals with the cancer of interest (pivot individual). However, these methods use an iterative pairwise approach (pivot cancer and relative cancer) and are unable to simultaneously exploit data from multiple cancer types. As a result, power to identify novel patterns of multicancer configurations is limited. Other domains, such as marketing, have developed innovative computational techniques to discover patterns across expansive amounts of data in large databases and identify homogenous subpopulations of people. Application of such “big data” techniques to linked genealogical and cancer databases can potentially identify new multicancer configurations and improve understanding of familial cancer risk, tumor spectrum, and phenotypic heterogeneity.

This study takes advantage of a unique population-level data resource, the Utah Population Database (UPDB), containing vast genealogy and statewide cancer data. This study proposes an innovative method for identifying novel familial multicancer configurations for bladder cancer through application of a network-inspired approach to complex family and cancer data.

Study design and data

We utilized the genealogical, demographic, and health data from the UPDB. The vast majority of individuals residing in Utah are represented in UPDB (14–17). This immense genealogical dataset is record-linked to many statewide datasets (including the Utah Cancer Registry), with annual updates. The full dataset contains nearly 5 million people with 28 million records and the infrastructure that links distinct records for a specific person allows the UPDB to create a depiction of the life history of an individual based on medical and administrative data. The UPDB supports hundreds of biodemographic, epidemiologic, and genetic studies primarily due to its comprehensive population coverage, pedigree complexity, and linkages across data sources (18, 19).

Cancer-specific data for individuals with urothelial bladder cancer and their relatives was obtained from the Utah Cancer Registry (UCR), an original member of the Surveillance Epidemiology and End Results (SEER) program. We identified 6,752 individuals born 1900–1990 with urothelial bladder cancer (2) and family history information (defined as at least 10 known relatives) as pivots. Families were only represented once in the analysis. When multiple siblings (n ≥ 2) had bladder cancer (nsibsets = 104), one sibling was randomly selected as a pivot. Our sample included 6,416 three-generation families that included all first-, second-, and third-degree relatives that were available in the UPDB.

Statistical analysis

Familial bladder cancer

Our primary interest is familial bladder cancer. For each family, familial risk for bladder cancer (Supplementary Table S1) was measured using standardized incidence risk (SIR) ratios accounting for the sex, age, birth cohort, and person-years of the family members (for a detailed description of SIR calculations, see Supplementary Methods). Person-years were calculated using the minimum of the first year residing in Utah or 1966 to the year of first cancer diagnosis, last year of residence in Utah (due to death or migration), or 2017. Further analyses were restricted to families with a statistically significant bladder cancer SIR and more than one case of bladder cancer in the family. This resulted in a final sample size of 1,023 familial bladder cancer families for the multicancer configuration analysis.

Familial multicancer enrichment

As for bladder cancer above, familial risk for 25 additional cancers were measured using the SIR accounting for the sex, age, birth cohort, and person-years of the family members (Supplementary Table S1). Two risk metrics were used to capture the family's multicancer signature. First, wSIR, which is the SIR weighted by the P value, to incorporate both the magnitude and significance of the familial risk. This was calculated using the following equation, and allows us to include, but down weight, SIR values that were not significantly different relative to the overall population.

formula

Where p is the p-value, i is the family, and j is the cancer type.

To avoid bias due to large SIRs (especially for rare cancers), and increase the robustness of our results, we imposed a maximum value such that any wSIR values larger than the 90th percentile were set to the 90th percentile value across all families for the cancer type.

formula

where 90 indicates the 90th percentile for cancer j.

Second, we considered the significance of a SIR as a dichotomous indicator of risk (ISIR). Families were considered to have “high risk” status (ISIR = 1) for a cancer type if the SIR was statistically greater (P < 0.05) than the age- and sex-adjusted rates for the Utah population and “population risk” (ISIR = 0) otherwise. As all families were at significantly increased risk of bladder cancer by design, we did not include an ISIR matrix for bladder cancer (i.e., we had 26 measures of wSIR and 25 measures of ISIR; total measures = 51).

Familial multicancer configurations

We constructed a 1,023 × 51 matrix [families x (wSIR, ISIR)] and calculated the Gower general coefficient (daisy function in the cluster package in R; ref. 20) to measure similarities between multicancer configurations of families (21, 22). The Gower distance was selected because our data had both continuous and categorical (indicator) risk metrics and it allows for mixed variable types to be used simultaneously (detailed information can be found in the Supplementary Data). We used partitioning around medoids (PAM or K-medoids clustering package in R; ref. 21) to measure similarities between the multicancer risk signatures of families. K was selected by running a series of iterative models for k = 2 to k = 20 and using Silhouette plots to identify the point of diminishing improvement in average Silhouette width. The Silhouette plot for the final model, K = 5, is displayed in Supplementary Fig. S1.

Bootstrapping was used to evaluate the reproducibility of the clustering using the clustboot function in R. The PAM algorithm was used with 200 random draws and the results from each draw were stored in a results matrix, transformed into a consensus matrix using the ward linkage algorithm and the consensusmatrix function in R, and then plotted in a heatmap for visualization. The results for k = 5 were fairly stable, with some switching between clusters 3 and 5 (Supplementary Fig. S2).

Assessment of families by multicancer configuration

Fixed effect meta-analysis using the inverse variance method for pooling was used to estimate the cluster specific differences in SIRs and their 95% CIs using the R package metagen. Known profiles of risk were used to test the familial multicancer configurations (FMC) for risk in smoking-related cancers (lung, mouth, lips, nose and sinuses, larynx, pharynx, esophagus, stomach, pancreas, kidney, uterus, cervix, colon/rectum, ovary, and acute myeloid leukemia), Lynch syndrome cancers (small intestine, colon, pancreas, uterus, and kidney/renal pelvis), and arsenic-related cancers (lung, prostate, and kidney; non-melanoma skin was not included because it is not reported to cancer registries).

Familial bladder cancer families were classified as those with a significantly increased risk of bladder cancer relative to the general population and the family had more than one bladder cancer. We identified 1,023 familial bladder cancer families, each centered around a bladder cancer pivot individual. These 1,023 familial bladder cancer pivots had 59,177 relatives and 829 spouses, with the number of family members ranging from 16 to 628 (Table 1). Median age at diagnosis for all patients with familial bladder cancer was 72.3 and ranged from 15 to 98. Median age at diagnosis of familial bladder cancer pivots was slightly earlier (71.19; P < 0.001) and ranged from 27 to 96 years. In the overall sample, 19.2% of the bladder cancer cases were female. When stratified by familial bladder cancer, we found that a higher proportion of familial bladder cancer pivots were female relative to non-familial bladder cancer cases (20.6% vs. 18.2%; P = 0.03).

Table 1.

Number of family members and cancer diagnoses by FMC.

FMC 1 (n = 234)FMC 2 (n = 160)FMC 3 (n = 289)FMC 4 (n = 227)FMC 5 (n = 37)
MinMaxAvgMinMaxAvgMinMaxAvgMinMaxAvgMinMaxAvg
Number of family members 16 108 38.50 17 285 68.70 19 628 92.00 16 310 57.60 20 193 83.50 
Acute lymphocytic leukemia 0.05 0.08 0.18 0.15 0.08 
Acute myeloid leukemia 0.07 0.12 0.12 0.07 0.08 
Bladder 2.31 2.24 2.20 2.29 2.47 
Brain 0.05 0.19 0.23 0.10 0.11 
Breast 0.58 1.16 1.45 0.87 1.63 
Central nervous system 0.10 0.11 0.17 0.07 0.08 
Cervical 0.17 0.32 0.35 0.17 0.45 
Colon 0.40 0.69 1.05 0.67 0.97 
Esophagus 0.04 0.05 0.03 0.05 0.13 
Hodgkin—nodal 0.07 0.03 0.07 0.05 0.18 
Kidney 0.13 0.19 0.27 0.19 0.39 
Larynx 0.00 1.25 0.08 0.03 0.16 
Liver 0.03 0.03 0.03 0.05 0.03 
Lung and bronchus 0.26 0.65 0.51 0.44 0.55 
Melanoma 0.45 0.78 13 1.04 0.64 1.37 
Myeloma 0.05 0.14 0.09 0.08 0.13 
Non-Hodgkin lymphoma 0.12 0.27 0.50 0.28 0.34 
Ovary 0.10 0.12 0.21 0.08 0.29 
Pancreas 0.08 0.18 0.23 0.14 0.34 
Prostate 0.77 1.39 10 2.01 10 1.28 1.71 
Small intestine 0.00 0.01 0.00 0.00 1.08 
Soft tissue 0.02 0.08 0.06 0.04 0.21 
Stomach 0.04 0.13 0.15 0.07 0.21 
Testis 0.03 0.08 0.08 0.06 0.08 
Thyroid 0.12 0.31 0.36 0.18 0.53 
Uterine 0.15 0.30 0.36 0.16 0.42 
FMC 1 (n = 234)FMC 2 (n = 160)FMC 3 (n = 289)FMC 4 (n = 227)FMC 5 (n = 37)
MinMaxAvgMinMaxAvgMinMaxAvgMinMaxAvgMinMaxAvg
Number of family members 16 108 38.50 17 285 68.70 19 628 92.00 16 310 57.60 20 193 83.50 
Acute lymphocytic leukemia 0.05 0.08 0.18 0.15 0.08 
Acute myeloid leukemia 0.07 0.12 0.12 0.07 0.08 
Bladder 2.31 2.24 2.20 2.29 2.47 
Brain 0.05 0.19 0.23 0.10 0.11 
Breast 0.58 1.16 1.45 0.87 1.63 
Central nervous system 0.10 0.11 0.17 0.07 0.08 
Cervical 0.17 0.32 0.35 0.17 0.45 
Colon 0.40 0.69 1.05 0.67 0.97 
Esophagus 0.04 0.05 0.03 0.05 0.13 
Hodgkin—nodal 0.07 0.03 0.07 0.05 0.18 
Kidney 0.13 0.19 0.27 0.19 0.39 
Larynx 0.00 1.25 0.08 0.03 0.16 
Liver 0.03 0.03 0.03 0.05 0.03 
Lung and bronchus 0.26 0.65 0.51 0.44 0.55 
Melanoma 0.45 0.78 13 1.04 0.64 1.37 
Myeloma 0.05 0.14 0.09 0.08 0.13 
Non-Hodgkin lymphoma 0.12 0.27 0.50 0.28 0.34 
Ovary 0.10 0.12 0.21 0.08 0.29 
Pancreas 0.08 0.18 0.23 0.14 0.34 
Prostate 0.77 1.39 10 2.01 10 1.28 1.71 
Small intestine 0.00 0.01 0.00 0.00 1.08 
Soft tissue 0.02 0.08 0.06 0.04 0.21 
Stomach 0.04 0.13 0.15 0.07 0.21 
Testis 0.03 0.08 0.08 0.06 0.08 
Thyroid 0.12 0.31 0.36 0.18 0.53 
Uterine 0.15 0.30 0.36 0.16 0.42 

FMCs

Using the 1,023 familial bladder cancer families, we found five distinct FMCs (FMC1–5) using k-medoid clustering, each exhibiting a different pattern of cancer aggregation or multiphenogram (Fig. 2). The proportion of bladder cancer families captured by the five familial bladder cancer FMCs were 25.3%, 16.7%, 30%, 24.2%, and 3.8%, respectively. Of the 25 cancer types studied, kidney and prostate were most commonly enriched in the familial bladder cancer multicancer configurations. Laryngeal, lung, stomach, acute lymphocytic leukemia, Hodgkin disease, soft-tissue carcinoma, kidney, small intestine, breast, lung, uterine, thyroid, and melanoma were the other cancer types found to have increased incidence in familial bladder cancer families. The clustering algorithm accounted for both the magnitude of the SIR and whether or not there is a statistically significant increased risk for the cancer type (wSIR and ISIR). Similarly, both factors were considered when characterizing each FMC or FMC signature (Fig. 2). An FMC was defined as “strongly enriched” for cancer types that were both significantly increased in at least 10% of families within the FMC (Fig. 3) and for which the SIR for the cluster (combination across all families in the FMC) was statistically significant (Fig. 4; Supplementary Table S2).

Figure 2.

Familial multiphenograms for familial bladder cancer families. Five distinct familial multicancer clusters (FMC1–FMC5) were discovered and ranged in size from 37 families (FMC5) to 289 families (FMC3) that had similar patterns of familial cancer clustering. Multiphenograms are labeled with their cancer type and the percent of the families in the FMC with statistically significant increased risk for the respective cancer type.

Figure 2.

Familial multiphenograms for familial bladder cancer families. Five distinct familial multicancer clusters (FMC1–FMC5) were discovered and ranged in size from 37 families (FMC5) to 289 families (FMC3) that had similar patterns of familial cancer clustering. Multiphenograms are labeled with their cancer type and the percent of the families in the FMC with statistically significant increased risk for the respective cancer type.

Close modal
Figure 3.

Proportion of families with significantly increased risk of cancer relative to the general population by FMC. Five distinct FMCs were discovered (FMC1–FMC5) and ranged in size from 37 families (FMC5) to 289 families (FMC3). Families in each FMC had a similar pattern of familial cancer clustering. While not all families were at significant increased risk for each cancer displayed, the magnitude of risk was similar within each FMC.

Figure 3.

Proportion of families with significantly increased risk of cancer relative to the general population by FMC. Five distinct FMCs were discovered (FMC1–FMC5) and ranged in size from 37 families (FMC5) to 289 families (FMC3). Families in each FMC had a similar pattern of familial cancer clustering. While not all families were at significant increased risk for each cancer displayed, the magnitude of risk was similar within each FMC.

Close modal
Figure 4.

SIR profile for each FMC. Displayed SIRs are based on fixed-effect meta-analysis using the inverse variance method for pooling that was used to estimate the cluster-specific differences in SIRs. SIRs significant at P < 0.05 are displayed in panels FMC1–FMC5.

Figure 4.

SIR profile for each FMC. Displayed SIRs are based on fixed-effect meta-analysis using the inverse variance method for pooling that was used to estimate the cluster-specific differences in SIRs. SIRs significant at P < 0.05 are displayed in panels FMC1–FMC5.

Close modal

All families had a minimum of two bladder cancer diagnoses by definition and a maximum of seven, but FMCs varied in the magnitude of their bladder cancer risk. The mean bladder cancer SIR across all 1,023 families was 12.12 (median of 9.52). There was significant heterogeneity in the average bladder cancer SIR across FMCs (Fig. 5; Supplementary Table S2), with average bladder cancer SIR ranging from 5.31 (FMC2) to 19.98 (FMC1).

Figure 5.

Bladder cancer SIR profile by familial multicancer cluster (FMC1–FMC5). All families are at statistically significantly increased risk of bladder cancer (BCa) by definition; however, there is variation in the magnitude of the effect. Displayed SIRs are based on fixed-effect meta-analysis using the inverse variance method for pooling that was used to estimate the cluster-specific differences in SIRs.

Figure 5.

Bladder cancer SIR profile by familial multicancer cluster (FMC1–FMC5). All families are at statistically significantly increased risk of bladder cancer (BCa) by definition; however, there is variation in the magnitude of the effect. Displayed SIRs are based on fixed-effect meta-analysis using the inverse variance method for pooling that was used to estimate the cluster-specific differences in SIRs.

Close modal

In addition to extremely high risk for bladder cancer, FMC1 was strongly enriched for both kidney and prostate cancer. The average SIR for these cancers was 1.21 [95% confidence interval (CI), 1.10–1.34; 10% families] and 1.73 (95% CI, 1.57–1.90; 10% families), respectively (Supplementary Table S2). FMC2 was strongly enriched for larynx (SIR = 5.97; 95% CI 5.22–6.83; 37% of families), stomach (1.19; 95% CI, 1.06–1.34; 11% of families), and lung cancer (1.66; 95% CI, 1.47–1.86; 11% of families). FMC3 did not have an increased risk for any other cancer type other than bladder cancer. FMC4 was strongly enriched for prostate cancer and acute lymphocytic leukemia (ALL), with average SIRs of 1.72 (95% CI, 1.57–1.88; 11% of families), 1.28 (95% CI, 1.16–1.41; 11% of families), and 1.25 (95% CI, 1.13–1.37; 11% of families). FMC5 was strongly enriched for uterine, prostate, thyroid, melanoma, kidney, small intestinal, soft tissue, and Hodgkin lymphoma. Small intestinal cancer had the highest risk, with nearly all families having a significant increased risk of this rare cancer SIR 29.67 (95% CI, 21.84, 40.29; 97% of families). The significance for small intestine cancers in almost all families is striking. Risk for the other cancer sites in FMC5 ranged from 1.37 (Hodgkin disease; 16% of families) to 1.88 (melanoma; 11% of families).

Smoking and Lynch syndrome multicancer profiles

We calculated combined SIRs for groups of cancers related to smoking, Lynch syndrome, and arsenic exposure using fixed-effects meta-analysis and the inverse variance method. While all FMCs were at increased risk for arsenic-related cancers related to the population controls, there was not a significant difference in risk across the FMCs. However, SIR estimates for smoking-related cancers and Lynch syndrome–related cancers varied by FMC (Fig. 6A and B; Supplementary Table S3). FMC2 had the highest smoking-related SIR (SIRsmoking = 1.44; 95% CI, 1.39–1.49), followed by FMC5 (SIRsmoking = 1.36; 95% CI, 1.26–1.47). Both of these are significantly higher than the lowest: FMC1 (SIRsmoking = 1.17; 95% CI, 1.14–1.21) and FMC4 (SIRsmoking = 1.19; 95% CI, 1.15–1.22) and FMC3s was only slightly higher (SIR = 1.23; 95% CI, 1.20–1.27). Even after excluding small intestinal cancer, the combined SIRs for cancers related to Lynch syndrome also varied by FMC. The FMC5 cancer risk profile most resembled Lynch syndrome, and showed a nearly 2-fold increase in Lynch syndrome–related cancers (SIR = 1.96; 95% CI, 1.78–2.16). The risk in FMC5 remained the highest after accounting for small intestine cancers.

Figure 6.

SIR profile by familial multicancer cluster (FMC1 – FMC5). A, Estimated SIR for smoking-related cancers by FMC. B, Estimated SIR for Lynch syndrome cancers by FMC. C, Estimated SIR for arsenic-related cancers by FMC. See Supplementary Material for a list of cancers included in the smoking and Lynch syndrome designations.

Figure 6.

SIR profile by familial multicancer cluster (FMC1 – FMC5). A, Estimated SIR for smoking-related cancers by FMC. B, Estimated SIR for Lynch syndrome cancers by FMC. C, Estimated SIR for arsenic-related cancers by FMC. See Supplementary Material for a list of cancers included in the smoking and Lynch syndrome designations.

Close modal

Decreased risk for some cancer types

In addition to a significant increased cancer risk, some FMCs were characterized by having zero enrichment for certain cancer types. Specifically, no families in FMC1, FMC3, and FMC4 had increased risk for larynx or small intestinal cancers, which may suggest these FMCs are not related to smoking.

We have described a method for identifying FMCs, and illustrated 5 distinct patterns of multicancer risk surrounding patients with familial bladder cancer. The five different FMC clusters identified for familial bladder cancer illustrate the potential of our network-inspired approach to simultaneously assess multiple cancer risks, shifting focus away from a unidimensional definition of family history to a more comprehensive view of family history and risk identification. The pattern of multicancer clustering in FMC2 suggests that cancer risk in those families is driven by smoking and other related exposures, while the pattern in FMC5 is similar to patterns of cancer clustering in Lynch families. FMC1, FMC3, and FMC4 have patterns suggest shared genetics or environments other than smoking may play a strong role in cancer risk in these families. We did not find a single FMC pattern consistent with arsenic exposure.

There are limitations with our method of familial multicancer clustering discovery. First, this method did not factor in age at diagnosis or histopathologic information. Future versions of this method would be strengthened by considering those factors. Second, this method did not utilize all information in pedigree structure, such as weighting by kinship coefficient, and future studies should test methods that take advantage of that information. Despite these limitations, application of this approach could provide important insight to numerous cancers and other age-related chronic diseases. Future work will also need to investigate whether there are identifiable gene, environment, or gene–environment factors that explain the observed differences in FMCs in a similar way as BRAC1/2 result in distinct multicancer clusters.

The concept of phenomes, extending the phenotype to include multiple disease types to characterize genotype–phenotype relationships, is not new. However, the phenome is usually referred to as the set of all phenotypes within an individual. Recent studies demonstrated the feasibility of “phenome-wide association scans” (PheWAS), using genetic data linked to medical records to identify multiple phenotypes associated with a single genotype (23). Pan-cancer analyses are another familiar concept. The Cancer Genome Atlas (TCGA) launched the Pan-cancer analysis project in 2012 with the goal of combining molecular data across large numbers of tumor types to compare -omics data across tumors, potentially allowing for the identification of thematic pathways (24). However, these are independent individuals. Here, we utilized family-based data to investigate familial patterns of pan-cancer clustering, a novel familial extension to pan-cancer studies.

Molecular diagnostics and cancer-specific subtypes are becoming an essential component of clinical decision making; however, current approaches are typically organ-specific or do not include context for understanding the interplay between genes and environment. The most recent bladder cancer–specific analysis of TCGA data found five distinct subtypes; (i) luminal-papillary, (ii) luminal-infiltrated, (iii) luminal, (iv) basal-squamous, and (v) neuronal (25). Other studies have shown these subtypes closely align to cancers in other organs. The PAM50 algorithm (originally developed to classify breast cancers) has also been used to classify prostate, bladder, and lung cancer (26–29), evidence that molecular subtypes may share common etiologies. Subtypes of bladder cancer cancer have clinically meaningful differences (30); however, what predisposes an individual to a particular subtype remains unknown. Understanding etiology of the disease has clinical potential because it allows for increased screening in at-risk populations and may guide treatment decisions at early stages of diagnosis. Future work to combine familial multicancer phenotypes, as we developed here, with tumor molecular data has potential for identification and characterization of bladder cancer subtypes that may share common etiologic and/or tumorigenic pathways.

Many genetic and environmental risk factors have been proposed in bladder cancer risk, which could manifest as different multicancer configurations across a spectrum of organs. FDRs likely share similar environments throughout the life course and therefore familial aggregation of bladder cancer cancer may not be entirely genetic. For example, the familial multicancer configuration pattern in FMC2 appears to be strongly related to smoking-related exposure. Genetic predispositions may also make individuals sensitive to environmental exposures. For example, arsenic metabolism may vary between individuals and the rate of metabolism affects risk for adverse health outcomes (31). Moreover, individuals with N-acetyltransferase 2 (NAT2) slow acetylator and glutathione S-transferase μ1 (GSTM1)-null genotypes may have increased risk for bladder cancer when exposed to carcinogens through smoking or occupational risk (32–34).

This study identified five familial bladder cancer FMCs. These vary by bladder cancer risk and by the age at diagnosis and sex of the pivot bladder cancer case. In addition, each FMC shows unique risk patterns for cancers of other organs. Leveraging genealogic data linked to cancer records is a powerful way to perform cross-phenotype, multicancer analyses.

S. Gupta reports receiving commercial research grants from Bristol-Myers Squibb, Rexahn, Pfizer, AstraZeneca, MedImmune, Clovis, Incyte, Novartis, LSK, Five Prime, Mirati, QED, Debiopharm, and Merck and has ownership interest (including patents) in Salarius. No potential conflicts of interest were disclosed by the other authors.

Conception and design: H.A. Hanson, C.L. Leiser, B. O'Neil, C. Martin, S. Gupta, C. Dechet, N.J. Camp

Development of methodology: H.A. Hanson, C.L. Leiser, N.J. Camp

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H.A. Hanson, S. Gupta, K.R. Smith

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H.A. Hanson, C.L. Leiser, B. O'Neil, C. Martin, S. Gupta, W.T. Lowrance, M.J. Madsen, N.J. Camp

Writing, review, and/or revision of the manuscript: H.A. Hanson, C.L. Leiser, B. O'Neil, C. Martin, S. Gupta, K.R. Smith, C. Dechet, W.T. Lowrance, N.J. Camp

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H.A. Hanson, B. O'Neil, S. Gupta, K.R. Smith

Study supervision: H.A. Hanson, B. O'Neil, W.T. Lowrance

Research reported in this publication was supported by the NIH K12 Award, 1K12HD085852-01; NIH K07 Award, 1K07CA230150-01; and HCI Cancer Center Support Grant (grant number P30CA042014). The Utah Cancer Registry is funded by the National Cancer Institute's SEER Program, Contract No. HHSN261201800016I, and the US Center for Disease Control and Prevention's National Program of Cancer Registries, Cooperative Agreement no. NU58DP0063200, with additional support from the University of Utah and Huntsman Cancer Foundation.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Siegel
R
,
Naishadham
D
,
Jemal
A
. 
Cancer statistics, 2012
.
CA Cancer J Clin
2012
;
62
:
11
.
2.
Martin
C
,
Leiser
CL
,
O'Neil
B
,
Gupta
S
,
Lowrance
WT
,
Kohlmann
W
, et al
Familial cancer clustering in urothelial cancer: a population-based case-control study
.
J Natl Cancer Inst
2018
;
110
:
527
33
.
3.
Sampson
JN
,
Wheeler
WA
,
Yeager
M
,
Panagiotou
O
,
Wang
Z
,
Berndt
SI
, et al
Analysis of heritability and shared heritability based on genome-wide association studies for thirteen cancer types
.
J Natl Cancer Inst
2015
;
107
:
djv279
.
4.
Frank
C
,
Fallah
M
,
Sundquist
A
,
Hemminki
K
. 
Population landscape of familial cancer
.
Sci Rep
2015
;
5
:
12891
.
5.
Teerlink
CC
,
Albright
FS
,
Lins
L
,
Cannon-Albright
LA
. 
A comprehensive survey of cancer risks in extended families
.
Genet Med
2012
;
14
:
107
14
.
6.
Wu
YH
,
Graff
RE
,
Passarelli
MN
,
Hoffman
JD
,
Ziv
E
,
Hoffmann
TJ
, et al
Identification of pleiotropic cancer susceptibility variants from genome-wide association studies reveals functional characteristics
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
75
85
.
7.
Yu
H
,
Hemminki
O
,
Försti
A
,
Sundquist
K
,
Hemminki
K
. 
Familial urinary bladder cancer with other cancers
.
Eur Urol Oncol
2018
;
1
:
461
.
8.
Bermejo
JL
,
Sundquist
J
,
Hemminki
K
. 
Sex-specific familial risks of urinary bladder cancer and associated neoplasms in Sweden
.
Int J Cancer
2009
;
124
:
2166
71
.
9.
Hemminki
K
,
Sundquist
J
,
Brandt
A
. 
Do discordant cancers share familial susceptibility?
Eur J Cancer
2012
;
48
:
1200
7
.
10.
Mérette
C
,
King
MC
,
Ott
J
. 
Heterogeneity analysis of breast cancer families by using age at onset as a covariate
.
Am J Hum Genet
1992
;
50
:
515
9
.
11.
Hall
JM
,
Lee
MK
,
Newman
B
,
Morrow
JE
,
Anderson
LA
,
Huey
B
, et al
Linkage of early-onset familial breast cancer to chromosome 17q21
.
Science
1990
;
250
:
1684
9
.
12.
Easton
DF
,
Bishop
DT
,
Ford
D
,
Crockford
GP
. 
Genetic linkage analysis in familial breast and ovarian cancer: results from 214 families. The Breast Cancer Linkage Consortium
.
Am J Hum Genet
1993
;
52
:
678
701
.
13.
Lorenzo Bermejo
J
,
Hemminki
K
. 
Familial lung cancer and aggregation of smoking habits: a simulation of the effect of shared environmental factors on the familial risk of cancer
.
Cancer Epidemiol Biomarkers Prev
2005
;
14
:
1738
40
.
14.
Bean
LL
,
May
DL
,
Skolnick
M
. 
The Mormon historical demography project
.
Hist Methods
1978
;
11
:
45
53
.
15.
Bishop
DT
,
Skolnick
MH
. 
Genetic epidemiology of cancer in Utah genealogies: a prelude to the molecular genetics of common cancers
.
J Cell Physiol Suppl
1984
;
3
:
63
77
.
16.
Skolnick M
BL
,
Dintelman
S
,
Mineau
GP
. 
A computerized family history database system
.
Sociol Social Res
1979
;
63
:
506
23
.
17.
O'Brien
E
,
Rogers
AR
,
Beesley
J
,
Jorde
LB
. 
Genetic structure of the Utah Mormons: comparison of results based on RFLPs, blood groups, migration matrices, isonymy, and pedigrees
.
Hum Biol
1994
;
66
:
743
59
.
18.
DuVall
SL
,
Fraser
AM
,
Rowe
K
,
Thomas
A
,
Mineau
GP
. 
Evaluation of record linkage between a large healthcare provider and the Utah Population Database
.
J Am Med Inform Assoc
2012
;
19
:
e54
9
.
19.
Edelman
LS
,
Guo
JW
,
Fraser
A
,
Beck
SL
. 
Linking clinical research data to population databases
.
Nurs Res
2013
;
62
:
438
44
.
20.
Gower
JC
. 
A general coefficient of similarity and some of its properties
.
Biometrics
1971
;
27
:
857
71
.
21.
Kaufman
L
,
Rousseeuw
PJ
. 
Partitioning around medoids (Program PAM)
.
In
:
Finding groups in data: an introduction to cluster analysis
.
Hoboken (NJ)
:
John Wiley & Sons
; 
1990
. p.
68
125
.
22.
Hummel
M
,
Edelmann
D
,
Kopp-Schneider
A
. 
Clustering of samples and variables with mixed-type data
.
PLoS One
2017
;
12
:
e0188274
.
23.
Denny
JC
,
Ritchie
MD
,
Basford
MA
,
Pulley
JM
,
Bastarache
L
,
Brown-Gentry
K
, et al
PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations
.
Bioinformatics
2010
;
26
:
1205
10
.
24.
Cancer Genome Atlas Research Network
,
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KR
,
Ozenberger
BA
, et al
The Cancer Genome Atlas Pan-Cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
25.
Robertson
AG
,
Kim
J
,
Al-Ahmadie
H
,
Bellmunt
J
,
Guo
G
,
Cherniack
AD
, et al
Comprehensive molecular characterization of muscle-invasive bladder cancer
.
Cell
2017
;
171
:
540
56
.
26.
Zhao
SG
,
Chang
SL
,
Erho
N
,
Yu
M
,
Lehrer
J
,
Alshalalfa
M
, et al
Associations of luminal and basal subtyping of prostate cancer with prognosis and response to androgen deprivation therapy
.
JAMA Oncol
2017
;
3
:
1663
72
.
27.
Damrauer
JS
,
Hoadley
KA
,
Chism
DD
,
Fan
C
,
Tiganelli
CJ
,
Wobker
SE
, et al
Intrinsic subtypes of high-grade bladder cancer reflect the hallmarks of breast cancer biology
.
Proc Natl Acad Sci U S A
2014
;
111
:
3110
5
.
28.
Parker
JS
,
Mullins
M
,
Cheang
MC
,
Leung
S
,
Voduc
D
,
Vickery
T
, et al
Supervised risk predictor of breast cancer based on intrinsic subtypes
.
J Clin Oncol
2009
;
27
:
1160
7
.
29.
Siegfried
JM
,
Lin
Y
,
Diergaarde
B
,
Lin
HM
,
Dacic
S
,
Pennathur
A
, et al
Expression of PAM50 genes in lung cancer: evidence that interactions between hormone receptors and HER2/HER3 contribute to poor outcome
.
Neoplasia
2015
;
17
:
817
25
.
30.
Seiler
R
, et al
Muscle-invasive bladder cancer: molecular subtypes and response to neoadjuvant chemotherapy
.
J Clin Oncol
2017
;
35
:
281
.
31.
Smith
AH
,
Steinmaus
CM
. 
Health effects of arsenic and chromium in drinking water: recent human findings
.
Annu Rev Public Health
2009
;
30
:
107
22
.
32.
Moore
LE
,
Baris
DR
,
Figueroa
JD
,
Garcia-Closas
M
,
Karagas
MR
,
Schwenn
MR
, et al
GSTM1 null and NAT2 slow acetylation genotypes, smoking intensity and bladder cancer risk: results from the New England Bladder Cancer Study and NAT2 meta-analysis
.
Carcinogenesis
2011
;
32
:
182
9
.
33.
Krech
E
,
Selinski
S
,
Blaszkewicz
M
,
Bürger
H
,
Kadhum
T
,
Hengstler
JG
, et al
Urinary bladder cancer risk factors in an area of former coal, iron, and steel industries in Germany
.
J Toxicol Environ Health A
2017
;
80
:
430
8
.
34.
Ma
C
,
Gu
L
,
Yang
M
,
Zhang
Z
,
Zeng
S
,
Song
R
, et al
rs1495741 as a tag single nucleotide polymorphism of N-acetyltransferase 2 acetylator phenotype associates bladder cancer risk and interacts with smoking: A systematic review and meta-analysis
.
Medicine
2016
;
95
:
e4417
.