Abstract
Germline DNA copy number variation (CNV) is a ubiquitous source of genetic variation and remains largely unexplored in association with epithelial ovarian cancer (EOC) risk.
CNV was quantified in the DNA of approximately 3,500 cases and controls genotyped with the Illumina 610k and HumanOmni2.5M arrays. We performed a genome-wide association study of common (>1%) CNV regions (CNVRs) with EOC and high-grade serous (HGSOC) risk and, using The Cancer Genome Atlas (TCGA), performed in silico analyses of tumor-gene expression.
Three CNVRs were associated (P < 0.01) with EOC risk: two large (∼100 kb) regions within the 610k set and one small (<5 kb) region with the higher resolution 2.5M data. Large CNVRs included a duplication at LILRA6 (OR = 2.57; P = 0.001) and a deletion at CYP2A7 (OR = 1.90; P = 0.007) that were strongly associated with HGSOC risk (OR = 3.02; P = 8.98 × 10−5). Somatic CYP2A7 alterations correlated with EGLN2 expression in tumors (P = 2.94 × 10−47). An intronic ERBB4/HER4 deletion was associated with reduced EOC risk (OR = 0.33; P = 9.5 × 10−2), and somatic deletions correlated with ERBB4 downregulation (P = 7.05 × 10−5). Five CNVRs were associated with HGSOC, including two reduced-risk deletions: one at 1p36.33 (OR = 0.28; P = 0.001) that correlated with lower CDKIIA expression in TCGA tumors (P = 2.7 × 10−7), and another at 8p21.2 (OR = 0.52; P = 0.002) that was present somatically where it correlated with lower GNRH1 expression (P = 5.9 × 10−5).
Though CNV appears to not contribute largely to EOC susceptibility, a number of low-to-common frequency variants may influence the risk of EOC and tumor-gene expression.
Further research on CNV and EOC susceptibility is warranted, particularly with CNVs estimated from high-density arrays.
Introduction
Epithelial ovarian cancer (EOC) is the fifth most common cause of cancer-related death among women in North America (1). Because most women are diagnosed at advanced stages, better early detection and intervention is needed (1). Genome-wide association studies (GWAS) have identified thirty-nine common allelic variations associated with EOC susceptibility (2), but these variants explain only a modest fraction of heritability (3), thus more such loci likely exist. Exploration of other sources of DNA variation is warranted.
High-throughput genome technologies have revealed that the human genome contains substantial structural variation. An estimated 10%–13% of DNA content can be spanned by copy number variation (CNV), segments of DNA one kilobase or larger in length, that differ from a reference genome (4, 5). Germline CNV can be inherited or occur de novo (6) and predispose to an array of complex diseases including familial and sporadic types of cancer (7). The contribution of CNV to EOC risk remains largely unexplored.
Our group has previously evaluated whether inherited CNVs were associated with overall survival among 1,056 women with EOC; no associations achieved statistical significance after adjustment for multiple comparisons (8). Almost all studies evaluating CNV and EOC risk have been conducted among BRCA1 carriers or women with hereditary breast-ovarian cancer syndrome (9–11). The largest study to-date included 357 EOC cases and 1,962 nonovarian cancer-affected BRCA1 carriers from The Consortium of Investigators of Modifiers of BRCA1/BRCA2 (CIMBA), where a validated deletion in CYP2A7 was associated with decreased EOC risk (10). An analysis of The Cancer Genome Atlas (TCGA) compared the germline-somatic landscape in exomes of 429 high-grade serous (HGSOC) cases to 557 controls (12). However, copy number analysis was limited to BRCA1, BRCA2, and TP53 (12). This study represents the first large-scale genome-wide analysis of germline CNV evaluating associations with EOC risk among unselected women from the general population.
Methods
Study population
Our GWAS of EOC utilized two genotyping platforms. Four case-control studies from Mayo Clinic (Rochester, MN), Duke University (Durham, NC), University of Toronto (Toronto, Canada), and Moffitt Cancer Center (Tampa, FL) used the Illumina 610-quad Beadchip Array (“610k”). An independent sample of cases and controls from Mayo Clinic was genotyped on the Illumina HumanOmni2.5M-8 Beadchip (“2.5M”). Both GWAS sets included patients with incident, pathologically confirmed primary EOC, either borderline or invasive, aged 20 or above. DNA samples from women having less than 80% European ancestry were excluded (13). Full study details have been previously published (14, 15).
CNV calling and quality control
CNV segmentation was performed with PennCNV software (16) on probe-level B allele frequency (BAF) and log2 R ratio (LRR) for autosomal SNPs mapped to GRCh37 (hg19), with adjustment for local GC content responsible for signal fluctuations (17). Segments spanning at least 3 probes and confidence scores >10 were retained. To reduce possible batch effects or poor quality intensity data, we excluded samples with outliers [>median + 1.5 interquartile range (IQR)] for LRR SD, BAF drift, GC wave factor, and number of CNV calls. In total, 856 (23%) of the 610k array samples and 219 (22%) of the 2.5M array samples were excluded. The dataset used for this analysis will be made available through dbGAP under study accession phs001133.v1.p1.
Common CNV regions and association testing
CNV regions (CNVRs) were defined using the CNVruler tool (18) that constructs CNVRs by merging CNV segments that overlap by at least 1 bp and trims any rare, long CNV. Logistic regression was used to compare CNVR status (deletion/no deletion; duplication/no duplication) between cases and controls that occurred with >1% frequency among all samples in a set. To adjust for population stratification, eigenvectors were calculated from a matrix of CNVR status and the first principal component was included as a covariate of regression (Supplementary Fig. S1). Site, age, and experimental batch are known sources of bias (19) but none affected CNVR estimates and these were not retained in the risk model. As a sensitivity analysis for the CNVR merge method, we also employed ParseCNV (20) that performs SNP-level association testing and merges significant SNPs into risk-associated CNVRs. CNV mapping studies suggest that deletions are poorly tolerated and under negative selection whereas duplications are less likely to be pathogenic and are often under positive selection, which drives evolution of many gene families (21). Thus, for both analytic approaches, deletions and duplications were analyzed separately. Risk associations are reported for CNVRs that reached P < 0.01 significance threshold. We excluded T-cell receptor (TCR) and immunoglobulin heavy (IGH) chain genomic regions from analyses as these undergo V-(D)-J recombination in lymphocytes and can result in detection of somatic CNVs rather than inherited, germline CNV, which is the focus of this study (22, 23). These regions included TCR alpha and delta of chromosome 14 (chr14:22090057-23021075 and chr14:22891537-22935569, respectively), beta and gamma on chromosome 7 (chr7:141998851-142510972 and chr7:38279625-38407656, respectively), and IGH regions on chromosomes 14 and 16 (chr14:106032614-107288051 and chr16: 33740716-33741266).
Integration of CNV and tumor transcriptome using TCGA
To explore the correlation of copy number and gene expression, we obtained copy number segments, gene-level Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values from RNA sequencing, and CpG island methylation data for 571 HGSOC cases from TCGA (24) that had germline CNV quantified from either blood or normal tissue samples. CNV segments were estimated from SNP 6.0 array using circular binary segmentation and were limited to those with a minimum of three probes and <10 MB in size. Samples with a high number of calls (>median + 1.5 IQRs) were excluded. We defined deletions as segments with mean copy number ≤ 0.3 and amplifications as those > 0.3. We employed multivariate linear regression to model both CNV and somatic copy number alterations (SCNA) in the cancer genome (diploid, deletion, duplication) on mRNA expression level (log2 transformed) in tumor tissue. P values were calculated with the likelihood ratio statistic. For each gene, the effect of CpG methylation was regressed out, as described previously (25). Statistical analyses were performed in R (www.r-project.org).
Results
Table 1 summarizes the clinical characteristics and copy number distribution of 2,818 subjects (1,368 cases) genotyped with the 610k array and 792 subjects (449 cases) genotyped with the 2.5M array after applying quality control (QC) exclusions. Cases were slightly older than controls on average and the majority had serous tumors. Most (75%) of the CNV calls in the 610k array data were deletions, whereas the majority (59%) in the 2.5M set were duplications. The average number and length of deletions and duplications were similar between cases and controls (Table 1). The distribution of CNV in the 610k set was largely similar across the five main histotypes of EOC and by stage at diagnosis, although advanced stage endometrioid cases (n = 50) averaged a significantly higher number of deletions (P = 0.0003; Supplementary Table S1). Although 2.5M sample sizes were too small to investigate stage of disease, low-grade serous cases (n = 43) did average a significantly lower number of duplications (P = 0.002; Supplementary Table S2).
. | 610k Array . | . | 2.5M Array . | . | ||
---|---|---|---|---|---|---|
. | N = 2818 . | . | N = 792 . | . | ||
. | Case . | Control . | Pa . | Case . | Control . | Pa . |
N | 1,368 | 1,450 | 449 | 343 | ||
Age | ||||||
Mean (range) | 60.1 (26–91) | 57.7 (20–90) | <0.001 | 62.2 (20–88) | 58.2 (22–93) | <0.001 |
Study site | ||||||
Minnesota region (Mayo) | 325 | 467 | 449 | 343 | ||
North Carolina (Duke) | 406 | 553 | 0 | 0 | ||
Tampa Bay (Moffitt) | 152 | 139 | 0 | 0 | ||
Toronto (U of Toronto) | 485 | 291 | 0 | 0 | ||
Histotype (%) | ||||||
Serous | 825 (60%) | 346 (77%) | ||||
High grade | 410 (30%) | 303 (67%) | ||||
Low grade/LMP | 121 (9%) | 43 (10%) | ||||
Unknown | 294 (21%) | 0 | ||||
Endometrioid | 241 (18%) | 30 (7%) | ||||
Mucinous | 63 (5%) | 28 (6%) | ||||
Clear cell | 112 (8%) | 15 (3%) | ||||
Mixed cell | 38 (3%) | 25 (6%) | ||||
Other | 89 (7%) | 5 (1%) | ||||
CNV segments (No.) | ||||||
All | 30,877 | 33,626 | 28,578 | 21,929 | ||
Deletions | 23,532 (76%) | 25,028 (74%) | 11,947 (42%) | 8,821 (40%) | ||
Duplications | 7,345 (24%) | 8,598 (26%) | 16,631 (58%) | 13,108 (60%) | ||
CNV segments (Mean) | ||||||
All | 22.6 | 23.2 | 0.93 | 63.7 | 63.9 | 0.63 |
Deletions | 17.2 | 17.3 | 0.56 | 26.6 | 25.7 | 0.06 |
Duplications | 5.4 | 5.9 | 1.00 | 37.0 | 38.2 | 0.92 |
Average CNV length (Kb) | ||||||
All | 77 | 80 | 0.99 | 48 | 49 | 0.93 |
Deletions | 55 | 54 | 0.34 | 25 | 26 | 0.73 |
Duplications | 144 | 146 | 0.64 | 65 | 66 | 0.77 |
. | 610k Array . | . | 2.5M Array . | . | ||
---|---|---|---|---|---|---|
. | N = 2818 . | . | N = 792 . | . | ||
. | Case . | Control . | Pa . | Case . | Control . | Pa . |
N | 1,368 | 1,450 | 449 | 343 | ||
Age | ||||||
Mean (range) | 60.1 (26–91) | 57.7 (20–90) | <0.001 | 62.2 (20–88) | 58.2 (22–93) | <0.001 |
Study site | ||||||
Minnesota region (Mayo) | 325 | 467 | 449 | 343 | ||
North Carolina (Duke) | 406 | 553 | 0 | 0 | ||
Tampa Bay (Moffitt) | 152 | 139 | 0 | 0 | ||
Toronto (U of Toronto) | 485 | 291 | 0 | 0 | ||
Histotype (%) | ||||||
Serous | 825 (60%) | 346 (77%) | ||||
High grade | 410 (30%) | 303 (67%) | ||||
Low grade/LMP | 121 (9%) | 43 (10%) | ||||
Unknown | 294 (21%) | 0 | ||||
Endometrioid | 241 (18%) | 30 (7%) | ||||
Mucinous | 63 (5%) | 28 (6%) | ||||
Clear cell | 112 (8%) | 15 (3%) | ||||
Mixed cell | 38 (3%) | 25 (6%) | ||||
Other | 89 (7%) | 5 (1%) | ||||
CNV segments (No.) | ||||||
All | 30,877 | 33,626 | 28,578 | 21,929 | ||
Deletions | 23,532 (76%) | 25,028 (74%) | 11,947 (42%) | 8,821 (40%) | ||
Duplications | 7,345 (24%) | 8,598 (26%) | 16,631 (58%) | 13,108 (60%) | ||
CNV segments (Mean) | ||||||
All | 22.6 | 23.2 | 0.93 | 63.7 | 63.9 | 0.63 |
Deletions | 17.2 | 17.3 | 0.56 | 26.6 | 25.7 | 0.06 |
Duplications | 5.4 | 5.9 | 1.00 | 37.0 | 38.2 | 0.92 |
Average CNV length (Kb) | ||||||
All | 77 | 80 | 0.99 | 48 | 49 | 0.93 |
Deletions | 55 | 54 | 0.34 | 25 | 26 | 0.73 |
Duplications | 144 | 146 | 0.64 | 65 | 66 | 0.77 |
aMean age was compared between cases and controls with the Student t test. Empirical P values are reported for difference in CNV average number of segments and difference in average CNV length based on 10,000 permutations.
CNVRs were constructed by merging overlapping CNV calls across individuals in a study set, separately for deletions and duplications. In the 610k set, there were 7,384 CNVRs that included 348 regions occurring in ≥1% of subjects (denoted as common CNVRs henceforth), 3,105 rare regions (<1%), and 3,931 regions detected in a single individual (singletons). In the 2.5M set, CNV calls merged into 3,732 CNVRs: 624 common regions, 972 rare regions, and 2,136 singletons. Notably, the majority (>80%) of CNV was rare (<1%) or singletons. Rare CNV calls trended higher for cases than controls in both array sets but the results were not statistically significant (Supplementary Table S3).
CNVR size distributions differed between array sets, likely reflecting differences in probe density (Fig. 1). CNVRs in the 2.5M set spanned shorter genomic regions (median = 23 kb) compared with the 610k set (median = 246 kb) where regions >200 kb in size comprised the majority (70%) of CNVRs. Most of the common CNVRs in the 610k set (N = 271, 78%) were detected at least once within the higher resolution 2.5M array set and, conversely, 383 (61%) in the 2.5M set were detected at least once in the 610k set. We limited CNVRs to those detected within both array sets and excluded genomic regions that are somatically deleted in lymphocytes and are not likely to be inherited, germline CNV (see Methods). In total, 189 deletion and 74 duplication CNVRs in the 610k set and 252 deletion and 125 duplication CNVRs in the 2.5M set were analyzed for association with EOC risk.
Common CNV regions and EOC risk
CNVR associations with EOC risk overall and with HGSOC are shown in Fig. 2. Differences in copy number between all EOC cases and controls were detected (P < 0.01) at three common CNVRs (Table 2). Two CNVRs spanning approximately 100 kb each were associated with EOC risk within the 610k population and a third, substantially smaller CNVR <5 kb in length was associated with risk in the 2.5M analysis. HGSOC-specific analysis showed that all three EOC risk–associated CNVRs were also associated with HGSOC (Table 3). An additional four CNVR in the 610k analysis and one CNVR in the 2.5M analysis were associated with HGSOC risk (P < 0.01) that were not identified in the overall EOC analysis. All CNVRs were detected in both array sets albeit frequencies varied between sets and comparable regions were substantially smaller in the 2.5M set (Supplementary Table S4). In addition, risk-associated CNVRs were compared with the Database of Genomic Variants and shown to overlap gold standard copy number variants (Supplementary Table S5).
. | . | . | . | Merged CNVRa . | SNP-level CNVRb . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Array (N cases) . | Locus . | CNV type . | Gene . | CNVR (KB) . | Case/control, n (%) . | P . | OR (95% CI) . | Tag SNP . | Total SNPs . | P . | OR (95% CI) . |
610k (n = 1,368) | 19q13.2 | Del | CYP2A7 | chr19:41341589-41433931 (92) | 49 (4)/29 (2) | 0.007 | 1.90 (1.19–3.03) | rs2545754 | 9 | 0.006 | 2.03 (1.22–3.36) |
19q13.42 | Dup | LILRA6 | chr19:54731679-54845802 (114) | 39 (3)/17 (1) | 0.001 | 2.57 (1.44–4.57) | rs11672654 | 11 | 0.01 | 5.87 (1.53–22.57) | |
2.5M (n = 449) | 2q34 | Del | ERBB4 | chr2:213187034-213191389 (4) | 8 (2)/18 (5) | 0.0095 | 0.33 (0.14–0.76) | kgp5655115 | 12 | 0.008 | 0.33 (0.15–0.75) |
. | . | . | . | Merged CNVRa . | SNP-level CNVRb . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Array (N cases) . | Locus . | CNV type . | Gene . | CNVR (KB) . | Case/control, n (%) . | P . | OR (95% CI) . | Tag SNP . | Total SNPs . | P . | OR (95% CI) . |
610k (n = 1,368) | 19q13.2 | Del | CYP2A7 | chr19:41341589-41433931 (92) | 49 (4)/29 (2) | 0.007 | 1.90 (1.19–3.03) | rs2545754 | 9 | 0.006 | 2.03 (1.22–3.36) |
19q13.42 | Dup | LILRA6 | chr19:54731679-54845802 (114) | 39 (3)/17 (1) | 0.001 | 2.57 (1.44–4.57) | rs11672654 | 11 | 0.01 | 5.87 (1.53–22.57) | |
2.5M (n = 449) | 2q34 | Del | ERBB4 | chr2:213187034-213191389 (4) | 8 (2)/18 (5) | 0.0095 | 0.33 (0.14–0.76) | kgp5655115 | 12 | 0.008 | 0.33 (0.15–0.75) |
Abbreviation: CI, confidence interval.
aCNVR was defined by overlapping CNV segments across subjects. Coordinates are mapped to human genome build 37 (hg19). P values are from logistic regression adjusted for the first principal component.
bCNVR was defined as region with significant SNP-level statistics, and P value reported is from Fisher exact test comparing deletion/no deletion or duplication/no duplication. All epithelial OC versus controls reported.
. | . | . | . | Merged CNVR . | SNP-level CNVR . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Array (N cases) . | Locus . | CNV type . | Gene(s) . | CNVR (KB)a . | Case/control, n (%) . | P . | OR (95% CI) . | Tag SNP . | Total SNPs . | P . | OR (95% CI) . |
610k (n = 410) | 1p36.33 | Del | ACAP3, CPSF3L, DVL1, GLTPD1, MXRA8, PUSL1, TAS1R3b | Chr1:943468-1706160 (763) | 7 (2) 84 (6) | 0.001 | 0.28 (0.13–0.60) | rs3737717 | 13 | 0.001 | 0.31 (0.15–0.62) |
1p13.3 | Del | Intergenic | Chr1:111370372-111391381 (21) | 23 (6) 40 (3) | 0.007 | 2.07 (1.21–3.51) | rs6677356 | 6 | 0.008 | 2.09 (1.21–3.60) | |
8p21.2 | Del | DOCK5 | Chr8:24931313-25101936 (171) | 27 (7) 168 (12) | 0.002 | 0.52 (0.34–0.79) | cnvi0001694 | 7 | 0.003 | 0.54 (0.36–0.81) | |
12p11.21 | Dup | RP11-428G5.5 | Chr12:31975730–32068877 (93) | 19 (5) 32 (2) | 0.0098 | 2.14 (1.20–3.82) | rs1259725 | 27 | 0.015 | 2.15 (1.16–3.98) | |
19q13.2 | Del | CYP2A7 | chr19:41341589-41433931 (92) | 24 (6) 29 (2) | 8.98E-05 | 3.02 (1.74–5.25) | rs2545754 | 9 | 2.79E-05 | 3.54 (1.96–6.39) | |
19q13.42 | Dup | LILRA6 | chr19:54731679-54845802 (114) | 15 (4) 17 (1) | 0.001 | 3.16 (1.56–6.39) | rs11672654 | 8 | 0.004 | 2.98 (1.42–6.27) | |
2.5M (n = 303) | 2q34 | Del | ERBB4 | chr2:213187034-213191389 (4) | 3 (1) 18 (5) | 0.005 | 0.18 (0.05–0.60) | kgp5655115 | 12 | 0.003 | 0.18 (0.06–0.56) |
5p15.2 | Del | Intergenic | Chr5:12812336-12888815 (76) | 55 (18) 36 (10) | 0.005 | 1.91 (1.22–3.01) | kgp22267001 | 3 | 0.009 | 1.85 (1.17–2.94) |
. | . | . | . | Merged CNVR . | SNP-level CNVR . | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Array (N cases) . | Locus . | CNV type . | Gene(s) . | CNVR (KB)a . | Case/control, n (%) . | P . | OR (95% CI) . | Tag SNP . | Total SNPs . | P . | OR (95% CI) . |
610k (n = 410) | 1p36.33 | Del | ACAP3, CPSF3L, DVL1, GLTPD1, MXRA8, PUSL1, TAS1R3b | Chr1:943468-1706160 (763) | 7 (2) 84 (6) | 0.001 | 0.28 (0.13–0.60) | rs3737717 | 13 | 0.001 | 0.31 (0.15–0.62) |
1p13.3 | Del | Intergenic | Chr1:111370372-111391381 (21) | 23 (6) 40 (3) | 0.007 | 2.07 (1.21–3.51) | rs6677356 | 6 | 0.008 | 2.09 (1.21–3.60) | |
8p21.2 | Del | DOCK5 | Chr8:24931313-25101936 (171) | 27 (7) 168 (12) | 0.002 | 0.52 (0.34–0.79) | cnvi0001694 | 7 | 0.003 | 0.54 (0.36–0.81) | |
12p11.21 | Dup | RP11-428G5.5 | Chr12:31975730–32068877 (93) | 19 (5) 32 (2) | 0.0098 | 2.14 (1.20–3.82) | rs1259725 | 27 | 0.015 | 2.15 (1.16–3.98) | |
19q13.2 | Del | CYP2A7 | chr19:41341589-41433931 (92) | 24 (6) 29 (2) | 8.98E-05 | 3.02 (1.74–5.25) | rs2545754 | 9 | 2.79E-05 | 3.54 (1.96–6.39) | |
19q13.42 | Dup | LILRA6 | chr19:54731679-54845802 (114) | 15 (4) 17 (1) | 0.001 | 3.16 (1.56–6.39) | rs11672654 | 8 | 0.004 | 2.98 (1.42–6.27) | |
2.5M (n = 303) | 2q34 | Del | ERBB4 | chr2:213187034-213191389 (4) | 3 (1) 18 (5) | 0.005 | 0.18 (0.05–0.60) | kgp5655115 | 12 | 0.003 | 0.18 (0.06–0.56) |
5p15.2 | Del | Intergenic | Chr5:12812336-12888815 (76) | 55 (18) 36 (10) | 0.005 | 1.91 (1.22–3.01) | kgp22267001 | 3 | 0.009 | 1.85 (1.17–2.94) |
Abbreviation: CI, confidence interval.
aCNVR was defined by overlapping CNV segments across subjects. P values are from logistic regression adjusted for the first principal component.
bCNV segments centered on the reported genes, which fell within the SNP-level significance region.
Within the analysis of all invasive EOC, a duplication region at 19q13.42 within the human leukocyte receptor cluster was associated with increased risk of EOC in the 610k set (P = 0.001; OR = 2.57). Duplications within this CNVR spanned leukocyte immunoglobulin like receptor A6 (LILRA6) and occurred with low frequency (2%). The CNVR was more common in 2.5M subjects (28%) but frequency did not differ by disease status (P = 0.89; OR = 0.98). A second CNVR identified in the 610k set was a deletion at 19q13.2 spanning cytochrome P450 family 2 subfamily A member 7 (CYP2A7) and the downstream, intergenic region (P = 0.007; OR = 1.90). The deletion region was similarly common in the 2.5M population (6%) but not associated with EOC risk (P = 0.27; OR = 0.71). Finally, a small, 4 kb deletion CNVR within intron 1 of erb-b2 receptor tyrosine kinase 4 (ERBB4/HER4) at 2q34 was associated with reduced EOC risk (P = 0.0095; OR = 0.33) in the 2.5M data. CNV within the boundaries of this region were rare (<1%) in the 610k analysis and not associated with risk (P = 0.47; OR = 1.60).
The HGSOC risk analysis was limited to a subset of 410 cases in the 610k array set and 303 cases in the 2.5M array set. The increase in EOC risk for CYP2A7 deletion carriers in the 610k set was stronger for HGSOC-specific risk (P = 8.98 × 10−5; OR = 3.02), although it was not associated with HGSOC risk in the 2.5M analysis (P = 0.18). The CNVRs at LILRA6 and ERBB4 showed slightly stronger risk associations with HGSOC (OR = 3.16 and 0.18, respectively).
Two relatively large deletion regions were associated with lower risk of HGSOC: one spanning much of 1p36.33 (P = 0.001; OR = 0.28) and the other a 171 kb deletion at 8p21.2 (P = 0.002; OR = 0.52). The CNVR at 1p36.33 spanned an approximately 750 kb region, although most deletions were located within a smaller approximately 60 kb region where multiple genes reside including disheveled segment polarity protein 1 (DVL1). While large, the 8p21.2 deletion region solely contained the transcription start site (TSS) and sequence for dedicator of cytokinesis 5 (DOCK5) and no other coding features. Both deletions were also detected with the 2.5M array set with the 1p36.33 CNVR smaller and centered on ATPase family, AAA domain containing 3B (ATAD3B) rather than DVL1 and the 8p21.2 CNVR concurring with the TSS region of DOCK5. Both 2.5M CNVR associations were not statistically significant (1p36.33: P = 0.71, OR = 0.80; 8p21.2: P = 0.61, OR = 1.13).
Three other CNVRs were associated with increased risk of HGSOC. First, duplications of a 100 kb region at 12p11.21 that contained lincRNA RP11-428G5.5 occurred in twice as many HGSOC cases as controls (P = 9.8 × 10−3; OR = 2.14). This CNVR was also detected in 3% of 2.5M subjects but was not associated with risk (P = 0.36; OR = 0.62). A smaller 21 kb intergenic region at 1p13.3 revealed deletions among 6% of HGSOC cases and 3% of controls (P = 0.007; OR = 2.07). These deletions were 22 kb upstream to ubiquitously expressed cell-surface protein CD53 molecule. The same deletion region was detected with the 2.5M array but only present in one control and no HGSOC cases. Finally, a more common intergenic deletion CNVR at 5p15.2 was associated with increased HGSOC risk in the 2.5M set (18% in cases, 10% in controls, P = 0.005; OR = 1.91). Deletions within this region were detected in ten HGSOC cases and five controls in the 610k set (P = 0.30; OR = 1.76).
Our analysis of CNVRs was based on the assignment of a consensus boundary defined by merging individual segments. To determine whether this affected downstream association analyses, we conducted a sensitivity analysis that identified regions where SNP-level copy number was significantly associated with EOC risk rather than solely testing in predefined regions. All CNVRs associated with EOC risk in our primary analysis were also detected using the SNP-based CNVR approach, with notably higher risk estimates at the CNVR containing LILRA6 (Table 2), and more similar estimates for HGSOC-specific risk associations (Table 3). In addition, CNVR boundaries in our primary analysis were established by merging deletions and duplications separately. As an alternative approach, we merged segments irrespective of CNV type, which defined gain only, loss only, and mixed type regions. Consequently, CNVR boundaries were altered from our primary analysis; however, risk associations remained significant for all regions and were strengthened for the 8p21.2 (DOCK5) association with EOC risk (P = 0.004; OR = 0.69; Supplementary Tables S6 and S7). Analysis of mixed type CNV (deletion or duplication vs. diploid) did not identify any additional common CNVR associated with EOC risk.
Association of risk CNVRs with transcription levels in tumor tissue
With the multilevel data available from TCGA, we sought to determine whether germline CNV within the risk-associated CNVRs correlated with primary tumor mRNA expression levels. This required careful consideration of SCNA as they are the most prevalent alteration in the cancer genome and are known to influence oncogene activation and tumor suppressor gene inactivation in tumor tissues (26). Thus, within seven CNVRs, we quantified (i.e., deletion/diploid/duplication) CNV of the germline DNA and focal SCNA in the tumor and estimated their independent effects on cis-mRNA gene expression in 382 HGSOC cases from TCGA having both CNV and RNA-sequencing data. We excluded 5p15 as no mRNA sequences were within 500 kb of the CNVR.
CNVs were detected with common frequencies (1%–41%) within the risk-associated regions for the TCGA set of HGSOC cases, which was derived from a separate platform and segmentation algorithm, increasing our confidence in their validity (Fig. 3A). These regions also contained a high frequency of somatic alterations in the tumor genome (16%–26%), excluding the 1p36 region that was diploid in all HGSOC tumors. CNV at 1p36 was the only region significantly associated with expression of mRNA after adjustment for SCNA (Fig. 3B). Eleven percent (N = 42) of TCGA cases had germline deletions at the 1p36 CNVR and tumor expression in these subjects was significantly downregulated for cyclin dependent kinase 11A (CDK11A) compared with noncarriers [fold change (FC) = −1.8, P = 2.68 × 10−7].
SCNA at the risk CNVRs generally spanned large segments of chromosomal bands but only a subset (44/110) of amplified/deleted genes correlated (P < 0.05 and FC > 1.5) with altered gene expression (Supplementary Table S8; Fig. 3C). Four regions (1p13, 2q34, 19q13.2, and 19q13.42) exhibited both deletions and duplications in cancer genomes while 8p21.2 had only deletions and 12p11.21 had only duplications. Across all characterized genes, the most statistically significant association was between tumor copy number at 19q13.2 and egl-9 family hypoxia inducible factor 2 (EGLN2) expression (FCDel = 1.7, P = 2.94 × 10−47); 20 other genes were also correlated with copy number at this region. Notably, deletion of CYP2A7, the location of the risk-associated CNVR, was not associated with CYP2A7 or CYP2A6 expression (P = 0.09 and 0.08, respectively). On the basis of a public catalog of enhancers, the CYP2A7 CNVR overlaps an enhancer region in normal ovarian tissue predicted to affect the expression of EGLN2 (27). Somatic deletion of the enhancer region cooccurred with deletion of EGLN2 in all SCNA carriers except one. SCNA at the 19q13.42 region was correlated with 12 genes and most significantly with pre-mRNA processing factor 31 (PRPF31), a component of spliceosome complex, and TCF3 fusion partner (TFPT), which were significantly overexpressed when duplicated (FCDup = 2.0, P = 1.21 × 10−30 and FCDup = 2.5, P = 7.1 × 10−27, respectively). Expression of the immunoglobulin superfamily of genes clustered at this region, which include leukocyte immunoglobulin-like receptors and killer cell inhibitory receptors, was not associated with copy number (P > 0.05). The intergenic 21 kb CNVR at 1p13.3 was somatically altered in 16% of tumors and associated with expression of four genes. Tumors with somatic deletions averaged approximately two-fold lower expression for choline/ethanolamine phosphotransferase 1 (CEPT1; FCDel = −2.02; P = 9.58 × 10−22), DNA damage regulated autophagy modulator 2 (DRAM2; FCDel = 1.9; P = 1.18 × 10−13), and DENN domain containing 2D (DENN2D; FCDel = 2.1; P = 2.75 × 10−11). Risk-associated deletions at 2q34 were located within the first intron of ERBB4 whose entire sequence spans >1MB in length. No other mRNAs are located within 500 kb of the CNVR. ERBB4 somatic deletions were associated with a four-fold decrease in ERBB4 expression (FCDel = 4.2; P = 4.67 × 10−5).
SCNA at 8p21.2 and 12p11.21 were also common but showed specificity for one type of alteration. Amplifications at the 12p11 CNVR occurred in 17% of tumors and were associated with higher expression levels for four genes including two guanine exchange factors (GEF), FYVE, RhoGEF, and PH domain containing 4 (FGD4; FCDup = 1.5; P = 3.0 × 10−8), and DENN domain containing 5B (DENND5; FCDup = 1.7; P = 1.0 × 10−3), that display highest expression in ovarian tissue (28). Only 2 tumors contained deletions within the 12p11.21 CNVR. The 8p21.2 region was deleted more frequently (23%) than all other risk regions but duplications rarely occurred (N = 4). Deletions at 8p21.2 were associated with downregulation of gonadotropin-releasing hormone (GNRH1; FCDel = −1.7; P = 5.86 × 10−5) which is located 175 kb from the germline CNVR. The 8p21.2 germline deletion region spans both the TSS of DOCK5 and upstream histone modifications consistent with an enhancer element in ovarian tissue (27).
Discussion
CNV is a major source of human genetic variation that contributes as much to interindividual differences as the more frequently studied SNP (4). Here, we describe a large genome-wide association study of CNV with EOC risk that used a comprehensive dual array design and supplemented with in silico functional follow-up. Two SNP array datasets provided complementary strengths; the 610k array set contributed discovery power with its large sample size while the 2.5M set provided considerably higher resolution. Accordingly, we identified six relatively large CNV regions associated with EOC or HGSOC risk (P < 0.01) within the 610k array set and two smaller regions within the 2.5M set. In addition to limited power, the fewer detected differences and lack of replication with the 2.5M set may be due to the low frequency of variants, chance and sampling variation in the populations, and differential platform/probe CNV calling performance; it is probably a combination of these factors. By requiring CNVRs to be called by both platforms, our findings more likely reflect true variation rather than technical artifact, although type I error remains possible. Thus, we further detected and functionally characterized risk-associated CNVRs through analysis of TCGA data. The integration of both germline and somatic copy number with tumor transcription revealed associations that provided insight into the potential biological consequence of genomic copy number.
A large deletion at 1p36.33 was the only CNV independently associated with tumor transcription. Carriers were estimated to have an approximately 70% lower risk of EOC (P = 0.001) and corresponding analysis of tumor tissue showed lower expression of the cyclin-dependent kinase (CDK) CDK11A in carriers. CDK11 has three isoforms involved in cell-cycle control (p58), transcriptional regulation (p110), and apoptotic signaling (p46; ref. 29). CDK11-p58 is a centrosome-associated kinase expressed during the G2 to M transition and inhibition induces cell-cycle arrest and apoptosis (30) while CDK11-p110 positively regulates Hedgehog signaling and the Wnt/β-catenin signaling cascade (29). Accordingly, CDK overexpression is a common feature of many cancer types and in vitro and in vivo CDK11A/B knockdown induces apoptosis in EOC cells (31). It is therefore plausible that the reduced risk of EOC observed for 1p36.33 deletion carriers is conferred through reduced CDK11-associated oncogenic signaling. Potentially complicating this theory, this CNVR was notably the only region that remained diploid in all HGSOC tumors. CDK11-p58 promotes degradation of several steroid receptors such as androgen (32), vitamin D (33), and estrogen receptors (34), which inhibit migration and invasion of ERα-positive breast cancer cells (35). Thus, a similar suppressive role in progression of EOC may explain the lack of somatic amplification at 1p36.33.
The increased risk associated with a deletion at 19q13.2 containing CYP2A7 (OR = 1.90) was the only finding that remained significant after adjustment for multiple hypothesis testing (Bonferroni corrected P = 0.02; FDR = 0.02). This same deletion region was recently identified in association with lower ovarian cancer risk among 2,500 BRCA1 mutation carriers (CIMBA RR = 0.50; P = 0.007; ref. 10). CYP2A7 is a pseudogene largely expressed in the liver where it promotes expression of CYP2A6 (36) involved in the metabolism of nicotine and the tobacco-related procarcinogen nitrosamine (37). Genetic variation of CYP2A6, including a deletion, has been linked to a poor metabolizer phenotype and reduced risk of lung cancer in smokers (38). While altered enzymatic activity may similarly explain the reduced risk observed in BRCA1 carriers, this study suggests a more complex relationship. Our in silico analyses identified EGLN2, an enzyme (aka PHD1) involved in cellular response to hypoxia (39), as the mRNA most significantly associated with CYP2A7 SCNA (P = 1.17 × 10−49) but CYP2A7 expression was not (P = 0.09). These data support regulation of EGLN2 by an enhancer element at CYP2A7 (27). Numerous other genes were also associated with SCNA, such as melanoma inhibitory activity (MIA), which was upregulated in polyps of a germline CYP2A7 deletion carrier in a recent study of familial adenomatous polyposis (40). MIA is a novel class of secreted proteins that interact with the extracellular matrix to promote the development, invasiveness, and metastases in melanoma as well as in pancreatic and gastric carcinomas (41, 42). Thus, germline CYP2A7 deletions could have a role in promoting tumorigenesis and progression through epigenetic regulation of cis-genes such as EGLN2 and MIA and this role may act secondarily to a separate, distinct role in metabolism that may be beneficial for BRCA1 carriers. Altogether, the consistent detection of a CYP2A7 locus deletion and its association with EOC risk warrants further investigation.
We identified five other CNVRs at nominal statistical significance but functional characterization of SCNA identified biological pathways pertinent to EOC risk. Of particular interest was the reduced EOC risk (OR = 0.52) associated with deletions at 8p21.2 where somatic deletions corresponded with lower expression of GNRH1. Gonadotropin releasing hormone (GnRH) induces pituitary synthesis and secretion of follicle-stimulating hormone (FSH) and luteinizing hormone (LH), both of which are hypothesized to have an etiologic role in EOC (43). Although we did not observe an effect of germline CNV on tumor expression of GNRH1, it is tempting to hypothesize that deletions in this region may reduce systemic GnRH and thus mediate EOC risk associated with FSH/LH “excessive stimulation.” We also observed frequent (23%) deletion of 8p21.2 in TCGA tumors yet rare occurrence of amplifications (<1%), which is consistent with previous studies reporting common 8p21.2-p21.3 deletion and loss of heterozygosity in ovarian tumors, particularly for serous histology and high-grade and chemoresistant disease (44–47). Published analyses of TCGA data identified 8p21.2 as one of the 40 most common focal deletions in ovarian cancer genomes and the deletions correlated with GNRH1 expression (48). This indication of 8p21 as a tumor suppressor gene locus coincides with strong evidence that the extrapituitary, autocrine function of GnRH, involved in follicular development in the ovary (49), counteracts growth factor receptor signaling and exerts antiproliferative and antimotility effects in ovarian and other tumors (50). Our group previously observed SNPs within GNRH1 that exhibited gene–level associations with increased HGSOC risk (51); we now report the first indication of an association between HGSOC risk and germline CNV at this region.
Other CNVs included deletion of ERBB4 (HER4), a receptor tyrosine kinase in the EGFR family (e.g., EGFR, HER2) that is commonly mutated and highly expressed in many solid tumors including ovarian (52–54) where it portends chemotherapy resistance and poor survival (55–57). Consistent with these findings, ERBB4 deletions associated with decreased EOC risk (OR = 0.33). Although germline deletions were intronic and their consequence on gene function is unknown, intronic SNPs in ERBB4 affect its expression (58) and intronic CNV may also demonstrate this capability. SCNA at several risk CNVRs (1p13.3, 12p11.21, and 19q13.42) had multiple genic associations in pathways relevant to ovarian carcinogenesis including choline metabolism (CEPT1-1p13; ref. 59), regulation of autophagy and apoptosis (DRAM2-1p13, TFPT-19q13.2; refs. 60–62), and activation of cellular motility/migration in tumorigenesis (FGD4-12p11.21; ref. 63). Interestingly, somatic duplications at 19q13.42 were associated with expression of PRPF3, which has been previously associated with early HGSOC relapse (64). Although these transcriptome correlations are suggestive of tumor progression mechanisms, their implication in EOC risk is uncertain.
Although our study has by far the largest sample size to explore disease-associated CNV (7), analyses of GWAS for CNV suffer from reduced statistical power due to the rarity of CNV compared with SNPs and the statistically challenging detection of CNV from SNP arrays (65). Low frequency CNVRs (<5%) represented the majority (∼82%) of variation identified in this study and this distribution has also been observed in a study of over 190,000 European adults where 92.4% of the CNVs were present in <1 in 1,000 samples and 99.4% of them occurred with <1% frequency (65). While a meta-analysis of the two array sets may improve statistical power, we opted for a stratified analysis with comparative evaluations given the large discrepancy in probe coverage and CNV detection between arrays. High false-negative and false-positive CNV calls can also limit statistical power. Although multiple detection algorithms are often used to increase sensitivity, we opted to use PennCNV alone, which called approximately 90% of all variants detected using four algorithms in a previous study (10). We controlled for false positives at multiple stages in our analytic pipeline, including stringent QC of logRatio, BAF, and sample outliers, which excluded approximately 25% of samples. Future studies should include technical validation of CNV such as qPCR and this may also allow more permissive QC criterion to be used to increase sample size. Considering these limitations, we reported EOC risk associations that reached a P < 0.01 threshold and did not adjust for multiple testing. As a discovery study, it was preferential to reduce type II error, and avoid missing possibly important findings, at the expense of increased type I error.
In summary, this large genome-wide study identified common CNV events in genomic regions that frequently undergo somatic alterations in ovarian tumors to promote progression. The risk associations together with in silico functional analyses highlight several novel genomic regions with biologically plausible mechanisms for EOC predisposition and pathogenesis. Replication of the findings in a larger study population profiled on the same platform is warranted. Since the initiation of this study, SNP array data from the Oncoarray (3) have become available and present opportunity for future CNV studies.
Disclosure of Potential Conflicts of Interest
H. Jim is a consultant/advisory board member for RedHill Biopharma and Janssen Scientific Affairs. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: B.M. Reid, J.B. Permuth, E.L. Goode, T.A. Sellers
Development of methodology: B.M. Reid
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J.M. Cunningham, S. Narod, H. Risch, J.M. Schildkraut, T.A. Sellers
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B.M. Reid, Y.A. Chen, B.L. Fridley, Z. Chen, J.S. Barnholtz-Sloan, H. Risch, A.N. Monteiro
Writing, review, and/or revision of the manuscript: B.M. Reid, J.B. Permuth, Y.A. Chen, B.L. Fridley, E.S. Iversen, H. Jim, R.A. Vierkant, J.M. Cunningham, J.S. Barnholtz-Sloan, H. Risch, E.L. Goode, A.N. Monteiro, T.A. Sellers
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E.S. Iversen, Z. Chen, H. Risch
Study supervision: E.S. Iversen, H. Risch
Acknowledgments
We thank all of the women who participated, along with all of the researchers, clinicians, and staff who have contributed to the participating studies. This work was supported by NIH R01 CA114343 and U19 CA148112 (to T.A. Sellers), R01 CA122443 (to E.L. Goode), R01 CA76016 (to J.M. Schildkraut), R01 CA106414 (to R. Sutphen), P30 CA15083 for the Mayo Clinic Genotyping Shared Resource, and Mayo Clinic Ovarian Cancer SPORE grant P50 CA136393.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.