It has been estimated that >1,000 genetic loci have yet to be identified for breast cancer risk. Here we report the first study utilizing targeted next-generation sequencing to identify single-nucleotide polymorphisms (SNP) associated with breast cancer risk. Targeted sequencing of 283 genes was performed in 240 women with early-onset breast cancer (≤40 years) or a family history of breast and/or ovarian cancer. Common coding variants with minor allele frequencies (MAF) >1% that were identified were presumed initially to be SNPs, but further database inspections revealed variants had MAF of ≤1% in the general population. Through prioritization and stringent selection criteria, we selected 24 SNPs for further genotyping in 1,516 breast cancer cases and 1,189 noncancer controls. Overall, we identified the JAK2 SNP rs56118985 to be significantly associated with overall breast cancer risk. Subtype analysis performed for patient subgroups defined by ER, PR, and HER2 status suggested additional associations of the NOTCH3 SNP rs200504060 and the HIF1A SNP rs142179458 with breast cancer risk. In silico analysis indicated that coding amino acids encoded at these three SNP sites were conserved evolutionarily and associated with decreased protein stability, suggesting a likely impact on protein function. Our results offer proof of concept for identifying novel cancer risk loci from next-generation sequencing data, with iterative data analysis from targeted, whole-exome, or whole-genome sequencing a wellspring to identify new SNPs associated with cancer risk. Cancer Res; 77(19); 5428–37. ©2017 AACR.

Large-scale genome-wide association studies utilizing high-density genotyping microarrays have identified approximately 100 common variants associated with breast cancer risk (1–4), with >1000 additional loci yet to be identified (4). These common genetic variants have high minor allele frequencies (MAF) >1% and are associated with elevated breast cancer risk with ORs that are typically below 1.5 as compared with the general population (5). Most of these variants are located in the intronic or intergenic regions, with a small proportion (∼4%) within the coding regions (6, 7).

Whole-exome sequencing and targeted gene sequencing have generated an enormous amount of sequence data for coding regions of genes. Typically, variants that are detected with MAFs of >1% within the cases, are filtered out at an early stage of data analysis, as these variants are assumed to be common polymorphisms. We hypothesized that these common coding variants in breast cancer patients could be associated with breast cancer risk.

In this proof-of-concept study, targeted next-generation sequencing of 283 cancer-associated genes (Supplementary Table S1) was performed for 240 women with a family history of breast and/or ovarian cancer or early-onset breast cancer. We identified coding variants with MAF > 1% among these women but with MAF ≤ 1% within the general population (ascertained from the 1000 Genomes Project and ExAC databases), and 24 coding variants were selected for high-throughput genotyping in an additional cohort of 1,516 cases and 1,189 controls.

Study population

The discovery phase of the study utilized DNA samples obtained from 240 women with a family history of breast cancer and/or ovarian cancer or early-onset breast cancer (collectively designated as FH), who were referred to the National Cancer Centre Singapore (NCCS) for genetic risk assessment. They were invited to participate in the study if they had a family history of breast and/or ovarian cancer in first- and/or second-degree relatives; had both breast and ovarian cancer or bilateral breast cancer; or if they had early-onset breast or ovarian cancer at ≤40 years of age (Table 1; ref. 8). Of the 240 subjects, 12 did not have a personal cancer history of breast and/or ovarian cancer. Peripheral blood samples were taken and DNA was extracted using an optimized in-house method (9).

Table 1.

Demographic and clinical characteristics of study participants

Discovery cohortValidation cohort
Subjects (n = 240)Cases (n = 1,516)Controls (n = 1,189)
Personal cancer history 
 Breast 205 1,490 — 
 Ovarian 12 — — 
 Breast and ovarian 11 26 — 
 None 12 — — 
Agea (years) 
 Mean 39.2 51.5 42.8 
 Median 38 51 43 
 Range 19–67 24–91 21–79 
 Family history 112 81 — 
 Early onset breast cancer (≤40 years old) 151 192 — 
ER status 
 Positive 132 1,019 — 
 Negative 67 433 — 
 Unknown 17 64 — 
PR status 
 Positive 111 870 — 
 Negative 88 577 — 
 Unknown 17 69 — 
Her2 status 
 Positive 52 341 — 
 Negative 123 772 — 
 Equivocal 12 104 — 
 Unknown 29 299 — 
Triple-negative breast cancer 32 131 — 
Breast cancer histologic type 
 Invasive ductal carcinoma (IDC) 162 1,221 — 
 Invasive lobular carcinoma (ILC) 48 — 
 Invasive micropapillary carcinoma 12 — 
 Invasive mucinous carcinoma 33 — 
 Ductal carcinoma in situ (DCIS) 31 — 
 Others 16 — 
 Mixed histologic types 13 66 — 
 Unknown 14 89 — 
Histologic grade 
 Grade 1 177 — 
 Grade 2 42 500 — 
 Grade 3 69 592 — 
 Unknown 87 247 — 
Tumor size 
 ≤20 mm 73 521 — 
 20 mm to ≤50 mm 46 618 — 
 >50 mm 16 117 — 
 Unknown 81 260 — 
Lymph node status 
 Negative 80 636 — 
 1–3 positive 38 401 — 
 4–9 positive 13 168 — 
 ≥10 positive 14 104 — 
 Unknown 71 207 — 
Discovery cohortValidation cohort
Subjects (n = 240)Cases (n = 1,516)Controls (n = 1,189)
Personal cancer history 
 Breast 205 1,490 — 
 Ovarian 12 — — 
 Breast and ovarian 11 26 — 
 None 12 — — 
Agea (years) 
 Mean 39.2 51.5 42.8 
 Median 38 51 43 
 Range 19–67 24–91 21–79 
 Family history 112 81 — 
 Early onset breast cancer (≤40 years old) 151 192 — 
ER status 
 Positive 132 1,019 — 
 Negative 67 433 — 
 Unknown 17 64 — 
PR status 
 Positive 111 870 — 
 Negative 88 577 — 
 Unknown 17 69 — 
Her2 status 
 Positive 52 341 — 
 Negative 123 772 — 
 Equivocal 12 104 — 
 Unknown 29 299 — 
Triple-negative breast cancer 32 131 — 
Breast cancer histologic type 
 Invasive ductal carcinoma (IDC) 162 1,221 — 
 Invasive lobular carcinoma (ILC) 48 — 
 Invasive micropapillary carcinoma 12 — 
 Invasive mucinous carcinoma 33 — 
 Ductal carcinoma in situ (DCIS) 31 — 
 Others 16 — 
 Mixed histologic types 13 66 — 
 Unknown 14 89 — 
Histologic grade 
 Grade 1 177 — 
 Grade 2 42 500 — 
 Grade 3 69 592 — 
 Unknown 87 247 — 
Tumor size 
 ≤20 mm 73 521 — 
 20 mm to ≤50 mm 46 618 — 
 >50 mm 16 117 — 
 Unknown 81 260 — 
Lymph node status 
 Negative 80 636 — 
 1–3 positive 38 401 — 
 4–9 positive 13 168 — 
 ≥10 positive 14 104 — 
 Unknown 71 207 — 

aAge refers to the age of cancer diagnosis for cases and the age at recruitment for controls.

The validation phase was performed using 1,516 DNA samples from women of Chinese ancestry with breast cancer (Table 1). These samples were obtained from patients recruited at outpatient clinics at NCCS and Singapore General Hospital or were archival frozen peripheral blood samples from the SingHealth Tissue Repository (STR). DNA was extracted using the same in-house method used in the discovery phase. A control group of 1,189 healthy women of Chinese ancestry were also included. These controls were archival DNA samples obtained from the DNA Diagnostic and Research Lab, KK Women's and Children's Hospital, Singapore.

The study was approved by the SingHealth Centralized Institutional Review Board (CIRB Ref: 2008/478/B), and written informed consent was taken from each participant. Patient studies were conducted in accordance with the ethical guidelines of the Declaration of Helsinki.

Targeted sequencing and data analysis

The 240 DNA samples for the discovery phase were sequenced using a multi-gene target panel consisting of 283 genes (Supplementary Table S1). Target exome enrichment was performed using the Agilent SureSelect kit, and was subsequently sequenced on the Illumina Hiseq 2000 or Hiseq 4000 platforms. Using the Burrows–Wheeler alignment (BWA v0.7.5) tool (10), the raw sequence reads were mapped against the human reference genome (hg19) sequence. The aligned reads were sorted, reordered, and processed for PCR duplicates using Picard tool v1.74 (http://broadinstitute.github.io/picard/). Indel realignment was performed on the targets using the GATK IndelRealigner module (11). Variant (SNPs and indels) detection in the target regions was carried out using GATK HaplotypeCaller algorithm v3.4-46 (11). The functional annotation of the variants was performed using the ANNOVAR pipeline (12), and tools such as SIFT (13), PolyPhen-2 (14), MutationTaster (15), Mutation Assessor (16), and CADD (17) were used to predict the impact of amino acid change on the protein.

Variant filtering and SNP selection

Variants in the exonic, and ±50 bp intronic regions flanking the exons were included in the analysis. In the next step of filtering, variants were selected on the basis of minor allele frequency (MAF ≤1%) obtained from the 1000 Genomes Project (18) and ExAC databases (19) for all ethnicities (Fig. 1). Subjects with mutations in 25 known breast cancer predisposition genes (ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1, CDH1, CDKN2A, CHEK2, FANCC, MLH1, MSH2, MSH6, NBN, NF1, PALB2, PMS2, PTEN, RAD51C, RAD51D, SMAD4, STK11, TP53, VHL, and XRCC2) associated with increasing breast cancer risk were excluded. After exclusion, we had 170 subjects in our study cohort. From the filtered list of variants, the corresponding list of genes were compared against the list of highly mutated genes carrying somatic mutations in the publicly available TCGA breast and ovarian cancer datasets (http://cancergenome.nih.gov/), and genes that were common to both were selected, and its variants were chosen for further analysis in this study. Variants with a PhyloP (20) conservation score of zero and above were retained, as were variants with a CADD score greater than or equal to 10. Finally, a set of 24 SNPs were chosen for SNP genotyping, prioritized by the frequency of occurrence in the samples (Fig. 1).

Figure 1.

Flow chart for the selection of 24 SNPs.

Figure 1.

Flow chart for the selection of 24 SNPs.

Close modal

SNP genotyping

SNP genotyping was carried out on 192.24 Dynamic Array integrated fluidic circuits (IFC) using TaqMan SNP Genotyping Assays (Applied Biosystems; ref. 21). The IFC Controller RX (Fluidigm) was used to load samples and assays onto the IFC, and the BioMark HD (Fluidigm) was used for thermal cycling and detection of fluorescence. Data were analyzed using the Fluidigm SNP Genotyping Analysis software, which automatically calls genotypes based on k-means clustering.

SNP association analysis

Statistical analysis on SNP genotype data was performed using the R package, SNPassoc (22) to estimate the per-allele OR with 95% confidence intervals (CI), and evaluated using logistic regression model to determine their significance. In addition, association of the SNPs was carried out with respect to ER status, PR status, HER2 status, sporadic breast cancer, or FH. A P value of ≤0.05 was considered statistically significant.

Detection of amino acid conservation using ConSurf analysis and multiple sequence alignment

ConSurf analysis (http://consurftest.tau.ac.il/) was carried out to identify the conservation of amino acid residue position in a protein utilizing the phylogenetic relationship between homologous sequences. Using the R Bioconductor package msa (23), multiple sequence alignment across 12 different species was carried out based on the ClustalOmega method (24). For rs200504060 (NOTCH3), only 10 species were available.

Protein structure stability prediction

To determine the effect of the mutation on protein structure and function, we used three different protein stability prediction programs namely I-Mutant2.0 (25), Impact of Non synonymous variations on Protein Stability-Multi-Dimension (INPS-MD) (26), and the HOPE server (27). The protein sequences of the three genes JAK2, NOTCH3, and HIF1A were retrieved from the NCBI protein database and were used for the analysis of protein stability. The accession numbers for the protein sequences of the genes JAK2, NOTCH3, and HIF1A that were used are NP_004963.1, NP_000426.2, and NP_001521.1, respectively.

The discovery cohort (n = 240) comprised of subjects who were Chinese (82.9%), Malays (7.1%), Indians (2.9%), and others (7.1%). The clinicopathologic characteristics of this cohort are shown in Table 1. To investigate the association of the 24 selected SNPs with breast cancer risk, genotyping was performed on 1,516 breast cancer cases and 1,189 controls, all of whom were Chinese. The clinicopathologic characteristics of these subjects are described in Table 1.

Targeted sequencing resulted in the detection of 1,154 exonic and splicing variants within 217 genes excluding synonymous variants. After filtering and annotation of variants as shown in Fig. 1, there were 204 variants in 106 genes. Each of these variants were present in varying proportions in our cohort of 170 patient samples ranging from 1.17% to 97.64%. The observation of these variants being common in our discovery cohort cases was in contrast to their minor allele frequency (≤1% in all populations, and ≤4.2% in the East Asian population) reported in the publicly available databases such as the 1000 Genomes Project and ExAC (Table 2). Taking into consideration the fact that common variants have been found to be associated with breast cancer risk, we selected 24 variants with the highest occurrence in our cohort to determine their association with breast cancer risk. In addition, the functional effect of the 24 variants was predicted to be deleterious by one or more in silico prediction tools (Table 2). Furthermore, all 24 variants had a positive nucleotide conservation score (range, 0.879–9.230) measured using the PhyloP program (Table 2).

Table 2.

MAF of the 24 SNPs and their functional effect prediction using in silico tools

MAFIn silico pathogenicity prediction
SNPLocusGeneAllelesaFrequency, % (n = 170)b1000 Genomes (ALL)1000 Genomes (EAS)ExAC (ALL)ExAC (EAS)SIFTcPolyPhenMutation TasterMutation AssessordCADD C scaled scoreePhyloP conservation scoref
rs112515611 7q36.1 KMT2C G/A 97.64 NR NR NR NR 0.16 (T) Possibly damaging Disease causing 1.245 (L) 15.34 6.332 
rs60244562 7q36.1 KMT2C T/C 90.00 NR NR NR NR 0.09 (T) Damaging Disease causing 1.15 (L) 13.48 1.399 
rs199839047 7q36.1 KMT2C A/G 84.70 NR NR NR NR 0 (D) Damaging Disease causing 1.355 (L) 16.84 7.062 
rs112790792 7q36.1 KMT2C C/T 11.18 NR NR NR NR NA NA Disease causing NA 23.8 7.814 
rs201760077 8p11.21 KAT6A TCT/- 10.59 0.007 0.03 0.006 0.04 NA NA NA NA NA NA 
rs78128744 12q12 ARID2 A/G 7.65 0.005 0.024 0.003 0.03 0.34 (T) Benign Disease causing 0 (N) 11.12 5.517 
rs35118262 10q11.21 RET C/A 6.47 0.004 0.019 0.002 0.027 0.14 (T) Benign Disease causing 1.095 (L) 17.54 3.758 
rs146242251 22q13.2 EP300 A/G 6.47 0.003 0.011 0.002 0.017 0.02 (D) Benign Polymorphism automatic 0.975 (L) 11.57 1.289 
rs34172843 13q12.2 FLT3 T/A 6.47 0.005 0.025 0.002 0.023 0.13 (T) Benign Polymorphism 0 (N) 15.46 2.891 
rs138399473 Xq11.2 AMER1 C/T 5.88 0.007 0.028 0.003 0.034 0.29 (T) Benign Disease causing 0.695 (N) 16.03 2.524 
rs150804738 11q23.3 KMT2A G/A 5.88 0.005 0.024 0.002 0.034 0.23 (T) Damaging Disease causing 1.245 (L) 15.49 6.778 
rs200567881 6p21.32 DAXX CCT/- 5.88 0.004 0.02 0.002 0.018 NA NA NA NA NA NA 
rs75191113 7q36.1 KMT2C G/T 5.88 0.006 0.03 0.002 0.024 1 (T) Benign Disease causing 1.74 (L) 11.91 6.086 
rs142179458 14q23.2 HIF1A G/A 4.71 0.005 0.022 0.002 0.026 0.01 (D) Possibly damaging Disease causing 2.07 (M) 16.97 5.462 
rs3832931 14q32.31 HSP90AA1 TTT/- 4.71 0.008 0.038 0.003 0.042 NA NA NA NA NA NA 
rs4024370 7q36.1 KMT2C G/A 4.71 NR NR NR NR 1 (T) NA Disease causing NA 41 4.399 
rs200504060 19p13.12 NOTCH3 G/A 4.71 0.002 0.008 0.001 0.013 0 (D) Possibly damaging Disease causing 2.485 (M) 12.93 3.883 
rs56118985 9p24.1 JAK2 G/A 4.12 0.004 0.015 0.002 0.018 0.09 (T) Damaging Polymorphism 1.355 (L) 16.86 2.114 
rs78004519 7q36.1 KMT2C A/G 4.12 0.003 0.016 0.002 0.015 0.05 (D) Benign Disease causing 2.215 (M) 16.16 6.388 
rs75758215 2q22.1 LRP1B G/A 4.12 0.004 0.016 0.002 0.015 0.3 (T) Possibly damaging Disease causing 1.39 (L) 12.76 0.879 
rs75321043 1p34.1 MUTYH C/T 4.12 0.002 0.008 0.001 0.013 0.54 (T) Damaging Disease causing 1.735 (L) 23.3 9.23 
rs79777494 1p34.1 MUTYH G/A 4.12 0.002 0.008 0.001 0.013 0.09 (T) Possibly damaging Polymorphism 1.445 (L) 18.07 1.433 
rs150513105 17p12 NCOR1 C/T 4.12 0.002 0.012 0.001 0.011 1 (T) Benign Disease causing 1.04 (L) 12.64 1.318 
rs3782356 12q13.12 KMT2D C/T 3.53 0.002 0.009 0.001 0.014 0.05 (D) Damaging Disease causing 1.965 (M) 32 6.006 
MAFIn silico pathogenicity prediction
SNPLocusGeneAllelesaFrequency, % (n = 170)b1000 Genomes (ALL)1000 Genomes (EAS)ExAC (ALL)ExAC (EAS)SIFTcPolyPhenMutation TasterMutation AssessordCADD C scaled scoreePhyloP conservation scoref
rs112515611 7q36.1 KMT2C G/A 97.64 NR NR NR NR 0.16 (T) Possibly damaging Disease causing 1.245 (L) 15.34 6.332 
rs60244562 7q36.1 KMT2C T/C 90.00 NR NR NR NR 0.09 (T) Damaging Disease causing 1.15 (L) 13.48 1.399 
rs199839047 7q36.1 KMT2C A/G 84.70 NR NR NR NR 0 (D) Damaging Disease causing 1.355 (L) 16.84 7.062 
rs112790792 7q36.1 KMT2C C/T 11.18 NR NR NR NR NA NA Disease causing NA 23.8 7.814 
rs201760077 8p11.21 KAT6A TCT/- 10.59 0.007 0.03 0.006 0.04 NA NA NA NA NA NA 
rs78128744 12q12 ARID2 A/G 7.65 0.005 0.024 0.003 0.03 0.34 (T) Benign Disease causing 0 (N) 11.12 5.517 
rs35118262 10q11.21 RET C/A 6.47 0.004 0.019 0.002 0.027 0.14 (T) Benign Disease causing 1.095 (L) 17.54 3.758 
rs146242251 22q13.2 EP300 A/G 6.47 0.003 0.011 0.002 0.017 0.02 (D) Benign Polymorphism automatic 0.975 (L) 11.57 1.289 
rs34172843 13q12.2 FLT3 T/A 6.47 0.005 0.025 0.002 0.023 0.13 (T) Benign Polymorphism 0 (N) 15.46 2.891 
rs138399473 Xq11.2 AMER1 C/T 5.88 0.007 0.028 0.003 0.034 0.29 (T) Benign Disease causing 0.695 (N) 16.03 2.524 
rs150804738 11q23.3 KMT2A G/A 5.88 0.005 0.024 0.002 0.034 0.23 (T) Damaging Disease causing 1.245 (L) 15.49 6.778 
rs200567881 6p21.32 DAXX CCT/- 5.88 0.004 0.02 0.002 0.018 NA NA NA NA NA NA 
rs75191113 7q36.1 KMT2C G/T 5.88 0.006 0.03 0.002 0.024 1 (T) Benign Disease causing 1.74 (L) 11.91 6.086 
rs142179458 14q23.2 HIF1A G/A 4.71 0.005 0.022 0.002 0.026 0.01 (D) Possibly damaging Disease causing 2.07 (M) 16.97 5.462 
rs3832931 14q32.31 HSP90AA1 TTT/- 4.71 0.008 0.038 0.003 0.042 NA NA NA NA NA NA 
rs4024370 7q36.1 KMT2C G/A 4.71 NR NR NR NR 1 (T) NA Disease causing NA 41 4.399 
rs200504060 19p13.12 NOTCH3 G/A 4.71 0.002 0.008 0.001 0.013 0 (D) Possibly damaging Disease causing 2.485 (M) 12.93 3.883 
rs56118985 9p24.1 JAK2 G/A 4.12 0.004 0.015 0.002 0.018 0.09 (T) Damaging Polymorphism 1.355 (L) 16.86 2.114 
rs78004519 7q36.1 KMT2C A/G 4.12 0.003 0.016 0.002 0.015 0.05 (D) Benign Disease causing 2.215 (M) 16.16 6.388 
rs75758215 2q22.1 LRP1B G/A 4.12 0.004 0.016 0.002 0.015 0.3 (T) Possibly damaging Disease causing 1.39 (L) 12.76 0.879 
rs75321043 1p34.1 MUTYH C/T 4.12 0.002 0.008 0.001 0.013 0.54 (T) Damaging Disease causing 1.735 (L) 23.3 9.23 
rs79777494 1p34.1 MUTYH G/A 4.12 0.002 0.008 0.001 0.013 0.09 (T) Possibly damaging Polymorphism 1.445 (L) 18.07 1.433 
rs150513105 17p12 NCOR1 C/T 4.12 0.002 0.012 0.001 0.011 1 (T) Benign Disease causing 1.04 (L) 12.64 1.318 
rs3782356 12q13.12 KMT2D C/T 3.53 0.002 0.009 0.001 0.014 0.05 (D) Damaging Disease causing 1.965 (M) 32 6.006 

NOTE: ALL includes East Asian (EAS), South Asian (SAS), European (EUR), African (AFR), and Ad mixed American (AMR) population.

Abbreviations: NR, not reported; NA, prediction is not available.

aMajor/minor allele of the SNP.

bFrequency of alternate homozygous/heterozygous genotypes in the discovery cohort.

cD, deleterious (SIFT ≤ 0.05); T, tolerated (SIFT > 0.05).

dVariants classified as neutral (N)/low (L) are predicted to be nonfunctional; variants classified as medium (M)/high (H) are predicted to be functional.

eVariants with CADD scores of more than or equal to 10 are classified as deleterious.

fPositive scores indicate evolutionary conservation.

All 24 SNP assays had a call rate of more than 95.0%, with an average call rate of 98.34%. All 24 SNPs are not included on commercially available Illumina and Affymetrix genotyping arrays. Five SNPs, rs112515611, rs60244562, rs199839047, rs112790792, and rs4024370, were found to be monomorphic in our cases and controls and were excluded from analysis. After exclusion of these monomorphic SNPs and applying Bonferroni correction to the remaining 19 SNPs, one SNP, rs56118985 located at 9p24.1/JAK2 was found to be significantly associated with breast cancer risk via an additive model (per allele OR = 1.81; 95% CI = 1.24 - 2.64; P = 0.00331; Table 3).

Table 3.

Association of 19 SNPs with breast cancer risk in the validation cohort

SNPChr: PositionGeneAllelesaRisk alleleFrequency, % (n = 1,516)bOR (95% CI)P
rs20176077 8: 41794797 KAT6A TCT/- — 7.9 1.01 (0.78–1.31) 0.95237 
rs78128744 12: 46243406 ARID2 A/G 98.4 1.28 (0.94–1.74) 0.12139 
rs35118262 10: 43600607 RET C/A 6.27 1.12 (0.82–1.52) 0.48617 
rs146242251 22: 41527628 EP300 A/G 3.1 1.12 (0.73–1.73) 0.59991 
rs34172843 13: 28622544 FLT3 T/A 98.1 1.12 (0.77–1.63) 0.52293 
rs138399473 X: 63413082 AMER1 C/T 97.6 1.19 (0.88–1.62) 0.26378 
rs150804738 11: 118375998 KMT2A G/A 98.2 1.22 (0.93–1.61) 0.15249 
rs200567881 6: 33287881 DAXX CCT/- CCT 96.8 0.85 (0.6–1.21) 0.3656 
rs75191113 7: 151859288 KMT2C G/T 97.7 0.97 (0.67–1.39) 0.8569 
rs142179458 14: 62203623 HIF1A G/A 98.1 1.37 (1.01–1.85) 0.04248 
rs3832931 14: 102551276 HSP90AA1 TTT/- — 10.0 1.36 (1.05–1.76) 0.02052 
rs200504060 19: 15290031 NOTCH3 G/A 2.8 2.25 (1.29–3.94) 0.00796 
rs56118985 9: 5044432 JAK2 G/A 5.6 1.81 (1.24–2.64) 0.00331 
rs78004519 7: 151860023 KMT2C A/G 98.5 0.83 (0.54–1.29) 0.4111 
rs75758215 2: 140995843 LRP1B G/A 98.3 0.94 (0.51–1.72) 0.8429 
rs75321043 1: 45800146 MUTYH C/T 97.0 0.97 (0.61–1.54) 0.8923 
rs79777494 1: 45800167 MUTYH G/A 98.3 0.91 (0.57–1.43) 0.6766 
rs150513105 17: 15983784 NCOR1 C/T 2.0 1.3 (0.73–2.32) 0.3728 
rs3782356 12: 49420078 KMT2D C/T 2.6 1.48 (0.88–2.49) 0.23739 
SNPChr: PositionGeneAllelesaRisk alleleFrequency, % (n = 1,516)bOR (95% CI)P
rs20176077 8: 41794797 KAT6A TCT/- — 7.9 1.01 (0.78–1.31) 0.95237 
rs78128744 12: 46243406 ARID2 A/G 98.4 1.28 (0.94–1.74) 0.12139 
rs35118262 10: 43600607 RET C/A 6.27 1.12 (0.82–1.52) 0.48617 
rs146242251 22: 41527628 EP300 A/G 3.1 1.12 (0.73–1.73) 0.59991 
rs34172843 13: 28622544 FLT3 T/A 98.1 1.12 (0.77–1.63) 0.52293 
rs138399473 X: 63413082 AMER1 C/T 97.6 1.19 (0.88–1.62) 0.26378 
rs150804738 11: 118375998 KMT2A G/A 98.2 1.22 (0.93–1.61) 0.15249 
rs200567881 6: 33287881 DAXX CCT/- CCT 96.8 0.85 (0.6–1.21) 0.3656 
rs75191113 7: 151859288 KMT2C G/T 97.7 0.97 (0.67–1.39) 0.8569 
rs142179458 14: 62203623 HIF1A G/A 98.1 1.37 (1.01–1.85) 0.04248 
rs3832931 14: 102551276 HSP90AA1 TTT/- — 10.0 1.36 (1.05–1.76) 0.02052 
rs200504060 19: 15290031 NOTCH3 G/A 2.8 2.25 (1.29–3.94) 0.00796 
rs56118985 9: 5044432 JAK2 G/A 5.6 1.81 (1.24–2.64) 0.00331 
rs78004519 7: 151860023 KMT2C A/G 98.5 0.83 (0.54–1.29) 0.4111 
rs75758215 2: 140995843 LRP1B G/A 98.3 0.94 (0.51–1.72) 0.8429 
rs75321043 1: 45800146 MUTYH C/T 97.0 0.97 (0.61–1.54) 0.8923 
rs79777494 1: 45800167 MUTYH G/A 98.3 0.91 (0.57–1.43) 0.6766 
rs150513105 17: 15983784 NCOR1 C/T 2.0 1.3 (0.73–2.32) 0.3728 
rs3782356 12: 49420078 KMT2D C/T 2.6 1.48 (0.88–2.49) 0.23739 

NOTE: SNPs with P = 0.0042 (0.05/12) are considered significant. Only 12 SNPs satisfy the additive model in this study.

Abbreviation: CI, confidence interval.

aMajor/minor allele of the SNP.

bFrequency of alternate homozygous/heterozygous genotypes in the validation cohort.

The association of 19 SNPs with clinicopathologic parameters (ER, PR, and HER2 status; sporadic breast cancer, and FH) was also investigated (Table 4). The number of cases in each subgroup is listed in Table 4. For rs56118985, only sporadic (per-allele OR = 1.81; 95% CI = 1.22–2.69; P = 0.004495) and PR-negative breast cases (per-allele OR = 2.02; 95% CI = 1.28–3.18; P = 0.00381) showed significant associations with breast cancer risk (Table 4; Supplementary Table S2).

Table 4.

Association of SNPs with different subgroups of breast cancer cases, and their ORs and P value

GroupsCases (n)rs56118985 (JAK2)ars200504060 (NOTCH3)brs142179458 (HIF1A)c
OR (95% CI), POR (95% CI), POR (95% CI), P
All cases 1,516 1.81 (1.24–2.64), 0.00331 2.25 (1.29–3.94), 0.008 1.37 (1.01–1.85) 0.0425 
Sporadic 1,227 1.81 (1.22–2.69), 0.0045 2.45 (1.39–4.33), 0.00233 1.36 (0.98–1.88), 0.0604 
FH 318 1.85 (1.07–3.21), 0.0276 1.47 (0.57–3.79), 0.4376 1.40 (0.83–2.38), 0.4146 
ER+ 1,019 1.71 (1.13–2.59), 0.01614 2.49 (1.38–4.49), 0.00323 1.60 (1.12–2.29), 0.00804 
ER 433 1.88 (1.14–3.10), 0.0166 2.08 (1.00–4.34), 0.07335 1.15 (0.74–1.77), 0.7500 
PR+ 870 1.60 (1.04–2.47), 0.04976 2.23 (1.21–4.10), 0.01498 1.64 (1.12–2.40), 0.00842 
PR 577 2.02 (1.28–3.18), 0.00381 2.61 (1.35–5.01), 0.00753 1.20 (0.81–1.78), 0.6505 
HER2+ 341 1.46 (0.82–2.61), 0.1362 3.31 (1.64–6.69), 0.001175 1.10 (0.68–1.75), 0.9256 
HER2 772 1.90 (1.24–2.93), 0.00536 1.82 (0.96–3.47), 0.08064 1.91 (1.26–2.90), 0.00147 
Triple negative 
 ER and PR and HER2 131 2.25 (1.06–4.74), 0.04973 2.46 (0.81–7.49), 0.1460 1.62 (0.70–3.77), 0.4236 
ER/PR+ HER2 
 (ER+ or PR+) and HER2 639 1.84 (1.17–2.90), 0.00965 1.73 (0.88–3.38), 0.1041 1.98 (1.25–3.12), 0.00185 
 ER+ and PR+ and HER2 493 2.14 (1.34–3.41), 0.00198 2.19 (1.11–4.32), 0.0299 1.83 (1.12–2.99), 0.0103 
GroupsCases (n)rs56118985 (JAK2)ars200504060 (NOTCH3)brs142179458 (HIF1A)c
OR (95% CI), POR (95% CI), POR (95% CI), P
All cases 1,516 1.81 (1.24–2.64), 0.00331 2.25 (1.29–3.94), 0.008 1.37 (1.01–1.85) 0.0425 
Sporadic 1,227 1.81 (1.22–2.69), 0.0045 2.45 (1.39–4.33), 0.00233 1.36 (0.98–1.88), 0.0604 
FH 318 1.85 (1.07–3.21), 0.0276 1.47 (0.57–3.79), 0.4376 1.40 (0.83–2.38), 0.4146 
ER+ 1,019 1.71 (1.13–2.59), 0.01614 2.49 (1.38–4.49), 0.00323 1.60 (1.12–2.29), 0.00804 
ER 433 1.88 (1.14–3.10), 0.0166 2.08 (1.00–4.34), 0.07335 1.15 (0.74–1.77), 0.7500 
PR+ 870 1.60 (1.04–2.47), 0.04976 2.23 (1.21–4.10), 0.01498 1.64 (1.12–2.40), 0.00842 
PR 577 2.02 (1.28–3.18), 0.00381 2.61 (1.35–5.01), 0.00753 1.20 (0.81–1.78), 0.6505 
HER2+ 341 1.46 (0.82–2.61), 0.1362 3.31 (1.64–6.69), 0.001175 1.10 (0.68–1.75), 0.9256 
HER2 772 1.90 (1.24–2.93), 0.00536 1.82 (0.96–3.47), 0.08064 1.91 (1.26–2.90), 0.00147 
Triple negative 
 ER and PR and HER2 131 2.25 (1.06–4.74), 0.04973 2.46 (0.81–7.49), 0.1460 1.62 (0.70–3.77), 0.4236 
ER/PR+ HER2 
 (ER+ or PR+) and HER2 639 1.84 (1.17–2.90), 0.00965 1.73 (0.88–3.38), 0.1041 1.98 (1.25–3.12), 0.00185 
 ER+ and PR+ and HER2 493 2.14 (1.34–3.41), 0.00198 2.19 (1.11–4.32), 0.0299 1.83 (1.12–2.99), 0.0103 

NOTE: SNPs with statistically significant association after Bonferroni correction are highlighted in bold.

aFor rs56118985, the major/minor allele is G/A, and the risk allele is A.

bFor rs200504060, the major/minor allele is G/A, and the risk allele is A.

cFor rs142179458, the major/minor allele is G/A, and the risk allele is G.

For SNP rs200504060, which maps to 19p13.12/NOTCH3, significant association with breast cancer risk were detected in sporadic (per-allele OR = 2.45; 95% CI = 1.39–4.33; P = 0.002331), ER-positive (per-allele OR = 2.49; 95% CI = 1.38–4.49; P = 0.00323) and HER2-positive breast cancer (per-allele OR = 3.31; 95% CI = 1.64–6.69; P = 0.001175; Table 4; Supplementary Table S3).

Another SNP, rs142179458, located at 14q23.2 within the HIF1A gene was identified to be specifically associated with breast cancer risk only in HER2-negative cases (per-allele OR = 1.92; 95% CI = 1.26–2.90; P = 0.00147; Table 4; Supplementary Table S4). Because of the smaller number of cases in some of the subgroups, wider CIs were observed. This could be attributed to the smaller number of samples present within each subgroup, suggesting a lack of sufficient statistical power. Further studies with larger samples sizes of the different hormone receptor subgroups should be carried out to confirm our findings.

The neural network algorithm of ConSurf was used to predict a conservation score for each amino acid with the conservation scale ranging from 1 to 9 (a score of 1 being “variable” to 9 being “highly conserved”). The amino acid at position 127 of the JAK2 protein (SNP: rs56118985) had a score of between 6 and 7 (moderately conserved), and is predicted to be an exposed residue based on the algorithm. However, the amino acid residues at position 1175 of NOTCH3 (SNP: rs200504060) and at position 349 of HIF1A (SNP: rs142179458) both had scores of 3 (likely less conserved). In addition, multiple sequence alignment based on the ClustalOmega method showed the residues of interest to be conserved across different species for all three sites (Fig. 2A–C).

Figure 2.

Multiple sequence alignment of rs56118985 (A), rs200504060 (B), and rs142179458 (C) showing conservation of the amino acid at the mutation sites (arrowed) across species.

Figure 2.

Multiple sequence alignment of rs56118985 (A), rs200504060 (B), and rs142179458 (C) showing conservation of the amino acid at the mutation sites (arrowed) across species.

Close modal

Analysis of protein stability for the JAK2 SNP rs56118985 predicted a decrease in stability for the G127D mutation by I-Mutant2.0 and INPS-MD with free energy values (DDG) of –0.07 and –0.1836 Kcal/mol, respectively. The HOPE server analysis of the JAK2 SNP reported the mutant residue to be larger than the wild-type. In addition, the mutation alters the charge from neutral to negative with increased hydrophobicity. Likewise, the SNP rs200504060, that causes the change of amino acid residue “R” at position 1175 to “W” in the NOTCH3 protein results in alteration of the amino acid charge from positive to neutral with the mutant residue being larger as reported by the HOPE server. SNP rs200504060 caused reduced NOTCH3 protein stability as predicted by the tools, I-Mutant2.0 and INPS-MD with DDG values of –0.35 and –0.6867 Kcal/mol, respectively. For the SNP rs142179458 within the HIF1A gene, decreased protein stability was predicted by I-Mutant2.0 and INPS-MD with DDG values of –0.28 and –0.5563 Kcal/mol, respectively, due to the D349N mutation. In addition, analysis by the HOPE server reported that the amino acid charge changed from negative to neutral, thereby likely disturbing its function due to differences in amino acid properties (Table 5).

Table 5.

Prediction of protein stability for SNPs rs56118985, rs200504060, and rs142179458

Prediction of protein stability for SNPs rs56118985, rs200504060, and rs142179458
Prediction of protein stability for SNPs rs56118985, rs200504060, and rs142179458

Common variants associated with breast cancer risk have been identified from several GWAS. We report here the first study utilizing targeted next-generation sequencing to identify breast cancer risk loci. Through high-throughput SNP genotyping of 24 selected SNPs, three novel coding variants were found to be significantly associated with breast cancer risk in Chinese.

We have detected a novel SNP, rs56118985, associated with breast cancer risk via an additive model in a Singaporean Chinese population. Rs56118985 has been reported to be associated with acute leukemia and acute myeloid leukemia in a single study done in Chinese (28). No other association or functional studies have been done on rs56118985. Rs56118985 is located in the coding region of the JAK2 gene on 9p24.1. The JAK–STAT signaling pathway plays a role in proliferation, differentiation, and apoptosis, and has been implicated in tumorigenesis and cancer development (29). Germline JAK2 mutations have been most commonly associated with myeloproliferative neoplasms (30).

The rs56118985 variant allele (MAF = 0.0161) was only present in East Asians in a study that characterized germline variations in 158 cancer susceptibility genes by analyzing whole genome sequences of 681 healthy individuals of diverse ethnicities (31). MAFs of rs56118985 from 1000 Genomes Project and ExAC databases were also higher in East Asian populations as compared with the general population (0.015 vs. 0.004 and 0.018 vs. 0.002, respectively). Specifically looking at a database comprising of 765 healthy Singaporean individuals (http://beacon.prism-genomics.org/), we found the MAF of this particular variant to be 0.016. Taken together, this suggests that this particular SNP has an increased MAF in East Asian populations. It has been well established that genetic variants identified could have allele frequencies that differ among different ethnicities, and could confer varying degrees of disease susceptibility (32). Association studies carried out on a single population may not always be applicable to other populations, emphasizing the importance of the role of ethnicity in these studies. GWAS have been highly successful in identifying risk loci that contribute to breast cancer susceptibility (1–4). However, the majority of these studies have been performed in European populations, and risk loci identified do not always apply to Asian populations. The approach described in this current study has successfully demonstrated the feasibility of identifying novel breast cancer risk loci using next-generation sequencing, and this approach could be extended to other populations and cancers.

We also identified another SNP, rs200504060, which was specifically associated with HER2-positive breast cancer. Rs200504060 resides in the coding region of NOTCH3 on 19p13.2. The Notch family of proteins are highly conserved and are crucial in development. They are involved in signaling pathways that controls cell fate by influencing proliferation, differentiation, and apoptosis (33). It has been suggested that Notch signaling drives the proliferation of epithelial cells during mammary gland development and prevents their terminal differentiation (34). Thus, the upregulation of Notch signaling may then lead to breast tumorigenesis (34). The role of NOTCH3 in the development of breast cancer has also been established (35, 36). In a study investigating the relationship between HER2 and NOTCH3 in DCIS (37), it was found that their expression levels are directly correlated and the upregulation and activation of the two pathways may contribute to the progression to invasive breast carcinomas. This could suggest why the association of rs200504060 with breast cancer risk was only found specifically in HER2-positive breast cancer cases. The role of HER2 and NOTCH3 together in the development of breast cancer should be further elucidated.

A third SNP, rs142179458, was found to be significantly associated with HER2-negative breast cancer. Rs142179458 is located in the coding region of HIF1A on chromosome 14q23, and encodes for the alpha subunit of Hypoxia-Inducible Factor 1 (HIF1). A number of studies have established associations between HIF1A polymorphisms with disease phenotypes including cancer (38). However, none have identified rs142179458. The overexpression of HIF1A is brought about by decreased levels of cellular oxygen (39), and functional HIF1 activates the transcription of genes that allow cells to adapt to these hypoxic conditions. HIF1A has been found to be highly expressed in many solid tumors, and its role in promoting angiogenesis and metastasis has also been established (40). In breast cancer, levels of HIF1A have been found to be directly correlated to stage of cancer progression (41), and could potentially be used as a marker of poor prognosis (42). High expression levels of HIF1A have been associated with HER2-positive breast cancers, whereby studies have demonstrated its role in further promoting cancer progression and resistance to therapy (43–45).

Numerous genetic variants are identified through sequencing studies, resulting in investigators having the challenge of determining the clinical significance of these variants with regard to the disease. Conducting functional experiments is one way to determine whether these variants are deleterious but due to their time consuming and laborious nature they are not often carried out. Although there are databases like ClinVar, HGMD, dbSNP, and OMIM that have cataloged the functional effect of previously reported variants, information is often not available for novel variants detected through NGS. To overcome this, prediction tools based on the theoretical knowledge and features like nucleotide or amino acid conservation and biochemical properties of amino acids have been built to classify these variants as benign, likely benign, pathogenic, likely pathogenic, or of unknown significance. However, these prediction programs are based on different methods and datasets, and their interpretations vary (46). To address this limitation, tools like CADD (17) integrate the prediction of different individual methods and produce a score that classifies the variant as either benign or deleterious. In our study, we employed commonly used tools like SIFT (13), PolyPhen (14), MutationTaster (15), MutationAssessor (16), and as well as CADD (17) to determine the pathogenicity of the variants. Utilizing multiple tools in determining the nature of the variants could provide a better estimation of the variant's effect on the protein. Although in silico prediction tools can suggest associations of variants in relation to disease, conclusive evidence on the pathogenicity of a variant are best drawn from functional studies (47).

Rs56118985, rs200504060, and rs142179458 reported in this current study are low-frequency (1% < MAF ≤ 5%) coding variants. There is currently limited data on the role of low-frequency variants and their contribution to the missing heritability in cancer. However, recent studies suggest that low-frequency and rare variants may have larger effect sizes than common variants (48). Low-frequency missense variants associated with lung cancer risk or epithelial ovarian cancer risk have been identified using genotyping arrays (49, 50), providing evidence that such variants are relevant in cancer susceptibility.

We interrogated several publicly available databases [TCGA (www.cbioportal.org), COSMIC (http://cancer.sanger.ac.uk/cosmic), ICGC (https://dcc.icgc.org/), LOVD (http://www.lovd.nl/3.0/), Intogen (https://www.intogen.org/), and DoCM (http://docm.genome.wustl.edu/)] for the variants rs56118985, rs200405060, and rs142179548 but these variants were not reported in any of these databases except for rs200504060. The variant rs200405060 was detected in only one lung cancer sample out of 14 tumor–normal matched samples from lung carcinoma patients that were exome sequenced (COSMIC). Possible reasons for the low frequency or lack of detection of these variants could be because (i) the variants could have been filtered out during the variant filtering and prioritization process; (ii) these variants are uncommon in Caucasian populations. Further studies are warranted in additional diverse populations to determine the frequency of the three variants identified in this current study.

One limitation of this study is that the cases and controls in the Validation cohort are not age matched. As the controls are about a decade younger than the cases in the Validation cohort, there is the possibility that some of the controls may develop cancer in the future, and thus in the future would be categorized under “cases.” Hence, this may suggest that the ORs reported here could be potentially lower than if the cases and controls are age matched. We had performed a preliminary data analysis of our Validation cohort with 866 cases and 886 controls that were age matched, and had observed that the 3 SNPs of interest (rs56118985 (JAK2), rs200504060 (NOTCH3), and rs142179458 (HIF1A) were significant at P < 0.05 (data not shown). With more samples added (total of 1,516 cases and 1,189 controls), these 3 SNPs remained significant even with a more stringent cutoff after Bonferroni correction. It is likely that adding more age matched samples would yield similar results.

In summary, we identified variants in JAK2, NOTCH3, and HIF1A that are associated with breast cancer in Chinese through a novel strategy utilizing data derived from targeted sequencing. Additional studies in other populations are warranted to determine whether these variants are associated with breast cancer risk in other ethnicities. Our findings suggest that through the filtering pipeline described here, additional risk loci associated with cancer could be discovered from next-generation sequencing data.

M.-H. Tan is a CEO and Medical Director at Lucence Diagnostics Pte. Ltd. No potential conflicts of interest were disclosed by the other authors.

Conception and design: A.S.G. Lee

Development of methodology: P. Munusamy, A.S.G. Lee

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.H.T. Chan, S.Y. Loke, G.L. Koh, E.S.Y. Wong, H.Y. Law, M.-H. Tan, Y.S. Yap, P. Ang

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.H.T. Chan, P. Munusamy, Y.S. Yap, A.S.G. Lee

Writing, review, and/or revision of the manuscript: C.H.T. Chan, P. Munusamy, S.Y. Loke, G.L. Koh, E.S.Y. Wong, C.S. Yoon, Y.S. Yap, P. Ang, A.S.G. Lee

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H.Y. Law, C.S. Yoon

Study supervision: A.S.G. Lee

The authors are grateful to the volunteers who have participated in the study. The authors also thank Dr. C.Y. Wong, Dr. W.S. Yong, Dr. N.S. Wong, Dr. R. Ng, Dr. K.W. Ong, Dr. P. Madhukumar, Dr. C.L. Oey, and Dr. G.H. Ho, for referring patients for the study.

This study was supported by a grant from the National Medical Research Council (NMRC) of Singapore (NMRC/CBRG/0034/2013) awarded to A.S.G. Lee and by Centre Grant NMRC support to the National Cancer Centre of Singapore.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Couch
FJ
,
Wang
X
,
McGuffog
L
,
Lee
A
,
Olswold
C
,
Kuchenbaecker
KB
, et al
Genome-wide association study in BRCA1 mutation carriers identifies novel loci associated with breast and ovarian cancer risk
.
PLoS Genet
2013
;
9
:
e1003212
.
2.
Lindstrom
S
,
Thompson
DJ
,
Paterson
AD
,
Li
J
,
Gierach
GL
,
Scott
C
, et al
Genome-wide association study identifies multiple loci associated with both mammographic density and breast cancer risk
.
Nat Commun
2014
;
5
:
5303
.
3.
Michailidou
K
,
Beesley
J
,
Lindstrom
S
,
Canisius
S
,
Dennis
J
,
Lush
MJ
, et al
Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer
.
Nat Genet
2015
;
47
:
373
80
.
4.
Michailidou
K
,
Hall
P
,
Gonzalez-Neira
A
,
Ghoussaini
M
,
Dennis
J
,
Milne
RL
, et al
Large-scale genotyping identifies 41 new loci associated with breast cancer risk
.
Nat Genet
2013
;
45
:
353
61
.
5.
Manolio
TA
. 
Genomewide association atudies and assessment of the risk of disease
.
N Engl J Med
2010
;
363
:
166
76
.
6.
Sherry
ST
,
Ward
MH
,
Kholodov
M
,
Baker
J
,
Phan
L
,
Smigielski
EM
, et al
dbSNP: the NCBI database of genetic variation
.
Nucleic Acids Res
2001
;
29
:
308
11
.
7.
Sachidanandam
R
,
Weissman
D
,
Schmidt
SC
,
Kakol
JM
,
Stein
LD
,
Marth
G
, et al
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms
.
Nature
2001
;
409
:
928
33
.
8.
Wong
ES
,
Shekar
S
,
Met-Domestici
M
,
Chan
C
,
Sze
M
,
Yap
YS
, et al
Inherited breast cancer predisposition in Asians: multigene panel testing outcomes from Singapore
.
NPJ Genomic Medicine
2016
;
1
:
15003
.
9.
Chan
M
,
Chan
MW
,
Loh
TW
,
Law
HY
,
Yoon
CS
,
Than
SS
, et al
Evaluation of nanofluidics technology for high-throughput SNP genotyping in a clinical setting
.
J Mol Diagn
2011
;
13
:
305
12
.
10.
Li
H
,
Durbin
R
. 
Fast and accurate long-read alignment with Burrows-Wheeler transform
.
Bioinformatics
2010
;
26
:
589
95
.
11.
McKenna
A
,
Hanna
M
,
Banks
E
,
Sivachenko
A
,
Cibulskis
K
,
Kernytsky
A
, et al
The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data
.
Genome Res
2010
;
20
:
1297
303
.
12.
Wang
K
,
Li
M
,
Hakonarson
H
. 
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
2010
;
38
:
e164
.
13.
Sim
N-L
,
Kumar
P
,
Hu
J
,
Henikoff
S
,
Schneider
G
,
Ng
PC
. 
SIFT web server: predicting effects of amino acid substitutions on proteins
.
Nucleic Acids Res
2012
;
40
:
W452
7
.
14.
Adzhubei
I
,
Jordan
DM
,
Sunyaev
SR
. 
Predicting functional effect of human missense mutations using PolyPhen-2
.
Curr Protoc Hum Genet
2013
;
Chapter 7
:
Unit 20
.
15.
Schwarz
JM
,
Cooper
DN
,
Schuelke
M
,
Seelow
D
. 
MutationTaster2: mutation prediction for the deep-sequencing age
.
Nat Methods
2014
;
11
:
361
2
.
16.
Frousios
K
,
Iliopoulos
CS
,
Schlitt
T
,
Simpson
MA
. 
Predicting the functional consequences of non-synonymous DNA sequence variants — evaluation of bioinformatics tools and development of a consensus strategy
.
Genomics
2013
;
102
:
223
8
.
17.
Kircher
M
,
Witten
DM
,
Jain
P
,
O'Roak
BJ
,
Cooper
GM
,
Shendure
J
. 
A general framework for estimating the relative pathogenicity of human genetic variants
.
Nat Genet
2014
;
46
:
310
5
.
18.
The Genomes Project Consortium
. 
A global reference for human genetic variation
.
Nature
2015
;
526
:
68
74
.
19.
Lek
M
,
Karczewski
KJ
,
Minikel
EV
,
Samocha
KE
,
Banks
E
,
Fennell
T
, et al
Analysis of protein-coding genetic variation in 60,706 humans
.
Nature
2016
;
536
:
285
91
.
20.
Pollard
KS
,
Hubisz
MJ
,
Rosenbloom
KR
,
Siepel
A
. 
Detection of nonneutral substitution rates on mammalian phylogenies
.
Genome Res
2010
;
20
:
110
21
.
21.
Chan
M
,
Ji
SM
,
Liaw
CS
,
Yap
YS
,
Law
HY
,
Yoon
CS
, et al
Association of common genetic variants with breast cancer risk and clinicopathological characteristics in a Chinese population
.
Breast Cancer Res Treat
2012
;
136
:
209
20
.
22.
Gonzalez
JR
,
Armengol
L
,
Sole
X
,
Guino
E
,
Mercader
JM
,
Estivill
X
, et al
SNPassoc: an R package to perform whole genome association studies
.
Bioinformatics
2007
;
23
:
644
5
.
23.
Bodenhofer
U
,
Bonatesta
E
,
Horejs-Kainrath
C
,
Hochreiter
S
. 
msa: an R package for multiple sequence alignment
.
Bioinformatics
2015
;
31
:
3997
9
.
24.
Sievers
F
,
Wilm
A
,
Dineen
D
,
Gibson
TJ
,
Karplus
K
,
Li
W
, et al
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
.
Mol Syst Biol
2011
;
7
:
539
.
25.
Capriotti
E
,
Fariselli
P
,
Casadio
R
. 
I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure
.
Nucleic Acids Res
2005
;
33
:
W306
10
.
26.
Savojardo
C
,
Fariselli
P
,
Martelli
PL
,
Casadio
R
. 
INPS-MD: a web server to predict stability of protein variants from sequence and structure
.
Bioinformatics
2016
;
32
:
2542
44
.
27.
Venselaar
H
,
Te Beek
TA
,
Kuipers
RK
,
Hekkelman
ML
,
Vriend
G
. 
Protein structure analysis of mutations causing inheritable diseases. An e-Science approach with life scientist friendly interfaces
.
BMC Bioinformatics
2010
;
11
:
548
.
28.
Zhong
Y
,
Wu
J
,
Ma
R
,
Cao
H
,
Wang
Z
,
Ding
J
, et al
Association of Janus kinase 2 (JAK2) polymorphisms with acute leukemia susceptibility
.
Int J Lab Hematol
2012
;
34
:
248
53
.
29.
Thomas
SJ
,
Snowden
JA
,
Zeidler
MP
,
Danson
SJ
. 
The role of JAK/STAT signalling in the pathogenesis, prognosis and treatment of solid tumours
.
Br J Cancer
2015
;
113
:
365
71
.
30.
Hinds
DA
,
Barnholt
KE
,
Mesa
RA
,
Kiefer
AK
,
Do
CB
,
Eriksson
N
, et al
Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms
.
Blood
2016
;
128
:
1121
8
.
31.
Bodian
DL
,
McCutcheon
JN
,
Kothiyal
P
,
Huddleston
KC
,
Iyer
RK
,
Vockley
JG
, et al
Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing
.
PLoS One
2014
;
9
:
e94554
.
32.
Henderson
BE
,
Lee
NH
,
Seewaldt
V
,
Shen
H
. 
The influence of race and ethnicity on the biology of cancer
.
Nat Rev Cancer
2012
;
12
:
648
53
.
33.
Artavanis-Tsakonas
S
,
Rand
MD
,
Lake
RJ
. 
Notch signaling: cell fate control and signal integration in development
.
Science
1999
;
284
:
770
6
.
34.
Farnie
G
,
Clarke
RB
. 
Mammary stem cells and breast cancer–role of Notch signalling
.
Stem Cell Rev
2007
;
3
:
169
75
.
35.
Choy
L
,
Hagenbeek
T
,
Solon
M
,
French
DM
,
Finkle
D
,
Shelton
A
, et al
Constitutive NOTCH3 signaling promotes the growth of basal breast cancers
.
Cancer Res
2017
;
77
:
1439
52
.
36.
Zhang
Z
,
Wang
H
,
Ikeda
S
,
Fahey
F
,
Bielenberg
D
,
Smits
P
, et al
Notch3 in human breast cancer cell lines regulates osteoblast-cancer cell interactions and osteolytic bone metastasis
.
Am J Pathol
2010
;
177
:
1459
69
.
37.
Pradeep
C-R
,
Köstler
WJ
,
Lauriola
M
,
Granit
R
,
Zhang
F
,
Jacob-Hirsch
J
, et al
Modeling ductal carcinoma in situ: a HER2-Notch3 collaboration enables luminal filling
.
Oncogene
2012
;
31
:
907
917
.
38.
Gladek
I
,
Ferdin
J
,
Horvat
S
,
Calin
GA
,
Kunej
T
. 
HIF1A gene polymorphisms and human diseases: graphical review of 97 association studies
.
Genes Chromosomes Cancer
2017
;
56
:
439
52
.
39.
Semenza
GL
. 
Regulation of mammalian O2 homeostasis by hypoxia-inducible factor 1
.
Annu Rev Cell Dev Biol
1999
;
15
:
551
78
.
40.
Zhong
H
,
De Marzo
AM
,
Laughner
E
,
Lim
M
,
Hilton
DA
,
Zagzag
D
, et al
Overexpression of hypoxia-inducible factor 1alpha in common human cancers and their metastases
.
Cancer Res
1999
;
59
:
5830
5
.
41.
Bos
R
,
Zhong
H
,
Hanrahan
CF
,
Mommers
ECM
,
Semenza
GL
,
Pinedo
HM
, et al
Levels of hypoxia-inducible factor-1α during breast carcinogenesis
.
J Nat Cancer Inst
2001
;
93
:
309
14
.
42.
Generali
D
,
Berruti
A
,
Brizzi
MP
,
Campo
L
,
Bonardi
S
,
Wigfield
S
, et al
Hypoxia-inducible factor-1alpha expression predicts a poor response to primary chemoendocrine therapy and disease-free survival in primary human breast cancer
.
Clin Cancer Res
2006
;
12
:
4562
8
.
43.
Whelan
KA
,
Schwab
LP
,
Karakashev
SV
,
Franchetti
L
,
Johannes
GJ
,
Seagroves
TN
, et al
The oncogene HER2/neu (ERBB2) requires the hypoxia-inducible factor HIF-1 for mammary tumor growth and anoikis resistance
.
J Biol Chem
2013
;
288
:
15865
77
.
44.
Karakashev
SV
,
Reginato
MJ
. 
Hypoxia/HIF1alpha induces lapatinib resistance in ERBB2-positive breast cancer cells via regulation of DUSP2
.
Oncotarget
2015
;
6
:
1967
80
.
45.
Laughner
E
,
Taghavi
P
,
Chiles
K
,
Mahon
PC
,
Semenza
GL
. 
HER2 (neu) signaling increases the rate of hypoxia-inducible factor 1alpha (HIF-1alpha) synthesis: novel mechanism for HIF-1-mediated vascular endothelial growth factor expression
.
Mol Cell Biol
2001
;
21
:
3995
4004
.
46.
Walters-Sen
LC
,
Hashimoto
S
,
Thrush
DL
,
Reshmi
S
,
Gastier-Foster
JM
,
Astbury
C
, et al
Variability in pathogenicity prediction programs: impact on clinical diagnostics
.
Mol Genet Genomic Med
2015
;
3
:
99
110
.
47.
Miosge
LA
,
Field
MA
,
Sontani
Y
,
Cho
V
,
Johnson
S
,
Palkova
A
, et al
Comparison of predicted and actual consequences of missense mutations
.
Proc Natl Acad Sci
2015
;
112
:
E5189
98
.
48.
Gibson
G
. 
Rare and common variants: twenty arguments
.
Nat Rev Genet
2012
;
13
:
135
45
.
49.
Permuth
JB
,
Pirie
A
,
Ann Chen
Y
,
Lin
HY
,
Reid
BM
,
Chen
Z
, et al
Exome genotyping arrays to identify rare and low frequency variants associated with epithelial ovarian cancer risk
.
Hum Mol Genet
2016
;
25
:
3600
12
.
50.
Jin
G
,
Zhu
M
,
Yin
R
,
Shen
W
,
Liu
J
,
Sun
J
, et al
Low-frequency coding variants at 6p21.33 and 20q11.21 are associated with lung cancer risk in Chinese populations
.
Am J Hum Genet
2015
;
96
:
832
40
.