Abstract
The application of next-generation sequencing (NGS) technologies in cancer research has accelerated the discovery of somatic mutations; however, progress in the identification of germline variation associated with cancer risk is less clear. We conducted a systematic literature review of cancer genetic susceptibility studies that used NGS technologies at an exome/genome-wide scale to obtain a fuller understanding of the research landscape to date and to inform future studies. The variability across studies on methodologies and reporting was considerable. Most studies sequenced few high-risk (mainly European) families, used a candidate analysis approach, and identified potential cancer-related germline variants or genes in a small fraction of the sequenced cancer cases. This review highlights the importance of establishing consensus on standards for the application and reporting of variants filtering strategies. It also describes the progress in the identification of cancer-related germline variation to date. These findings point to the untapped potential in conducting studies with appropriately sized and racially diverse families and populations, combining results across studies and expanding beyond a candidate analysis approach to advance the discovery of genetic variation that accounts for the unexplained cancer heritability.
Introduction
Since 2005, the volume of publications enabled by sequencing approaches has grown at an astonishing rate. Several reviews have described the sequencing technology platforms (1) and advancements made in next-generation sequencing (NGS) over the past decade (2). The widespread availability of NGS technologies, including whole genome sequencing (WGS) and whole exome sequencing (WES), has not only led to its applications in cancer research, but also for use in the clinical setting (3, 4).
Use of NGS has accelerated the discovery of somatic mutations (5) and germline mutations in Mendelian diseases (6). Approximately 60% of Mendelian disease projects have successfully identified disease gene mutations (7) using sequencing technologies, improving upon classical approaches for gene discovery (e.g., linkage analysis). In addition, the application of NGS technologies is revealing complex somatic mutational signatures associated with different types of cancer, a disease that is, by definition, a result of somatic mutations (8, 9).
Given these successes, there is hope that sequencing studies may also aid the identification of genes accounting for the expected cancer heritability. The estimated cancer heritability is 33% for overall cancer (10) and its unexplained component has remained high, where a study by Susswein and colleagues (11) reported that 91% of cancer cases tested negative for known mutations in a large gene-panel testing study. In addition to the heritability hidden in current array-based studies and likely detectable with larger sample sizes, it has been hypothesized that the missing familial heritability may reside in rare variants of high or moderate/low penetrance that are potentially tractable by NGS technologies (12). As NGS technologies continue to evolve, NGS will play an increasing role in cancer research in the foreseeable future.
We conducted a systematic literature review and evaluated the degree of success and limitations in identifying germline cancer susceptibility variants using NGS technologies at a genome-wide scale, that is, through WES and WGS, with the goal of learning from past efforts and obtaining a fuller understanding of the NGS-related research landscape to date. Given the transition of genomic discovery research from candidate genes (historically of limited success) to WES/WGS studies, the high cost of WES/WGS methods, and their specific challenge with sifting through millions of variants, this review focuses on the effectiveness of WES/WGS studies, and not on other NGS gene-targeted approaches, to identify novel variants and genes involved in cancer risk. This review provides selected study-related characteristics, technologies, and methodologic details for 186 WES/WGS-related publications with the goal of informing the design of future studies. We also discuss the research needs and opportunities that could further advance the discovery of cancer susceptibility genes or variants. It should be noted that, although the reviewed articles were not selected on the basis of their focus on rare versus common variants nor on their focus on low versus moderate/high penetrant variants, more cost-efficient approaches, based on genome-wide genotyping assays, exist to study common variants, while NGS technologies are necessary to study rare variants.
Methodology
We followed the methodology for systematic literature review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (13).
Search strategy
For this report, PubMed and Embase were searched for the period between January 1, 2000 through December 31, 2018, using various search terms for exome/genome sequencing and germline susceptibility and cancer (see Supplementary Fig. S1). Both searches were restricted to articles written in English. The references of the reviewed articles were also checked for the presence of additional in scope articles that may have been missed by the above keyword searches.
Exclusion/inclusion criteria
Articles were included in this literature review if they had used genome wide (i.e., whole exome or whole genome) sequencing to generate germline DNA data in at least two cancer cases (even if within the same family) with the purpose of identifying cancer susceptibility genes or variants (see exclusion criteria in Supplementary Fig. S1). We excluded articles that sequenced only one cancer case because they were often case reports without a variant/gene identification research focus. Notably, we did not exclude publications that restricted the analysis to a priori or candidate genetic regions (referred in this manuscript as “candidate analysis approach”), due to the large methodologic diversity across such studies; instead, we opted to include all studies that used WES/WGS independent of their use of a candidate analytic approach, and captured the prior knowledge used to select variants/genes to allow for sensitivity analysis by candidate analysis approach. The eligibility of each abstract/full-text article was assessed independently in a standardized manner by three reviewers. A fourth reviewer confirmed all inclusions and performed quality control on one third of the exclusions. The exclusion criteria were applied in no particular order, and the first reason noted by the coder was recorded, even if an article could be excluded for multiple reasons.
Data abstraction
We looked for four broad components or phases in each article: (i) a required discovery phase and optional phases of (ii) technical validation, (iii) independent replication, or (iv) functional evaluation.
(i) The discovery phase refers to the component of the article where germline DNA of more than one cancer case (and possibly controls) were sequenced by WES or WGS with the goal of identifying cancer susceptibility variants and genes.
(ii) The technical validation attempts to confirm some of the variants observed in the discovery phase using an alternative sequencing technology.
(iii) The independent replication phase attempts to replicate some of the variants or genes observed in the discovery phase in independent cases.
(iv) The functional evaluation phase characterizes through in silico and/or laboratory functional experiments some of the variants or genes observed in the discovery phase.
Data abstracted (see list and definitions of coding fields in Table 1; Supplementary Table S1) included publication and general study information, numbers, and characteristics of cases, controls, and families used in each phase, sequencing technique, data filtering and analysis methods, in silico and experimental functional assessment, and key conclusions. We extracted information on family history of cancer, early age at diagnosis and/or multiple primaries, which are the National Comprehensive Cancer Network (NCCN) guidelines criteria for genetic cancer risk assessment. Of note, we defined “familial studies” as those studies that sequenced samples from cancer cases (here referred to as “familial cases”) belonging to a family in which multiple cancer cases of the studied cancer type had been diagnosed (also referred to in the literature as “high-risk families”). We defined “unselected studies” as those studies that sequenced samples from cancer cases (here referred to as “unselected cases”) who were unselected for family history of cancer (also referred to in the literature as “sporadic cases”). For each authors' selected variants and genes, we recorded the nomenclature and minor allele frequency (MAF), as reported by the authors, the number of families for which cancer status fully segregated or not with the variant, the number of unselected or familial cases, unaffected relatives, or unrelated controls that carried the variant, and the number of families, familial cases, unselected cases, or controls carrying any prioritized variant in the same gene. For quality control purposes, 50% of the articles were also reviewed by a second abstractor and discrepancies between coders were resolved by consensus. Information in the articles that was unclear was coded on the basis of best interpretation of the reviewers.
Topic . | Coding fields . |
---|---|
Publication | Pubmed ID; Journal; Year; Author; Title; Abstract |
Study general | Goal; Study design; Source of individuals; Cancer type; Ethnicity; Sequencing center; Data repository; |
Number of individuals sequenced in discovery | Number of cases, controls, families, and cases per family sequenced in discovery phase |
Sequencing technique | Samples type; Exome and/or genome; Capture kit; Sequencer; Coverage/depth |
Processing of raw data | Aligner; Reference genome; Variant caller and calling quality control; Annotation software and sources |
Technical validation | Yes/no; Validation technology; Number of cases, controls, families, and cases per family sequenced in validation phase; variants/genes validated |
Independent replication | Yes/no; Replication technology and analysis; Number of cases, controls, families, and cases per family sequenced in replication phase; variants/genes replicated |
Functional validation | In silico functional analyses; Experimental functional study |
Variants and genes data analysis | Candidate analysis approach; Filtering strategy overall; Analytical methods |
Variants and genes identification | Yes/no; Identified variants and genes; Number of cases, controls, families carrying the identified variants and genes |
Authors comments and conclusions | Challenges; Suggested next steps; Conclusions |
Derived filtering criteria/categories shown in Fig. 1 | f_1: Variant passing quality control metrics |
f_2: Heterozygous variant | |
f_3: Homozygous variant | |
f_4: Variant located in a coding region | |
f_5: Nonsynonymous or splice variant | |
f_6: Variant damaging according to in silico algorithms | |
f_7: Truncating variant | |
f_8: Variant altering protein properties according to molecular modeling | |
f_9: Not hypervariable gene | |
f_10: Variant absent from minor allele frequency (MAF) databases | |
f_11: Variant rare in MAF databases | |
f_12: Variant segregating with disease status in the family | |
f_13: Variant present in multiple families or independent cases | |
f_14: Variant enriched in cases compared to controls | |
f_15: Gene mutated in multiple families or independent cases | |
f_16: Gene enriched in cases compared to controls | |
f_17: Variant present in disease-related databases | |
f_18: Gene known to be linked to disease | |
f_19: Genetic region known to be linked to disease | |
f_20: Biological or molecular pathway known to be linked to disease | |
f_21: Pathway analysis indicating a gene–disease link | |
f_22: Variant confirmed through technical validation | |
f_23: Variant or gene replicating in independent cases | |
f_24: Variant loss of heterozygosity (LOH) observed in tumor | |
f_25: Relevant somatic mutations observed in tumor | |
f_26: Gene–disease link supported by functional experiment | |
f_27: Variant splicing supported by experiment | |
f_28: Variant–disease link supported by functional experiment |
Topic . | Coding fields . |
---|---|
Publication | Pubmed ID; Journal; Year; Author; Title; Abstract |
Study general | Goal; Study design; Source of individuals; Cancer type; Ethnicity; Sequencing center; Data repository; |
Number of individuals sequenced in discovery | Number of cases, controls, families, and cases per family sequenced in discovery phase |
Sequencing technique | Samples type; Exome and/or genome; Capture kit; Sequencer; Coverage/depth |
Processing of raw data | Aligner; Reference genome; Variant caller and calling quality control; Annotation software and sources |
Technical validation | Yes/no; Validation technology; Number of cases, controls, families, and cases per family sequenced in validation phase; variants/genes validated |
Independent replication | Yes/no; Replication technology and analysis; Number of cases, controls, families, and cases per family sequenced in replication phase; variants/genes replicated |
Functional validation | In silico functional analyses; Experimental functional study |
Variants and genes data analysis | Candidate analysis approach; Filtering strategy overall; Analytical methods |
Variants and genes identification | Yes/no; Identified variants and genes; Number of cases, controls, families carrying the identified variants and genes |
Authors comments and conclusions | Challenges; Suggested next steps; Conclusions |
Derived filtering criteria/categories shown in Fig. 1 | f_1: Variant passing quality control metrics |
f_2: Heterozygous variant | |
f_3: Homozygous variant | |
f_4: Variant located in a coding region | |
f_5: Nonsynonymous or splice variant | |
f_6: Variant damaging according to in silico algorithms | |
f_7: Truncating variant | |
f_8: Variant altering protein properties according to molecular modeling | |
f_9: Not hypervariable gene | |
f_10: Variant absent from minor allele frequency (MAF) databases | |
f_11: Variant rare in MAF databases | |
f_12: Variant segregating with disease status in the family | |
f_13: Variant present in multiple families or independent cases | |
f_14: Variant enriched in cases compared to controls | |
f_15: Gene mutated in multiple families or independent cases | |
f_16: Gene enriched in cases compared to controls | |
f_17: Variant present in disease-related databases | |
f_18: Gene known to be linked to disease | |
f_19: Genetic region known to be linked to disease | |
f_20: Biological or molecular pathway known to be linked to disease | |
f_21: Pathway analysis indicating a gene–disease link | |
f_22: Variant confirmed through technical validation | |
f_23: Variant or gene replicating in independent cases | |
f_24: Variant loss of heterozygosity (LOH) observed in tumor | |
f_25: Relevant somatic mutations observed in tumor | |
f_26: Gene–disease link supported by functional experiment | |
f_27: Variant splicing supported by experiment | |
f_28: Variant–disease link supported by functional experiment |
Results
Article selection
The search yielded a total of 6,339 unique articles (see PRISMA flowchart in Supplementary Fig. S1) that were evaluated for inclusion and exclusion criteria (Supplementary Table S2). After full-text review, 186 articles met the inclusion criteria and are listed with the derived coding variables in Supplementary Tables S1 and S3. The distribution of the 186 reviewed articles by publication year shows an increase from 2 to 40 articles per year between 2011 and 2015, followed by a plateau in 2016 to 2018 (Supplementary Fig. S2).
Study design and population characteristics
In the discovery phase, 86% of articles used familial cases (11% of which were in combination with early age of onset and/or unselected cases), 12% of studies were conducted in unselected cases, and 2% of articles used early age of onset cases. Fifty-five percent of the studies included controls in the discovery phase, of which 67% were unaffected relatives of the cancer cases. Fifty-seven percent of the reviewed articles also attempted some type of replication in an independent group of cancer cases. However, only 17% of the replication phase used the same study design as the discovery phase (Supplementary Fig. S3). For example, 16% of the family-based articles that included a replication phase did not include familial cancer cases. Moreover, controls were included more often in the replication phase (80%) than in the discovery phase (55%). Likewise, unselected cases were used more frequently in the replication phase than in the discovery phase (30% compared with 12%).
One third of the studies sequenced only 2 or 3 cancer cases in the discovery phase and only 28 (15%) of the articles sequenced more than 100 cancer cases. Technical validation of some of the variants was performed in 85% of the articles, although most of these were conducted in studies with small sample size (i.e., less than 50 cases). The replication phase had generally larger sample size, including more than 100 cancer cases. Almost half (n = 73) of the 160 family-based studies sequenced a single family and the majority (n = 115) sequenced one or two familial cancer cases (Table 2).
. | Number (%) of articles . | ||
---|---|---|---|
. | Discovery . | Independent replication . | Technical validation . |
Number of cancer cases | |||
2–3 | 62 (33%) | 0 (0%) | 41 (22%) |
4–10 | 38 (20%) | 3 (2%) | 48 (26%) |
11–50 | 39 (21%) | 17 (9%) | 35 (19%) |
51–100 | 19 (10%) | 10 (9%) | 4 (2%) |
101–1,000 | 22 (12%) | 43 (23%) | 3 (2%) |
>1,000 | 6 (3%) | 34 (18%) | 1 (1%) |
Not stated | 0 (0%) | 0 (0%) | 26 (14%) |
Totala | 186 (100%) | 107 (57%) | 158 (85%) |
Number of high-risk families | |||
1 | 73 (39%) | 3 (2%) | |
2–10 | 27 (15%) | 6 (3%) | |
11–50 | 40 (22%) | 23 (12%) | |
51–100 | 10 (5%) | 8 (4%) | |
101–1,000 | 6 (3%) | 23 (12%) | |
>1,000 | 2 (1%) | 5 (3%) | |
Not stated | 2 (1%) | 4 (2%) | |
Totala | 160 (86%) | 72 (39%) | |
Average number of sequenced cases per family | |||
1 | 49 (26%) | 56 (30%) | |
2 | 66 (35%) | 10 (5%) | |
3 | 27 (15%) | 1 (1%) | |
4 | 10 (5%) | 1 (1%) | |
5 | 4 (2%) | 0 (0%) | |
6–7 | 3 (2%) | 0 (0%) | |
Not stated | 1 (1%) | 5 (2%) | |
Totala | 160 (86%) | 72 (39%) |
. | Number (%) of articles . | ||
---|---|---|---|
. | Discovery . | Independent replication . | Technical validation . |
Number of cancer cases | |||
2–3 | 62 (33%) | 0 (0%) | 41 (22%) |
4–10 | 38 (20%) | 3 (2%) | 48 (26%) |
11–50 | 39 (21%) | 17 (9%) | 35 (19%) |
51–100 | 19 (10%) | 10 (9%) | 4 (2%) |
101–1,000 | 22 (12%) | 43 (23%) | 3 (2%) |
>1,000 | 6 (3%) | 34 (18%) | 1 (1%) |
Not stated | 0 (0%) | 0 (0%) | 26 (14%) |
Totala | 186 (100%) | 107 (57%) | 158 (85%) |
Number of high-risk families | |||
1 | 73 (39%) | 3 (2%) | |
2–10 | 27 (15%) | 6 (3%) | |
11–50 | 40 (22%) | 23 (12%) | |
51–100 | 10 (5%) | 8 (4%) | |
101–1,000 | 6 (3%) | 23 (12%) | |
>1,000 | 2 (1%) | 5 (3%) | |
Not stated | 2 (1%) | 4 (2%) | |
Totala | 160 (86%) | 72 (39%) | |
Average number of sequenced cases per family | |||
1 | 49 (26%) | 56 (30%) | |
2 | 66 (35%) | 10 (5%) | |
3 | 27 (15%) | 1 (1%) | |
4 | 10 (5%) | 1 (1%) | |
5 | 4 (2%) | 0 (0%) | |
6–7 | 3 (2%) | 0 (0%) | |
Not stated | 1 (1%) | 5 (2%) | |
Totala | 160 (86%) | 72 (39%) |
aSome totals do not add to 100% because 15% of articles did not perform technical validation; 43% of articles did not perform independent replication; 14% of articles did not include familial cases in the discovery phase; and 61% of articles did not perform independent replication in familial cases.
The most commonly studied cancer types were breast cancer (15%), followed by hematologic malignancies (15%, which included pediatric cases), colorectal cancer (10%), melanoma (7%), lung cancer (7%), and prostate cancer (5%; Supplementary Fig. S4). Information on race, ethnicity, or country of origin was reported in 85% of the articles reviewed and mostly referred to the region or country of origin (Supplementary Fig. S5), with only few studies reporting sequencing-derived ancestry. Over half of the studies were conducted in Caucasians or individuals from Europe (59%), followed by individuals from Asia (13%), the Middle East (7%), of African descent (3%), from Latin America (2%) and Australia (2%).
Sequencing technologies, read alignment, variant calling, and annotation
Sequencing was performed in DNA extracted from blood (65%), formalin-fixed paraffin-embedded, generally nontumor, tissues (8%) and/or saliva (5%). However, a notable proportion did not state the DNA source (22%). Moreover, the amount of DNA used for sequencing (∼1–3 μg per sample) was reported only in 33% of the articles (Supplementary Table S1).
Ten articles (5%) analyzed WGS only, seven studies (4%) conducted both WGS and WES, and 169 studies (91%) analyzed WES only. The reviewed studies used 28 different capture methods and 14 different sequencing platforms (Supplementary Fig. S6), whereas 12% and 8% did not report the capture or sequencer used, respectively. Sequencing coverage information was reported for only 71% of the articles, as average depth (52%) and/or percentage of the target genome covered at 10× or higher thresholds (42%; Supplementary Fig. S7). For most articles, it was unclear whether the reported coverage statistics referred to the targeted or actual coverage and to pre-quality control or post-quality control coverage. In addition, there was no correlation between the number of samples sequenced and the reported coverage depth.
Sequencing reads were aligned to human genome references, where 79% of the studies reported using hg19 (also known as NCBI build 37 or GRCh37) and 3% used hg18 (a.k.a., NCBI build 36). The most widely used aligner (52%) was the Burrows-Wheeler Aligner (BWA; ref. 14). Reference genome and alignment algorithm used were not specified in 18% and 12% of articles, respectively. Over half of the articles used Genome Analysis Toolkit (GATK; ref. 15) variant calling algorithms. One fourth of the studies used more than one algorithm to call variants, which can improve call quality, and 8% of articles did not report their variant calling method. Various quality metrics were applied to screen the sequencing reads (e.g., removal of PCR duplicates, unmapped or nonuniquely mapped, or out of target reads) and the called variants (e.g., removal of variants with quality or coverage below a preset threshold; Supplementary Table S1). In addition, 26% of the articles used additional control sequencing datasets generated in-house via the same technology and pipeline as the study dataset to control for technical artifacts.
The annotation software used to annotate the called variants was not specified in 30% of the articles and included ANNOVAR (16) in 36% of the articles. Eighty-nine percent of the articles used allele frequency information from publicly available databases, mainly from the 1000 Genomes Project (ref. 17; 64%), the NHLBI Exome Sequencing Project (ESP; ref. 18; 47%), dbSNP (ref. 19; 46%), and the Exome Aggregation Consortium (ExAC; ref. 20; 32%). Several other annotation tools and source databases were used (Supplementary Table S1).
Criteria used to filter variants and genes
Figure 1 shows which criteria or filters (see the f_1, …, f_28 variables described in Table 1) were used in each reviewed article. Filtering criteria were generally used to prioritize/select a variant/gene over others and/or to seek evidence in support of a selected variant/gene. Data in Fig. 1 show that the criteria used to identify variants and genes with a role in cancer susceptibility are disparate across articles, and consequently, results cannot be directly compared across studies. Below, we examine these filtering/selection criteria and summarize their use and outcomes by grouping them into seven broader themes: (i) variant quality; (ii) variant effect; (iii) variant rarity; (iv) mode of inheritance and genetic disease association; (v) candidate analysis approaches; (vi) independent replication; and (vii) functional validation. Two general observations hold true: (i) lack of sensitivity analyses to assess variability in results by changes in variant/gene selection strategy, and (ii) no or minimal reporting of a justification for the choice of criteria and thresholds used.
(i) Variant quality (f_1, f_9, f_22). Approximately half of the reviewed articles explicitly described the use of variant call metrics to exclude low-quality variants from the analyses, for example, manual inspection using the Integrative Genomics Viewer (21), removing variants in paralogs or repeats regions, and/or with Phred-scaled quality scores or coverage below a given threshold (Supplementary Table S1). Approximately 85% of the articles reported technically validating variants, where most used Sanger sequencing. The technical validation success rate was about 80% for studies that tested over 50 variants versus above 90% for studies that tested fewer variants (Supplementary Fig. S8).
(ii) Variant effect (f_4, f_5, f_6, f_7, f_8). Most articles (n = 168 or 90%) required the variants to be in coding regions, and more specifically, to be nonsynonymous or in splice sites or frameshift (n = 163 or 88%). A subset of these articles also required the selected variants to be functionally impactful (e.g., “deleterious”, “damaging”, or “pathogenic”) according to various in silico algorithms (n = 91 or 49%) or to be truncating (n = 38 or 20%). Supplementary Table S1 lists for each article the adopted in silico pathogenicity predictors (e.g., refs. 22–29) that were often used in combination.
(iii) Variant rarity (f_10, f_11). Most articles required the selected variants to be absent/not described (25%) or rare (62%) based on a preset MAF threshold (0–0.1 range, Supplementary Fig. S9) in internal or publicly available control datasets, such as 1000 Genomes Project, dbSNP, ESP, or others (Supplementary Table S1).
(iv) Mode of inheritance (f_2, f_3, f_12) and genetic disease association (f_13, f_14, f_15, f_16, f21). Only 10 (5%) and 17 (9%) articles restricted their search to homozygous and heterozygous variants according to a recessive or a dominant mode of inheritance, respectively. Because the remaining articles describe only heterozygous variants in their findings, a dominant inheritance hypothesis can be assumed for all but 10 articles. The majority of articles (n = 108 or 58%) required the variant to fully or partially segregate with disease status in at least one family whose members were sequenced in the discovery phase. Only two of the reviewed articles looked for de novo variants. In several studies, the selected genes were required to be mutated in more than one family (n = 19 or 10%) or in multiple independent cases (n = 5 or 3%), or to be enriched in cases compared with controls according to burden tests (n = 18 or 10%). Fewer articles required the same selected variant to be present in more than one family (n = 10 or 5%), in multiple independent cases (n = 6 or 3%), or to be statistically enriched in cases compared with controls (n = 8 or 4%). Finally, 12 articles used pathway analysis techniques to identify biological or molecular functions that were enriched with mutated genes.
(v) Candidate analysis approaches (f_17, f_18, f_19, f_20). Two thirds of the reviewed articles (n = 121, 65%) used existing information from the literature and curated databases to restrict the discovery analysis to: variants present in disease-related databases such as ClinVar (ref. 30; n = 14 or 7%); genes known to be linked to disease such as those listed in OMIM (31) or reported in the literature (n = 59 or 32%); genetic region known to be linked to disease through genome wide association studies (GWAS) or linkage studies (n = 14 or 7%); and/or biological or molecular pathways known to be linked to disease, such as DNA repair pathways (n = 51 or 27%).
(vi) Independent replication (f_23). Only 107 (57%) articles attempted replication of variants/genes in an independent set of cancer cases. Overall, 79 (42%) reported various degrees of confirmatory evidence. In some cases, the authors reported the presence of other pathogenic variants in the same gene, whereas in others, a statistically significant burden test in cases compared with controls for that gene was reported. In a few studies, the exact variant(s) initially found in the discovery phase were found in additional cancer cases in the replication phase.
(vii) Functional validation (f_24, f_25, f_26, f_27, f_28). In those studies that evaluated function of the identified variants/genes (70% of 186), 60 (32%) tested for loss of heterozygosity in tumor samples (58% of which tested positive); 36 (19%) looked for somatic mutations in the same gene (69% of which were found); 22 (12%) looked for gene/methylation expression changes supporting a link with disease (86% with positive results); 35 (19%) checked for variant splicing (86% of which were verified); and 45 (24%) carried out in vitro experiments or other functional assays on the identified variants, 80% of which showed results consistent with the hypothesized function for these variants.
Variants and genes identified
About 95% (n = 176) of reviewed articles indicated that they identified variants or genes (listed in Table 3 with PMIDs by cancer type) with various degrees of certainty. Only eight (4%) articles clearly stated that they were not able to identify variants or genes in the studied cases, and the remaining two (1%) articles pointed to molecular or functional pathways of possible relevance to the studied cancer type. The 176 articles indicated as primary findings (Supplementary Table S3) a total of approximately 2,095 variants (average 11, median 3, range 1–222 per article) and approximately 1,215 (∼954 unique) genes (average 6, median 1, range 1–<222 per article). An exact count of variants and genes identified was not feasible due to incomplete counts and/or variant nomenclature in some of the articles. For the 99 articles that studied more than one high-risk family, and reported the information, the identified variants/genes accounted on average for 25% of the families evaluated in discovery and replication phases combined (we excluded from this analysis 43 articles that studied only a single family and did not attempt replication, Supplementary Fig. S10). Regarding the prevalence of the identified variants among controls, 27 (16% of 176) articles did not sequence these variants in controls, 50 (28%) sequenced only some unaffected relative of the cases, 37 (21%) did not report how many of the sequenced controls carried the investigated variants; in the remaining articles the variants' frequency in unrelated controls averaged 0.015 with median 0 (Supplementary Table S3).
Overall, 106 genes were identified in two or more articles (see bolded gene symbols in Table 3). The five genes reported by more than 10 articles are well-established cancer susceptibility genes (i.e., ATM, BRCA2, BRCA1, TP53, and PALB2 were observed in 12%, 12%, 9%, 9%, and 8% of the articles, respectively). When the analysis was restricted to the articles that used a fully agnostic, not candidate, analytic approach, these genes were observed less frequently (6%, 4%, 7%, 0%, 2%, respectively) and other less established genes were more frequently observed (>4%; i.e., PMS2, IGSF22, ABCA10, ACAN, and PABPC3). We also observed 43 variants in 22 genes that were independently identified in two or three articles (Table 4). While some of the observed pleiotropic effects are well established (e.g., PALB2 and BRCA1/2 for breast, ovarian, prostate, and pancreatic cancer), others are potentially novel, such as BRCA2 for melanoma and head and neck cancers, MUTYH for prostate and small intestine cancers, and KDR for prostate cancer and Hodgkin lymphoma.
Gene . | Variant . | Allele frequency . | Articles' PMID . | Cancer type . |
---|---|---|---|---|
PALB2 | chr16, c.172_175delTTGT, p.Q60fs, rs1214293842 | 0.000042 | 25330149, 30128536 | Breast, Ovarian and breast |
PALB2 | chr16, c.509_510delGA, p.R170fs, rs863224790 | 0.000037 | 25330149, 30128536 | Breast, Ovarian and breast |
PALB2 | chr16, c.1240C>T, p.R414*, rs180177100 | 0.000009 | 27449771, 30128536 | Pancreatic, Ovarian and breast |
PALB2 | chr16, c.3256C>T, p.R1086*, rs587776527 | 0.000009 | 23561644, 30128536 | Pancreatic, Ovarian and breast |
PALB2 | chr16, c.3004_3007delGAAA, p.E1002Tfs | 0 | 26485759, 30128536 | Prostate, Ovarian and breast |
PALB2 | chr16, c.3549C>A, p.Y1183*, rs118203998 | 0.000009 | 26689913, 30128536 | Multiple_12, Ovarian and breast |
PALB2 | chr16, c.424A>T, p.K142* | 0 | 26689913, 30128536 | Multiple_12, Ovarian and breast |
BRCA2 | chr13, c.658_659delGT, p.V220fs, rs876660049 | 0.000028 | 26689913, 28202063 | Multiple_12, Breast |
BRCA2 | chr13, c.6275_6276delTT, p.Leu2092fs, rs11571658 | 0.000065 | 25330149, 29915322 | Breast, Prostate |
BRCA2 | chr13, c.9246_9247insA, p.T3085fs, rs80359752 | 0 | 25330149, 29915322 | Breast, Prostate |
BRCA2 | chr13, c.865A>C, p.N289H, rs766173 | 0.052597 | 29317335, 29747023 | Melanoma, Head and neck |
BRCA2 | chr13, c.9294C>G, p.Y3098*, rs80359200 | 0.000009 | 26580448, 29625052 | Pediatric, Multiple_33 |
ATM | chr11, c.5071A>C, p.S1691R, rs1800059 | 0.001937 | 28202063, 28652578 | Breast, Blood (CLL) |
ATM | chr11, c.170G>GA, p.W57* | 0 | 22585167, 30128536 | Pancreatic, Ovarian and breast |
ATM | chr11, c.6095G>GA, p.R2032K, rs139770721 | 0.000027 | 22585167, 29625052 | Pancreatic, Multiple_33 |
ATM | chr11, c.6100C>T p.R2034*, rs532480170 | 0.000009 | 27913932, 30128536 | Breast, Ovarian and breast |
ATM | chr11, g.108155008_delG, p.E1267fs | 0 | 28652578, 29625052 | Blood (CLL), Multiple_33 |
BRCA1 | chr17, c.1067A>G, p.Q356R, rs1799950 | 0.045162 | 25923920, 26485759 | Breast, Prostate |
BRCA1 | chr17, c.4065_4068delTCAA, p.N1355fs, rs886040195 | 0.000018 | 26689913, 29625052 | Multiple_12, Multiple_33 |
BRCA1 | chr17, c.1054G>T, p.E352*, rs80357472 | 0.000009 | 26689913, 29625052 | Multiple_12, Multiple_33 |
BRCA1 | chr17, c.68_69delAG, p.E23fs, rs80357410 | 0.000175 | 26689913, 29625052 | Multiple_12, Multiple_33 |
TP53 | chr17, c.733C>T, p.G245S, rs28934575 | 0 | 26580448, 29351919, 29602769 | Pediatric, Pediatric, Brain |
TP53 | chr17, c.524G>A, p.R175H, rs28934578 | 0 | 26580448, 30128536 | Pediatric, Ovarian and breast |
TP53 | chr17, c.743G>A, p.R248Q, rs11540652 | 0.000009 | 26580448, 30128536 | Pediatric, Ovarian and breast |
FANCM | chr14, c.5101C>T, p.Q1701*, rs147021911 | 0.001530 | 25288723, 28881617 | Breast, Ovarian |
FANCM | chr14, c.5791C>T, p.R1931*, rs144567652 | 0.001181 | 28591191, 28881617 | Ovarian, Ovarian |
KAT6B | chr10, c.4546G>T, p.D1516Y | 0 | 23800003, 24969172 | Breast, Breast |
KAT6B | chr10, c.4729C>T, p.R1577C | 0 | 23800003, 24969172 | Breast, Breast |
POT1 | chr7, c.1851_1852delTA, p.D617fs, rs758673417 | 0.000009 | 25482530, 27329137 | Glioma, Colorectal |
MSH6 | chr2, c.3261delC, p.F1088fs | 0 | 26689913, 30128536 | Multiple_12, Ovarian and breast |
CHEK2 | chr22, c.1100delC, p.T367fs | 0 | 22527104, 29351919, 30128536 | Breast, Pediatric, Breast/ovarian |
RAD51D | chr17, g.33433425G>A, p.R206*, rs387906843 | 0.000017 | 26689913, 28591191 | Multiple_12, Ovarian |
FANCC | chr9, c.C553C>T, p.R185*, rs121917783 | 0.000064 | 23028338, 28125078 | Breast, Sarcoma (Ewing) |
MUTYH | chr1, c.1187G>A, p.G396D | 0 | 27084275, 28634180 | Prostate, Intestine (small) |
BLM | chr15, c.1933C>T, p.Q645*, rs373525781 | 0.000018 | 23028338, 28125078 | Breast, Sarcoma (Ewing) |
TYK2 | chr19, c.2279C>T, p.P760L | 0 | 27733777, 29351919 | Blood (ALL), Pediatric |
MAX | chr14, c.223C>T, p.R75* | 0 | 21685915, 29625052 | Pheochromocytoma, Multiple_33 |
NOTCH2 | chr1, c.3625T>G, p.F1209V, rs147223770 | 0.003217 | 26485759, 29868112 | Prostate, Breast |
XRCC2 | chr7, c.96delT, p.F32fs, rs774296079 | 0.000075 | 25330149, 26689913 | Breast, Multiple_12 |
RET | chr10, c.2370G>C, p.L790F, rs75030001 | 0.000009 | 26580448, 28125078 | Pediatric, Sarcoma (Ewing) |
GPRC5A | chr12, c.183delG, p.R61fs, rs527915306 | 0.002113 | 22527104, 24470238 | Breast, Breast |
KDR | chr4, c.3193G>A, p.A1065T, rs56302315 | 0.000324 | 26485759, 27365461 | Prostate, Blood (HL) |
MITF | chr3, c.952G>A, p.E318K, rs149617956 | 0.001330 | 28125078, 29317335 | Sarcoma (Ewing), Melanoma |
Gene . | Variant . | Allele frequency . | Articles' PMID . | Cancer type . |
---|---|---|---|---|
PALB2 | chr16, c.172_175delTTGT, p.Q60fs, rs1214293842 | 0.000042 | 25330149, 30128536 | Breast, Ovarian and breast |
PALB2 | chr16, c.509_510delGA, p.R170fs, rs863224790 | 0.000037 | 25330149, 30128536 | Breast, Ovarian and breast |
PALB2 | chr16, c.1240C>T, p.R414*, rs180177100 | 0.000009 | 27449771, 30128536 | Pancreatic, Ovarian and breast |
PALB2 | chr16, c.3256C>T, p.R1086*, rs587776527 | 0.000009 | 23561644, 30128536 | Pancreatic, Ovarian and breast |
PALB2 | chr16, c.3004_3007delGAAA, p.E1002Tfs | 0 | 26485759, 30128536 | Prostate, Ovarian and breast |
PALB2 | chr16, c.3549C>A, p.Y1183*, rs118203998 | 0.000009 | 26689913, 30128536 | Multiple_12, Ovarian and breast |
PALB2 | chr16, c.424A>T, p.K142* | 0 | 26689913, 30128536 | Multiple_12, Ovarian and breast |
BRCA2 | chr13, c.658_659delGT, p.V220fs, rs876660049 | 0.000028 | 26689913, 28202063 | Multiple_12, Breast |
BRCA2 | chr13, c.6275_6276delTT, p.Leu2092fs, rs11571658 | 0.000065 | 25330149, 29915322 | Breast, Prostate |
BRCA2 | chr13, c.9246_9247insA, p.T3085fs, rs80359752 | 0 | 25330149, 29915322 | Breast, Prostate |
BRCA2 | chr13, c.865A>C, p.N289H, rs766173 | 0.052597 | 29317335, 29747023 | Melanoma, Head and neck |
BRCA2 | chr13, c.9294C>G, p.Y3098*, rs80359200 | 0.000009 | 26580448, 29625052 | Pediatric, Multiple_33 |
ATM | chr11, c.5071A>C, p.S1691R, rs1800059 | 0.001937 | 28202063, 28652578 | Breast, Blood (CLL) |
ATM | chr11, c.170G>GA, p.W57* | 0 | 22585167, 30128536 | Pancreatic, Ovarian and breast |
ATM | chr11, c.6095G>GA, p.R2032K, rs139770721 | 0.000027 | 22585167, 29625052 | Pancreatic, Multiple_33 |
ATM | chr11, c.6100C>T p.R2034*, rs532480170 | 0.000009 | 27913932, 30128536 | Breast, Ovarian and breast |
ATM | chr11, g.108155008_delG, p.E1267fs | 0 | 28652578, 29625052 | Blood (CLL), Multiple_33 |
BRCA1 | chr17, c.1067A>G, p.Q356R, rs1799950 | 0.045162 | 25923920, 26485759 | Breast, Prostate |
BRCA1 | chr17, c.4065_4068delTCAA, p.N1355fs, rs886040195 | 0.000018 | 26689913, 29625052 | Multiple_12, Multiple_33 |
BRCA1 | chr17, c.1054G>T, p.E352*, rs80357472 | 0.000009 | 26689913, 29625052 | Multiple_12, Multiple_33 |
BRCA1 | chr17, c.68_69delAG, p.E23fs, rs80357410 | 0.000175 | 26689913, 29625052 | Multiple_12, Multiple_33 |
TP53 | chr17, c.733C>T, p.G245S, rs28934575 | 0 | 26580448, 29351919, 29602769 | Pediatric, Pediatric, Brain |
TP53 | chr17, c.524G>A, p.R175H, rs28934578 | 0 | 26580448, 30128536 | Pediatric, Ovarian and breast |
TP53 | chr17, c.743G>A, p.R248Q, rs11540652 | 0.000009 | 26580448, 30128536 | Pediatric, Ovarian and breast |
FANCM | chr14, c.5101C>T, p.Q1701*, rs147021911 | 0.001530 | 25288723, 28881617 | Breast, Ovarian |
FANCM | chr14, c.5791C>T, p.R1931*, rs144567652 | 0.001181 | 28591191, 28881617 | Ovarian, Ovarian |
KAT6B | chr10, c.4546G>T, p.D1516Y | 0 | 23800003, 24969172 | Breast, Breast |
KAT6B | chr10, c.4729C>T, p.R1577C | 0 | 23800003, 24969172 | Breast, Breast |
POT1 | chr7, c.1851_1852delTA, p.D617fs, rs758673417 | 0.000009 | 25482530, 27329137 | Glioma, Colorectal |
MSH6 | chr2, c.3261delC, p.F1088fs | 0 | 26689913, 30128536 | Multiple_12, Ovarian and breast |
CHEK2 | chr22, c.1100delC, p.T367fs | 0 | 22527104, 29351919, 30128536 | Breast, Pediatric, Breast/ovarian |
RAD51D | chr17, g.33433425G>A, p.R206*, rs387906843 | 0.000017 | 26689913, 28591191 | Multiple_12, Ovarian |
FANCC | chr9, c.C553C>T, p.R185*, rs121917783 | 0.000064 | 23028338, 28125078 | Breast, Sarcoma (Ewing) |
MUTYH | chr1, c.1187G>A, p.G396D | 0 | 27084275, 28634180 | Prostate, Intestine (small) |
BLM | chr15, c.1933C>T, p.Q645*, rs373525781 | 0.000018 | 23028338, 28125078 | Breast, Sarcoma (Ewing) |
TYK2 | chr19, c.2279C>T, p.P760L | 0 | 27733777, 29351919 | Blood (ALL), Pediatric |
MAX | chr14, c.223C>T, p.R75* | 0 | 21685915, 29625052 | Pheochromocytoma, Multiple_33 |
NOTCH2 | chr1, c.3625T>G, p.F1209V, rs147223770 | 0.003217 | 26485759, 29868112 | Prostate, Breast |
XRCC2 | chr7, c.96delT, p.F32fs, rs774296079 | 0.000075 | 25330149, 26689913 | Breast, Multiple_12 |
RET | chr10, c.2370G>C, p.L790F, rs75030001 | 0.000009 | 26580448, 28125078 | Pediatric, Sarcoma (Ewing) |
GPRC5A | chr12, c.183delG, p.R61fs, rs527915306 | 0.002113 | 22527104, 24470238 | Breast, Breast |
KDR | chr4, c.3193G>A, p.A1065T, rs56302315 | 0.000324 | 26485759, 27365461 | Prostate, Blood (HL) |
MITF | chr3, c.952G>A, p.E318K, rs149617956 | 0.001330 | 28125078, 29317335 | Sarcoma (Ewing), Melanoma |
Abbreviations: ALL, acute lymphoblastic leukemia; CLL, chronic lymphocytic leukemia; HL, Hodgkin lymphoma.
Discussion
Methodologic variability across reviewed articles
One major observation from this review was that the criteria used to identify variants and genes presented by the authors as having a role in cancer susceptibility varied dramatically across studies (Fig. 1). In addition, most reviewed articles lacked sensitivity analyses to assess the variability in results by changes in variant/gene selection strategy or a justification for the choice of criteria and thresholds adopted. For example, although restricting the analysis to rare variants is justified in principle by the fact that high-risk/high-penetrance variants are very rare in the general population, no clear justification (e.g., based on disease penetrance estimates) was usually given for the exact choice of MAF thresholds, which can impact both false positives and negatives. In addition, requiring that the selected variants be completely absent from internal or publicly available control datasets (25% of articles) may also lead to false negatives, given that known disease related variants are observed in these datasets. Moreover, differences in methodologies, such as, study design, sequencing technologies, depth of coverage, human genome reference used, annotation software, variant calling methods, in silico prediction tools, have been reported to lead to differences in findings (218–219). Similarly, although the choice of transcript set and annotation software have been quantified to have a substantial effect on variant annotation and impact on the analysis of genome sequencing studies (220), none of the reviewed articles examined or discussed these potential effects. Although this literature review spanned a decade wherein underlying technologies, costs, and bioinformatic pipelines evolved significantly, we note that 74% of the reviewed articles were published in the years 2015 to 2018. When restricting the analyses to this group of recent articles, we observed similar results.
Importance of developing consensus on standards
The observed wide variation and inconsistencies in approaches and strategies underscore the importance of establishing a consensus on standards for filtering strategies and rationale for variant identification (e.g., justification for the criteria and thresholds used). The disadvantage of using different methodologies to identify germline susceptibility genes is that it limits the ability to compare results across studies. While initiatives by the American College of Medical Genetics and Genomics (221) and the NIH (222) have developed standards to assign a pathogenicity status to a given variant based on the available literature and annotations, to our knowledge, there has not been an attempt to set standards or a framework for agnostic searches of susceptibility variants or genes. On the basis of the present systematic review, we would recommend that articles in this field: (i) report information for all the relevant components described in Table 1 and Supplementary Table S1; (ii) include a complete list of identified variants/genes and a count of the individuals carrying those in a format similar to Supplementary Table S3; (iii) report and explain the choice of variants/genes filtering criteria and thresholds, including sensitivity analysis when warranted.
Variants and genes identified in the reviewed articles
Approximately 95% of the reviewed studies reported identifying susceptibility variants or genes in the studied cancer cases. However, this observation may reflect general publication bias. Overall, about 2,000 variants and 1,000 unique genes were reported as primary findings by the authors. Breast cancer studies reported the highest number of genes, possibly reflective of the large proportion of published studies rather than the underlying genetic architecture. Notably, one hundred genes were found in more than one article (see bolded gene symbols in Table 3), indicating that results are recurrent within each cancer type and suggestive of pleiotropic effects across cancer types. Some of these observations may also be due to chance and/or to the wide adoption of variant/gene selection approaches based on known candidate variants, genes, or pathways. Indeed, when restricting the analysis to the articles that used a fully agnostic analysis approach, we observed a decrease in the relative frequency of reporting of these genes and an increase in relative frequency for less established genes. This observation may illustrate that more novel genes could be discovered by using a more expansive analysis approach. In addition, we found that 43 variants (Table 4) were each identified in two or more articles across cancer types. The identified variants/genes accounted on average for 25% of the families evaluated in both discovery and replication, suggesting that the fraction of families explained by the genes identified through exome/genome-wide sequencing may have increased since previous linkage analysis and candidate gene sequencing results (10%–25% of families depending on the cancer type; ref. 223). The results collectively show that important progress has been made in the identification of cancer susceptibility genes and that pleiotropy is a common phenomenon in genetic cancer susceptibility. Nevertheless, the progress made to date is not without caveats.
Challenges limiting progress in variant/gene identification
This review reveals scientific gaps and challenges in the body of literature. Of note, many (especially rare) cancer types remain understudied (or under published) and over 75% of cancer-prone families remain unexplained. While the lack of identification of mutations for cancer in heavily loaded families could reflect a polygenic or omnigenic architecture in these families, several additional challenges may have limited further progress in identifying germline variants associated with cancer, as indicated by the suspected publication bias (95% articles reported positive findings) and by the limited number of articles identified (only 186 articles across 10 years and all cancer types). First, through careful review of the literature, we observed variation in study design and case selection, even within studies. For example, many of the studies included in this review used familial cases in the discovery phase before switching to unselected cases in the replication phase (perhaps due to a lack of additional familial samples or funds), which may introduce etiologic heterogeneity (e.g., familial cases may carry different and/or more penetrant variants/genes than unselected cases) and may, in part, explain the lack of replication for some of the reviewed studies. In addition, including suitable control populations is important to ascertain magnitude of risk, whereas the frequency of the identified variants in such controls was reported only in one third of the articles. Second, focusing exclusively on the exome (only 10 of the articles were WGS) may be a limitation in complex trait genetics for which noncoding genetic variation is believed to play a larger role than in Mendelian genetics (224, 225)—a hypothesis that still needs to be verified for rare variants specifically. A third aspect that may have limited progress is the widespread use of candidate analysis approaches that focus the discovery analysis on known variants or genes or pathways by leveraging relevant existing information to select the resulting variants/genes. Although articles that used a candidate sequencing approach were excluded, the use of candidate analysis approaches was reported in 65% of the reviewed articles. These challenges (lack of genome-wide and agnostic studies) may be due to the fact that researchers do not yet have the tools to examine agnostically the whole exome or genome effectively. Alternatively, the researchers may have had specific reasons to focus on candidate regions of interest. Whatever the origin, an important consequence should be acknowledged: much of the human exome (and genome) remains unexplored or untested for cancer. Finally, authors frequently stated a need for additional research to replicate their findings in larger and more homogeneous (e.g., by race/ethnicity or cancer histology) study populations. Indeed, although the majority of the reviewed articles used in the discovery phase a familial study design (that does not require cancer case numbers as large as unselected case–control study designs), 39% of articles exome/genome sequenced only a single family and 26% only a single member per family. Increasing the number of sequenced families and of cancer cases within each family may be an important avenue for future studies.
Technical considerations
From a technical point of view, the differences in utilization of the various technologies are dependent on the timing of their development and subsequent replacement by the next capture kit or sequencer version. We showed that the reviewed articles reported a technical validation rate of about 80% for studies that tested over fifty variants versus over 90% for studies that tested fewer variants; the difference may be due to prevalidation manual inspection steps (e.g., through IGV) being more feasibly applied to a limited number of variants. This observation suggests the importance for researchers using current sequencing technologies at a genome-wide scale to technically validate any observed variants. Another technical limitation stems from within study aggregation of samples across multiple sequencing experiments, as this approach can generate biases in variant detection and false positives/negatives in variant–cancer associations, particularly for WES datasets that vary also in capture efficiency. Several strategies to control for biological and technical heterogeneity and to minimize calling discordance and erroneous findings were described in the reviewed articles, including checking for comparable depths and rare variants detection and performing alignment and variant calling of all samples simultaneously. One notable point, reinforced by the observed lack of data sharing for over 80% of the reviewed articles, is the importance of saving, storing and being able to access BAM or CRAM files of the studied datasets for the wider research community, both for publicly and privately sponsored datasets. Advancements in long-range sequencing and other new technologies may also help address some of the described technical shortcomings in the future and influence near-term approaches to genomic analyses (226).
Importance of functional validation
Finally, most reviewed articles acknowledged the importance of functional validation (e.g., through in vitro and in vivo models) to determine whether the function of the mutated gene product is consistent with the cancer of interest and to inform the interpretation of the reported findings. Even though 65% of the articles did attempt some type of experimental validation for the final selection of variants/genes, the reported functional results were usually not considered definitive by the authors. In fact, most articles described the need for additional functional studies to determine whether the identified genes or variants play a causal role in carcinogenesis and to describe the mechanisms for these variants to impact disease.
Limitation and strengths of this literature review
Limitations of our literature review include: the lack of access to primary data, and consequent inability to systematically evaluate how different filtering choices would lead to different results in these studies; likely publication bias toward non-null results; use of the authors definition of “identified gene or variant” that varied greatly across the reviewed articles; the exclusion of articles in which only a single cancer case was sequenced, which although usually case reports, can also in principle lead to the identification of novel cancer susceptibility genes [e.g., PALB2 (227), or NPAT (228)]. Strengths of our review are the systematic inclusion/exclusion approach, the comprehensive key term search, and thorough data abstraction.
Conclusions
In conclusion, the findings from this review indicate a growth in usage of NGS technologies at the exome/genome scale to identify genes associated with cancer risk. Nevertheless, progress has been limited by a range of challenges inherent in the field. The review highlights several important next steps including establishing consensus on standards for use and reporting of filtering strategies, describing rationale for variant identification, developing analytic methods that truly mine the whole exome/genome, improving the accuracy and cross-studies interoperability of current sequencing technologies, sharing of the primary data with the research community, and performing extensive variant functional validation. It also points to the untapped potential in conducting studies with more/larger families and in more diverse populations and cancers types, harmonizing results across studies, and expanding searches beyond a candidate analysis approach.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
This research was supported by the NCI, NIH at the Division of Cancer Control and Populations Sciences, the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, and the Scientific Consulting Group (to N.I. Simonds; contract number HHSN261201400011I).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.