Abstract
Investigation of biological mechanisms underlying genetic alterations in cancer can assist the understanding of etiology and identify the potential prognostic biomarkers.
We performed an integrative genomic analysis for a total of 731 nasopharyngeal carcinoma cases from five independent nasopharyngeal carcinoma cohorts to identify the genetic events associated with clinical outcomes.
In addition to the known mutational signatures associated with aging, APOBEC and mismatch repair (MMR), a new signature for homologous recombination deficiency (BRCAness) was discovered in 64 of 216 (29.6%) cases in the discovery set including three cohorts. This signature appeared more frequently in the recurrent and metastatic tumors and significantly correlated with shorter overall survival (OS) in the primary tumors. Independent prognostic value of MMR and BRCAness signatures was revealed by multivariable Cox analysis after adjustment for clinical parameters and stratification by studies. The cases with both signatures had much worse clinical outcome than those without these signatures [hazard ratio (HR), 12.4; P = 0.002]. This correlation was confirmed in the validation set (HR, 8.9; P = 0.003). The BRCAness signature is highly associated with BRCA2 pathogenic germline or somatic alterations (7.8% vs. 0%; P = 0.002). Targeted sequencing results from a prospective nasopharyngeal carcinoma cohort (N = 402) showed that the cases carrying BRCA2 germline rare variants are more likely to have poor OS and progression-free survival.
Our study highlights importance of defects of DNA repair machinery in nasopharyngeal carcinoma pathogenesis and their prognostic values for clinical implications. These signatures will be useful for patient stratification to evaluate conventional and new treatment for precision medicine in nasopharyngeal carcinoma.
In nasopharyngeal carcinoma, general consensus for treatment is to use radiotherapy alone for stage I disease, radiotherapy with or without concurrent chemotherapy for stage II, and chemoradiotherapy for advanced-stage disease. However, 15%–58% of the cases do not respond well to conventional treatment, and, thus, have poor clinical outcomes. There is a need for biomarkers to assist our understanding of the molecular basis of disease pathogenesis and progression and to aid clinical management in nasopharyngeal carcinoma. We have systematically examined the mutation signatures and evaluated their prognostic values in association with clinical outcome. The mutational signature relevant to homologous recombination deficiency (BRCAness) was discovered, which was unappreciated in nasopharyngeal carcinoma before. Importantly, independent prognostic values of the BRCAness signature and mismatch repair signature are now revealed. These data show the clinical importance of DNA repair pathways in nasopharyngeal carcinoma and their potential as prognostic and predictive biomarkers for future clinical studies.
Introduction
Nasopharyngeal carcinoma is a complex disease involving Epstein–Barr virus (EBV) infection and genetic and environmental factors (1). To understand the molecular basis of nasopharyngeal carcinoma pathogenesis, we performed genomic studies characterizing the important genetic alterations using whole-exome sequencing (WES) in nasopharyngeal carcinoma tissues obtained from cases and xenografts (2, 3). Together with the results from other nasopharyngeal carcinoma genomic studies, multiple critical genetic alterations, including inactivating mutations of negative regulators in the NF-κB pathway, mutations in the PI3K/MAPK pathway, and epigenetic regulators, were discovered (2, 4, 5).
Mutational signatures are the footprints of mutational processes relevant to endogenous and exogenous factors contributing to tumorigenesis (6). Analysis of mutational signatures helps to decipher the etiology of nasopharyngeal carcinoma. Three mutational signatures relevant to aging caused by an endogenous mutational process initiated by spontaneous deamination of 5′-methylcytosine, defective DNA mismatch repair (MMR), and APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like) family of cytidine deaminases were reported in nasopharyngeal carcinoma (2, 5). However, our understanding of the molecular basis of nasopharyngeal carcinoma is far from complete, as these individual studies typically only profiled 50–110 cases.
We interrogated genomic data and clinical information from four published nasopharyngeal carcinoma studies and one prospectively collected nasopharyngeal carcinoma cohort. By increasing the statistical power to distinguish the infrequent driver events from passengers, we aimed to identify the additional genetic events driving tumorigenesis, gain insights in the mechanisms underlying genetic alterations, and systematically evaluate their prognostic value.
Materials and Methods
Patient cohorts and clinical information
In the discovery stage, we included Asian patients with nasopharyngeal carcinoma from three cohorts. The first nasopharyngeal carcinoma cohort was obtained from the Sequence Read Archive (SRA) database (accession No. SRP035573) from Singapore (SG cohort) and included 56 Singapore nasopharyngeal carcinoma cases (4). Our earlier WES Hong Kong study includes 59 nasopharyngeal carcinoma cases (accession No. SRA288429; ref. 2), together with 12 additional cases (accession No. SRP265671) to make up the HK-1 cohort. The third nasopharyngeal carcinoma cohort (HK-2 cohort) included 97 independent Hong Kong cases from dbGAP-NHGRI (accession No. phs001244.v1.p1; ref. 5). None of patients with primary nasopharyngeal carcinoma had radiotherapy or chemotherapy. In total, 178 primary tumors and 38 recurrent or metastatic nasopharyngeal carcinomas were included. The male-to-female ratio was 3.7. The independent validation cohort including 113 patients with nasopharyngeal carcinoma was obtained from another genomic study by Sun Yat-Sen University Cancer Center (Guangdong, Guangzhou, P.R. China; ref. 7), and, thus, was named as the GZ cohort. To evaluate the prognostic value of BRCA2 germline pathogenic variants in association with clinical outcome, we examined a total of 420 patients with nasopharyngeal carcinoma (HK-3) prospectively collected by the Area of Excellence of Hong Kong Nasopharyngeal Carcinoma Tissue Bank. After quality evaluation, 402 patients were included in the analysis for the HK-3 cohort. Following the REMARK recommendations (8), we include full details of clinical parameters in Table 1 and Supplementary Table S1 and their relationship to patient outcome in Tables 2 and 3. The study workflow is illustrated in Supplementary Fig. S1. We included the Research Resource Identifiers (RRID) for the relevant tools and databases for the genomic analysis.
. | Discovery set (n = 216) . | Validation set (n = 113) . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | Total . | SG . | HK-1 . | HK-2 . | I2 . | Pa . | GZ . | Pb . | |
Site | Primary | 178 | 50 | 60 | 68 | 90.60% | <0.001 | 113 | |
Recurrent/metastasis | 38 | 0 | 11 | 27 | 0 | ||||
Total | 216 | 50 | 71 | 95 | 113 | ||||
Primary NPC | |||||||||
Age | Mean ± SD | 53 ± 9 | 53 ± 11 | 51 ± 13 | 0.575 | — | |||
Gender | Female | 38 | 10 | 11 | 17 | 0% | 0.632 | ||
Male | 140 | 40 | 49 | 51 | — | ||||
Unknown | 0 | 0 | 0 | 0 | — | ||||
Stage | I | 13 | 1 | 6 | 6 | 4.54% | 0.392 | 5 | |
II | 31 | 9 | 9 | 13 | 2 | ||||
III | 66 | 15 | 27 | 24 | 41 | ||||
IV | 64 | 22 | 18 | 24 | 25 | ||||
Unknown | 4 | 3 | 0 | 1 | 40 | 0.005 | |||
Metastasis (M) | Yes | 12 | 3 | 5 | 4 | 0% | 0.86 | — | |
No | 161 | 44 | 55 | 63 | |||||
Unknown | 5 | 3 | 0 | 1 | |||||
Lymph node metastasis (N) | Yes | 88 | 38 | 50 | — | 0% | 0.787 | — | |
No | 18 | 9 | 9 | — | |||||
Unknown | 4 | 3 | 1 | — | |||||
Survival status | Alive | 132 | 36 | 48 | 48 | 0% | 0.449 | — | |
Dead | 40 | 9 | 12 | 19 | |||||
Unknown | 6 | 5 | 0 | 1 | |||||
OS (month) | Mean (95% CI) | 75 (70–81) | 50 (44–57) | 77 (70–84) | 73 (65–82) | 0.291 | — | ||
Disease progression | Yes | 17 | 17 | 16 | |||||
No | 43 | — | 43 | — | — | — | 72 | ||
Unknown | 0 | 0 | 25 | 0.209 | |||||
PFS (month) | Mean (95% CI) | 70 (61–78) | — | 70 (61–78) | — | — | — | 46 (42–50) | 0.702 |
. | Discovery set (n = 216) . | Validation set (n = 113) . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | Total . | SG . | HK-1 . | HK-2 . | I2 . | Pa . | GZ . | Pb . | |
Site | Primary | 178 | 50 | 60 | 68 | 90.60% | <0.001 | 113 | |
Recurrent/metastasis | 38 | 0 | 11 | 27 | 0 | ||||
Total | 216 | 50 | 71 | 95 | 113 | ||||
Primary NPC | |||||||||
Age | Mean ± SD | 53 ± 9 | 53 ± 11 | 51 ± 13 | 0.575 | — | |||
Gender | Female | 38 | 10 | 11 | 17 | 0% | 0.632 | ||
Male | 140 | 40 | 49 | 51 | — | ||||
Unknown | 0 | 0 | 0 | 0 | — | ||||
Stage | I | 13 | 1 | 6 | 6 | 4.54% | 0.392 | 5 | |
II | 31 | 9 | 9 | 13 | 2 | ||||
III | 66 | 15 | 27 | 24 | 41 | ||||
IV | 64 | 22 | 18 | 24 | 25 | ||||
Unknown | 4 | 3 | 0 | 1 | 40 | 0.005 | |||
Metastasis (M) | Yes | 12 | 3 | 5 | 4 | 0% | 0.86 | — | |
No | 161 | 44 | 55 | 63 | |||||
Unknown | 5 | 3 | 0 | 1 | |||||
Lymph node metastasis (N) | Yes | 88 | 38 | 50 | — | 0% | 0.787 | — | |
No | 18 | 9 | 9 | — | |||||
Unknown | 4 | 3 | 1 | — | |||||
Survival status | Alive | 132 | 36 | 48 | 48 | 0% | 0.449 | — | |
Dead | 40 | 9 | 12 | 19 | |||||
Unknown | 6 | 5 | 0 | 1 | |||||
OS (month) | Mean (95% CI) | 75 (70–81) | 50 (44–57) | 77 (70–84) | 73 (65–82) | 0.291 | — | ||
Disease progression | Yes | 17 | 17 | 16 | |||||
No | 43 | — | 43 | — | — | — | 72 | ||
Unknown | 0 | 0 | 25 | 0.209 | |||||
PFS (month) | Mean (95% CI) | 70 (61–78) | — | 70 (61–78) | — | — | — | 46 (42–50) | 0.702 |
Abbreviations: NPC, nasopharyngeal carcinoma; PFS, progression-free survival.
aP value was estimated for difference among three cohorts within the discovery set (two-sided).
bP value was estimated for difference between discovery and validation cohorts (two-sided). The cases with missing information were not included in both analyses. The cases with unknown information were removed from the analysis. The significance of bold terms is P < 0.05.
. | Univariate survival analysis . | ||
---|---|---|---|
Discovery set . | HR (95% CI) . | P . | |
Age | 1 (1.0–1.1) | 0.002 | |
Gender | Female | 1 (reference) | |
Male | 2.9 (0.9–9.6) | 0.081 | |
Stage | I & II | 1 (reference) | |
III | 1.0 (0.3–3.6) | 0.951 | |
IV | 3.1 (1.0–9.1) | 0.042 | |
Study | SG | 1 (reference) | |
HK-1 | 0.8 (0.3–2.6) | 0.763 | |
HK-2 | 0.9 (0.3–2.4) | 0.78 | |
Signature 3 | Negative | 1 (reference) | |
Positive | 2.8 (1.4–5.8) | 0.005 | |
MMR signature | Negative | 1 (reference) | |
Positive | 1.9 (0.7–5.0) | 0.181 | |
Multivariable survival analysisa | |||
Discovery set | HR (95% CI) | P | |
Age | 1.1 (1.0–1.1) | 0.001 | |
Gender | Female | 1 (reference) | |
Male | 2.1 (0.6–7.3) | 0.264 | |
Stage | I and II | 1 (reference) | |
III | 0.9 (0.3–3.2) | 0.891 | |
IV | 3.6 (1.2–6.2) | 0.026 | |
Signature 3 | Negative | 1 (reference) | |
Positive | 2.9 (1.4–6.2) | 0.006 | |
MMR signature | Negative | 1 (reference) | |
Positive | 2.8 (1.0–7.5) | 0.046 | |
Multivariable survival analysisa | |||
Discovery set | HR (95% CI) | P | |
Age | 1.1 (1.0–1.1) | 3.8 × 10−4 | |
Gender | 2.1 (0.6–7.3) | 0.261 | |
Stage | I and II | 1 (reference) | |
III | 0.8 (0.2–2.8) | 0.704 | |
IV | 3.5 (1.1–11.2) | 0.033 | |
Signature group | MMR−/Sig3− | 1 (reference) | |
MMR+/Sig3− | 5.2 (1.1–24.1) | 0.035 | |
MMR−/Sig3+ | 9.1 (1.3–61.8) | 0.023 | |
MMR+/Sig3+ | 12.4 (2.6–60.6) | 0.002 | |
Univariate survival analysis | |||
Validation set (GZ cohort) | HR (95% CI) | P | |
Stage | I–III | 1 (reference) | |
IV | 2.6 (0.6–10.3) | 0.186 | |
Signature group | Others | 1 (reference) | |
MMR+/Sig3+ | 9.5 (2.2–40.8) | 0.002 | |
Multivariable analysis | |||
HR (95% CI) | P | ||
Stage | I–III | 1 (reference) | |
IV | 2.6 (0.6–10.6) | 0.193 | |
Signature group | Others | 1 (reference) | |
MMR+/Sig3+ | 8.9 (2.1–38) | 0.003 |
. | Univariate survival analysis . | ||
---|---|---|---|
Discovery set . | HR (95% CI) . | P . | |
Age | 1 (1.0–1.1) | 0.002 | |
Gender | Female | 1 (reference) | |
Male | 2.9 (0.9–9.6) | 0.081 | |
Stage | I & II | 1 (reference) | |
III | 1.0 (0.3–3.6) | 0.951 | |
IV | 3.1 (1.0–9.1) | 0.042 | |
Study | SG | 1 (reference) | |
HK-1 | 0.8 (0.3–2.6) | 0.763 | |
HK-2 | 0.9 (0.3–2.4) | 0.78 | |
Signature 3 | Negative | 1 (reference) | |
Positive | 2.8 (1.4–5.8) | 0.005 | |
MMR signature | Negative | 1 (reference) | |
Positive | 1.9 (0.7–5.0) | 0.181 | |
Multivariable survival analysisa | |||
Discovery set | HR (95% CI) | P | |
Age | 1.1 (1.0–1.1) | 0.001 | |
Gender | Female | 1 (reference) | |
Male | 2.1 (0.6–7.3) | 0.264 | |
Stage | I and II | 1 (reference) | |
III | 0.9 (0.3–3.2) | 0.891 | |
IV | 3.6 (1.2–6.2) | 0.026 | |
Signature 3 | Negative | 1 (reference) | |
Positive | 2.9 (1.4–6.2) | 0.006 | |
MMR signature | Negative | 1 (reference) | |
Positive | 2.8 (1.0–7.5) | 0.046 | |
Multivariable survival analysisa | |||
Discovery set | HR (95% CI) | P | |
Age | 1.1 (1.0–1.1) | 3.8 × 10−4 | |
Gender | 2.1 (0.6–7.3) | 0.261 | |
Stage | I and II | 1 (reference) | |
III | 0.8 (0.2–2.8) | 0.704 | |
IV | 3.5 (1.1–11.2) | 0.033 | |
Signature group | MMR−/Sig3− | 1 (reference) | |
MMR+/Sig3− | 5.2 (1.1–24.1) | 0.035 | |
MMR−/Sig3+ | 9.1 (1.3–61.8) | 0.023 | |
MMR+/Sig3+ | 12.4 (2.6–60.6) | 0.002 | |
Univariate survival analysis | |||
Validation set (GZ cohort) | HR (95% CI) | P | |
Stage | I–III | 1 (reference) | |
IV | 2.6 (0.6–10.3) | 0.186 | |
Signature group | Others | 1 (reference) | |
MMR+/Sig3+ | 9.5 (2.2–40.8) | 0.002 | |
Multivariable analysis | |||
HR (95% CI) | P | ||
Stage | I–III | 1 (reference) | |
IV | 2.6 (0.6–10.6) | 0.193 | |
Signature group | Others | 1 (reference) | |
MMR+/Sig3+ | 8.9 (2.1–38) | 0.003 |
Note: HR was estimated from Cox proportional hazard regression model, CI was confidence interval of the estimated HR, P value was estimated from score test (two-sided). The significance of bold terms is P < 0.05.
Abbreviation: Sig3, signature 3.
aThe analysis was stratified by three different genomic studies.
. | PFS . | OS . | |||
---|---|---|---|---|---|
. | Univariate analysis . | ||||
HK-3 cohort . | HR (95% CI) . | P . | HR (95% CI) . | P . | |
Age | 1 (0.9–1.0) | 0.164 | 1. (1.0–1.1) | 0.031 | |
Gender | Female | 1 (reference) | 1 (reference) | ||
Male | 1.8 (0.9–3.5) | 0.081 | 1.9 (0.9–38) | 0.094 | |
Stage | I & II | 1 (reference) | 1 (reference) | ||
III | 2.5 (2.7–29) | 0.138 | 1.9 (0.6–6.2) | 0.32 | |
IV | 8.8 (2.7–29) | 3.3 × 10−4 | 7.9 (2.4–26) | 0.001 | |
BRCA2 germline variants | Negative | 1 (reference) | 1 (reference) | ||
Positive | 2.0 (1.1, 3.6) | 0.027 | 2.0 (1.1–3.9) | 0.034 | |
Multivariable analysis | |||||
HK-3 cohort | HR (95% CI) | P | HR (95% CI) | P | |
Age | 1 (1.0–1.1) | 0.085 | 1.0 (1.0–1.1) | 0.014 | |
Gender | Female | 1 (reference) | 1 reference | ||
Male | 1.4 (0.7, 2.8) | 0.284 | 1.4 (0.7–3.0) | 0.349 | |
Stage | I and II | 1 (reference) | 1 (reference) | ||
III | 2.3 (0.7–7.6) | 0.163 | 1.8 (0.5–6.1) | 0.347 | |
IV | 8.1 (2.4–27) | 0.001 | 7.5 (1.0–3.7) | 0.001 | |
BRCA2 germline variants | Negative | 1 (reference) | 1 (reference) | ||
Positive | 1.9 (1.0–3.4) | 0.042 | 1.9 (1.0–3.7) | 0.046 |
. | PFS . | OS . | |||
---|---|---|---|---|---|
. | Univariate analysis . | ||||
HK-3 cohort . | HR (95% CI) . | P . | HR (95% CI) . | P . | |
Age | 1 (0.9–1.0) | 0.164 | 1. (1.0–1.1) | 0.031 | |
Gender | Female | 1 (reference) | 1 (reference) | ||
Male | 1.8 (0.9–3.5) | 0.081 | 1.9 (0.9–38) | 0.094 | |
Stage | I & II | 1 (reference) | 1 (reference) | ||
III | 2.5 (2.7–29) | 0.138 | 1.9 (0.6–6.2) | 0.32 | |
IV | 8.8 (2.7–29) | 3.3 × 10−4 | 7.9 (2.4–26) | 0.001 | |
BRCA2 germline variants | Negative | 1 (reference) | 1 (reference) | ||
Positive | 2.0 (1.1, 3.6) | 0.027 | 2.0 (1.1–3.9) | 0.034 | |
Multivariable analysis | |||||
HK-3 cohort | HR (95% CI) | P | HR (95% CI) | P | |
Age | 1 (1.0–1.1) | 0.085 | 1.0 (1.0–1.1) | 0.014 | |
Gender | Female | 1 (reference) | 1 reference | ||
Male | 1.4 (0.7, 2.8) | 0.284 | 1.4 (0.7–3.0) | 0.349 | |
Stage | I and II | 1 (reference) | 1 (reference) | ||
III | 2.3 (0.7–7.6) | 0.163 | 1.8 (0.5–6.1) | 0.347 | |
IV | 8.1 (2.4–27) | 0.001 | 7.5 (1.0–3.7) | 0.001 | |
BRCA2 germline variants | Negative | 1 (reference) | 1 (reference) | ||
Positive | 1.9 (1.0–3.4) | 0.042 | 1.9 (1.0–3.7) | 0.046 |
Note: HR was estimated from the Cox proportional hazard regression model, CI of the estimated HR, P value was estimated from score test (two-sided). The significance of bold terms is P < 0.05.
Genomic data processing
For the discovery cohort, raw sequencing reads were aligned to the human genome reference (hg19) using Burrows–Wheeler aligner and were processed according to GATK Best Practices recommendations (version 3.8, RRID:SCR_001876; ref. 9). To combine the samples from different platforms, only overlapping regions (∼31 Mb) captured from the WES kits for three cohorts were considered. A total of 216 nasopharyngeal carcinoma cases were analyzed after removing eight cases that did not pass quality evaluation. Median coverage for tumor tissues for HK-1, HK-2, and SG nasopharyngeal carcinoma cohorts was 80-fold, 100-fold, and 79-fold, respectively. For the GZ cohort, the mutations were directly obtained from the previous study (7) as independent validation.
Identification of germline and somatic single-nucleotide variants and insertions and deletions
For the germline variants, to ensure the data quality, two levels of quality controls, both variant based and genotype based, were applied on called variants. At the variant level, single-nucleotide variants (SNVs) and insertions and deletions (indels) were recalibrated and grouped separately with reference to the HapMap (RRID:SCR_002846) and 1000 Genomes known variants into multiple variant quality score recalibration (VQSR) sensitivity tranche using GATK (RRID:SCR_001876). A VQSR sensitivity tranche of 99.9% was chosen for SNVs and the tranche of 97.5% was selected for indels. After variant-based quality control, transition-to-transversion ratio of the resulting exonic known variants was 2.75. We further applied genotype-based quality control, in which genotypes with low genotype quality (GQ < 20) were set to missing. Multi-allelic variants and variants with >10% missing genotypes were excluded. The relatedness of the cases was evaluated using the identity-by-descent (IBD) analysis in Plink (v1.90, RRID:SCR_001757; ref. 10). The related samples with IBD score > 0.25 were removed from the analysis. The somatic SNVs and INDELs were detected by MuTect (RRID:SCR_000559; ref. 11). The SNVs and INDELs with at least five supporting reads and 5% allele frequency in the overlapped regions were included in the analysis. The somatic variants with minor allele frequency (MAF) >1% in the 1000 Genomes project (RRID:SCR_008801), NHLBI Grand Opportunity Exome Sequencing Project (ESP6500), ExAC database (RRID:SCR_004068), and our in-house database for 895 Southern Chinese (12) were removed.
Deciphering mutational signatures
The somatic mutations were converted to the 96 possible mutated trinucleotides matrix, and the de novo mutation signatures extraction from this matrix was performed using non-negative matrix factorization algorithm in the R package, NMF (13). The cophenetic correlation coefficient was used as an indicator to evaluate the stable reproducibility. The R package, MutationalPatterns (14), was applied to deconvolute the mutational data against the 30 Catalogue of Somatic Mutations in Cancer (COSMIC, RRID:SCR_002260) signatures, where the contribution of each COSMIC signature was estimated.
Targeted sequencing to identify germline rare variants at BRCA2 in the HK-3 cohort
The genomic DNA was extracted from the peripheral blood mononuclear cell fraction from 420 newly diagnosed nasopharyngeal carcinoma cases. The blood samples from the cases were prospectively collected by the Area of Excellence of Hong Kong Nasopharyngeal Carcinoma Tissue Bank. Only the cases with follow-up times longer than 24 months were included in the analysis. The study was approved by the Institutional Review Board of the University of Hong Kong (Pokfulam, Hong Kong, UW 12-239) and conducted in accordance with Declaration of Helsinki. The informed written consent form was obtained from each subject.
The library preparation and sequencing data processing were performed as described previously (2). The germline rare variants were identified as the loss-of-function mutations, including truncations and frameshift INDELs, as well as missense variants with MAF < 0.01, in the public database and our in-house database for 895 Southern Chinese (12). A total of 402 cases with good quality of data (average coverage >30-fold) were included for the survival analysis.
Sample size and statistical power estimation for survival analysis
We performed the power estimation on the basis of the method described previously (ref. 15; Supplementary Table S2). The discovery set included the genomic data for a total of 178 primary tumors, and 105 samples had adequate mutations for estimating the mutation signatures. The approximate statistical power for this set was estimated prior to analysis. We expected to detect a hazard ratio (HR) at 2.9 for a genetic event occurring in 30% of the cases (N = 105) with the analysis power to be 80%.
The GZ validation set consisted of 113 primary tumors, and 71 samples had adequate mutations for estimating the mutation signatures. We expected to have a statistical power of more than 90% for detecting an HR at 4.5 for a genetic event occurring in 30% of the cases (N = 71). This high HR was achievable based on the result in the discovery cohort. For validation of the association between BRCA2 germline variants and clinical outcomes, we assumed that the BRCA2 germline rare variants occurred in 10% of the cases, and 417 cases were required for detecting an HR at 2.6 with the analysis power to be 90%.
Survival analysis
The survival analysis was performed using IBM SPSS (v25, RRID:SCR_002865). The association between overall survival (OS) and clinical parameters and selected mutation signature was examined by the univariate Cox model. The primary endpoint was OS and the secondary endpoint was disease progression-free survival (PFS), if available. PFS was defined as the time from diagnosis to progressive disease or early death due to nasopharyngeal carcinoma or other causes in the HK-1 and HK-3 cohorts. Only the patients with adequate mutations (>30 in the coding regions and one mutation/Mb) in the protein-coding regions for deconvolution of the mutational signature were considered in the analysis. The assumption of the proportional hazard in the Cox model was examined using the R package, survival. The mutation signatures and all the clinical parameters, including overall stage and sex, were used as the categorical variables, except age at diagnosis, which was used as a continuous variable. In the multivariable analysis for the discovery set, in addition to the clinical parameters, different study cohorts across three genomic studies were used as a stratification factor in the multivariable Cox analysis. The significance level was set at P < 0.05.
Detection of copy-number alterations and consensus clustering
The somatic copy-number alterations (SCNAs) were detected by Aberration Detection in Tumor Exome (ADTEx) using the matched normal–tumor pair, as we described previously (16). This method was tailored for WES data to infer the SCNAs (17). Of the overlapped regions captured by three studies, we randomly selected 2,000 regions to perform the unsupervised hierarchical clustering 1,000 times. We considered the samples stably clustered together in 60% of the random sampling times as one group; otherwise the samples were considered as the unclassified ones. Three distinct clusters were identified as SCNA-H-gain, SCNA-M-gain, and SCNA-L-gain. To evaluate the stability of the clusters, we permutated the data for both sample and region labels and tested the probability of the samples clustering together by chance through the same unsupervised hierarchical clustering procedure. From 100 times permutation, the estimated probability for three groups clustering randomly was P < 0.01, P < 0.01, and P = 0.05 for SCNA-H-gain, SCNA-M-gain, and SCNA-L-gain clusters, respectively. The median copy numbers for the overlapping regions were calculated for three clusters and illustrated on the chromosome ideograms by Phenogram (18).
Methylation data analysis
To evaluate the difference of host and EBV methylation between the signature-positive and -negative groups, we compared the methylome data between two groups using Illumina HumanMethylation450 BeadChip for host genome by LIMMA analysis (19) and bisulfite sequencing for EBV genome by Mann–Whitney U test. The host methylome data were obtained from our previous study in nasopharyngeal carcinoma [Gene Expression Omnibus (GEO), RRID:SCR_005012, accession No. GSE62336; ref. 20]. Of the 25 cases available from the previous study, 24 had matched WES data in the HK-1 cohort.
To examine the promoter methylation at BRCA1, BRCA2, and other selected DNA damage and repair relevant genes, we combined our methylome data for 25 nasopharyngeal carcinoma cases (GEO accession No. GSE6233; ref. 20) together with another publicly available methylome data for 24 nasopharyngeal carcinoma cases (GEO accession No. GSE52068; ref. 21). All the data were generated using Illumina HumanMethylation450 BeadChip. The average methylation level for the multiple CpG sites at the promoter CpG islands was calculated using the normalized β value: β = M/(M + U + 100), where M and U are the signals of the methylated and unmethylated probes, respectively. β values ranged from 0 (unmethylated) to 1 (fully methylated). The average promoter methylation level of the selected genes was estimated for comparison between groups.
Results
Mutation signature relevant to homologous recombination deficiency (BRCAness) identified in nasopharyngeal carcinoma
Low heterogeneity for clinical characteristics was observed in three primary nasopharyngeal carcinoma cohorts in the discovery set with Higgin and Thompson I2 < 5% (Table 1). To identify the mutational signatures, we applied NMF (13). A sharp decrease of cophenetic correlation coefficient was observed at r = 5, indicating substantially less stability was achieved using more than four clusters; the mutation data can be grouped as four robust signatures (Supplementary Fig. S2). Besides the known age-related signature, MMR, and APOBEC signatures (i.e., signatures 1, 6, and 13), reported previously in nasopharyngeal carcinoma (2, 5), an additional signature was discovered (Fig. 1A and B). It corresponds to COSMIC signature 3, associated with failure of DNA double-strand break repair by homologous recombination (6). To rule out the possibility that this signature was random, we deconvoluted the mutational data against the COSMIC signatures; results showed that signature 3 was indeed present in a subset of nasopharyngeal carcinoma cases (Supplementary Fig. S3). Signature 3 was detectable in 29.6% of the total cases (Fig. 1C). No obvious difference of age, gender, and overall stage were found between the signature-positive and -negative groups in primary nasopharyngeal carcinoma (Supplementary Table S3). The MMR signature was found in 75.4% of the cases. Both signatures were detectable in 22.7% of the total cases (N = 216).
Signature 3 and MMR signature are independent prognostic factors in nasopharyngeal carcinoma
Signature 3 was detectable in 31.6% of recurrent or metastatic tumors from two Hong Kong (HK) patient cohorts. This frequency was slightly higher than the primary tumors from the same cohorts, with marginal significance (19.5% vs. 31.6%; χ2 test P = 0.058).
In the discovery cohort, we explored the association between the mutation signatures in the primary tumors and OS. The cases with detectable signature 3 had shorter OS than those without this signature [HR, 2.8; 95% confidence interval (CI), 1.4–5.8; P = 0.005; Table 2; Fig. 2A]. The same trend was observed in three different study cohorts (Fig. 2B–D). Among the clinical parameters, age at diagnosis and overall stage were significantly associated with survival. Although the association of MMR signature with survival was not significant in the univariate analysis, after adjustment for age, stage, and gender and stratification by studies in the multivariable Cox analysis, independent prognostic value of signature 3 and MMR signatures was discovered (Table 2). Therefore, we categorized the cases on the basis of the signature status. Group 1 comprised the cases without both signatures. Group 2 was for the MMR signature–positive and signature 3–negative cases. Group 3 included MMR signature–negative and signature 3–positive cases, while Group 4 included the cases carrying both signatures. Clinical outcomes of the four groups differed statistically and the cases with both signatures had strikingly higher risk of death compared with those without the signatures (adjusted HR, 12.4; 95% CI, 2.6–60.6; P = 0.002; Fig. 2E).
We further examined the association between these mutational signatures and PFS in our study cohort (HK-1). The result suggests that the cases with both signature 3 and MMR signature had shorter PFS (log-rank test, P = 0.011; Supplementary Fig. S4). This association was further validated in the GZ cohort and its prognostic value was independent from overall stage (adjusted HR, 8.9; 95% CI, 2.1–38; P = 0.003; Table 2; Fig. 2F).
Signature 3 is highly associated with BRCA2 pathogenic alterations
The signature 3 is often present due to pathogenic germline and somatic mutations of BRCA1 and BRCA2 reported in breast, ovarian, and pancreatic cancers (22–24). We examined the germline and somatic alterations of BRCA1 and BRCA2 in this combined nasopharyngeal carcinoma cohort. No germline pathogenic mutation was detected at BRCA1. Interestingly, the cases carrying signature 3 were more likely to have the pathogenic germline or somatic alterations at BRCA2, while no BRCA2 pathogenic alteration was detected in the cases without this signature (7.8% vs. 0%; Fisher exact test, P = 0.002; Supplementary Table S4). Although we and others reported the nasopharyngeal carcinoma hypermethylation phenotype (20, 25), examination of promoter methylation at BRCA1 and BRCA2 in 49 nasopharyngeal carcinoma cases assayed by Illumina HumanMethylation450 BeadChip showed that it is unlikely that these two genes are inactivated by promoter methylation in nasopharyngeal carcinoma (N = 49; Supplementary Fig. S5) and no methylation difference was found at these two genes between the signature-positive and -negative cases (Supplementary Fig. S6).
Polak and colleagues reported that promoter methylation at RAD51C is relevant to this signature (22). To explore genetic lesions in the other potential genes, we examined the WES and methylome data for the germline and somatic alterations leading to inactivation of other important genes in DNA damage pathway, including RAD51C, RAD51, RAD50, PALB2, BARD1, BRIP1, NBN, ATM, ATR, CHEK2, FANCA, and FANCM (22). Overall, no evidence showed that promoter methylation or mutations of any of these genes were relevant to this signature in nasopharyngeal carcinoma (Supplementary Table S5; Supplementary Figs. S5 and S6).
BRCA2 germline rare variants are associated with OS and PFS in the HK-3 cohort
Our integrative analysis result emphasizes the importance of BRCA2 genetic alterations in association with the BRCAness signature. We further examined its germline variants by targeted sequencing approach. We identified 27 BRCA2 germline rare variants in 57 cases, accounting for 14.1% of the total cases in the HK-3 nasopharyngeal carcinoma cohort (Supplementary Table S6). The germline rare variants had no association with age and stage, while the female cases were more likely to carry these germline rare variants than the male cases (20.4% vs. 11.9%; P = 0.043; Supplementary Table S7). The univariate analysis showed that the BRCA2 germline rare variants were significantly correlated with both OS (HR, 2.0; 95% CI, 1.1–3.9; P = 0.034) and PFS (HR, 2.0; 95% CI, 1.1–3.6; P = 0.027). This prognostic value remained after adjustment for age, gender, and overall stage (P < 0.05; Table 3).
Signature 3 and MMR signature associate with genomic instability
SCNAs are frequently reported in nasopharyngeal carcinoma (26, 27). Given the homologous recombination pathway plays an important role in governing genomic integrity, we hypothesized that the cases with the BRCAness signature have elevated levels of SCNAs. In the combined cohort, three clusters with distinct SCNA patterns were uncovered (Fig. 3A). Cluster 1 (SCNA-L-gain), accounting for 67.8% of cases, was the genomically stable group and it had consistent copy-number loss in chromosomes 3, 14, and 16, which are typical SCNAs in nasopharyngeal carcinoma. Both cluster 2 (SCNA-M-gain, 9.2%) and cluster 3 (SCNA-H-gain, 23%) had relatively higher genomic instability (Fig. 3B). Cluster 2 was characterized as having a moderate level of copy-number gain involved in the selected chromosomes 17, 19, and 22 and associated with CCND1 amplifications (Fig. 3C). Strikingly, cluster 3 (SCNA-H-gain) had extensive copy-number gain across the genome with regional loss near telomeres in several chromosomes. For example, copy-number gain in the large regions involving both short and long arms on chromosome 12 was found in this cluster (Fig. 3B). Significant elevation of signature 3 activity was found in cluster 3 (SCNA-H-gain; Fig. 3C), while higher MMR signature activity was found in cluster 2 (SCNA-M-gain). This result supports our hypothesis about high chromosome instability relevant to BRCAness signature. Previously, we and others have reported that inactivation of multiple negative regulators in the NF-κB pathway were the driver events in nasopharyngeal carcinoma pathogenesis (2, 5). The cases in the cluster of SCNA-H-gain were mutually exclusive with the cases with somatic-inactivated alterations in the NF-κB pathway (P = 0.018; Fig. 3C), suggesting high genomic instability as an independent mechanism underlying etiology in a subset of nasopharyngeal carcinoma cases.
Distinct host and EBV methylation in nasopharyngeal carcinoma with signature 3
The host methylation profiles were examined between the signature-positive and -negative nasopharyngeal carcinoma by Illumina HumanMethylation450 BeadChip. Differential methylation analysis showed that this signature was associated with a unique methylation pattern compared with the signature-negative cases (Padj < 0.05; Supplementary Table S8; Supplementary Fig. S7A).
The EBV methylation profiles were investigated between the two groups by bisulfite sequencing. In general, the EBV genome was highly methylated with several unmethylated regions corresponding to QriP, Qp, and promoters of RPMS1 and LMP-2A, which are important for regulating expression of EBV latent genes (28). Surprisingly, the signature-positive cases had characteristic hypomethylated regions at the LMP-1 promoters and LMP-2A gene body (Supplementary Fig. S7B–S7C). RNA sequencing (RNA-seq) data suggest there was no statistically significant difference for LMP-1 expression between the two groups. Previously, LMP-1 staining was performed in the HK-2 cohort; among 24 cases with positive LMP-1 protein expression, there were only six signature 3 cases, confirming no association between LMP-1 and this signature. However, a significantly reduced ratio of LMP-1-to-LMP-2A expression was detected in the signature-positive cases (P = 0.017; Supplementary Fig. S7D).
Discussion
In the retrospective analysis for the discovery stage, WES data revealed an etiologically distinct subset of nasopharyngeal carcinoma with a new mutational signature relevant to homologous recombination deficiency. The accuracy for calling the somatic mutations and germline variants was systematically evaluated by us previously (2, 12). Although the BRCAness signature is frequently reported in breast, ovarian, and pancreatic cancers (6, 29, 30), it was underappreciated in nasopharyngeal carcinoma before. Because it is only present in a subset of the nasopharyngeal carcinoma cases, previous genomic analyses may have missed this signature because of sample size issues. To positively confirm whether or not this new signature is present in nasopharyngeal carcinoma, we used two validation strategies: (i) deconvolution of the mutations against the known COSMIC mutation database to evaluate the correlation between the de novo extracted and known signatures and (ii) analysis of additional genomic features, including germline variants, copy-number alterations, and methylations, to determine the presence of this signature. Our result shows that this signature is highly associated with BRCA2 germline and somatic alterations and increased signature activity is associated with high genomic instability and distinct methylation patterns, further supporting that this signature is indeed present in a subset of nasopharyngeal carcinoma cases and has important functional impact on the molecular profiles during nasopharyngeal carcinoma pathogenesis and disease progression.
DNA damage repair plays an essential role to maintain the genomic stability and is a key factor determining the cancer risk, disease progression, and therapeutic response (31). Previously, we have reported that significant nasopharyngeal carcinoma risks associated with multiple SNPs involved in homologous recombination repair and nonhomologous end joining repair (32). Here, we further demonstrate that the somatic and germline genetic alterations relevant to homologous recombination deficiency have significant clinical impact on patients' outcomes in nasopharyngeal carcinoma. In breast and ovarian cancers, the patients with homologous recombination deficiency are generally sensitive to ionizing radiation, as well as the cross-linking agents, such as cisplatin and carboplatin, which can induce double-strand breaks (33, 34), and, thus, they are associated with a good clinical outcome. In contrast, in nasopharyngeal carcinoma, the BRCAness signature is associated with worse clinical outcome and this trend was consistently found in four independent nasopharyngeal carcinoma study cohorts. Furthermore, in a prospective cohort, the patients carrying the germline BRCA2 rare variants also had poor OS and PFS. Recently, a similar trend of the homologous recombination deficiency in correlation with worse clinical outcome has been reported in esophageal and prostate cancers by a genomic and molecular landscape study (35). One explanation out of many possible reasons for this observation is that, accumulated genetic and epigenetic alterations, caused by dysfunctional DNA repair pathways, may contribute to the aggressive phenotype and resistance to the conventional treatment in nasopharyngeal carcinoma.
In Hong Kong, almost all patients with nasopharyngeal carcinoma are EBV positive (36). Recently, we reported that EBV infection modifies bivalent histone marks in the genes involved in multiple DNA repair pathways including base excision repair, homologous recombination, and MMR, leading to their transcription suppression and higher DNA damage in EBV-positive cells (37). In this study, the risks of death and disease progression for the cases carrying both BRCAness and MMR signatures were dramatically increased compared with those without these signatures. We speculate that the cumulative and suppressive effect in multiple genes involved in homologous recombination repair and MMR by EBV infection partially contributes to the relevant mutational signatures and their association with poor survival in nasopharyngeal carcinoma. Interestingly, hypomethylation of selected EBV regions and imbalanced expression of LMP-1 and LMP-2A were observed in the cases carrying the BRCAness signature. These results suggest the potential link between specific EBV latent gene expression and dysfunctional DNA repair pathway through homologous recombination in a subset of patients with nasopharyngeal carcinoma. In addition, Zhu and colleagues reported that the EBV miRNAs, miR-15a and miR-16, target BRCA1, another key player in the homologous recombination repair pathway, and suppress its expression in nasopharyngeal carcinoma (38). This study suggests EBV miRNAs may also have a role to tune down this pathway. All these results raise the possibility that dysregulation of multiple epigenetic mechanisms by EBV may work collectively to suppress the relevant genes and impair the function of this pathway, contributing to the BRCAness signature. Systematic investigation of the epigenetic changes, including not only methylation, but also histone modifications, miRNAs, as well as EBV–host interaction, for regulating the expression of the genes involved in the homologous recombination in a larger cohort, will be helpful to elucidate the detailed mechanisms underlying this phenomenon.
In conclusion, we utilized the genomic data together with methylome data and RNA-seq data to characterize the molecular profiles in nasopharyngeal carcinoma. Our study highlights the importance of DNA repair pathways, involved in homologous recombination repair and MMR, in nasopharyngeal carcinoma molecular pathogenesis and their prognostic value for clinical implication. Identification of the mutational signatures relevant to DNA repair pathways will be helpful for patient stratification and will provide the evidence for evaluating the conventional treatment and new therapy targeting these pathways in the selected patients with nasopharyngeal carcinoma.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
W. Dai: Conceptualization, formal analysis, funding acquisition, validation, investigation, methodology, writing-original draft, writing-review and editing. D.L.-S. Chung: Investigation, writing-original draft. L.K.-Y. Chow: Formal analysis, investigation, writing-original draft. V.Z. Yu: Validation, investigation, methodology. L.C. Lei: Validation, investigation, writing-original draft. M.M.-L. Leong: Investigation. C.K.-C. Chan: Resources, data curation. J.M.-Y. Ko: Investigation, methodology. M.L. Lung: Conceptualization, resources, supervision, funding acquisition, methodology, writing-original draft, project administration, writing-review and editing.
Acknowledgments
We acknowledge the authors who contributed to the WES data from previous genomic studies. The public data were obtained from the SRA database (accession No. SRP035573 and No. SRA288429) and the dbGAP-NHGRI database (accession No. phs001244.v1.p1). This study was funded by the Hong Kong Research Grants Council grant (AoE/M-06/08 to M.L. Lung), General Research Fund (17103218) from Hong Kong Research Grant Council, and seed fund for basic research (201611159158) from the University of Hong Kong (to W. Dai).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.