Abstract
Lung cancer is the leading cause of cancer mortality in the United States (U.S.). Squamous cell carcinoma (SQCC) represents 22.6% of all lung cancers nationally, and 26.4% in Appalachian Kentucky (AppKY), where death from lung cancer is exceptionally high. The Cancer Genome Atlas (TCGA) characterized genetic alterations in lung SQCC, but this cohort did not focus on AppKY residents.
Whole-exome sequencing was performed on tumor and normal DNA samples from 51 lung SQCC subjects from AppKY. Somatic genomic alterations were compared between the AppKY and TCGA SQCC cohorts.
From this AppKY cohort, we identified an average of 237 nonsilent mutations per patient and, in comparison with TCGA, we found that PCMTD1 (18%) and IDH1 (12%) were more commonly altered in AppKY versus TCGA. Using IDH1 as a starting point, we identified a mutually exclusive mutational pattern (IDH1, KDM6A, KDM4E, JMJD1C) involving functionally related genes. We also found actionable mutations (10%) and/or intermediate or high-tumor mutation burden (65%), indicating potential therapeutic targets in 65% of subjects.
This study has identified an increased percentage of IDH1 and PCMTD1 mutations in SQCC arising in the AppKY residents versus TCGA, with population-specific implications for the personalized treatment of this disease.
Our study is the first report to characterize genomic alterations in lung SQCC from AppKY. These findings suggest population differences in the genetics of lung SQCC between AppKY and U.S. populations, highlighting the importance of the relevant population when developing personalized treatment approaches for this disease.
Introduction
Lung cancer is the leading cause of cancer death worldwide, with 160,000 deaths in the United States annually (1). The state of Kentucky ranks highest in lung cancer incidence and mortality, and the Appalachian region of Kentucky (AppKY) is the major driver of this extraordinary healthcare burden (ref. 2; lung cancer characteristics in Kentucky, Supplementary Fig. S1). We undertook this study of lung squamous cell carcinomas (SQCC) from residents of AppKY to provide the first genomic characterization of lung cancers from this region and to test the hypothesis that genetic mutations in AppKY SQCC are distinct from the general population and may help explain the region's extremely high cancer incidence. Essential to this effort was the full sharing of the comprehensive genomic profile of lung SQCC in The Cancer Genome Atlas (TCGA; ref. 3), which provided the comparison of the initial 178 subjects from a U.S. genomic profile that does not focus on Central Appalachians (the distribution of TCGA tissue source sites is provided in Supplementary Table S1).
Previous studies reported distinct genetic abnormalities in lung SQCCs: TP53, NFE2L2, CDKN2A, PTEN, and ALK1 (3–7), and copy-number alterations in SOX2, PDGFRA, FGFR1, WHSC1L1, and CDKN2A (3, 8–10), which are targets of therapeutic interventions. We present the results from whole-exome sequencing (WES) and analysis of 51 patients with SQCC from AppKY, which includes an overview of somatic alterations and copy-number variations, explores unique mutational patterns and provides a clinically actionable assessment of mutations in this population.
Materials and Methods
Patient cohort
This study was approved by the Institutional Review Board (IRB) of the University of Kentucky (Lexington, KY, protocol # 14-0071-P3H). Cancer, adjacent normal tissue, and associated clinical data were obtained from 51 AppKY patients (2002–2013) with previously untreated stage I–III lung SQCC, excluding mixed histology. All specimens were collected from single surgical procedures prior to any cancer therapy and fresh frozen. All surgical samples were reviewed by a board-certified pathologist and confirmed to be SQCC by the Markey Cancer Center's (MCC) Biospecimen Procurement and Translational Pathology (BPTP) Shared Resource Facility (SRF). The study was carried out in accordance with the Declaration of Helsinki and all subjects provided written informed consent in accordance with local IRB requirements.
Only residents of AppKY, as defined by the Appalachian Regional Commission's authorizing legislation, were included in the study. All subjects were confirmed to be residents of AppKY by the Kentucky Cancer Registry (KCR) using multiple confirmatory identifiers, including address, zip code, county, phone number, and were followed long term by the SEER-KCR. MCC's Cancer Research Informatics SRF provided all demographic and survival data for this project in a deidentified manner and served as the honest broker. The cohort included 36 males, 15 females, median age of 65 years (range, 43–82 years), and 30 stage I, 12 stage II and 5 stage III patients. The median follow-up was 40 months and 35% of patients were alive at the time of submission. Reflective of AppKY, 98% of patients were of Caucasian race, and 76% were current smokers. Supplementary Table S2 lists summary demographic and clinical characteristics, compared with the TCGA SQCC cohort.
Sample preparation and WES
High molecular weight genomic DNA was extracted from matched tumor and adjacent normal tissues in all 51 cases using the DNAeasy Blood & Tissue Kit or the MagAttract HMW DNA Kit (both from Qiagen). Tumor contents of samples are provided in Supplementary Table S3. DNA was quantified using the Qubit dsDNA HS Assay Kit (Invitrogen) and quality was assessed using E-gel, 0.8% (Invitrogen). Exome capture and sequencing were performed using Illumina Nextera Rapid Capture Exome v1.2 targeting 212,158 exonic regions. Paired-end sequencing (2 × 110 bp) was carried out using HiSeq2500 sequencing platform at the University of Illinois at Urbana-Champaign (Champaign, IL).
Somatic mutation analysis
Sequencing reads were trimmed and filtered using Cutadapt (v1.4.1) (11), then aligned to human reference genome b37/hg19 using BWA (v.0.7.9a) (12). PCR duplicates were removed using Picard (http://broadinstitute.github.io/picard/, v1.115). The Genome Analysis Toolkit (GATK v3.1-1) (13) was used for local indels realignment and base quality recalibration. Somatic point mutations and indels were detected using MuTect (v1.1.4) (14) and SomaticIndelDetector (GATK v2.3-9), respectively, with default settings. Mutations were annotated using Oncotator (v1.4.1.0) (15). Significantly mutated genes were identified using MutSigCV (v1.4) (16). Gene alteration rates between the AppKY cohort and the TCGA were compared using Fisher exact tests with Benjamini–Hochberg procedure to calculate the FDR. For genes that showed a significant difference in alteration rates between the two cohorts, the exact logistic regression was used to further evaluate the difference between the two cohorts after adjusting for clinical and demographic variables including age, gender, stage, and smoking. The analyses were performed using R (v3.3.1) and SAS (v9.3).
Somatic copy-number alterations (SCNA) analysis was conducted using ExomeCNV (17), an R statistical package. Exonic CNAs were inferred on the basis of the depth-of-coverage ratio between matched tumor and normal samples. Then, CNA calls were combined into larger segments using circular binary segmentation in DNAcopy (18). Gistic2.0 (19) with a confidence level of 0.95 was used on the copy ratio profiles to identify significantly amplified/deleted regions.
To evaluate the clinical relevance of the somatic genomic alterations identified in our cohort, we downloaded the OncoKB database (20; accessed in December 2017) to identify FDA-approved drugs for the FDA-recognized and standard care biomarkers.
The mutual exclusivity of IDH1-related gene sets was examined by MEGSA (21), which implements a likelihood ratio test. The P value was calculated on the basis of a mixture distribution with 0.5 probability at point mass zero and 0.5 probability as a χ2 distribution of 1 degree of freedom, as described in the MEGSA paper (21).
Identification of IDH1 homologs
Human IDH1 protein sequence (NP_005887.2) was queried using BLASTP against the RefSeq database. The top 500 hits, including representatives from all major eukaryotic supergroups (except Rhizaria), were aligned using ClustalX (22) with default parameters. The neighbor joining tree was built from this alignment. From the tree, a clade containing IDH1 and IDH2 homologs was identified and sequences from representative genomes of distantly related organisms were retrieved and realigned. The maximum likelihood phylogenetic tree was built from this alignment using the MEGA6 package (23) with default parameters. IDH1 and IDH2 orthologs were identified in individual clades that had the same topology as the eukaryotic ribosomal tree (24) and checked for consistency with RefSeq annotations.
Mutagenesis
pCSC-SP-PW-GFP (aka:pBOB-GFP) was a gift from Inder Verma (Addgene plasmid #12337). pCSC-Sp-pw-IDH1-GFP and pCSC-Sp-pw-IDH1 R132H-GFP were a gift from Hai Yan, which encode for wild-type human IDH1 (GenBank:CAG38738.1) and mutant IDH1 R132H (encoding for human IDH1 protein that has mutant aa132 from Arg to His), respectively. Both plasmids were digested with EcoRI and NOT1 and ligated into pcDNA3.1 to make pcDNA3.1-IDH1-GFP. Then, pcDNA3.1-IDH1-GFP used for mutagenesis to make pcDNA3.1-IDH1 R132H-GFP, pcDNA3.1-IDH1 V178A-GFP, pcDNA3.1-IDH1 A307S-GFP, and pcDNA3.1-IDH1 L352P-GFP (shortened as WT, R132H, V178A, A307S, and L352P hereafter). Primers used for mutagenesis are listed in Supplementary Table S2. Mutagenesis was performed with Fushion E in a 50 μL reaction volume. PCR consisted of 19 cycles as follows: 95°C for 50 seconds, 60°C for 50 seconds, and 68°C for 3 minutes. The PCR product was digested with DpnI at 37°C for 1 hour and transformed into DH5a-competent cells. Colonies were selected for sequencing to verify the mutations.
Isocitrate dehydrogenase activity assays
HEK293T (ATCC) cells (see Supplementary Table S3 for the authentication) were transfected with the above plasmids using Calcium Phosphate transfection. Each well of cells from the 6-well plate were transfected with 1 μg of plasmids. Cells were then suspended in 0.02% Triton-X100 PBS, homogenized 48 hours posttransfection, and sonicated with three 20-second pulses. Ten microgram cell lysate (in 5 μL of 0.02% Triton-X100 PBS) was added to the reaction mix (including 33 mmol/L Tris-Cl pH7.5, 2 mmol/L MnCl2, 107 μmol/L NADP+) in total volume of 180 μL at room temperature. Twenty microliters of 800 μmol/L isocitrate was added to start the reaction. Absorbance at 340 nm was monitored for 1 hour at 10-minute intervals. NADPH production was calculated as follows: μmole of NADPH produced/mL sample/min = DA340 × 3 × 1000/6220. The enzyme activity assay was based on previous reports (25).
IHC staining to confirm IDH1 R132H mutation
Tissue samples were stained with anti-IDH1 R132H antibody (Dianova DIA-H09, 1:20) using Target Retrieval Solution (Dako) according to the manufacturer's instructions. Known IDH1 R132H–positive controls and negative mouse IgG controls were run simultaneously. A detailed protocol is provided in Supplementary Table S4.
Pathway construction via systems biology analysis
Gene families that are grouped on the basis of sequence or function were downloaded from the HGNC database (26) and gene families with similar or complementary functions to significantly mutated genes in our AppKY cohort were extracted. Likewise, protein–protein interactions involving the gene products of significantly mutated genes and their related gene families were extracted from protein interaction databases (27, 28) to find interaction partners. Both sets of genes were used to identify and expand mutually exclusive mutational patterns in the AppKY mutational dataset. The resulting mutually exclusive patterns were then used to construct pathways around IDH1 based on known interactions, guided by the principle that similar functions may be modified for cancer-related processes.
Results
Somatic alterations identified from the AppKY cohort
Overview of somatic alterations.
The mean coverage of WES across the targeted regions was 104 × with 92% of targeted bases being covered at ≥30 × (Supplementary Table S4). Raw sequencing data are available at dbGaP (Accession: phs001651.v1.p1). We identified 16,005 somatic single-nucleotide variants and 217 somatic insertions or deletions (indels) across 51 matched tumor and normal pairs in the protein coding regions (Supplementary Table S5). Of those mutations, 12,117 were predicted to be nonsilent mutations resulting in an amino acid change. The mean mutation rate in our cohort was 237 nonsilent mutations per patient, corresponding to 8.5 mutations per megabases (Mb). Among nonsilent mutations, transitions and transversions at CpG sites were the most commonly observed mutation types, with rates of 11.5 per Mb and 15.5 per Mb, respectively. For non-CpG sites, transitions were more frequently observed at C:G sites (3.2 per Mb) than at A:T sites (1.8 per Mb). Similarly, transversions were more frequently observed at C:G sites (8.0 per Mb) than at A:T sites (2.0 per Mb). Mutation rate for each sample is provided in Supplementary Table S6.
Significantly mutated genes.
We identified 3 genes that were significantly mutated (i.e., nonsilent mutation rates higher than background mutation rates) in the AppKY cohort with an FDR <0.2 using MutSigCV (16): TP53, PCMTD1, and IDH1. To increase the statistical power of our analysis, we followed the approach of the TCGA SQCC report (3) and performed a secondary MutSigCV (16) analysis to only consider genes causally implicated in cancer according to the COSMIC database (29). This approach enabled us to identify 11 additional genes that were significantly mutated with an FDR <0.2: PIK3CA, RNF43, MLLT10, STK11, NFE2L2, DEK, POT1, ATP2B3, HRAS, HOXA11, and HOXA13 (Fig. 1). The description of each gene symbol used in this study is provided in Supplementary Table S7.
Significantly mutated genes in lung SQCC. Significantly mutated genes (FDR <0.2) from WES of 51 samples from AppKY patients. The number and percentage of samples with mutations in each gene are shown on the left. Samples are displayed as columns, with the overall number of mutations, smoking status, and tumor stage plotted at the top.
Significantly mutated genes in lung SQCC. Significantly mutated genes (FDR <0.2) from WES of 51 samples from AppKY patients. The number and percentage of samples with mutations in each gene are shown on the left. Samples are displayed as columns, with the overall number of mutations, smoking status, and tumor stage plotted at the top.
Copy number variation analysis.
SCNAs were analyzed using WES data. We identified regions with significant SCNAs using Gistic2.0 (19). There were 18 peaks of significant amplification and 34 peaks of significant deletions (FDR<0.25). Significantly amplified regions were 3q27 (MCF2L2), 8p11 (FGFR1, TACC1, WHSC1L1, LETM2, RNF5P1), 11q13 (CCND1-oncogene), 7q21.2 (CDK6), 19q13, 13q34, 5p15, 8q24 (MYC-oncogene), and deleted regions were 9p21 (CDKN2A-tumor suppressor, CDKN2B), 8p23, 10q23 (PTEN, CFL1P1, KLLN), 17p13, 4q28.2 (VEGFC), 22q13.2 (CHEK2). Consistent amplification patterns were seen in certain related sets of genes such as stem cell renewal genes. Detailed SCNA results are provided in Supplementary Tables S8–S11 and Supplementary Figs. S2 and S3.
Comparative mutational analysis with other cohorts.
We first compared somatic mutations and SCNAs of AppKY lung SQCC to TCGA cohort (3, 30–32). We focused our comparison on significantly mutated genes in at least one cohort by the MutSigCV (16) analysis. Our comparative analysis presented here (Table 1) included somatic mutations (point mutations and indels) only in the calculation of gene alteration rate. The comparison including both somatic mutations and SCNAs is provided in Supplementary Table S12 with similar conclusions. Both cohorts showed similar rates of alterations for TP53 (68.6% AppKY, 80.9% TCGA, FDR q-value = 1.000), PIK3CA (11.8% AppKY, 15.7% TCGA, FDR q-value = 1.000), NOTCH1 (11.8% AppKY, 8.4% TCGA, FDR q-value = 1.000), and PTEN (5.9% AppKY, 7.9% TCGA, FDR q-value = 1.000).
Somatic alteration rate comparison between AppKY and TCGA of Lung SQCC
Hugo symbola . | AppKY (%) . | TCGA (%) . | Pb . | q-valuec . |
---|---|---|---|---|
IDH1![]() | 11.80% | 1.10% | 0.002 | 0.039 |
PCMTD1![]() | 17.60% | 3.90% | 0.002 | 0.045 |
DEK | 5.90% | 0.00% | 0.011 | 0.200 |
NFE2L2![]() | 3.90% | 15.20% | 0.032 | 0.584 |
CDKN2A![]() | 3.90% | 14.60% | 0.050 | 0.830 |
HOXA11 | 3.90% | 0.00% | 0.049 | 0.830 |
TP53![]() | 68.60% | 80.90% | 0.082 | 1.000 |
PTEN![]() | 5.90% | 7.90% | 0.770 | 1.000 |
PIK3CA![]() | 11.80% | 15.70% | 0.655 | 1.000 |
KEAP1![]() | 9.80% | 12.40% | 0.806 | 1.000 |
KMT2D![]() | 9.80% | 19.70% | 0.142 | 1.000 |
HLA-A![]() | 7.80% | 3.40% | 0.236 | 1.000 |
NOTCH1![]() | 11.80% | 8.40% | 0.424 | 1.000 |
RB1![]() | 2.00% | 6.70% | 0.307 | 1.000 |
RNF43 | 5.90% | 1.70% | 0.126 | 1.000 |
MLLT10 | 7.80% | 3.90% | 0.269 | 1.000 |
STK11 | 3.90% | 1.70% | 0.309 | 1.000 |
POT1 | 5.90% | 2.20% | 0.186 | 1.000 |
ATP2B3 | 3.90% | 2.20% | 0.617 | 1.000 |
HRAS | 5.90% | 2.80% | 0.381 | 1.000 |
HOXA13 | 3.90% | 0.60% | 0.125 | 1.000 |
Hugo symbola . | AppKY (%) . | TCGA (%) . | Pb . | q-valuec . |
---|---|---|---|---|
IDH1![]() | 11.80% | 1.10% | 0.002 | 0.039 |
PCMTD1![]() | 17.60% | 3.90% | 0.002 | 0.045 |
DEK | 5.90% | 0.00% | 0.011 | 0.200 |
NFE2L2![]() | 3.90% | 15.20% | 0.032 | 0.584 |
CDKN2A![]() | 3.90% | 14.60% | 0.050 | 0.830 |
HOXA11 | 3.90% | 0.00% | 0.049 | 0.830 |
TP53![]() | 68.60% | 80.90% | 0.082 | 1.000 |
PTEN![]() | 5.90% | 7.90% | 0.770 | 1.000 |
PIK3CA![]() | 11.80% | 15.70% | 0.655 | 1.000 |
KEAP1![]() | 9.80% | 12.40% | 0.806 | 1.000 |
KMT2D![]() | 9.80% | 19.70% | 0.142 | 1.000 |
HLA-A![]() | 7.80% | 3.40% | 0.236 | 1.000 |
NOTCH1![]() | 11.80% | 8.40% | 0.424 | 1.000 |
RB1![]() | 2.00% | 6.70% | 0.307 | 1.000 |
RNF43 | 5.90% | 1.70% | 0.126 | 1.000 |
MLLT10 | 7.80% | 3.90% | 0.269 | 1.000 |
STK11 | 3.90% | 1.70% | 0.309 | 1.000 |
POT1 | 5.90% | 2.20% | 0.186 | 1.000 |
ATP2B3 | 3.90% | 2.20% | 0.617 | 1.000 |
HRAS | 5.90% | 2.80% | 0.381 | 1.000 |
HOXA13 | 3.90% | 0.60% | 0.125 | 1.000 |
NOTE: The comparison focuses on genes that were identified as significantly mutated based on the MutSigCV analysis in at least one of the two cohorts.
a: significantly mutated in AppKY only;
: significantly mutated in TCGA only;
: significantly mutated in both cohorts.
bThe P value was based on the Fisher exact test to compare percentages of samples that had somatic alterations (somatic mutations or SCNAs) in the two cohorts.
cThe q-value was based on the Benjamini–Hochberg procedure. Genes with significant differences (FDR<0.2) in the alteration rate are shown in bold.
Significant differences in mutation rates between the AppKY and TCGA cohorts were observed. The IDH1 mutations were observed in 11.8% of patients in the AppKY cohort. In contrast, only 1.1% of patients in the TCGA cohort had IDH1 mutations (FDR q-value = 0.039). Similarly, the AppKY cohort also showed a higher rate of mutations in PCMTD1 (17.6% AppKY vs. 3.9% TCGA, FDR q-value = 0.045). Even after adjusting for age, gender, stage, and smoking via exact logistic regression, mutation frequencies are still significantly different between the AppKY and TCGA cohorts for IDH1 (P = 0.0024) and PCMTD1 (P = 0.019).
We also compared the somatic mutations and SCNAs of AppKY to three other lung SQCC cohorts (refs. 31–33; Supplementary Table S12). Mutation rates of IDH1 and PCMTD1 were significantly higher in AppKY than all other cohorts.
Because IDH1 and PCMTD1 showed significantly higher alteration rates in the AppKY cohort, we performed an in-depth analysis of alterations seen in these 2 genes and in other genes based on previously observed mutually exclusive mutational patterns and/or either known or reasonably hypothesized interactions.
Clinically actionable mutations assessment.
We investigated the somatic mutations/SCNAs observed in our cohort in association with FDA-approved agents or published or ongoing clinical trials for non–small cell lung carcinoma or other tumor types. Five subjects (10%) had actionable mutations, defined as FDA-approved drugs (either for this indication or another cancer type), with a total of 8 somatic mutations/SCNAs events found in these 5 individuals. In addition, we found that 33 of 51 subjects (65%) had high (>20 mut/MB) or intermediate (6–20 mut/MB) tumor mutation burden (TMB), indicating an additional group of therapeutic choices for this population using checkpoint inhibitors. Overall, 65% of subjects had actionable mutations with FDA-approved drugs and/or TMB that was high or intermediate. Many others had mutations that are under clinical investigation (Supplementary Table S13).
Analysis of alterations in IDH1 and associated pathways
Prediction of the effect of IDH1 mutations.
Mutations in IDH1 and its homolog IDH2 coding for cytosolic and mitochondrial isocitrate dehydrogenases, correspondingly, are common in gliomas (33) and myeloid neoplasms (34), but rare in lung cancer. We observed multiple IDH1 variants: R132H, V178A, A307S, and L352P, and the R132H variant was confirmed by IHC (Supplementary Fig. S4). The IDH1-variant R132H (Supplementary Fig. S5A) is reported in a variety of cancers and the role of various R132 missense substitutions has been studied extensively. These mutations are generally heterozygous, suggesting a gain-of-function by the enzyme, and supported by mechanistic studies demonstrating that the R132H-variant protein has an aberrant enzymatic activity, converting α-ketoglutarate (2OG) to (R)-2-hydroxyglutarate (2HG) (35). This enantiomer of 2HG acts as an oncometabolite and interferes with cell differentiation (36).
To understand potential consequences of the other detected IDH1 variants (V178A, A307S, and L352P; Supplementary Fig. S5A), we applied a recently developed evolutionary approach (37), based on the principle that most deleterious, and hence potentially disease-promoting mutations, result in reduced evolutionary fitness and thus are selected against during evolution. Homologous genes derive from a common ancestor gene, whereas orthologous genes diverge after a speciation event in two different species; paralogous genes occur within a single species and diverge after a duplication event. Unlike orthologous genes, a paralogous gene evolves new function(s), making the distinction between the roles of orthologous and paralogous genes in disease critical for estimating disease risk using molecular conservation (37). We have identified both IDH1 and IDH2 orthologs in representative genomes from all major eukaryotic supergroups and built a maximum-likelihood phylogenetic tree (Supplementary Fig. S6) from their multiple sequence alignment (Supplementary Fig. S7). Satisfactorily, we found that position corresponding to R132 in the human IDH1 protein is absolutely invariant, not only in orthologous sequences, but in all IDH homologs (Supplementary Fig. S6), which is consistent with deleterious effects of its substitution. Similar to R132, both A307 and L352 are also invariant residues in all IDH1 and IDH2 orthologs and all other IDH1 homologs with uncertain evolutionary history from all major eukaryotic supergroups (Fig. 2A; Supplementary Fig. S7). Because no substitutions in these positions occurred since the last eukaryotic common ancestor, any changes in these positions were predicted to be disease-promoting. Although position V178 is not invariable among all homologs, the only allowable substitutions are V178I (occasionally found in both IDH1 and IDH2) and V178C (occasionally found only in IDH2; Fig. 2A; Supplementary Fig. S7). No V178A substitution was ever detected in any IDH homologs, including the most distant ones, and might be cancer-promoting. We, therefore, tested the activity of these mutations using an enzymatic activity assay.
Functional analysis of IDH1 variants. A, Segments of multiple sequence alignment for representative IDH1 (top set) and IDH2 (bottom set) orthologs, showing conservation of Arg132, Val178, Ala207, and Leu352. Numbers are provided for a human IDH1 protein. A complete alignment and sequence accession numbers are shown in Supplementary Fig. S7. Positions 132, 178, 307, and 352 are marked and highlighted in yellow, whereas substitutions in these positions are highlighted in blue. For all other positions, residues that are identical to those in the human IDH1 are highlighted in gray. Human, Homosapiens; Frog, Xenopus tropicalis; Fish, Takifugu rubripes; Nematode, Caenorhabditis elegans; Worm, Saccoglossus kowalevskii; Lancelet, Branchiostoma floridae. B, Effect of IDH1 variants on enzyme activity. Left, effect of R132H and A307S mutants; Right, effect of V178A and L352P mutants. The two-sample t test was performed to compare each IDH1 mutant versus the wild-type and the Bonferroni correction was used for multiple comparison adjustment. Statistically significant reductions of NADPH production comparing IDH1 R132H versus wild-type;
Statistically significant reductions of NADPH production comparing IDH1 L352P versus wild-type.
Functional analysis of IDH1 variants. A, Segments of multiple sequence alignment for representative IDH1 (top set) and IDH2 (bottom set) orthologs, showing conservation of Arg132, Val178, Ala207, and Leu352. Numbers are provided for a human IDH1 protein. A complete alignment and sequence accession numbers are shown in Supplementary Fig. S7. Positions 132, 178, 307, and 352 are marked and highlighted in yellow, whereas substitutions in these positions are highlighted in blue. For all other positions, residues that are identical to those in the human IDH1 are highlighted in gray. Human, Homosapiens; Frog, Xenopus tropicalis; Fish, Takifugu rubripes; Nematode, Caenorhabditis elegans; Worm, Saccoglossus kowalevskii; Lancelet, Branchiostoma floridae. B, Effect of IDH1 variants on enzyme activity. Left, effect of R132H and A307S mutants; Right, effect of V178A and L352P mutants. The two-sample t test was performed to compare each IDH1 mutant versus the wild-type and the Bonferroni correction was used for multiple comparison adjustment. Statistically significant reductions of NADPH production comparing IDH1 R132H versus wild-type;
Statistically significant reductions of NADPH production comparing IDH1 L352P versus wild-type.
Effect of IDH1 mutations on enzyme activity.
To test the function of IDH1 and the effect of different variants on IDH1 functions, we constructed plasmids with wild-type (WT) IDH1 and mutant IDH1 genes (pcDNA3.1-IDH1-A307S; pcDNA3.1-IDH1-R132H; pcDNA3.1-IDH1-V178A; and pcDNA3.1-IDH1-L352P). We tested the enzymatic activity of the WT and each IDH1 variant by analysis of isocitrate dehydrogenase activity that directly tests NADPH production. We found that R132H and L352P mutations significantly attenuated net NADPH production of IDH1 (Fig. 2B), whereas A307S and V178 mutations had no significant effect. In the context of other R132 IDH1 studies, attenuation of net NADPH production by the R132H-variant enzyme implies that production of 2HG in the oncogenic reaction consumes NADPH. These results suggest that R132H is a point mutation that disables or attenuates some enzymatic activity of IDH1.
Placing IDH1 within a functional pathway context.
As previously mentioned, certain variants of IDH1 are known to produce the oncometabolite 2HG, (38, 39) which showed inhibitory effects on 2OG-dependent enzymes, with the histone demethylases (KDM) most sensitive to inhibition (40). There are two classes of KDMs: 2OG-dependent and FAD-dependent. The biochemical function of both classes of KDMs is to demethylate specific lysine residues in histones, leading to regulation of gene expression (41). KDMs may also regulate gene expression via demethylation of other residues on histones (42). On the basis of this information and our discovery of mutually exclusive mutational patterns between certain histone demethylases and methyl transferases, we proceeded to ask whether mutations in IDH1 share a mutually exclusive pattern with 2OG-dependent enzymes in this lung SQCC population. We found that mutations in 2OG-dependent KDMs are mutually exclusive with IDH1 (Fig. 3), suggesting that mutations in either IDH1 or the 2OG-dependent KDMs lead to a common inhibition of histone demethylation. The mutually exclusive mutational pattern involving IDH1 is statistically significant [P = 0.018 based on the MEGSA (21) method]. This mutual exclusion is a novel observation in lung SQCC, which has not previously been reported. More than 35% of AppKY patients have mutations in 2OG-dependent protein demethylases, the vast majority of them in KDMs (Supplementary Fig. S8). Furthermore, when all lysine demethylases are included in the analyses, only one FAD-dependent, KDM1A, is found to be mutated in one case. These data suggest that IDH1 mutations may regulate gene expression via inhibition of 2OG-dependent KDMs. We further evaluated the mutations in the KDMs to see whether they had functional consequences and found mutations possibly affecting a variety of specific regions in each of the different KDMs. The mutations in the KDMs are not localized to a specific region, are highly dispersive across each gene, and functionally affect protein–protein interactions, posttranslational modification sites, and metal binding (Supplementary Table S14), suggesting a general loss-of-function. This loss-of-function interpretation is further strengthened by the fact that IDH1 mutations responsible for the production of 2HG, which is inhibitory to KDMs (40), are mutually exclusive with mutations in the abovementioned KDMs (Fig. 3). The mutational patterns observed between IDH1 and KDMs suggest that restoring the KDM 2HG–inhibited function in cases with certain IDH1 mutations may prevent cancer signaling through IDH1 (43).
IDH1 mutations and IDH1-associated pathway analysis. Variant IDH1 may produce the oncometabolite 2HG that inhibits 2OG-dependent dioxygenases; the 2OG-dependent dioxygenases are highly sensitive to inhibition by 2HG. Mutations in IDH1 and 2OG-dependent enzymes are mutually exclusive. The number and percentage of samples with mutations in each gene are shown on the left. Samples are displayed as columns.
IDH1 mutations and IDH1-associated pathway analysis. Variant IDH1 may produce the oncometabolite 2HG that inhibits 2OG-dependent dioxygenases; the 2OG-dependent dioxygenases are highly sensitive to inhibition by 2HG. Mutations in IDH1 and 2OG-dependent enzymes are mutually exclusive. The number and percentage of samples with mutations in each gene are shown on the left. Samples are displayed as columns.
Analysis of alterations in PCMTD1 and associated pathways
Localization of PCMTD1 mutations.
PCMTD1 has an N-terminal canonical iso-aspartate methyl transferase (PCMT) domain, which in another protein has been shown to methylate iso-aspartate and aspartate residues on proteins including histone H4 and suggests a role in protein repair or turnover (44, 45). PCTMD1's C-terminal domain is not well characterized, and the cellular function(s) of the gene product are not known. In the AppKY dataset, mutations in PCMTD1 were always observed in the C-terminus coding region of the protein and never in the N-terminus region. These results are similar to other cancer studies including pancreatic cancer, melanoma, aggressive rhabdomyosarcoma, and others (Supplementary Fig. S5B; Table 2; Supplementary Table S15). Therefore, the C-terminus coding region of PCMTD1 appears to be a mutation hotspot.
PCMTD1 mutations
. | . | . | SOCS Box (240–356) . | . | ||
---|---|---|---|---|---|---|
Study PMID . | Cancer . | PCMT (1–239) . | BC (∼16) . | Spacer (∼82) . | Cul5 Box (∼15) . | % of Cases . |
22960745 | Lung SQCC | Yes | No | Yes | Yes | 4% |
24793135 | Aggressive rhabdomyosarcoma | No | No | Yes | No | 65% |
22622578 | Melanoma | No | Yes | Yes | Yes | 28% |
22610119 | Prostrate | No | No | No | Yes | 1% |
24816255 | Gastric carcinoma | No | No | Yes | No | 7% |
25855536 | Pancreatic cancer | No | Yes | Yes | No | 7% |
24120142 | Glioblastoma | Yes | No | Yes | No | 1% |
AppKY | Lung SQCC | No | No | Yes | Yes | 18% |
. | . | . | SOCS Box (240–356) . | . | ||
---|---|---|---|---|---|---|
Study PMID . | Cancer . | PCMT (1–239) . | BC (∼16) . | Spacer (∼82) . | Cul5 Box (∼15) . | % of Cases . |
22960745 | Lung SQCC | Yes | No | Yes | Yes | 4% |
24793135 | Aggressive rhabdomyosarcoma | No | No | Yes | No | 65% |
22622578 | Melanoma | No | Yes | Yes | Yes | 28% |
22610119 | Prostrate | No | No | No | Yes | 1% |
24816255 | Gastric carcinoma | No | No | Yes | No | 7% |
25855536 | Pancreatic cancer | No | Yes | Yes | No | 7% |
24120142 | Glioblastoma | Yes | No | Yes | No | 1% |
AppKY | Lung SQCC | No | No | Yes | Yes | 18% |
NOTE: The PCMTD1 mutations reported in the literature are in the C-terminal SOCS Box. PCMTD1 mutations in cancers are rarely found in the PCMT domain. The vast majority of mutations (except 1 case in TCGA Lung SQCC and 1 case in Glioblastoma) occur in the SOCS Box.
Smoking signature in TP53 gene and possible relationship to PCMTD1 mutations.
A recent report indicates that lysine methyltransferases (KMT), KMT2A and KMT2D, are upregulated by gain-of-function TP53 mutations (mutations in the DNA-binding domain; ref. 46). PCMTD1 is also a methyltransferase (MT). As mentioned earlier, isoaspartate residues of TP53 have been shown to be methylated, and this in turn has been shown to regulate levels of TP53 as well as its function during DNA damage (47). CUL5, a PCMTD1-interacting protein is recruited to target the TP53 protein for proteasomal degradation (48). We explored the connections between PCMTD1 and TP53, the most frequently mutated gene in the AppKY dataset (69%). TP53 mutations in this cohort showed a strong signature for a smoking-associated mutational pattern, with frequent mutations in the protein regions 157–159 and 192–193 (49). We also found that the mutations within the smoking signature, specifically the 157–159 region frequently cooccur with mutations in PCMTD1 (Supplementary Table S16).
Discussion
From our analyses and other studies, there is growing evidence that numerous pathways converge on protein modification enzymes, including MTs and protein demethylases, which function via direct protein modification and in the regulation of gene expression via chromatin modification. Therefore, regulation of protein MTs and demethylases affects the methylation status of histones and other substrates such as signaling proteins (50). For example, mutations in PI3K/AKT signaling regulate H3K4 methylation through KDM5A (50), and PIK3CA and AKT phosphorylate KDMs and KMTs, which alter their functions and render them oncogenic (50, 51). Thus, these methyltransferases and demethylases may be promising targets in cancer therapy.
The observation of a smoking-associated mutational signature in TP53 is not surprising (52) given the high rate of smoking in AppKY, and this signature appears to frequently cooccur with mutations in PCMTD1. We hypothesize that PCMTD1 could function as a regulator of TP53, although further study will be needed to examine this hypothesis. In the AppKY population, concentrations of arsenic, chromium, and nickel are higher than the U.S. national levels (53). The toxicity of carcinogenic metals has been shown to be mediated by altering histone methylation via 2OG-dependent enzymes (54, 55). In addition to the known link to tobacco exposure, we hypothesize that environmental exposures relevant to AppKY may be contributing to the development of this (R)-2-hydoxyglutarate-specific cancer mechanism in our cohort. This could help explain the IDH1 and 2OG-dependent KDMs mutually exclusive pattern seen only in the AppKY cohort.
This study is the first characterization of the genomic alterations in lung SQCC from AppKY residents. Our data share several findings with the TCGA, namely high rates of TP53, NOTCH1, PTEN, and PI3KCA, the complexity of genomic patterns, and well-recognized pathways upregulated in SQCC lung cancer. However, the AppKY SQCC has a specific genetic signature characterized by an increased number of IDH1 and PCMTD1 mutations, as compared with the TCGA. The findings in this study have important mechanistic implications for how SQCC lung cancers develop in AppKY residents and provide insights into treatment. The 10% potentially actionable mutations/SCNAs observed in our AppKY cohort (based on FDA-approved drugs) coupled with 65% of subjects with high or intermediate mutation burden indicate that a majority of these patients have potential molecular targets for treatment (Supplementary Table S13) including ERBB2 amplification with FDA-approved mAbs and tyrosine kinase inhibitors; PDGFRA and TSC2 where targeted agents are approved in other tumor types; as well as other mutations with targeted therapies under active investigation (HRAS, KRAS, PTEN, NOTCH1, NF1, BRAF). This study adds to the body of literature that supports drug development based on mutations in lung SQCC and highlights genomic population differences that are relevant. By utilizing therapies specific to actionable mutations that are common in our AppKY population, we can provide a more personalized approach through directed drug discovery targeting highly mutated genes, such as IDH1 and PCMTD1.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: J. Liu, C. Liu, H.L. Weiss, N.L. Vanderford, D.W. Fardo, C. Wang, S.M. Arnold
Development of methodology: J. Liu, T. Murali, T. Yu, C. Liu, H.N.B. Moseley, C.M. Horbinski, K. Hodges, C. Wang, S.M. Arnold
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): T. Yu, C. Liu, E.B. Durbin, S.R. Ellingson, B. Huang, B.J. Hallahan, C.M. Horbinski, K. Hodges, D.L. Napier, T. Bocklage, J. Mueller, S.M. Arnold
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Liu, T. Murali, T. Yu, C. Liu, T.A. Sivakumaran, H.N.B. Moseley, I.B. Zhulin, J. Liu, B. Huang, C.M. Horbinski, N.L. Vanderford, D.W. Fardo, C. Wang, S.M. Arnold
Writing, review, and/or revision of the manuscript: J. Liu, T. Murali, T. Yu, C. Liu, H.N.B. Moseley, H.L. Weiss, E.B. Durbin, B. Huang, B.J. Hallahan, C.M. Horbinski, K. Hodges, N.L. Vanderford, D.W. Fardo, C. Wang, S.M. Arnold
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Liu, E.B. Durbin, S.R. Ellingson, J. Mueller, N.L. Vanderford, C. Wang
Study supervision: C. Liu, C. Wang, S.M. Arnold
Other (contributed to high performance computing and WES data management): S.R. Ellingson
Other (reviewed all pathology for the study): T. Bocklage
Acknowledgments
The authors wish to thank Drs. Jill M. Kolesar and Andrew N. Lane for helpful comments, the MCC Research Communications Office for its assistance in manuscript editing and Dr. Youngwook Kim for providing the somatic mutation and copy number variation data from their paper (30). T. Murali, H.N.B. Moseley and C. Wang were supported by NCI (grant no. R21 CA205778). I.B. Zhulin was supported by National Institute of General Medical Sciences (grant no. R01 GM072285). H.L. Weiss, J. Liu, C. Wang and S.M. Arnold were supported by NIH (grant no. UL1 TR001998). This work was also supported by NCI grant P30 CA177558 which supports the Biostatistics and Bioinformatics SRF, the Biospecimen Procurement and Translation Pathology SRF, and the Cancer Research Informatics SRF of the University of Kentucky MCC.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.