Abstract
Protein kinases are frequently mutated in human cancer and inhibitors of mutant protein kinases have proven to be effective anticancer drugs. We screened the coding sequences of 518 protein kinases (∼1.3 Mb of DNA per sample) for somatic mutations in 26 primary lung neoplasms and seven lung cancer cell lines. One hundred eighty-eight somatic mutations were detected in 141 genes. Of these, 35 were synonymous (silent) changes. This result indicates that most of the 188 mutations were “passenger” mutations that are not causally implicated in oncogenesis. However, an excess of ∼40 nonsynonymous substitutions compared with that expected by chance (P = 0.07) suggests that some nonsynonymous mutations have been selected and are contributing to oncogenesis. There was considerable variation between individual lung cancers in the number of mutations observed and no mutations were found in lung carcinoids. The mutational spectra of most lung cancers were characterized by a high proportion of C:G > A:T transversions, compatible with the mutagenic effects of tobacco carcinogens. However, one neuroendocrine cancer cell line had a distinctive mutational spectrum reminiscent of UV-induced DNA damage. The results suggest that several mutated protein kinases may be contributing to lung cancer development, but that mutations in each one are infrequent.
Introduction
Lung cancer is the commonest cause of cancer-associated mortality worldwide (1). Protein kinases are frequently mutated in cancer (http://www.sanger.ac.uk/genetics/CGP/Census/) and inhibitors of this family have proven to be effective new anticancer drugs (2, 3). In lung cancer, activating mutations of the protein kinases EGFR, ERBB2, and BRAF and inactivating mutations of STK11 have been reported (4–10). Inhibitors of EGFR induce regression of lung cancers with EGFR mutations. To identify further potential drug targets and to investigate patterns of somatic mutation in lung cancer, we have analyzed for mutations the full coding sequences of the protein kinase gene family in a series of lung cancers.
Materials and Methods
For details of the methods used in these experiments, see http://www.sanger.ac.uk/genetics/CGP/. Briefly, patient samples were collected and used with informed consent and Ethics Committee approval. Primary tumors were reviewed by a pathologist and assessed as being >80% tumor by microscopic inspection. The initial DNA sequences and sequence accession numbers for the protein kinase genes in the human genome were taken from Manning et al. (11). Protein kinase coding and flanking intron sequences were PCR amplified from genomic DNA and products sequenced in both directions. Exons that failed were subjected to a second and, if necessary, third round of PCR primer design after which alternative PCR conditions were tested. If these failed, we were unable to obtain information on these sequences. Sequence traces were automatically analyzed and putative variants subsequently manually assessed. All variants were evaluated as somatic by sequencing the relevant amplicon from normal DNA from the same individual. All putative somatic mutations were confirmed by re-PCR and bidirectional sequencing of normal and tumor DNA. To assess whether a proportion of mutations detected was being selected, deviation in the ratio of nonsynonymous to synonymous mutations from that expected by chance was examined. To assess the significance of this ratio, an exact Monte Carlo test was developed. To assess heterogeneity of mutational spectrum an exact test was developed that inferred upon the χ2 statistic. To evaluate the sequence context of mutations, a χ2 test was done. For details of these statistical analyses, see Supplementary Material and http://www.sanger.ac.uk/genetics/CGP/.
Results and Discussion
Twenty-six primary lung neoplasms (seven adenocarcinomas, seven squamous carcinomas, six large cell carcinomas, and six carcinoids) and seven lung cancer cell lines (one neuroendocrine cancer and six adenocarcinomas) were each screened for somatic mutations through ∼1.3-Mb DNA constituted by the coding exons and splice junctions of 518 protein kinases. One hundred eighty-eight somatic mutations were detected. One hundred twenty-seven were missense (123 single nucleotide and four double nucleotide substitutions), 13 nonsense, six frameshift, one in-frame insertion, six splice site, and 35 synonymous (silent) (http://www.sanger.ac.uk/cosmic/ and Supplementary Information). Mutations were detected in 141 of the 518 genes screened (120 of which carried nonsynonymous mutations). Mutations were found in protein kinases known to be implicated in lung oncogenesis including BRAF (two), ERBB2 (HER2/neu; one), and STK11 (four). No mutations in EGFR were observed in this series. Mutations (including truncating mutations) were also found in the recessive cancer genes MAP2K4 (two) and ATM (four). One truncating mutation in MAP2K4 in lung cancer has been reported (12). However, inactivating mutations of ATM have not previously been implicated in the development of lung cancer. Among the remaining genes, there were six nonsynonymous mutations in TTN; three in FGFR2, EPHA3, ATR, and TAF1L; two in 12 genes (Table 1), and one in 98 genes. TTN (Titin) has a coding sequence of ∼80 kb and is believed to be the largest protein encoded by the human genome. Therefore, taking into account coding sequence length, this distribution of somatic mutations does not differ from that expected by chance. Mutation clustering in the kinase domain or among particular subclasses of kinases was not detected.
Gene . | Mutation . |
---|---|
ANKK1 | G2207T R736L, G2290A E764K |
ATM | G1672T G558*, G2542C E848Q, G6154T E2052*, A7996G T2666A |
ATR | G4462C A1488P, C6005G A2002G, G6698T S2233I |
BRAF | G1406C G469A, C1789G L597V |
DDR1 | G1486T A496S, CC2469/2470TT R824W |
EPHA3 | C686A S229Y, C1346T S449F, G2297A G766E |
EPHA5 | G1250A R417Q, G1507A E503K |
FGFR2 | G847A D283N, G870C W290C, G1487C R496T |
GPRK5 | C489A D163E, 874insT 292CGRDPLRLRRPPP* |
GUCY2F | C1064A S355*, A3155G K1052R |
MAP2K4 | A425T Q142L, 882delG 295SHCMSWPQADFLIQSGIVYLIN* |
MGC42105 | C997T P333S, C1231A P411T |
NEK10 | G2633T R878M, C3344T P1115L |
NTRK3 | C2029T H677Y, CG2161/2162TT R721F |
PRKDC | G4340T R1447M, G8822C G2941A |
PRKWNK2 | G4856A G1619E, G5933T S1978I |
SK681 | A906T R302S, C1826T S609L |
STK11/LKB1 | C109T Q37*, C109T Q37*, 167delG 57KALTAR*, 842delC 281RSLTC* |
TAF1L | G2250T L750F, C2284A L762I, A2382C E794D |
TRPM6 | G224T G75V, G3021C W1007C |
TTN | G4937A G1646D, C7156T L2386F, C21910G L7304V, T34907A L11636Q, G35000A R11667, C74240A T24747N |
Gene . | Mutation . |
---|---|
ANKK1 | G2207T R736L, G2290A E764K |
ATM | G1672T G558*, G2542C E848Q, G6154T E2052*, A7996G T2666A |
ATR | G4462C A1488P, C6005G A2002G, G6698T S2233I |
BRAF | G1406C G469A, C1789G L597V |
DDR1 | G1486T A496S, CC2469/2470TT R824W |
EPHA3 | C686A S229Y, C1346T S449F, G2297A G766E |
EPHA5 | G1250A R417Q, G1507A E503K |
FGFR2 | G847A D283N, G870C W290C, G1487C R496T |
GPRK5 | C489A D163E, 874insT 292CGRDPLRLRRPPP* |
GUCY2F | C1064A S355*, A3155G K1052R |
MAP2K4 | A425T Q142L, 882delG 295SHCMSWPQADFLIQSGIVYLIN* |
MGC42105 | C997T P333S, C1231A P411T |
NEK10 | G2633T R878M, C3344T P1115L |
NTRK3 | C2029T H677Y, CG2161/2162TT R721F |
PRKDC | G4340T R1447M, G8822C G2941A |
PRKWNK2 | G4856A G1619E, G5933T S1978I |
SK681 | A906T R302S, C1826T S609L |
STK11/LKB1 | C109T Q37*, C109T Q37*, 167delG 57KALTAR*, 842delC 281RSLTC* |
TAF1L | G2250T L750F, C2284A L762I, A2382C E794D |
TRPM6 | G224T G75V, G3021C W1007C |
TTN | G4937A G1646D, C7156T L2386F, C21910G L7304V, T34907A L11636Q, G35000A R11667, C74240A T24747N |
NOTE: Frame-shift mutations are annotated at the amino acid level by describing the first amino acid that is altered with respect to the wild-type sequence followed by the novel predicted translation. Termination codons (*). Missense mutations within canonical kinase domains are in bold.
Somatic mutations found in cancer cells may be “driver” mutations which confer growth advantage, are selected, and are implicated in the development of the cancer. Alternatively, they may be “passenger” mutations (also known as “bystander” mutations) that are not subject to selection and are not causally involved in oncogenesis. Thirty-five synonymous (silent) mutations were detected in this screen. Although it is possible that a small proportion of these may have biological effects through alteration of RNA splicing or other regulatory processes, most are likely to have no effect on the biology of the cells in which they occur and hence are passenger mutations. On this assumption and taking into account the observed mutational spectrum and sequence of the 1.3 Mb DNA that constitute the coding sequences of the protein kinases (see Materials and Methods), the presence of 35 synonymous changes indicates that approximately three quarters of the 188 somatic mutations detected are likely to be “passenger” mutations that are not involved in cancer causation. Nevertheless, there is an excess of ∼40 nonsynonymous substitutions compared with that expected by chance (P = 0.07) which may be attributable to biological selection and hence may be implicated in oncogenesis. To further evaluate this possibility, we conducted a supplementary screen of 30 genes, in which two or more mutations were found or with mutations in conserved portions of the kinase domain, in an additional series of 56 primary lung neoplasms (see Supplementary Material). This yielded 18 somatic mutations of which three were synonymous thus showing a similar trend towards an excess of nonsynonymous mutations as observed in the primary screen.
We did not find a commonly mutated and activated protein kinase. It should be noted that we were unable to amplify some DNA sequences, that a small number of mutations may have been undetected, and that the number of tumors of each subtype in the screen was limited. Therefore, we cannot exclude the possibility of a frequently mutated protein kinase in lung cancer and further screens of this type are justified.
To account for the observed excess of nonsynonymous somatic mutations, however, several infrequently mutated protein kinases may be contributing to oncogenesis. This notion is supported by specific features of certain somatic mutations. For example, activating germ line mutations of the fibroblast growth factor receptors, FGFR1, FGFR2, and FGFR3, have been reported in a broad spectrum of skeletal dysplasias (13). Somatic mutations of FGFR3 found in papillary transitional cell bladder cancer occur at identical positions to germ line mutations predisposing to a subtype of skeletal dysplasia known as thanatophoric dwarfism (14). Somatic point mutations in FGFR1 have not previously been implicated in oncogenesis, although activating rearrangements of the gene are present in myeloproliferative disorders and non-Hodgkin lymphoma (http://www.sanger.ac.uk/genetics/CGP/Census/). We observed a single somatic mutation in FGFR1 (c.C754A p.P252T) in a bronchoalveolar cancer. Interestingly, codon 252 is the only position in FGFR1 at which activating germ line mutations are known to predispose to skeletal dysplasia, specifically Pfeiffer's syndrome (Human Gene Mutation Database, http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html). The analogy with the role of FGFR3 mutations in thanatophoric dwarfism and bladder cancer suggests that this mutation is contributing to cancer development. Similarly, we found a somatic mutation of FGFR2 (c.G870C p.W290C) in a squamous cell lung cancer at an identical position to germ line mutations in this gene that also cause Pfeiffer syndrome (Human Gene Mutation Database, http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html). Two FGFR2 somatic mutations have previously been reported (in gastric cancer) and both are also found as germ line mutations predisposing to skeletal dysplasia (15). The results therefore suggest that a subset of germ line–activating mutations in FGFR1 and FGFR2 that cause skeletal dysplasias occur as infrequent but probably causative somatic mutations in lung cancer.
Other mutations were in conserved, key functional regions and amino acid residues that are known to harbor activating somatic mutations in other kinases in cancer. For example, AURKC (STK13), SNK, EPHA3, NTRK3, CDK8, and SGK2 each had a single mutation in the P-loop or activation segment of their kinase domains. In NTRK3, we detected further mutations in the kinase domain in another lung cancer and in a breast cancer (16), both in the highly conserved HRD motif of the catalytic loop. Mutations in the kinase domain of NTRK3 have also previously been reported in colorectal cancer (17). Additional somatic mutations were also identified in the supplementary screen in AURKC, a member of the aurora kinase family of chromosomal passenger proteins implicated in centrosome/spindle regulation during mitosis (18). Finally, the position of an acidic substitution in the transmembrane domain of EPHA5 (c.G1745A p.G582E) is highly reminiscent of the recurrent activating mutation (c.T1991A p.V664E) of Erbb2/neu in ethylnitrosourea-induced tumors of rats (19). Therefore, it is plausible that this mutation activates the kinase activity of EPHA5 as it does Erbb2/neu. Biological studies are indicated to evaluate the transforming potential of these somatic mutations.
We have sequenced 3% of the coding sequence of the human genome in each of the 33 lung neoplasms examined. We can therefore begin to investigate the numbers and patterns of somatic mutations in lung cancer genomes, which in turn may provide insights into the mutagenic processes underlying oncogenesis. At least one mutation was found in seven of seven lung cancer cell lines, five of seven adenocarcinomas, four of seven squamous carcinomas, and five of six large cell carcinomas (Table 2). However, none of six carcinoids showed somatic mutations. Overall, we detected more somatic mutations in lung cancer cell lines than in primary tumors, but within each group, there was substantial variation in mutation number. For example, among primary large cell carcinomas, 24 somatic mutations were observed in PD1362a and none in PD1365a. On the assumption that most are passenger mutations and therefore that the prevalence of mutations is similar elsewhere in the genome, we estimate that PD1362a has ∼50,000 somatic point mutations in its genome generating ∼500 amino acid changes, one in every 50 genes. By comparison, PD1365a may have fewer than 2,000 somatic mutations and 20 random amino acid changes.
. | C:G > T:A . | C:G > G:C . | C:G > A:T . | T:A > C:G . | T:A > G:C . | T:A > A:T . | Insertion/deletion . | Other . | Total . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NCI-H1395 | 1 | 1 | 1 | 1 | 1 | 5 | ||||||||||||
NCI-H1437 | 3 | 1 | 1 | 5 | ||||||||||||||
NCI-H1770 | 34 | 1 | 2 | 1 | 1 | 2X CC:GG > TT:AA | 41 | |||||||||||
NCI-H2009 | 6 | 5 | 4 | 3 | 1 | 19 | ||||||||||||
NCI-H2087 | 4 | 5 | 3 | 12 | ||||||||||||||
NCI-H2122 | 2 | 2 | 1 | 5 | ||||||||||||||
NCI-H2126 | 5 | 3 | 6 | 1 | 1 | 1 | 17 | |||||||||||
PD0248a | 3 | 3 | 1 | 7 | ||||||||||||||
PD0251a | 1 | 2 | 1 | 4 | ||||||||||||||
PD0252a | 3 | 3 | 6 | |||||||||||||||
PD0263a | 1 | 1 | 1 | 3 | ||||||||||||||
PD0269a | 1 | 2 | 1 | 1 | 5 | |||||||||||||
PD0276a | 3 | 2 | CG:GC > TT:AA | 6 | ||||||||||||||
PD1351a | 3 | 1 | 4 | |||||||||||||||
PD1352a | 3 | 3 | ||||||||||||||||
PD1353a | 1 | 1 | 2 | |||||||||||||||
PD1362a | 5 | 5 | 10 | 2 | 1 | CC:GG > AA:TT | 24 | |||||||||||
PD1364a | 1 | 2 | 1 | 4 | ||||||||||||||
PD1367a | 2 | 1 | 2 | 5 | ||||||||||||||
PD1379a | 4 | 1 | 3 | 1 | 1 | 1 | 11 | |||||||||||
Total without NCI-H1770 | 44 | 26 | 48 | 9 | 4 | 7 | 7 | 2 | 147 | |||||||||
Total | 78 | 27 | 50 | 10 | 5 | 7 | 7 | 4 | 188 |
. | C:G > T:A . | C:G > G:C . | C:G > A:T . | T:A > C:G . | T:A > G:C . | T:A > A:T . | Insertion/deletion . | Other . | Total . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NCI-H1395 | 1 | 1 | 1 | 1 | 1 | 5 | ||||||||||||
NCI-H1437 | 3 | 1 | 1 | 5 | ||||||||||||||
NCI-H1770 | 34 | 1 | 2 | 1 | 1 | 2X CC:GG > TT:AA | 41 | |||||||||||
NCI-H2009 | 6 | 5 | 4 | 3 | 1 | 19 | ||||||||||||
NCI-H2087 | 4 | 5 | 3 | 12 | ||||||||||||||
NCI-H2122 | 2 | 2 | 1 | 5 | ||||||||||||||
NCI-H2126 | 5 | 3 | 6 | 1 | 1 | 1 | 17 | |||||||||||
PD0248a | 3 | 3 | 1 | 7 | ||||||||||||||
PD0251a | 1 | 2 | 1 | 4 | ||||||||||||||
PD0252a | 3 | 3 | 6 | |||||||||||||||
PD0263a | 1 | 1 | 1 | 3 | ||||||||||||||
PD0269a | 1 | 2 | 1 | 1 | 5 | |||||||||||||
PD0276a | 3 | 2 | CG:GC > TT:AA | 6 | ||||||||||||||
PD1351a | 3 | 1 | 4 | |||||||||||||||
PD1352a | 3 | 3 | ||||||||||||||||
PD1353a | 1 | 1 | 2 | |||||||||||||||
PD1362a | 5 | 5 | 10 | 2 | 1 | CC:GG > AA:TT | 24 | |||||||||||
PD1364a | 1 | 2 | 1 | 4 | ||||||||||||||
PD1367a | 2 | 1 | 2 | 5 | ||||||||||||||
PD1379a | 4 | 1 | 3 | 1 | 1 | 1 | 11 | |||||||||||
Total without NCI-H1770 | 44 | 26 | 48 | 9 | 4 | 7 | 7 | 2 | 147 | |||||||||
Total | 78 | 27 | 50 | 10 | 5 | 7 | 7 | 4 | 188 |
NOTE: Samples without mutations are not shown.
The numbers of somatic mutations observed in individual lung cancers are higher overall than in our previous analyses of the same 1.3 Mb of DNA sequence in breast and testicular cancers (16),15
G. Bignell et al. Sequence analysis of the protein kinase gene family in human testicular germ-cell tumours of adolescents and adults. 2005, submitted for publication.
In addition to the numbers of somatic mutation, their patterns may also provide insights into the processes of mutagenesis operative during the development of each cancer. The mutational spectra of most lung cancers were characterized by a high proportion of C:G > A:T transversions (Table 2). This is similar to the spectrum of somatic mutations observed in TP53 in lung cancer and is compatible with the mutagenic effects of tobacco carcinogens (The IARC TP53 Database, http://www-p53.iarc.fr/index.html).
There was, however, evidence for heterogeneity of mutational spectrum among the lung cancers examined (P = 0.005). This was mainly attributable to the cell line NCI-H1770, which has the largest number of somatic mutations among the 33 lung neoplasms screened and is reported to be derived from a lung neuroendocrine tumor in a never smoker. The mutational spectrum of NCI-H1770 is characterized by a high proportion of C:G > T:A transition mutations (34 of 41) and by two CC:GG > TT:AA double nucleotide substitutions. These occurred in a specific sequence context (P < 0.00001) at pyrimidine dinucleotides. There was also a strand preference for mutation occurrence, with mutations of C twice as common as mutations at G on the untranscribed strand. All these features of the mutational signature of NCI-H1770 are similar to those induced by UVB exposure in squamous cell carcinoma of the skin (20). However, immunocytochemistry for neuroendocrine (CD56 and MNF116) and melanocytic (S-100 and HMB45) markers indicated that NCI-H1770 is a neuroendocrine neoplasm and is unlikely to be a malignant melanoma or skin epithelial tumor. If it is of primary lung origin, it seems unlikely that its characteristic mutational signature is due to UVB mutagenesis. It may, however, conceivably be due to a DNA repair defect or exposure that mimics exposure to UV radiation.
In summary, we have screened the full protein kinase gene family in a series of lung cancers. The results suggest that a substantial fraction of mutations detected are “passenger” mutations but that there is also an excess of nonsynonymous mutations that are likely to be “driver” mutations. If correct, this would indicate that a high proportion of non–small cell lung cancers have activating mutations in the protein kinase gene family that contribute to oncogenesis. However, these mutations seem distributed over a large number of protein kinases, and mutations in each individual protein kinase are relatively infrequent, as is the case for ERBB2 and BRAF. If confirmed, these results have implications for the development of protein kinase inhibitors as novel therapeutics in lung cancer. Although it may be possible to derive many protein kinase inhibitors (each targeted at a different protein kinase implicated in lung cancer development), the relatively small subsets of patients that might potentially benefit from each inhibitor present significant challenges to the exploitation of these targets in cancer treatment.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
Acknowledgments
Grant support: Wellcome Trust.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.