Genome-wide copy number changes were analyzed in 70 primary human lung carcinoma specimens and 31 cell lines derived from human lung carcinomas, with high-density arrays representing ∼115,000 single nucleotide polymorphism loci. In addition to previously characterized loci, two regions of homozygous deletion were found, one near the PTPRD locus on chromosome segment 9p23 in four samples representing both small cell lung carcinoma (SCLC) and non–small cell lung carcinoma (NSCLC) and the second on chromosome segment 3q25 in one sample each of NSCLC and SCLC. High-level amplifications were identified within chromosome segment 8q12-13 in two SCLC specimens, 12p11 in two NSCLC specimens and 22q11 in four NSCLC specimens. Systematic copy number analysis of tyrosine kinase genes identified high-level amplification of EGFR in three NSCLC specimens, FGFR1 in two specimens and ERBB2 and MET in one specimen each. EGFR amplification was shown to be independent of kinase domain mutational status.
Mapping copy number alterations in the cancer genome has contributed to the subsequent identification of tumor suppressor genes and oncogenes. The delineation of cancer-specific homozygous deletions enabled the discovery of several different tumor suppressor genes, including RB1 (1), CDKN2A (2, 3), PTEN (4, 5), and SMAD4/DPC4 (6). Similarly, genes that are targets for high-level amplification in cancers are likewise often subject to oncogenic mutations. Examples where identification of cancer-specific amplifications preceded the identification of cancer somatic mutations include PDGFRA (7, 8), EGFR (9–14), ERBB2 (15, 16), and PIK3CA (17–19).
High-resolution genomic approaches now make it possible to screen for chromosomal copy number alterations in a systematic manner. Array comparative genomic hybridization can provide high-resolution detection of copy number changes (20, 21). Single nucleotide polymorphism (SNP) arrays can measure cancer-specific loss of heterozygosity (LOH) with high accuracy in a genome-wide fashion (22–26). Furthermore, SNP arrays that cover ∼10,000 SNP loci can be used to detect DNA copy number changes at the genome level, including high-level amplifications and homozygous deletions (27, 28).
Lung cancer is the leading cause of cancer death in the world and is estimated to result in ∼160,000 deaths annually in the United States alone (29). Genomic studies have begun to impact the diagnosis and treatment of human lung carcinoma. For example, mutations in the EGFR kinase domain in non–small cell lung carcinoma (NSCLC) specimens were found to correlate with patient responses to gefitinib and erlotinib (11–13).
To discover novel genomic changes in human lung carcinomas, in the hopes of identifying additional pathways active in these diseases, we have used SNP arrays covering ∼115,000 SNP loci to investigate copy number changes in a panel of DNA from 77 NSCLC and 24 small cell lung carcinoma (SCLC) lung cell lines and primary tumors. Given the high resolution of the SNP arrays, we have been able to identify several small homozygous deletions and amplifications that have not been detected by previous methods.
Materials and Methods
Primary tumor and cell line specimens. We obtained the following genomic DNA: lung adenocarcinoma (HOP-62, NCI-H23), large cell lung carcinoma (LC; HOP-92), and squamous cell lung carcinoma (NCI-H266) from the National Cancer Institute. We prepared genomic DNA from the following cell lines: adenocarcinoma (H1437, H1819, H1993, H2009, H2087, H2122, H2347, HCC193, HCC461, HCC515, HCC78, HCC827), adenosquamous lung carcinoma (HCC366), LC (H2126, HCC1359, HCC1171), unspecified NSCLC (H2882, H2887), squamous cell lung carcinoma (H157, HCC15, HCC95), bronchioloalveolar carcinoma (BAC; H358), and SCLC (H524, H526, H1184, H1607, H1963). The primary tumors were from anonymous patients and were surgically dissected and frozen at −80°C until use. All primary tumor specimens were examined histologically to ensure at least 70% neoplastic tissue, except SCLC samples that were considered to have high tumor contents. These tumors consisted of 19 SCLC and 51 NSCLC. These include SCLC (S0168T, S0169T, S0170T, S0171T, S0172T, S0173T, S0177T, S0185T, S0187T, S0188T, S0189T, S0190T, S0191T, S0192T, S0193T, S0194T, S0196T, S0198T, S0199T), lung adenocarcinoma (MGH1622T, MGH7T, MGH1028T, S0356T, S0372T, S0377T, S0380T, S0392T, S0395T, S0397T, S0405T, S0412T, S0464T, S0471T, S0479T, S0482T, S0488T, S0498T, S0500T, S0502T, S0514T, S0522T, S0524T, S0534T, S0535T, S0539T, AD157T, AD163T, AD309T, AD311T, AD327T, AD330T, AD334T, AD335T, AD336T, AD337T, AD347T), squamous cell lung carcinoma (S0446T, S0449T, S0458T, S0465T, S0480T, S0485T, S0496T, S0508T, S0515T, S0536T), adenocarcinoma/BAC (S0376T), and BAC (S0509T, AD338T, AD362T).
Single nucleotide polymorphism array. For each sample, SNPs were genotyped with two different arrays, CentXba and CentHind, in parallel (Affymetrix, Inc., Santa Clara, CA). Array experiments were done according to the manufacturer's recommendations. In brief, two aliquots of DNA (250 ng each) were first digested with XbaI or HindIII restriction enzyme (New England Biolabs, Boston, MA), respectively. The digested DNA was ligated to an adaptor before subsequent PCR amplification using AmpliTaq Gold (Applied Biosystems, Foster City, CA). Four 100 μL PCR reactions were then set up for each XbaI or HindIII adaptor-ligated DNA sample. The PCR products from four reactions were pooled, concentrated, and fragmented with DNase I to a size range of 250 to 2,000 bp. Fragmented PCR products were then labeled, denatured, and hybridized to the array. After hybridization, the arrays were washed on the Affymetrix fluidics stations, stained, and scanned using the Gene Chip Scanner 3000 and the genotyping software, Affymetrix Genotyping Tools Version 2.0.
Data analysis. Data were normalized to a baseline array with median signal intensity at the probe intensity level with the invariant set normalization method described by Li and Wong (30). After normalization, the signal values for each SNP in each array were obtained with a model-based (PM/MM) method (31). Signal intensities at each probe locus were compared with a set of normal reference samples representing 12 individuals. From raw signal data, the inferred copy numbers at each SNP locus was estimated by applying the hidden Markov model (HMM; ref. 27). We applied the HMM model based on the assumption of diploidy or triploidy; thus, possible normalized copy numbers are (0, 1, 2, 3, 4, …; diploid) or (0, 0.67, 1.33, 2, 2.67, 3.33, 4, …; triploid), leading to the possible copy number set (0, 0.67, 1, 1.33, 2, 2.67, 3, 3.33, 4, …). The analysis methods described above are implemented in the dChip software Version 1.3, which is freely available to academic users (http://biosun1.harvard.edu/~cli/dchip2004.exe). Mapping information of SNP locations and cytogenetic band are based on curation of Affymetrix and University of California Santa Cruz hg 16 (http://genome.ucsc.edu).
We applied the circular binary segmentation algorithm (32) to our raw log2 ratio data. This algorithm recursively splits chromosomes into subsegments based on a maximum t statistic. The reference distribution for this statistic, estimated by permutation, is used to decide whether or not to split at each stage (see ref. 32 for details). We compared the (rounded) mean raw estimated copy for each segment to our HMM results.
Quantitative real-time PCR. Relative gene copy numbers and gene expression were determined by quantitative real-time PCR using a PRISM 7500 sequence Detection System (Applied Biosystems) and a QuantiTect SYBR Green PCR kit and a QuantiTect SYBR Green RT-PCR kit (Qiagen, Inc., Valencia, CA). The standard curve method was used to calculate target gene copy number in the tumor DNA sample normalized to a repetitive element Line-1 and normal reference DNA. The comparative threshold cycle method was used to calculate gene expression normalized to β-actin as a gene reference and normal human lung RNA as an RNA reference. Primers were designed using Primer 3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) and synthesized by Invitrogen (Carlsbad, CA). Primer sequences are available upon request.
Interphase fluorescence in situ hybridization. Fluorescence in situ hybridization (FISH) probes were made from BAC clones RP11-805B16 and RP11-153K21 (Children's Hospital, Oakland Research Institute, Oakland, CA), identified to overlap the ASPH locus. BAC DNA was purified and 100 ng of each clone was labeled with digoxigenin-dUTP using random primers. The DNA was then purified with a MicroSpin S-200 HR column, ethanol precipitated, and resuspended in 100 μL hybridization solution. The control FISH probe, CEP 8 SpectrumOrange (Vysis, Downers Grove, IL), detects the centromeric region of chromosome 8. H2171 cells were grown in culture using standard methods and harvested by centrifugation after 3 days of growth. Slides for FISH analysis were prepared according to the control probe manufacturer's directions (Vysis). Briefly, cell pellets were resuspended in a fixative, a 3:1 solution of methanol and acetic acid. The cell suspension was dropped onto slides dipped in fixative and dried for 10 minutes over a 67°C water bath. The slides were pretreated with 2× SSC at 37°C for 1 hour, then digested for 5 minutes with a 1:25 solution of All III and rinsed with PBS. They were then incubated for 1 minute in 10% buffered formalin at room temperature, rinsed with PBS, dehydrated in an ethanol series (70%, 85%, 95%, and 100%), and air dried. Ten microliters of probe solution (6 μL hybridization buffer, 1 μL Cot-1 DNA, 1 μL centromere control probe, and 1 μL each of RP11-805B16 and RP11-153K21 probes) were incubated on the slide under a sealed coverslip for 3.5 minutes at 85°C and then placed in a humidified chamber overnight at 37°C. The slides were then washed in 0.5× SSC at 75°C for 5 minutes. The ASPH probes were detected using a 1:500 dilution of FITC antidigoxigenin in 10% normal goat serum, with 4′,6-diamidino-2-phenylindole as a counterstain.
Genome-wide analysis of lung cancer. One hundred one human lung carcinoma DNA samples, including 51 NSCLC primary tumor samples, 26 NSCLC cell line samples, 19 SCLC primary tumor samples, and 5 SCLC cell line samples, were hybridized to SNP arrays containing 115,593 mapped SNP loci. Two independent algorithms, the HMM in dChipSNP (27) and binary segmentation analysis (32), were used to infer copy number and thereby to identify genomic amplifications and deletions.
The genomes of lung carcinomas are often complex with numerous chromosomal alterations. For example, the lung adenocarcinoma cell line H2122 shows homozygous deletions on chromosome arms 2q, 8q, and 9p, and a significant amplification on chromosome arm 8q (Fig. 1A). Analyses of all tumor samples, throughout the genome, identifies recurrent regions of copy number gain and loss in lung carcinomas (Fig. 1B and C; Supplementary Fig. S1 shows copy number estimates for each sample across the entire genome). In both NSCLC and SCLC, the most frequent copy number gains were found in chromosome arm 5p, with ≥3 copies found in 25% and 43% of samples, respectively (Fig. 1B and C). However, the region of 5p copy number gain is usually large, and we were not able to identify any area of focal amplification from our data set.
Maximum degrees of copy number loss at a given locus, with an inferred copy number <1.5 based on HMM analysis, were found most often in chromosome arms 8p (33%) and 9p (26%) in NSCLC (Fig. 1B) and in chromosome arms 3p (68%) and 4q (58%) in SCLC (Fig. 1C). These findings are broadly similar to results reported by CGH (33); however, overall, we are reporting a somewhat lower frequency of these chromosome alterations. Possible reasons include the following: (a) the presence of stromal admixture in primary tumor samples; (b) copy number signal is attenuated either at the level of primary hybridization intensity or due to the application of the HMM; or (c) we are analyzing maximal regions of loss rather than tallying the presence of loss at any site within a whole chromosome arm. For example, the SCLC cell lines NCI-H524 and NCI-H526 only show partial loss of chromosome 3p in SNP array analysis (Supplementary Fig. S1); similar results are seen for the same cell lines upon analysis with cDNA array CGH.15
L. Girard and J. Minna, unpublished data.
Homozygous deletions and amplifications are of particular interest because they may indicate tumor suppressor gene and oncogene loci, respectively. Regions of homozygous deletion were defined as segments of at least four SNP loci covering >5 kb with an inferred copy number of 0 by HMM. Similarly, regions of high-level amplification were defined as segments at least four SNP loci covering >5 kb with an inferred copy number ≥7.
Copy number alterations that recurred in multiple samples were verified by real-time PCR (Table 1). In general, copy number estimation was consistent between the HMM and binary segmentation approaches. Note that whereas quantitative real-time PCR results have been highly reproducible between duplicate experiments, the copy number increases detected by the arrays seem to saturate at a much lower level. Thus, an inferred copy number of ∼7 may correspond to a quantitative PCR copy number of as high as 175 or as low as 9 (MYC amplification; Table 1).
Cytoband* . | Start (Mb)* . | Stop (Mb)* . | Size (Mb)* . | Sample . | Mean dCHIP copy number† . | Binary segmentation copy number . | Real-time PCR copy number . | Representative gene within interval . | Total no. genes* . |
1p34.2 | 39.51 | 40.55 | 1.04 | H1963 | 10.6 | 11 | 147.8 ± 32.3 | MYCL1 | 23 |
39.55 | 40.91 | 1.36 | S0173T | 10.7 | 8 | 20.7 ± 4.9 | 24 | ||
2p24.3-p24.2 | 14.20 | 16.38 | 2.18 | S0172T | 14.2 | 12 | 56.2 ± 13.7 | MYCN | 9 |
15.25 | 17.06 | 1.81 | H526 | 7.0 | 5 | 42.3 ± 6.4 | 9 | ||
2q22.1 | 141.71 | 142.45 | 0.73 | H2122 | 0 | 0 | 0.0 ± 0.0 | LRP1B‡ | 1 |
141.79 | 142.78 | 0.99 | HCC95 | 0 | 1 | 0.0 ± 0.0 | 1 | ||
141.94 | 142.20 | 0.26 | H2126 | 0 | 0 | 0.0 ± 0.0 | 1 | ||
142.00 | 142.20 | 0.20 | H157 | 0 | 0 | 0.1 ± 0.0 | 1 | ||
3p14.2 | 60.29 | 60.78 | 0.49 | HCC95 | 0 | 0 | 0.0 ± 0.0 | FHIT | 4 |
60.32 | 60.40 | 0.08 | H2887 | 0 | 0 | 0.0 ± 0.0 | 1 | ||
3q25.1 | 152.82 | 152.95 | 0.12 | H2882 | 0 | 0 | 0.0 ± 0.0§ | AADAC, SUCNR1 | 2 |
152.82 | 152.95 | 0.12 | S0177T | 0 | 0 | 0.0 ± 0.0§ | 2 | ||
3q26.31-q27.1 | 174.86 | 184.52 | 9.66 | S0465T | 7.8 | 5 | 10.3 ± 1.7 | PIK3CA | 41 |
182.50 | 184.47 | 1.98 | S0515T | 13.2 | 11 | 3.9 ± 0.4 | 12 | ||
7p12.1-q11.22 | 53.16 | 61.49 | 11.34 | HCC827 | 11.3 | 9 | 41.7 ± 8.6 | EGFR | 50 |
54.24 | 69.62 | 9.73 | AD347T | 9.8 | 9 | 18.3 ± 4.8 | 123 | ||
54.37 | 55.63 | 13.65 | S0480T | 13.7 | 10 | 67.1 ± 10.9 | 12 | ||
8p12-p11.22 | 38.05 | 39.97 | 1.93 | MGH1622T | 10.4 | 11 | 14.9 ± 5.9 | FGFR1 | 22 |
38.73 | 39.84 | 1.11 | S0449T | 6.4 | 5 | 6.07 ± 1.7 | 11 | ||
8q24.13-q24.21 | 126.60 | 128.89 | 2.3 | H524 | 6.6 | 5 | 174.5 ± 39.3 | MYC | 6 |
127.46 | 128.89 | 1.43 | HCC827 | 6.9 | 6 | 8.6 ± 1.4 | 6 | ||
127.59 | 130.83 | 3.24 | NCI-H23 | 8.0 | 7 | 11.1 ± 2.8 | 11 | ||
127.90 | 129.62 | 1.72 | H2122 | 3.6 | 5 | 14.5 ± 4.3 | 6 | ||
128.44 | 129.60 | 1.16 | H2087 | 7.9 | 8 | 16.0 ± 5.8 | 4 | ||
9p23 | 8.61 | 9.12 | 0.51 | S0177T | 0 | 0 | 0.0 ± 0.0 | PTPRD | 2 |
8.79 | 9.55 | 0.77 | H358 | 0 | 0 | 0.1 ± 0.0§ | 2 | ||
9.41 | 9.61 | 0.20 | HCC1171 | 0 | 0 | 0.2 ± 0.0§ | 1 | ||
9.50 | 9.75 | 0.25 | H2347 | 0 | 0 | 0.0 ± 0.0§ | 1 | ||
9p21.3 | 20.90 | 22.94 | 2.03 | H2126 | 0 | 0 | 0.0 ± 0.0 | CDKN2A | 35 |
21.20 | 22.19 | 0.98 | HCC1359 | 0 | 0 | 0.0 ± 0.0 | 21 | ||
21.58 | 25.10 | 3.52 | HCC1171 | 0 | 0 | 0.0 ± 0.0 | 11 | ||
21.70 | 23.39 | 1.69 | H2882 | 0 | 1 | 0.0 ± 0.0 | 6 | ||
21.84 | 26.83 | 4.99 | HCC95 | 0 | 0 | 0.0 ± 0.0 | 11 | ||
21.95 | 22.09 | 0.14 | H2122 | 0 | 0 | 0.0 ± 0.0 | 3 | ||
24.34 | 24.70 | 0.36 | H157 | 0 | 0 | 0.0 ± 0.0§ | 1 | ||
10q23.31 | 89.03 | 89.40 | 0.37 | H1607 | 0 | 0 | 0.0 ± 0.0 | PTEN | 4 |
89.18 | 89.88 | 0.69 | S0187T | 0 | 0 | 0.1 ± 0.0 | 4 | ||
89.35 | 91.16 | 1.80 | S0189T | 0 | 0 | 0.1 ± 0.0§ | 22 | ||
12p11.21 | 32.17 | 33.02 | 0.85 | S0515T | 8.8 | 7 | 10.8 ± 5.4 | PKP2 | 6 |
32.69 | 36.59 | 3.90 | H2087 | 7.8 | 8 | 11.4 ± 5.7 | 12 | ||
12q13.3-q14.1 | 56.26 | 57.37 | 1.10 | H2087 | 8.8 | 9 | 23.4 ± 11.3 | CDK4 | 20 |
55.82 | 56.67 | 0.85 | HCC827∥ | 13.0 | 13 | 30.3 ± 11.8 | 34 | ||
19q12 | 34.02 | 35.55 | 1.53 | S0524T | 6.7 | 7 | 6.8 ± 1.93 | CCNE1 | 12 |
34.79 | 37.09 | 2.30 | S0188T | 7.9 | 7 | 10.9 ± 3.2 | 12 | ||
22q11.21-q11.22 | 16.99 | 20.31 | 3.32 | H1819 | 6.8 | 7 | 12.6 ± 4.8 | CRKL | 92 |
17.51 | 21.44 | 3.93 | HCC515 | 7.4 | 7 | 14.0 ± 3.6 | 169 | ||
18.47 | 20.61 | 2.14 | S0380T | 6.4 | 5 | 8.4 ± 1.2 | 68 | ||
19.45 | 20.75 | 1.29 | HCC1359 | 6.5 | 7 | 8.1 ± 1.7 | 48 |
NOTE: Samples are those with EGFR amplification (copy number ≥ 4) or EGFR gene mutation or those with EGFR expression, EGFR copy number, and EGFR mutation status available. Abbreviation: N/A, not available.
120K set was from both XbaI and HindIII chips; 60K set was from only XbaI chip.
Ref. 12; Naoki et al., unpublished data.
Ref. 55; Naoki et al., unpublished data.
Comparisons of EGFR amplification, mutation, and expression data indicate that high and moderate level amplifications are not always associated with EGFR gene mutation (Table 2). In addition, 17 samples run on SNP arrays, with known EGFR mutation status, also had expression information available (Table 2). Samples with EGFR mutation and/or copy number gain (copy number ≥4) of the EGFR gene were shown to have higher average EGFR expression than samples with wild type, unamplified EGFR on average, as measured by two probe sets on Affymetrix U95AV2 arrays [1537_at (EGFR): P = 0.035; 37327_at (EGFR): P = 0.004]. Our study shows that mutations in the EGFR gene are at least partially independent of EGFR gene amplification and that EGFR expression and amplification are correlated.
The present study represents the first application of genome-wide copy number analysis in lung cancer by SNP array. Our group and others recently showed that this technology provided us with a unique opportunity to assess DNA copy number changes and LOH simultaneously throughout the entire genome (27, 28). Our results illustrate the application of SNP arrays to the analysis of chromosomal alterations across lung cancer genomes. The ability to identify aberrations in the chromosomal structure of cancer cells, and the genes affected by them, has been improved through the present analysis with high resolution. The power of SNP arrays to detect small homozygous deletions with high reliability has been shown in this study with the identification of the novel chromosome 9p23 deletion. As used on unpaired samples in this study, SNP array analysis is comparable with oligonucleotide array CGH, although the SNP arrays also offer the ability to identify copy-neutral LOH when coupled with paired normal samples.
Targeting therapies based on the identification of specific chromosomal abnormalities in tumor cells have already offered great potential; inhibitors to kinases, such as BCR/ABL, KIT, ERBB2, and EGFR, have proven to be valuable as cancer therapies. Thus, kinases, such as CDK4, FGFR1, and MET, identified in this study may be future therapeutic targets in lung cancers upon further validation.
However, it is important to note that the presence of DNA copy number changes does not provide absolute proof of an involvement of these genes within the regions. Therefore, a detailed characterization should be done to elucidate the significance of the regions we have detected in this study, many of which may contain candidate tumor suppressor genes and oncogenes. Novel loci including the recurrent deletion within chromosome segments 3q25 and 9p23 and amplifications within 8q12-13, 12p11, and 22q11 will be of particular interest.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
X. Zhao and B. Weir contributed equally to this work.
Grant support: NIH grant 2P30 CA06516-39 (C. Li), American Cancer Society grant RSG-03-240-01-MGO (M. Meyerson), Flight Attendant Medical Research Institute (M. Meyerson), Arthur and Rochelle Belfer Foundation (C. Li and M. Meyerson), and Lung Cancer Specialized Programs of Research Excellence grant P50CA70907 (J. Minna and L. Girard).
Raw data will be available at the web site http://research2.dfci.harvard.edu/dfci/snp/ and the latest version of the dChip program is available at http://biosun1.harvard.edu/~cli/dchip2004.exe.
We thank Pamela Mole, Maura Berkeley, and Dr. Ed Fox at Dana-Farber Cancer Institute microarray facility for valuable assistance with the SNP array hybridization and Dr. Adi Gazdar for discussions and tumor cell lines.