We constructed a genome-wide transcriptome map of non-small cell lung carcinomas based on gene-expression profiles generated by serial analysis of gene expression (SAGE) using primary tumors and bronchial epithelial cells of the lung. Using the human genome working draft and the public databases, 25,135 nonredundant UniGene clusters were mapped onto unambiguous chromosomal positions. Of the 23,056 SAGE tags that appeared more than once among the nine SAGE libraries, 11,156 tags representing 7,097 UniGene clusters were positioned onto chromosomes. A total of 43 and 55 clusters of differentially expressed genes were observed in squamous cell carcinoma and adenocarcinoma, respectively. The number of genes in each cluster ranged from 18 to 78 in squamous cell carcinomas and from 20 to 165 in adenocarcinomas. The size of these clusters varied from 1.8 Mb to 65.5 Mb in squamous cell carcinomas and from 1.6 Mb to 98.1 Mb in adenocarcinomas. Overall, the clusters with genes over-represented in tumors had an average of 3–4-fold increase in gene expression compared with the normal control. In contrast, clusters of genes with reduced expression had about 50–65% of the gene expression level compared with the normal. Examination of clusters identified in squamous cell lung cancer suggested that 9 of 15 clusters with overexpressed genes and 13 of 28 clusters with underexpressed genes were concordant with previously reported cytogenetic, comparative genomic hybridization or loss of heterozygosity studies. Therefore, at least a portion of the gene clusters identified via the transcriptome map most likely represented the transcriptional or genetic alterations occurred in the tumors. Integrating chromosomal mapping information with gene expression profiles may help reveal novel molecular changes associated with human lung cancer.

Lung cancer accounts for >150,000 cancer deaths each year in the United States alone. NSCLC2 is the predominant form of lung cancer and consists of two major histological subtypes: squamous cell carcinoma and adenocarcinoma. Cytogenetic studies have shown that most NSCLCs displayed complex karyotypes with multiple numerical and structural alterations (1, 2). LOH (3, 4, 5) and CGH studies (6, 7, 8, 9, 10, 11) disclosed significant differences in patterns of allelic imbalances between adenocarcinoma and squamous cell carcinoma, as well as between small cell lung cancer and NSCLC. Recently, using SAGE, we demonstrated that squamous cell carcinoma and adenocarcinoma of the lung have unique gene expression patterns (12, 13).

To achieve genome-wide high-resolution detection of genomic alterations in cancer, Pollack et al.(14) developed a cDNA microarray-based CGH method that used 3360 cDNAs representing 3195 RH-mapped genes as probes in an effort to determine DNA copy-number variation in cancer. Caron et al.(15) constructed a genome-wide mRNA expression profile (transcriptome map) of several tumors using the publicly available SAGE tag-mapping database. By assigning genes to the RH-based chromosomal position, clusterings of highly expressed genes to specific chromosomal regions were observed. Completion of the initial human genome working draft (16, 17) has now made it possible to directly assign the genome-wide high-throughput gene expression profiles to the human genome working draft sequence. Using nine different SAGE libraries generated from primary lung tumors and normal respiratory epithelial cells, we have integrated gene expression profiles with the human genome working draft sequence. We observed clusters of genes that were differentially expressed between the tumors and the corresponding normal epithelial cells on the resulting transcriptome map. Many of these clusters corresponded to regions of gene amplification or deletion demonstrated previously by CGH and/or LOH analyses, indicating the clustering of gene expression changes in the specific chromosome regions may represent the underlying genetic and transcriptional alterations in lung cancer.

Tissue and SAGE Libraries.

Primary lung tumor tissues used for SAGE were obtained after surgical resection at Johns Hopkins Hospital, Baltimore, MD, as described previously (12). SAEC and NHBE cells were purchased from Clonetics/BioWhittaker (Walkersville, MD) and propagated following the manufacturer’s instruction. SAGE libraries were generated using a total of nine different samples including two of each from primary cultures of normal small or large airway epithelial cells (SAEC and NHBE, respectively), two of each from primary squamous cell carcinomas and adenocarcinomas of the lung, and a ninth library derived from the A549 lung adenocarcinoma cell line. A total of 374,643 tags were sequenced. Approximately 55,000 tags were analyzed for squamous cell carcinomas, NHBE cells, and A549 cell libraries, whereas about 22,000 tags were generated from each adenocarcinoma and SAEC libraries (12, 13).

Public Databases and Software.

Gene identity and UniGene Cluster assignment of each SAGE tag were obtained using the SAGEmap reliable tag-to-gene mapping table (dated June 19, 2001).3 The expression level of each transcript was normalized in each experiment to represent the occurrences of each transcript per 100,000 transcripts in the library. The public databases used in this study were the April 1, 2001 freeze of the UCSC human genome working draft sequence and its annotation database (dated July 16, 2001),4 UniGene Clusters (build #138, dated July 29, 2001),5 and RefSeq (dated June 15, 2001).6 Sequence-matching searches over the human genome working draft sequence were performed using a BLAT program (courtesy of W. James Kent, UCSC, Santa Cruz, CA) that was implemented onto the NIH Biowulf/Lobos3 Beowulf superclusters. Construction of the transcriptome-mapping database was done using MySQL database managing system (ver.3.23.39)7 and Perl (ver.5.6.0).

Clustering of Differentially Expressed Genes.

To identify the clusters of increased or decreased gene expression patterns in tumors, the moving-median ratio of tumor tissues versus normal cells was used to survey the SAGE-based transcriptome map. The median expression levels of tumor over normal or normal over tumor in squamous tumors were calculated for a window size (W) of 15 positionally consecutive UniGene entities. A clustering of differentially expressed genes was defined as eight or more runs (R) of consecutive moving-medians having a lower limit (K) of 1.8 times the genomic median. For adenocarcinomas, more stringent parameters, having W = 19, r = 10, and K = 2, were used because of the smaller size of the SAGE libraries. Finally, the size of cluster was reduced to contain only the minimal numbers of consecutive genes with consistent gene expression patterns. Monte-Carlo method was used according to Caron et al.(15) to evaluate whether the observed number of clusters with differentially expressed genes was more than what would be expected by chance alone.

Assignment of UniGene Clusters on the Human Genome Working Draft.

An outline for the construction of the transcriptome map is shown in Fig. 1. We assigned UniGene clusters onto chromosome positions, as chromosomal coordinates, based on the UCSC human genome working draft sequence. A total of 1,271,925 accessions consist of Known Genes, mRNA, or spliced EST entities in the annotation of the working draft sequence were used. Additional 371,123 UniGene and RefSeq sequences not included in the UCSC annotations at the time of this study were sequence-matched over the human genome working draft sequence using the BLAT program that was implemented on the NIH Biowulf/Lobos3 Beowulf supercluster. Only the accession entry with the highest matching score was chosen when BLAT outputs were redundant. To reduce redundancy, accessions that had a shared UniGene cluster identity and a positional overlap were joined as a single positional cluster. Positional clusters that shared the same strand orientation and at least one common exon were also joined. Those that were represented by only one EST accession were removed to reduce the possibility of contaminant genomic DNA segment. When UniGene clusters matched to multiple positions, the positional clusters derived from the most reliable category (i.e., RefSeq > mRNA > spliced EST) were used. Clusters having the same UniGene cluster identity and located within a 1-Mb distance were joined together. In doing so, 28,223 unique chromosomal positions were assigned to represent 27,542 UniGene clusters, and 26,992 of these clusters had a unique correspondence to chromosomal positions. These figures were considered as reasonable given the current estimate of 25,000–40,000 genes predicted in the human genome (16, 17).

Generation of the Transcriptome Map for NSCLC on the Basis of SAGE.

As summarized in Fig. 1, a total of 374,643 tags from nine lung SAGE libraries were included for the construction of NSCLC transcriptome map. SAGE analysis of five cancer tissues and four normal respiratory epithelial cells identified 66,501 distinct tags. After removing tags that matched to mitochondrial DNA or ribosomal RNA sequences, 23,056 of the unique tags appearing more than once among all nine of the libraries were subjected to UniGene cluster assignment using the SAGEmap reliable tag-to-gene mapping table. A total of 18,595 tags were assigned to one or more UniGene clusters, and 11,156 of these tags, representing 7,097 UniGene clusters, were assigned to unique chromosomal positions. The remaining 7,439 tags were excluded because of their multiple assignments to the genome. Of these clusters, 6,501 UniGene clusters were expressed in either squamous cell carcinoma and/or NHBE libraries, whereas 5,512 genes were expressed in either adenocarcinoma and/or SAEC cells. The discrepancy among the numbers of distinct SAGE tags, UniGene clusters, and chromosomal positions could be accounted for in part by alternatively spliced transcripts and the presence of multiple polyadenylation sites, which can result in multiple SAGE tags for the same gene (18). The resulting transcriptome map from SAGE in NSCLC is shown in Fig. 2.

Clustering of Differentially Expressed Genes.

Similar to previous reports (12, 19), a majority of the genes were expressed at compatible levels between the tumors and the corresponding normal cells based on SAGE tag counts. However, clustering of genes with similar expression pattern was present along segments of many chromosomes. To identify chromosomal clustering of highly differentially expressed genes, we used an unbiased approach based on a moving-median of the gene expression levels and then tested the window sizes and folds of differential expression based on existing cytogenetic, CGH, and LOH data. Not surprisingly, most chromosomes showed clustering of genes differentially expressed between normal and tumor tissues. For squamous cell carcinoma, 43 clusters were observed among the 6,501 UniGene clusters analyzed, whereas 55 clusters of genes were observed among the 5,512 UniGene clusters used in adenocarcinoma. The statistical significance of the resulted gene clustering was evaluated using the Monte-Carlo simulation to determine whether the observed clustering of differentially expressed genes was a random event (Table 1). In this evaluation, we regarded all of the chromosomes as a continuum of ordered genes and permutated the genomic order of all of the genes for squamous cell carcinoma or adenocarcinoma. A total of 10,000 simulations were performed to determine the incidence of clustering for each cancer type. For squamous cell carcinoma, we identified 16 clusters with increased gene expression and 30 clusters with reduced gene expression in the tumors. In contrast, Monte-Carlo simulation expected an average of 6.5 and 15.0 clusters, respectively, by random permutation. Therefore, it was highly unlikely that the clusters identified via the transcriptome map resulted from random variation in the gene clustering over the genome (P = 0.000064 and P = 0.000006 for over- and underexpressed clusters, respectively).

Overall, the genes increased in tumor had an average of 3–4-fold increase in expression, whereas clusters with decreased expression contained genes with an average of 50–65% the normal level of expression. For adenocarcinoma, 41 and 23 clusters were observed for over- and underexpressed gene clusters, respectively, compared with 38.7 and 11.5 clusters by Monte-Carlo simulation. This result suggested that the observed clustering could be observed by chance alone for genes overexpressed in adenocarcinomas (P = 0.5652). Several possible explanations for this observation are: (a) adenocarcinomas are intrinsically more similar to normal lung parenchyma because of their peripheral location; (b) the number of SAGE tags available in this tumor type may be insufficient for accurate, reliable analyses of differential expression along each chromosome; and (c) the variable histological subtyping of lung adenocarcinoma. Although the clusters with decreased gene expression in adenocarcinoma were reliable (P = 0.00004), we focused our analysis on squamous tumors. When each chromosome was considered separately, there were 15 clusters of genes with increased expression levels and 28 clusters of genes with decreased expression levels. No clustering was observed on chromosomes 6, 15, 18, 21, 22, and X.

Clustering of Genes with Increased Expression in Squamous Cell Carcinoma.

Gene amplification is generally associated with tumor progression, occuring often in a subset of late-stage cancers, and has prognostic significance (20). Some of the amplicons are known to harbor oncogenes or growth-regulatory genes. Previous CGH analyses have shown that chromosomal arms 3q, 5p, 7p, and 8q were the most frequently over-represented in NSCLC, and gene amplification was often present in chromosomal regions 3q13, 3q26, 3q28-qter, 7q11.2, 8p11-12, 8q24, 12p12, and 19q13.1-13.2 (6). Of the 15 clusters containing genes over-represented in squamous cell carcinoma (Table 2), 9 were located at 1q, 3q, 5p, 7p, and 8q, and were consistent with previous reports (6, 7, 9, 10, 11).

Chromosome region 3q24-qter is one of the most frequent targets for amplification in squamous cell carcinomas of the lung, head and neck, and the cervix (21, 22, 23, 24). On the basis of the literature, amplification occurs at several locations on 3q and is especially frequent at 3q26.2-q26.3. Several candidate targets for 3q26 amplicon in squamous cell carcinomas of the head and neck, and esophagus include PIK3CA(25), SKIL (SNO; Ref. 26), and the RNA component of the telomerase (27). Our transcriptome map revealed two distinct regions, 3q25-q26.3 and 3q27-q29 (clusters #13–14 and #15). On the basis of our SAGE data, the PIK3CA gene was not expressed in either normal or tumor tissues, whereas the SKIL gene was minimally expressed only in squamous cell carcinoma but not in NHBE cells. Therefore, other target gene(s) may remain to be discovered for 3q25-q26 amplicon. In contrast, the p53 homologue p63 gene, mapped at 3q28, is one of the likely candidates responsible for our observed amplification of 3q27-q29 (cluster #15) as it was 33-fold higher in the tumor by SAGE analysis (13) and is consistent with the published reports (28).

Clustering of Genes with Decreased Expression in Squamous Cell Carcinoma.

To identify potential tumor suppressor loci involved in the pathogenesis of lung cancer, high-resolution (10 cM), genome-wide LOH analyses has been reported (5). Among 36 NSCLC cell lines, 45 minimal areas of allelic loss were identified including nine hotspots where >60% of the cell lines had LOH at one or more of these locations. These areas included 1p36.12-qter, 8p21.3-q23.1, 9p21.3-p22, 13q11, 13q12-q14, 17p12-p13, 19p13.3, Xp-q21, and Xq22.1. Previous CGH analyses in NSCLC (7) and squamous cell carcinoma of the lung (9) showed frequent DNA copy-number decreases on chromosomes 1p21-p31, 2q34-q36, 3p, 4p, 4q, 5q, 6q14-q24, 8p, 9p, 10q, 13q13-qter, 18q12-qter, and 21q21. These regions were common regions of allelic losses believed to harbor either known or unidentified tumor suppressor genes. Our results showed 28 clusters of underrepresented genes throughout the genome in squamous cell carcinoma. Thirteen of these 28 clusters were concordant with previous reports by either LOH and/or CGH analyses.

Chromosome arm 1p is one of the most frequently deleted chromosomal regions in various neoplasms including neuroblastoma, breast, and lung cancers. Although TP73 is located at 1p36, no inactivation of TP73 by either somatic mutation or DNA methylation has been observed (29). In squamous cell carcinoma, we observed cluster #1 that corresponded to the shortest region of deletion located between D1S507 and TP73 at 1p36.2 (29). This cluster included TP73 but was located outside of the PRDM2 (RIZ1) gene inactivated by promoter methylation in liver and breast cancers (30).

Our SAGE-derived transcriptome map displayed three clusters of decreased gene expression on chromosome arm 3p (clusters #10–12). Cluster #10 at 3p24-p22 contained the MLH1 gene, which has been reported as deleted in association with the presence of p53 mutations and tobacco exposure (31). Although there was no difference in the expression levels of the MLH1 gene between normal and tumor tissues, genes surrounding MLH1 were underexpressed in our SAGE data. Cluster #12 corresponded to 3p21, a region most commonly deleted in lung cancer (32). On the basis of our SAGE data, the MST1R and RASSF1A genes at 3p21 were overexpressed in squamous cell carcinoma, whereas the SEMA3F, SEMA3B, IFRD2, FUS1, PL6, and MAPKAPK3 genes were underrepresented (32, 33, 34, 35, 36). Consistent with this observation, both SEMA3B and FUS1 genes have been shown to be epigenetically silenced by promoter methylation in lung cancers (32, 35, 36).

Allelic loss on chromosome arm 10q, especially at 10q21-qter, has been observed by LOH and CGH studies in various malignancies including renal cell carcinoma, bladder, endometrial as well as prostatic cancers, glioma, malignant melanoma, and lymphoma, as well as squamous cell carcinomas of the head and neck and NSCLCs (5, 9, 37, 38). LOH at 10q has been associated with a metastatic phenotype and poor prognosis in tumors of the upper respiratory tract (38, 39). The PTEN/MMAC1 gene is a candidate tumor suppressor responsible for the 10q23.3 deletion because it is mutated in multiple advanced cancers including renal cell carcinoma, prostatic cancer, breast cancer, and glioma. The PTEN gene was associated with the tumorigenesis of squamous cell carcinoma in the lung, and head and neck (37). However, other studies (39, 40) indicated that the PTEN gene was normally transcribed and expressed despite the presence of LOH close to the locus (40). In addition, genetic alterations were reported to be infrequent in squamous cell carcinomas of the head and neck (38), and lung (39). These results suggested that tumor-suppressor gene(s) other than PTEN, involved in the malignant progression of tumors of the upper respiratory tract, might exist in 10q. Our transcriptome map showed that the PTEN gene was located 8.2 Mb telomeric to the cluster at 10q24 (cluster # 27), and PTEN gene expression level was 2-fold higher in squamous tumor than in normal cells. Therefore, other genes more proximal to PTEN may be targeted for loss in NSCLC.

Implications.

We have constructed a transcriptome map based on gene expression profiles generated by SAGE in squamous cell carcinomas, adenocarcinomas, and normal lung epithelial cells. To validate the lung transcriptome map that was generated using the human genome working draft sequence, we compared it to the previous tag assignments of the Cancer Genome Anatomy Project, which was developed based on the Genebridge4 RH-map (42). Physical assignment of the genes to different chromosomes was observed in 1.9% of the UniGene clusters present on the RH-based Cancer Genome Anatomy Project map indicating the chromosome assignment of SAGE tags was at least 98% accurate. Furthermore, similar to the RH-based transcriptome map (15), we observed an uneven distribution of differentially expressed genes throughout the genome as represented by clusterings of highly expressed genes to specific chromosomal regions. This observation is consistent with the fact that genetic alterations often affect a set of genes closely positioned in the genome. The knowledge of such clustering along the chromosomes may provide alternative markers for cancer detection and prognosis. However, the identification of a particular gene cluster can only be suggestive of a potential chromosomal region with relatively consistent gene expression patterns because every cluster contained at least some genes with expression patterns different or unchanged from others in the same cluster.

In squamous tumors, about half of the clusters identified were consistent with the previous reports by CGH and/or LOH analyses. It is highly likely that some of the remaining clusters may represent novel gene expression changes at chromosome regions not implicated previously in NSCLCs. It is also possible that some clusters may have been identified by chance alone. Another possible caveat is the fact that not all highly expressed genes are amplified, nor are all amplified genes highly expressed (14). Nonetheless, our analyses suggest that a substantial number of gene clusters were supported by other studies, and some of them are likely involved in tumorigenesis and progression of lung cancers.

It is worth noting that the transcriptome map we have generated appeared to be more sensitive in detecting genes with increased expression levels, because no clustering was seen in several regions that were known to be deleted in squamous cell lung cancers. This lower sensitivity in detecting deleted regions on the transcriptome map could result if the normal expression levels of the genes were already low or close to the background. This fact is also shown in Table 1 where the ratio of gene expression was 3–4-fold for overexpressed gene clusters but only <50% for those with reduced expression. Therefore, it appears that overexpressed genes were more likely identified, whereas those having reduced expression could have been under-represented on the transcriptome map. Additional studies using LOH, CGH, or fluorescence in situ hybridization will be needed to examine the candidate cluster regions to better understand the positional clustering of gene expression changes with the chromosomal alterations at the corresponding region.

Finally, the accuracy of the present transcriptome map is entirely dependent on and subject to the completeness of the human genome working draft and its annotations. It is limited by the methods of choice for tag to UniGene assignment and criteria for redundancy reduction of the UniGene clusters. The detection of gene clusters with the most differential gene expression is also subject to the method of analysis, the size and reliabilities of the SAGE libraries, and the number of genes, as well as the genomic length of the clusters. Although the window size and the cluster identification criteria could affect the resulting clusters, altering the parameters almost always resulted in statistically significant clustering of genes for squamous cell carcinomas. In contrast, clusters of genes overexpressed in adenocarcinomas appeared to be much less reliable. Identification of gene clusters for this tumor type may rely on the identification of additional SAGE tags from the respected libraries or the inclusion of additional samples. Nevertheless, the results presented here suggest that the ability to map gene expression profiles onto specific chromosome locations will likely facilitate the identification of novel genetic changes that underlie lung tumorigenesis and the use of this knowledge to guide clinical management of the cancer patients.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

2

The abbreviations used are: NSCLC, non-small cell lung cancer; SAGE, serial analysis of gene expression; LOH, loss of heterozygosity; CGH, comparative genomic hybridization; EST, expressed sequence tag; RH, radiation-hybrid; NHBE, normal human bronchial epithelial; SAEC, small airway epithelial cell; UCSC, University of California Santa Cruz; RefSeq, Reference Sequences.

3

http://www.ncbi.nlm.nih.gov/SAGE/.

4

http://genome.ucsc.edu/.

5

http://www.ncbi.nlm.nih.gov/UniGene/.

6

http://www.ncbi.nlm.nih.gov/LocusLink/refseq.html.

7

http://www.mysql.com/.

Fig. 1.

Outline for the construction of the transcriptome map.

Fig. 1.

Outline for the construction of the transcriptome map.

Close modal
Fig. 2.

Transcriptome map of NSCLC based on SAGE. Expression is shown for eight SAGE libraries consisting of two each of NHBE, squamous cell carcinomas, SAEC, and adenocarcinomas. The number of gene expression in each tissue type was the sum of tag values from the two libraries of the same tissue or cell type and then normalized to per 100,000 tags. The levels of gene expression are displayed from 0 to 100 with the expression levels over 100 shown as 100. Vertical bars indicate the clustered regions of genes with increased (red) or decreased (green) expression levels, respectively. The average, median, and maximum expression levels of all genes was 14.7, 3.3, and 6452.6 for squamous cell carcinoma; 14.4, 3.9, and 1738.1 for NHBE cell; 17.8, 4.7, and 3279.6 for adenocarcinoma; and 14.9, 4.6, and 1481.3 for SAEC cell libraries. Yellow, orange, and dark brown shading on the chromosome diagrams represent centromeric, variable regions, and stalk, respectively.

Fig. 2.

Transcriptome map of NSCLC based on SAGE. Expression is shown for eight SAGE libraries consisting of two each of NHBE, squamous cell carcinomas, SAEC, and adenocarcinomas. The number of gene expression in each tissue type was the sum of tag values from the two libraries of the same tissue or cell type and then normalized to per 100,000 tags. The levels of gene expression are displayed from 0 to 100 with the expression levels over 100 shown as 100. Vertical bars indicate the clustered regions of genes with increased (red) or decreased (green) expression levels, respectively. The average, median, and maximum expression levels of all genes was 14.7, 3.3, and 6452.6 for squamous cell carcinoma; 14.4, 3.9, and 1738.1 for NHBE cell; 17.8, 4.7, and 3279.6 for adenocarcinoma; and 14.9, 4.6, and 1481.3 for SAEC cell libraries. Yellow, orange, and dark brown shading on the chromosome diagrams represent centromeric, variable regions, and stalk, respectively.

Close modal
Fig. 2B.

(continued).

Table 1

Validation of differentially expressed gene clusters by Monte-Carlo simulation

Squamous cell carcinomaAdenocarcinoma
Overexpressed in tumorUnderexpressed in tumorOverexpressed in tumorUnderexpressed in tumor
Observed number of clusters in genomea 16 30 41 23 
Expected average number (±S.D.) 6.5 ± 2.4 15 ± 3.3 38.7 ± 4.0 11.5 ± 2.8 
P              b 0.000064 0.000006 0.5652 0.00004 
Average tumor/normal ratio (range) 3.15 (2.76–3.52) 0.65 (0.43–0.78) 4.16 (1.54–4.39) 0.50 (0.27–0.62) 
Squamous cell carcinomaAdenocarcinoma
Overexpressed in tumorUnderexpressed in tumorOverexpressed in tumorUnderexpressed in tumor
Observed number of clusters in genomea 16 30 41 23 
Expected average number (±S.D.) 6.5 ± 2.4 15 ± 3.3 38.7 ± 4.0 11.5 ± 2.8 
P              b 0.000064 0.000006 0.5652 0.00004 
Average tumor/normal ratio (range) 3.15 (2.76–3.52) 0.65 (0.43–0.78) 4.16 (1.54–4.39) 0.50 (0.27–0.62) 
a

The values shown are higher than the actual number of gene clusters because all of the chromosomes were regarded as a continuum and a few clusters had boundaries between chromosomes.

b

The probability to obtain the observed or greater differences between the expected average numbers versus the actual numbers of clusters identified by chance alone.

Table 2

Clustering of differentially expressed genes between squamous cell carcinoma and normal bronchial epithelial cells on SAGE-based transcriptome map

Information on the genes included in each cluster can be found at http://lpg.nci.nih.gov/LPG/jen/proj1.

Cluster no.Cytogenetic bandChromosomal positionaStatusNumber of genesMedian T/N ratioNotable genes/tumor suppressor/oncogenesbReferencesc
TotalIncreasedDecreasedNo change
1p36.2–p36.3 Chr1:2205808–7866192 − 20 12 0.43 [TP73] (5, 7, 29, 42) 
1p36.1 Chr1:17749951–23736394 − 20 11 0.59   (5)  
2p22–p23 Chr2:27865382–33722776 − 23 14 0.73   
2p11.2–q11.1 Chr2:80547374–96027377 − 25 18 0.52   
2q14.2–q21 Chr2:121891208–136835134 − 22 12 0.75   
2q35–q36 Chr2:223459509–231867469 − 20 12 0.70  (7, 9) 
2q37 Chr2:238670940–247114424 − 22 14 0.60  (7, 9, 43) 
10 3p22–p24 Chr3:29156374–43672061 − 22 13 0.76 MLH1  
11 3p21 Chr3:50161501–53381366 − 21 12 0.74  (5, 7, 9, 31, 44) 
12 3p21 Chr3:55712479–58579957 − 28 15 0.78 SEMA3B, FUS1, RASSF1 (5, 7, 9, 3233343536, 44) 
16 4p15.3–p16 Chr4:6772157–18506596 − 16 11 0.58  (7, 9, 43) 
20 5q35 Chr5:185901250–191841574 − 26 15 0.64   
24 9p13–p21 Chr9:35863844–38031975 − 20 12 0.62 [11.5 Mb telomeric to CDKN2A(567, 9, 11, 42) 
27 10q24 Chr10:104155005–111136770 − 39 24 0.68 [Centromeric to PTEN(5, 9, 38, 39) 
28 11q13 Chr11:79065935–84562231 − 23 12 0.70 [2.7 Mb telomeric to CCND1 
29 12q13 Chr12:55578725–59469204 − 29 17 0.70 KRT7, KRT5, KRT6A, KRT1, KRT4, KRT8, HOXC, PFDN5  
32 16q12.2–q21 Chr16:64645711–68405819 − 25 14 0.66 MT2A, MT1E, MMP15, [2.6 Mb centromeric to CDH1 
33–35d 17q21 Chr17:42070147–54387502 − 78 41 30 0.72 KRT17, KRT19, KRT14, KRT16, [BRCA1], WNT3, HOXB7  
36 17q25 Chr17:81703388–85614916 − 18 12 0.57 [TOC], SEC14L1, TIMP2  
37 19p13.3 Chr19:3149667–5728422 − 25 14 0.74 [0.9 Mb centromeric to APCL(5, 7, 11) 
38 19p13.1 Chr19:17718229–21340745 − 20 12 0.76   
39 19q13.1 Chr19:36880234–46441554 − 40 25 0.60   
40 19q13.1–q13.2 Chr19:51112826–54555604 − 22 12 0.59  (5, 7) 
41 19q13.3 Chr19:59444162–63343735 − 41 26 15 0.60   
42 20q12–q13.1 Chr20:41509642–49168634 − 38 10 22 0.61   
43 20q13.3 Chr20:62621763–64468901 − 17 10 0.58   
1q42–q43 Chr1:259270782–274578830 23 12 3.20  (6, 7, 91011, 44) 
2q31–q32 Chr2:172570929–194025818 29 15 13 2.76  (10, 45) 
13–14 3q25–q26.3 Chr3:162192762–193593987 37 20 14 3.52 PIK3CA, SKIL (6, 7, 91011, 42, 44) 
15 3q27–q29 Chr3:197917861–215639073 53 31 18 3.52 TP63 (6, 7, 91011, 42, 44) 
17 4q21–q24 Chr4:80267829–105587989 30 16 3.05   
18–19 5p11–q13 Chr5:1663141–72954158 64 39 18 3.41  (6, 7, 91011) 
21 7p14–p22 Chr7:4316431–30797633 39 23 12 3.29  (6, 10, 44, 46) 
22–23 8q21.1–q24.1 Chr8:77518537–128248010 56 30 22 2.96 [4.3 Mb centromeric to MYC(6, 7, 91011, 42, 44) 
25 9p12–q22 Chr9:40952402–91352574 30 16 12 3.16   
26 9q31–q32 Chr9:107340066–115634147 21 13 3.17   
30 13q12–q14 Chr13:23162216–39892274 27 14 2.81   
31 14q23–q24 Chr14:55160283–65335623 25 13 10 2.97   
Cluster no.Cytogenetic bandChromosomal positionaStatusNumber of genesMedian T/N ratioNotable genes/tumor suppressor/oncogenesbReferencesc
TotalIncreasedDecreasedNo change
1p36.2–p36.3 Chr1:2205808–7866192 − 20 12 0.43 [TP73] (5, 7, 29, 42) 
1p36.1 Chr1:17749951–23736394 − 20 11 0.59   (5)  
2p22–p23 Chr2:27865382–33722776 − 23 14 0.73   
2p11.2–q11.1 Chr2:80547374–96027377 − 25 18 0.52   
2q14.2–q21 Chr2:121891208–136835134 − 22 12 0.75   
2q35–q36 Chr2:223459509–231867469 − 20 12 0.70  (7, 9) 
2q37 Chr2:238670940–247114424 − 22 14 0.60  (7, 9, 43) 
10 3p22–p24 Chr3:29156374–43672061 − 22 13 0.76 MLH1  
11 3p21 Chr3:50161501–53381366 − 21 12 0.74  (5, 7, 9, 31, 44) 
12 3p21 Chr3:55712479–58579957 − 28 15 0.78 SEMA3B, FUS1, RASSF1 (5, 7, 9, 3233343536, 44) 
16 4p15.3–p16 Chr4:6772157–18506596 − 16 11 0.58  (7, 9, 43) 
20 5q35 Chr5:185901250–191841574 − 26 15 0.64   
24 9p13–p21 Chr9:35863844–38031975 − 20 12 0.62 [11.5 Mb telomeric to CDKN2A(567, 9, 11, 42) 
27 10q24 Chr10:104155005–111136770 − 39 24 0.68 [Centromeric to PTEN(5, 9, 38, 39) 
28 11q13 Chr11:79065935–84562231 − 23 12 0.70 [2.7 Mb telomeric to CCND1 
29 12q13 Chr12:55578725–59469204 − 29 17 0.70 KRT7, KRT5, KRT6A, KRT1, KRT4, KRT8, HOXC, PFDN5  
32 16q12.2–q21 Chr16:64645711–68405819 − 25 14 0.66 MT2A, MT1E, MMP15, [2.6 Mb centromeric to CDH1 
33–35d 17q21 Chr17:42070147–54387502 − 78 41 30 0.72 KRT17, KRT19, KRT14, KRT16, [BRCA1], WNT3, HOXB7  
36 17q25 Chr17:81703388–85614916 − 18 12 0.57 [TOC], SEC14L1, TIMP2  
37 19p13.3 Chr19:3149667–5728422 − 25 14 0.74 [0.9 Mb centromeric to APCL(5, 7, 11) 
38 19p13.1 Chr19:17718229–21340745 − 20 12 0.76   
39 19q13.1 Chr19:36880234–46441554 − 40 25 0.60   
40 19q13.1–q13.2 Chr19:51112826–54555604 − 22 12 0.59  (5, 7) 
41 19q13.3 Chr19:59444162–63343735 − 41 26 15 0.60   
42 20q12–q13.1 Chr20:41509642–49168634 − 38 10 22 0.61   
43 20q13.3 Chr20:62621763–64468901 − 17 10 0.58   
1q42–q43 Chr1:259270782–274578830 23 12 3.20  (6, 7, 91011, 44) 
2q31–q32 Chr2:172570929–194025818 29 15 13 2.76  (10, 45) 
13–14 3q25–q26.3 Chr3:162192762–193593987 37 20 14 3.52 PIK3CA, SKIL (6, 7, 91011, 42, 44) 
15 3q27–q29 Chr3:197917861–215639073 53 31 18 3.52 TP63 (6, 7, 91011, 42, 44) 
17 4q21–q24 Chr4:80267829–105587989 30 16 3.05   
18–19 5p11–q13 Chr5:1663141–72954158 64 39 18 3.41  (6, 7, 91011) 
21 7p14–p22 Chr7:4316431–30797633 39 23 12 3.29  (6, 10, 44, 46) 
22–23 8q21.1–q24.1 Chr8:77518537–128248010 56 30 22 2.96 [4.3 Mb centromeric to MYC(6, 7, 91011, 42, 44) 
25 9p12–q22 Chr9:40952402–91352574 30 16 12 3.16   
26 9q31–q32 Chr9:107340066–115634147 21 13 3.17   
30 13q12–q14 Chr13:23162216–39892274 27 14 2.81   
31 14q23–q24 Chr14:55160283–65335623 25 13 10 2.97   
a

Chromosomal positions are indicated as nucleotide coordinates based on the UCSC Human Genome Working Draft (April 1, 2001 freeze).

b

Genes in brackets are within the cluster but not observed in the lung SAGE libraries.

c

Publications with results consistent with those observed in this study.

d

Gaps up to 1.3 Mb exist among the clusters.

We thank Drs. Mariana Nacht and Stephen L. Madden at Genzyme Molecular Oncology (Framingham, MA) for sharing the SAGE data. We also thank Dr. Maxwell P. Lee for advice, Dr. Michael Lerman for critical review of the manuscript, and John Curran and Dr. Daoud Meerzaman for technical support.

1
Testa J. R., Siegfried J. M., Liu Z., Hunt J. D., Feder M. M., Litwin S., Zhou J. Y., Taguchi T., Keller S. M. Cytogenetic analysis of 63 non-small cell lung carcinomas: recurrent chromosome alterations amid frequent and widespread genomic upheaval.
Genes Chromosomes Cancer
,
11
:
178
-194,  
1994
.
2
Mertens F., Johansson B., Hoglund M., Mitelman F. Chromosomal imbalance maps of malignant solid tumors: a cytogenetic survey of 3185 neoplasms.
Cancer Res.
,
57
:
2765
-2780,  
1997
.
3
Sato S., Nakamura Y., Tsuchiya E. Difference of allelotype between squamous cell carcinoma and adenocarcinoma of the lung.
Cancer Res.
,
54
:
5652
-5655,  
1994
.
4
Sanchez-Cespedes M., Ahrendt S. A., Piantadosi S., Rosell R., Monzo M., Wu L., Westra W. H., Yang S. C., Jen J., Sidransky D. Chromosomal alterations in lung adenocarcinoma from smokers and nonsmokers.
Cancer Res.
,
61
:
1309
-1313,  
2001
.
5
Girard L., Zochbauer-Muller S., Virmani A. K., Gazdar A. F., Minna J. D. Genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering.
Cancer Res.
,
60
:
4894
-4906,  
2000
.
6
Balsara B. R., Sonoda G., du Manoir S., Siegfried J. M., Gabrielson E., Testa J. R. Comparative genomic hybridization analysis detects frequent, often high-level, overrepresentation of DNA sequences at 3q, 5p, 7p, and 8q in human non-small cell lung carcinomas.
Cancer Res.
,
57
:
2116
-2120,  
1997
.
7
Petersen I., Bujard M., Petersen S., Wolf G., Goeze A., Schwendel A., Langreck H., Gellert K., Reichel M., Just K., du Manoir S., Cremer T., Dietel M., Ried T. Patterns of chromosomal imbalances in adenocarcinoma and squamous cell carcinoma of the lung.
Cancer Res.
,
57
:
2331
-2335,  
1997
.
8
Michelland S., Gazzeri S., Brambilla E., Robert-Nicoud M. Comparison of chromosomal imbalances in neuroendocrine and non-small-cell lung carcinomas.
Cancer Genet. Cytogenet.
,
114
:
22
-30,  
1999
.
9
Petersen S., Aninat-Meyer M., Schluns K., Gellert K., Dietel M., Petersen I. Chromosomal alterations in the clonal evolution to the metastatic stage of squamous cell carcinomas of the lung.
Br. J. Cancer
,
82
:
65
-73,  
2000
.
10
Pei J., Balsara B. R., Li W., Litwin S., Gabrielson E., Feder M., Jen J., Testa J. R. Genomic imbalances in human lung adenocarcinomas and squamous cell carcinomas.
Genes Chromosomes Cancer
,
31
:
282
-287,  
2001
.
11
Luk C., Tsao M. S., Bayani J., Shepherd F., Squire J. A. Molecular cytogenetic analysis of non-small cell lung carcinoma by spectral karyotyping and comparative genomic hybridization.
Cancer Genet. Cytogenet.
,
125
:
87
-99,  
2001
.
12
Hibi K., Liu Q., Beaudry G. A., Madden S. L., Westra W. H., Wehage S. L., Yang S. C., Heitmiller R. F., Bertelsen A. H., Sidransky D., Jen J. Serial analysis of gene expression in non-small cell lung cancer.
Cancer Res.
,
58
:
5690
-5694,  
1998
.
13
Nacht M., Dracheva T., Gao Y., Fujii T., Chen Y. D., Player A., Akmaev V., Cook B., Dufault M., Zhang M., Zhang W., Guo M. Z., Curran J., Han S., Sidransky D., Buetow K., Madden S. L., Jen J. Molecular characteristics of non-small cell lung cancer.
Proc. Natl. Acad. Sci. USA
,
98
:
15203
-15208,  
2001
.
14
Pollack J. R., Perou C. M., Alizadeh A. A., Eisen M. B., Pergamenschikov A., Williams C. F., Jeffrey S. S., Botstein D., Brown P. O. Genome-wide analysis of DNA copy-number changes using cDNA microarrays.
Nat. Genet.
,
23
:
41
-46,  
1999
.
15
Caron H., van Schaik B., van der Mee M., Baas F., Riggins G., van Sluis P., Hermus M. C., van Asperen R., Boon K., Voute P. A., Heisterkamp S., van Kampen A., Versteeg R. The human transcriptome map: clustering of highly expressed genes in chromosomal domains.
Science (Wash. DC)
,
291
:
1289
-1292,  
2001
.
16
Lander E. S., Linton L. M., Birren B., Nusbaum C., Zody M. C., Baldwin J., Devon K., Dewar K., Doyle M., FitzHugh W., et al Initial sequencing and analysis of the human genome.
Nature (Lond.)
,
409
:
860
-921,  
2001
.
17
Smith H. O., Yandell M., Evans C. A., Holt R. A., et al The sequence of the human genome.
Science (Wash. DC)
,
291
:
1304
-1351,  
2001
.
18
Pauws E., van Kampen A. H., van de Graaf S. A., de Vijlder J. J., Ris-Stalpers C. Heterogeneity in polyadenylation cleavage sites in mammalian mRNA sequences: implications for SAGE analysis.
Nucleic Acids Res.
,
29
:
1690
-1694,  
2001
.
19
Zhang L., Zhou W., Velculescu V. E., Kern S. E., Hruban R. H., Hamilton S. R., Vogelstein B., Kinzler K. W. Gene expression profiles in normal and cancer cells.
Science (Wash. DC)
,
276
:
1268
-1272,  
1997
.
20
Lengauer C., Kinzler K. W., Vogelstein B. Genetic instabilities in human cancers.
Nature (Lond.)
,
396
:
643
-649,  
1998
.
21
Bjorkqvist A. M., Husgafvel-Pursiainen K., Anttila S., Karjalainen A., Tammilehto L., Mattson K., Vainio H., Knuutila S. DNA gains in 3q occur frequently in squamous cell carcinoma of the lung, but not in adenocarcinoma.
Genes Chromosomes Cancer
,
22
:
79
-82,  
1998
.
22
Kettunen E., el-Rifai W., Bjorkqvist A. M., Wolff H., Karjalainen A., Anttila S., Mattson K., Husgafvel-Pursiainen K., Knuutila S. A broad amplification pattern at 3q in squamous cell lung cancer-a fluorescence in situ hybridization study.
Cancer Genet. Cytogenet.
,
117
:
66
-70,  
2000
.
23
Hashimoto Y., Oga A., Kawauchi S., Furuya T., Shimizu N., Nakano T., Imate Y., Yamashita H., Sasaki K. Amplification of 3q26 approximately qter correlates with tumor progression in head and neck squamous cell carcinomas.
Cancer Genet. Cytogenet.
,
129
:
52
-56,  
2001
.
24
Singh B., Gogineni S. K., Sacks P. G., Shaha A. R., Shah J. P., Stoffel A., Rao P. H. Molecular cytogenetic characterization of head and neck squamous cell carcinoma and refinement of 3q amplification.
Cancer Res.
,
61
:
4506
-4513,  
2001
.
25
Redon R., Muller D., Caulee K., Wanherdrick K., Abecassis J., du Manoir S. A simple specific pattern of chromosomal aberrations at early stages of head and neck squamous cell carcinomas: PIK3CA but not p63 gene as a likely target of 3q26-qter gains.
Cancer Res.
,
61
:
4122
-4129,  
2001
.
26
Imoto I., Pimkhaokham A., Fukuda Y., Yang Z. Q., Shimada Y., Nomura N., Hirai H., Imamura M., Inazawa J. SNO is a probable target for gene amplification at 3q26 in squamous-cell carcinomas of the esophagus.
Biochem. Biophys. Res. Commun.
,
286
:
559
-565,  
2001
.
27
Sugita M., Tanaka N., Davidson S., Sekiya S., Varella-Garcia M., West J., Drabkin H. A., Gemmill R. M. Molecular definition of a small amplification domain within 3q26 in tumors of cervix, ovary, and lung.
Cancer Genet. Cytogenet.
,
117
:
9
-18,  
2000
.
28
Hibi K., Trink B., Patturajan M., Westra W. H., Caballero O. L., Hill D. E., Ratovitski E. A., Jen J., Sidransky D. AIS is an oncogene amplified in squamous cell carcinoma.
Proc. Natl. Acad. Sci. USA
,
97
:
5462
-5467,  
2000
.
29
Nomoto S., Haruki N., Tatematsu Y., Konishi H., Mitsudomi T., Takahashi T. Frequent allelic imbalance suggests involvement of a tumor suppressor gene at 1p36 in the pathogenesis of human lung cancers.
Genes Chromosomes Cancer
,
28
:
342
-346,  
2000
.
30
Du Y., Carling T., Fang W., Piao Z., Sheu J. C., Huang S. Hypermethylation in human cancers of the RIZ1 tumor suppressor gene, a member of a histone/protein methyltransferase superfamily.
Cancer Res.
,
61
:
8094
-8099,  
2001
.
31
Zienolddiny S., Ryberg D., Arab M. O., Skaug V., Haugen A. Loss of heterozygosity is related to p53 mutations and smoking in lung cancer.
Br. J. Cancer
,
84
:
226
-231,  
2001
.
32
Lerman M. I., Minna J. D. The 630-kb lung cancer homozygous deletion region on human chromosome 3p21.3: identification and evaluation of the resident candidate tumor suppressor genes. The International Lung Cancer Chromosome 3p21.3 Tumor Suppressor Gene Consortium.
Cancer Res.
,
60
:
6116
-6133,  
2000
.
33
Agathanggelou A., Honorio S., Macartney D. P., Martinez A., Dallol A., Rader J., Fullwood P., Chauhan A., Walker R., Shaw J. A., Hosoe S., Lerman M. I., Minna J. D., Maher E. R., Latif F. Methylation associated inactivation of RASSF1A from region 3p21.3 in lung, breast and ovarian tumors.
Oncogene
,
20
:
1509
-1518,  
2001
.
34
Burbee D. G., Forgacs E., Zochbauer-Muller S., Shivakumar L., Fong K., Gao B., Randle D., Kondo M., Virmani A., Bader S., Sekido Y., Latif F., Milchgrub S., Toyooka S., Gazdar A. F., Lerman M. I., Zabarovsky E., White M., Minna J. D. Epigenetic inactivation of RASSF1A in lung and breast cancers and malignant phenotype suppression.
J. Natl. Cancer Inst.
,
93
:
691
-699,  
2001
.
35
Tomizawa Y., Sekido Y., Kondo M., Gao B., Yokota J., Roche J., Drabkin H., Lerman M. I., Gazdar A. F., Minna J. D. Inhibition of lung cancer cell growth and induction of apoptosis after reexpression of 3p21.3 candidate tumor suppressor gene SEMA3B.
Proc. Natl. Acad. Sci. USA
,
98
:
13954
-13959,  
2001
.
36
Kondo M., Ji L., Kamibayashi C., Tomizawa Y., Randle D., Sekido Y., Yokota J., Kashuba V., Zabarovsky E., Kuzmin I., Lerman M., Roth J., Minna J. D. Overexpression of candidate tumor suppressor gene FUS1 isolated from the 3p21.3 homozygous deletion region leads to G1 arrest and growth inhibition of lung cancer cells.
Oncogene
,
20
:
6258
-6262,  
2001
.
37
Okami K., Wu L., Riggins G., Cairns P., Goggins M., Evron E., Halachmi N., Ahrendt S. A., Reed A. L., Hilgers W., Kern S. E., Koch W. M., Sidransky D., Jen J. Analysis of PTEN/MMAC1 alterations in aerodigestive tract tumors.
Cancer Res.
,
58
:
509
-511,  
1998
.
38
Petersen S., Rudolf J., Bockmuhl U., Gellert K., Wolf G., Dietel M., Petersen I. Distinct regions of allelic imbalance on chromosome 10q22–q26 in squamous cell carcinomas of the lung.
Oncogene
,
17
:
449
-454,  
1998
.
39
Gasparotto D., Vukosavljevic T., Piccinin S., Barzan L., Sulfaro S., Armellin M., Boiocchi M., Maestro R. Loss of heterozygosity at 10q in tumors of the upper respiratory tract is associated with poor prognosis.
Int. J. Cancer
,
84
:
432
-436,  
1999
.
40
Snaddon J., Parkinson E. K., Craft J. A., Bartholomew C., Fulton R. Detection of functional PTEN lipid phosphatase protein and enzyme activity in squamous cell carcinomas of the head and neck, despite loss of heterozygosity at this locus.
Br. J. Cancer
,
84
:
1630
-1634,  
2001
.
41
Clifford R., Edmonson M., Hu Y., Nguyen C., Scherpbier T., Buetow K. H. Expression-based genetic/physical maps of single-nucleotide polymorphisms identified by the cancer genome anatomy project.
Genome Res.
,
10
:
1259
-1265,  
2000
.
42
Bockmuhl U., Wolf G., Schmidt S., Schwendel A., Jahnke V., Dietel M., Petersen I. Genomic alterations associated with malignancy in head and neck cancer.
Head Neck
,
20
:
145
-151,  
1998
.
43
Shivapurkar N., Virmani A. K., Wistuba I. I., Milchgrub S., Mackay B., Minna J. D., Gazdar A. F. Deletions of chromosome 4 at multiple sites are frequent in malignant mesothelioma and small cell lung carcinoma.
Clin. Cancer Res.
,
5
:
17
-23,  
1999
.
44
Bergamo N. A., Rogatto S. R., Poli-Frederico R. C., Reis P. P., Kowalski L. P., Zielenska M., Squire J. A. Comparative genomic hybridization analysis detects frequent over-representation of DNA sequences at 3q, 7p, and 8q in head and neck carcinomas.
Cancer Genet. Cytogenet.
,
119
:
48
-55,  
2000
.
45
Ubagai T., Matsuura S., Tauchi H., Itou K., Komatsu K. Comparative genomic hybridization analysis suggests a gain of chromosome 7p associated with lymph node metastasis in non-small cell lung cancer.
Oncol. Rep.
,
8
:
83
-88,  
2001
.