Abstract
Purpose: About 15% of colorectal cancers harbor microsatellite instability (MSI). MSI-associated gene expression changes have been identified in colorectal cancers, but little overlap exists between signatures hindering an assessment of overall consistency. Little is known about the causes and downstream effects of differential gene expression.
Experimental Design: DNA microarray data on 89 MSI and 140 microsatellite-stable (MSS) colorectal cancers from this study and 58 MSI and 77 MSS cases from three published reports were randomly divided into test and training sets. MSI-associated gene expression changes were assessed for cross-study consistency using training samples and validated as MSI classifier using test samples. Differences in biological pathways were identified by functional category analysis. Causation of differential gene expression was investigated by comparison to DNA copy-number data.
Results: MSI-associated gene expression changes in colorectal cancers were found to be highly consistent across multiple studies of primary tumors and cancer cell lines from patients of different ethnicities (P < 0.001). Clustering based on consistent changes separated additional test cases by MSI status, and classification of individual samples predicted MSI status with a sensitivity of 96% and specificity of 85%. Genes associated with immune response were up-regulated in MSI cancers, whereas genes associated with cell-cell adhesion, ion binding, and regulation of metabolism were down-regulated. Differential gene expression was shown to reflect systematic differences in DNA copy-number aberrations between MSI and MSS tumors (P < 0.001).
Conclusions: Our results show cross-study consistency of MSI-associated gene expression changes in colorectal cancers. DNA copy-number alterations partly cause the differences in gene expression between MSI and MSS cancers.
This study of MSI-associated gene expression changes in colorectal cancer shows the comparability and reproducibility of microarray data produced in different laboratories. This is an important issue, as cross-study consistency is key if gene expression-based classifiers are to be used in clinical practice. Consistent MSI-associated genes were successfully used for classification of additional colorectal cancer samples. Although gene expression-based MSI classification is unlikely to enter clinical use in isolation, given the ease of PCR-based MSI typing, it may find future application when combined with other signatures in a single assay. Our results improve our understanding of the MSI and MSS pathways of tumorigenesis by showing that DNA copy-number aberrations partly underlie MSI-associated gene expression changes. Genes associated with immune response were up-regulated in MSI cancers, whereas genes associated with cell-cell adhesion, ion binding, and regulation of metabolism were down-regulated. These candidate genes provide a good starting point for future study.
Gene expression profiling using DNA microarrays has been successfully applied in numerous studies of tumor classification and is gradually being introduced into clinical practice (1). However, the comparability and reproducibility of microarray data produced in different laboratories continues to be debated (2). As more data become available, systematic comparisons of studies with similar research goals are therefore gaining high importance.
Colorectal cancer is one of the most common malignancies and the second most common cause of cancer death in the western world (3). About 15% of sporadic colorectal cancers exhibit microsatellite instability (MSI) caused by mutation or epigenetic silencing of DNA mismatch repair genes (4). MSI colorectal cancers have characteristic clinical features including right-sided location in the colon, mucinous histology, poor differentiation, and pronounced lymphocyte infiltration (5). In addition, MSI cancers tend to have a better prognosis than microsatellite-stable (MSS) cancers (6). The majority of MSI tumors has a near-diploid karyotype and appears to follow a genetic pathway distinct from MSS tumors (7). For example, MSI cancers accumulate mutations at repeat sequences of genes, including TGFBR2 (8), IGFR2 (9), BAX (10), and E2F4 (11), which are rarely mutated in MSS cancers.
Both oligonucleotide and cDNA microarrays have been used to characterize gene expression profiles in MSI and MSS colorectal cancers and in colorectal cancer cell lines. Mori et al. (12) and Koinuma et al. (13) found that MSI had a great effect on the global transcriptome, and Banerjea et al. (14) identified a gene expression cluster in MSI tumors that correlated with an activated immune response. Kruhoffer et al. (15) constructed a gene expression classifier that could identify sporadic MSI cancers as well as hereditary nonpolyposis colorectal cancers, and Watanabe et al. (16) reported signatures that could predict MSI status and differentiate distal from proximal MSI cancers. Giacomini et al. (17) constructed a MSI classifier in colorectal cancer cell lines that also predicted MSI status in primary colorectal cancers and gastric tumors. Although these studies provide good evidence that MSI-associated gene expression changes do exist, there is little overlap between reported signatures, which hinders an assessment of overall consistency.
In this study, we analyzed global gene expression in 89 MSI and 140 MSS primary colorectal cancers. In addition, we retrieved microarray data on 58 MSI and 77 MSS cases from three published reports (13, 16, 17) for which complete results were deposited in the Gene Expression Omnibus database.10
The data under investigation were from primary colorectal cancers and colorectal cancer cell lines derived from different ethnic populations (European and Japanese), which have been analyzed using various microarray platforms. Our aims were to determine the extent of consistency of MSI-associated gene expression changes across independent studies and to evaluate consistent changes as a MSI classifier in additional test samples. Furthermore, we wished to establish the underlying causes and downstream effects of differential gene expression to improve our understanding of the MSI and MSS pathways of tumorigenesis.Materials and Methods
Colorectal cancer specimens. Seventy-four fresh-frozen colorectal cancers were retrieved from the tissue bank of the Royal Melbourne Hospital in Melbourne, Australia. The study was approved by the hospital ethics committee, and all patients gave informed consent before surgery. The samples consisted of 6 Dukes stage A, 23 Dukes stage B, 30 Dukes stage C, and 15 Dukes stage D cancers, 48 of which were localized to the colon, 3 to the colorectal junction, and 23 to the rectum. Median patient age at surgery was 69 years (range, 30-92 years). Tumor DNA was extracted from cancer tissue containing >75% tumor cells as judged by histologic assessment. Control DNA was extracted from blood or from normal tissue derived from the resection margin. MSI status was determined using the Bethesda microsatellite panel (BAT25, BAT26, D2S123, D5S346, and D17S250; ref. 18). MSI was scored as present if instability was seen at two or more markers; 11 colorectal cancers were MSI and 63 were MSS. Total RNA was extracted from cancer tissue using Trizol reagent (Invitrogen). The total RNA was labeled and hybridized to HG-U133Plus2.0 GeneChip arrays (Affymetrix) according to the manufacturer's instructions at the H. Lee Moffitt Cancer Center. The probe sets on this array represent over 47,000 transcripts.
Additional total RNA from 78 MSI and 77 MSS colorectal cancers was collected as part of a retrospective international study involving eight different centers in Denmark, The Netherlands, and Finland. Informed consent was obtained from all patients according to local ethics regulations. The samples consisted of 9 Dukes stage A, 127 Dukes stage B, 11 Dukes stage C, and 8 Dukes stage D cancers, 122 of which were localized to the colon, 3 to the colorectal junction, and 30 to the rectum. Median patient age at surgery was 69.5 years (range, 28-88 years). The total RNA was labeled and hybridized to HG-U133Plus2.0 GeneChip arrays (Affymetrix) at Aarhus University Hospital.
Further gene expression data were retrieved from two independent studies of sporadic colorectal cancers with known MSI status. The first cohort consisted of 33 MSI and 51 MSS tumors, which have been analyzed by Watanabe et al. using HG-U133Plus2.0 GeneChip arrays (Affymetrix; GSE4554; ref. 16), and the second cohort of 10 MSI and 10 MSS tumors, which have been analyzed by Koinuma et al. (13) using the HG-U133A and HG-U133B Gene Chip arrays (Affymetrix; GSE2138).
For each data set, MAS5.0-calculated signal intensities were normalized using the quantile normalization procedure implemented in robust multiarray analysis (19, 20), and the normalized data were log2 transformed. Filtering was done to exclude probe sets that were not expressed or probe sets that showed a low variability across samples. Expression values were required to be above the median of all expression measurements in at least 25% of samples, and the interquartile range across the samples on the log2 scale was required to be at least 0.5. The statistical software package R was used for all subsequent statistical analyses.11
Colorectal cancer cell lines. Two-color cDNA microarray (Stanford Functional Genomics Facility) data on 15 MSI and 16 MSS colorectal cancer cell lines were retrieved from a study by Giacomini et al. (GSE2591; ref. 5). These arrays comprised 39,632 different human IMAGE clones, representing over 21,000 transcripts. Expression values were provided as log2 transformed intensity ratios of test sample against a reference sample constituted from 11 human cell lines (17). To enable comparison across platforms, Affymetrix probe IDs corresponding to IMAGE clones were retrieved by matching Entrez identifiers, GenBank accession numbers, and gene symbols. Further matching was done using MatchMiner.12
Array-based comparative genomic hybridization (CGH) data were available from a study by Douglas et al. (21) for 10 MSI and 13 MSS colorectal cancer cell lines analyzed by Giancomini et al. CGH arrays consisted of 3,452 bacterial artificial chromosome clones that covered the human genome at an average spacing of about 1 Mb. The threshold for scoring DNA copy-number gain or loss has been defined as log2 tumor/normal ratio more than 0.2 or less than -0.2 (21).
Assessment of consistency of MSI-associated gene expression changes. Genes differentially expressed between MSI and MSS cases were identified using the Wilcoxon rank-sum test and P < 0.05 for the following training sample sets: 48 MSI and 47 MSS colorectal cancers randomly selected from the samples analyzed at Aarhus University Hospital, 24 MSI and 36 MSS colorectal cancers randomly selected from Watanabe et al., all 10 MSI and 10 MSS colorectal cancers reported by Koinuma et al., and 10 MSI and 10 MSS colorectal cancer cell lines randomly selected from Giacomini et al. (Supplementary Table S1). Separate lists were generated for genes significantly up-regulated or down-regulated in MSI cancers for each comparison; genes mapping to sex chromosomes were excluded as cases were not matched by gender. For significant genes repeatedly identified between cohorts, consistency of up-regulation or down-regulation was assessed using the χ2 test.
Evaluation of consistent MSI-associated genes as a MSI classifier. Consistent MSI-associated genes were evaluated as classifiers of MSI status using independent test sample sets: additional 30 MSI and 30 MSS primary colorectal cancers analyzed at Aarhus University Hospital, 9 MSI and 15 MSS primary colorectal cancers from Watanabe et al., and 5 MSI and 6 MSS colorectal cancer cell lines from Giacomini et al. Furthermore, all 11 MSI and 63 MSS colorectal cancers from the Royal Melbourne Hospital were evaluated (Supplementary Table S1). Primary colorectal cancers, which have been analyzed using oligonucleotide microarrays, and colorectal cancer cell lines, which have been analyzed using cDNA microarrays, were evaluated separately. For primary colorectal cancers, quantile normalization was done across studies. Expression values of MSI-associated genes were mean-centered and scaled followed by divisive hierarchical clustering using pair distances calculated as one minus the Spearman correlation coefficient as distance metric. The distribution of MSI and MSS cases within the two main branches of the resulting dendrogram was assessed for significance using the χ2 test or Fisher's exact test.
Single-sample MSI classification against a common reference set. Single-sample MSI classification was done by scoring individual test samples against a common reference set. Reference samples were selected using divisive hierarchical clustering from the MSI and MSS cases initially used for the identification of MSI-associated genes; colorectal cancer samples from Koinuma et al. were omitted, as expression data were derived from two separate microarray platforms (HG-U133A and HG-U133B), thus not permitting reliable integration of results. Only cases that “correctly” clustered into MSI or MSS branches were included in the reference set (Supplementary Table S1). Individual test samples were added to the reference set and quantile normalization was done for Affymetrix data followed by joint divisive hierarchical clustering as described above.
Functional category analysis of MSI-associated genes. Gene Ontology categories were analyzed using the Functional Annotation Clustering tool on the Database for Annotation, Visualization and Integrated Discovery.13
Genes were classified according to their annotated role in biological process, molecular function, and cellular components from Gene Ontology (The Gene Ontology Consortium). Category enrichment was tested against all human genes. P values were adjusted using the Benjamini and Hochberg False Discovery Rate multiple testing correction.Results
Consistency of MSI-associated gene expression changes across independent studies of colorectal cancers. Consistency of MSI-associated gene expression changes in primary colorectal cancers was assessed using oligonucleotide microarray data from three independent studies representing European and Japanese patients: 48 MSI and 47 MSS cases analyzed at Aarhus University Hospital, 24 MSI and 36 MSS cases from Watanabe et al. (16), and 10 MSI and 10 MSS cases from Koinuma et al. (ref. 13; Supplementary Table S1). For each training cohort, genes (probe sets) differentially expressed between MSI and MSS cases were identified, and separate lists were generated for up-regulated and down-regulated genes. Genes overlapping between studies were assessed for consistency of up-regulation or down-regulation in MSI cancers (Table 1). All pair-wise comparisons between studies were found to be significant (P < 0.001, χ2 test), with 98.0% (6,600 of 6,732, Aarhus University Hospital versus Watanabe et al.), 93.9% (1,081 of 1,151, Aarhus University Hospital versus Koinuma et al.), and 95.1% (1,006 of 1,058, Watanabe et al. versus Koinuma et al.) of genes showing consistent changes in expression. A total of 829 genes were consistently up-regulated (424 genes) or down-regulated (405 genes) in MSI cancers when all three data sets were combined (Supplementary Table S2).
Comparison of gene expression differences between MSI and MSS colorectal cancers across multiple studies
Primary colorectal cancers (MSI vs MSS), n (%) . | . | . | . | |||
---|---|---|---|---|---|---|
Aarhus University Hospital | ||||||
Watanabe et al. | Up-regulated | Down-regulated | P < 0.001 | |||
Up-regulated | 3,448 (51.2) | 49 (0.7) | ||||
Down-regulated | 83 (1.2) | 3,152 (46.8) | ||||
Aarhus University Hospital | ||||||
Koinuma et al. | Up-regulated | Down-regulated | P < 0.001 | |||
Up-regulated | 553 (48.0) | 24 (2.1) | ||||
Down-regulated | 46 (4.0) | 528 (45.9) | ||||
Watanabe et al. | ||||||
Koinuma et al. | Up-regulated | Down-regulated | P < 0.001 | |||
Up-regulated | 528 (49.9) | 25 (2.4) | ||||
Down-regulated | 27 (2.6) | 478 (45.2) |
Primary colorectal cancers (MSI vs MSS), n (%) . | . | . | . | |||
---|---|---|---|---|---|---|
Aarhus University Hospital | ||||||
Watanabe et al. | Up-regulated | Down-regulated | P < 0.001 | |||
Up-regulated | 3,448 (51.2) | 49 (0.7) | ||||
Down-regulated | 83 (1.2) | 3,152 (46.8) | ||||
Aarhus University Hospital | ||||||
Koinuma et al. | Up-regulated | Down-regulated | P < 0.001 | |||
Up-regulated | 553 (48.0) | 24 (2.1) | ||||
Down-regulated | 46 (4.0) | 528 (45.9) | ||||
Watanabe et al. | ||||||
Koinuma et al. | Up-regulated | Down-regulated | P < 0.001 | |||
Up-regulated | 528 (49.9) | 25 (2.4) | ||||
Down-regulated | 27 (2.6) | 478 (45.2) |
NOTE: Analysis was done on 48 MSI and 47 MSS cases analyzed at Aarhus University Hospital, 24 MSI and 36 MSS cases from Watanabe et al. (16), and 10 MSI and 10 MSS cases from Koinuma et al. (13). For each cohort, genes (probe sets) differentially expressed between MSI and MSS cases were identified using the Wilcoxon rank-sum test and P < 0.05. For genes overlapping between cohorts, consistency of up-regulation or down-regulation in MSI cancers was assessed using the χ2 test.
Consistency of MSI-associated gene expression changes between primary colorectal cancers and colorectal cancer cell lines. Similar comparisons were done to assess the consistency of MSI-associated gene expression changes between primary colorectal cancers and colorectal cancer cell lines. Data for the latter were 10 MSI and 10 MSS cases randomly selected from a two-color cDNA microarray study done by Giacomini et al. (17; Supplementary Table S1). Again, all pair-wise comparisons between studies were significant (P < 0.001, χ2 test), with 69.3% (1,641 of 2,367, Giacomini et al. versus Aarhus University Hospital), 69.2% (1,660 of 2,398, Giacomini et al. versus Watanabe et al.), and 78.1% (339 of 434, Giacomini et al. versus Koinuma et al.) of genes showing consistent changes in expression (Table 2). These proportions were significantly lower than those seen in the pair-wise comparisons between studies of primary tumors (P < 0.03 for all comparisons, χ2 test). This finding may be partly due to differences in gene expression between primary and cultured tumor cells, increased noise levels for two-color cDNA microarrays compared with oligonucleotide microarrays due to the use of a total RNA reference, or nonspecific hybridization of probe sets and IMAGE clones. A total of 192 genes (229 IMAGE clones) were consistently up-regulated (93 genes/117 IMAGE clones) or down-regulated (99 genes/112 IMAGE clones) when all four data sets were combined (Supplementary Table S2).
Comparison of MSI-associated gene expression changes between primary colorectal cancers and cancer cell lines
MSI vs MSS, n (%) . | . | . | . | |||
---|---|---|---|---|---|---|
Primary colorectal cancers . | Colorectal cancer cell lines . | . | . | |||
Giacomini et al. | ||||||
Aarhus University Hospital | Up-regulated | Down-regulated | ||||
Up-regulated | 859 (36.3) | 305 (12.9) | P < 0.001 | |||
Down-regulated | 421 (17.8) | 782 (33.0) | ||||
Watanabe et al. | ||||||
Up-regulated | 913 (38.1) | 267 (11.1) | P < 0.001 | |||
Down-regulated | 471 (19.6) | 747 (31.2) | ||||
Koinuma et al. | ||||||
Up-regulated | 192 (44.2) | 44 (10.1) | P < 0.001 | |||
Down-regulated | 51 (11.8) | 147 (33.9) |
MSI vs MSS, n (%) . | . | . | . | |||
---|---|---|---|---|---|---|
Primary colorectal cancers . | Colorectal cancer cell lines . | . | . | |||
Giacomini et al. | ||||||
Aarhus University Hospital | Up-regulated | Down-regulated | ||||
Up-regulated | 859 (36.3) | 305 (12.9) | P < 0.001 | |||
Down-regulated | 421 (17.8) | 782 (33.0) | ||||
Watanabe et al. | ||||||
Up-regulated | 913 (38.1) | 267 (11.1) | P < 0.001 | |||
Down-regulated | 471 (19.6) | 747 (31.2) | ||||
Koinuma et al. | ||||||
Up-regulated | 192 (44.2) | 44 (10.1) | P < 0.001 | |||
Down-regulated | 51 (11.8) | 147 (33.9) |
NOTE: Analysis was done on 48 MSI and 47 MSS colorectal cancers analyzed at Aarhus University Hospital, 24 MSI and 36 MSS colorectal cancers from Watanabe et al. (16), 10 MSI and 10 MSS colorectal cancers from Koinuma et al. (13), and 10 MSI and 10 MSS colorectal cancer cell lines from Giacomini et al. (17). For each cohort, genes (probe sets/IMAGE clones) differentially expressed between MSI and MSS cases were identified using the Wilcoxon rank-sum test and P < 0.05. For genes overlapping between cohorts of primary colorectal cancers and colorectal cancer cell lines, consistency of up-regulation or down-regulation in MSI cases was assessed using the χ2 test.
Consistent MSI-associated gene expression changes as a MSI classifier. The 829 and192 gene sets found to be consistently up-regulated or down-regulated in MSI cases across multiple studies of primary colorectal cancers and across primary colorectal cancers and colorectal cancer cell lines were assessed as MSI classifiers in independent test samples. These consisted of a separate set of additional primary cancers analyzed at Aarhus University Hospital (30 MSI and 30 MSS) and Watanabe et al. (9 MSI and 15 MSS) as well as the 74 primary cancers from the Royal Melbourne Hospital (11 MSI and 63 MSS). Additional colorectal cancer cell lines were derived from Giacomini et al. (5 MSI and 6 MSS; Supplementary Table S1). Given that classification of a binary outcome, MSI or MSS, was desired, samples were clustered using divisive hierarchical clustering. Clustering was done separately for primary colorectal cancer samples, which have been analyzed using oligonucleotide microarrays, and colorectal cancer cell lines, which have been analyzed using cDNA microarrays.
For primary colorectal cancers, clustering using either the 829 or the 192 gene set produced two main branches comprising predominantly MSI or MSS cases, respectively (Fig. 1). For the 829 gene set, one branch contained 48 MSI and 14 MSS cases and the other contained 2 MSI and 94 MSS cases (P < 0.001, χ2 test); for the 192 gene set, one branch contained 49 MSI and 11 MSS cases and the other contained 1 MSI and 97 MSS cases (P < 0.001, χ2 test). Clustering by study was also evident in both dendrograms, but this was secondary to the MSI and MSS branches (Fig. 1). For colorectal cancer cell lines, clustering using the 192 (229 IMAGE clone) gene set separated all 5 MSI and all 6 MSS cases into two main clusters (P < 0.002, Fisher’s exact test; Fig. 1).
Divisive hierarchical clustering of test colorectal cancers and cancer cell lines using the 829 and 192 consistent MSI-associated genes. A and B, primary colorectal cancers clustered using the 829 and 192 gene sets, respectively. C, colorectal cancer cell lines clustered using the 192 gene (229 IMAGE clone) set. Samples are arranged along the X axis and genes are arranged along the Y axis. Square, expression level of a given gene in an individual sample. Increased expression (red) and decreased expression (green) relative to the mean-centered and scaled expression of the gene across the samples following quantile normalization across studies. Genes are grouped into those down-regulated (top) and up-regulated (bottom) in MSI cases. For the dendrogram: orange lines, MSI cases; blue lines, MSS cases. Test samples included additional 30 MSI and 30 MSS cancers analyzed at the Aarhus University Hospital, 9 MSI and 15 MSS cancers from Watanabe et al. (16), 11 MSI and 63 MSS cancers from the Royal Melbourne Hospital, and 5 MSI and 6 MSS colorectal cancer cell lines from Giacomini et al. (17).
Divisive hierarchical clustering of test colorectal cancers and cancer cell lines using the 829 and 192 consistent MSI-associated genes. A and B, primary colorectal cancers clustered using the 829 and 192 gene sets, respectively. C, colorectal cancer cell lines clustered using the 192 gene (229 IMAGE clone) set. Samples are arranged along the X axis and genes are arranged along the Y axis. Square, expression level of a given gene in an individual sample. Increased expression (red) and decreased expression (green) relative to the mean-centered and scaled expression of the gene across the samples following quantile normalization across studies. Genes are grouped into those down-regulated (top) and up-regulated (bottom) in MSI cases. For the dendrogram: orange lines, MSI cases; blue lines, MSS cases. Test samples included additional 30 MSI and 30 MSS cancers analyzed at the Aarhus University Hospital, 9 MSI and 15 MSS cancers from Watanabe et al. (16), 11 MSI and 63 MSS cancers from the Royal Melbourne Hospital, and 5 MSI and 6 MSS colorectal cancer cell lines from Giacomini et al. (17).
Single-sample MSI classification against a common reference set. In clinical practice, classification of individual patient samples is required. Given that divisive hierarchical clustering successfully clustered test samples from independent studies and from patients of different ethnicities by MSI status, we modified this approach to permit scoring of individual colorectal cancer cases against a common reference set.
Reference samples were selected from the primary colorectal cancers of Aarhus University Hospital and Watanabe et al. and the colorectal cancer cell lines of Giacomini et al., which had initially been used to identify MSI-associated gene expression changes (samples from Koinuma et al. were excluded). Divisive hierarchical clustering was done using the 829 or 192 gene set, and only samples “correctly” segregating into MSI and MSS branches were chosen, resulting in reference sets of 67 MSI and 77 MSS primary colorectal cancers for the 829 gene set and 68 MSI and 77 MSS primary colorectal cancers and 10 MSI and 10 MSS colorectal cancer cell lines for the 192 gene set (Supplementary Table S1).
For single-sample MSI classification, the test samples were added one at a time to the above reference set. Divisive hierarchical clustering was done, and a score was given as to whether the test sample clustered within the MSI or the MSS branch of the resulting dendrogram (Table 3). In all cases, the reference samples in the smallest branch containing the test and at least five other reference samples were either all MSI or all MSS cases. Taking PCR-based MSI typing as the gold standard, classification of primary colorectal cancers using the 829 gene set had an overall sensitivity of 96.0% (48 of 50) and specificity of 83.3% (90 of 108), with a positive predictive value 72.7% (48 of 66) and negative predictive value of 97.8% (90 of 92). Classification using the 192 gene set appeared to slightly increase performance, with an overall sensitivity of 96.0% (48 of 50), specificity of 88.9% (96 of 108), positive predictive value of 80.0% (48 of 60), and negative predictive value of 98.0% (98 of 100). Importantly, classification using either the 829 or the 192 gene set showed similar sensitivity and specificity for test samples from Aarhus University Hospital/Watanabe et al. [sensitivity of 94.9% (37 of 39) and 94.9% (37 of 39) and specificity of 86.7% (39 of 45) and 88.9% (40 of 45), respectively] and the Royal Melbourne Hospital [sensitivity of 100.0% (11 of 11) and 100.0% (11 of 11) and specificity of 81.0% (51 of 63) and 88.9% (56 of 63), respectively], despite samples from the former studies, but not from the latter, contributing to the reference set. These results show the potential utility of this approach for single-sample MSI classification irrespective of study origin and further show the reproducibility and comparability of microarray data produced in different laboratories.
Single-sample MSI classification of test colorectal cancers and cancer cell lines using the 829 and 192 consistent MSI-associated genes
MSI typing (reference) . | 829-gene classifier . | . | 192-gene (229 IMAGE clone) classifier . | . | ||||
---|---|---|---|---|---|---|---|---|
. | MSS . | MSI . | MSS . | MSI . | ||||
Primary colorectal cancers | ||||||||
Aarhus University Hospital | ||||||||
MSS | 25 | 5 | 25 | 5 | ||||
MSI | 2 | 28 | 2 | 28 | ||||
Watanabe et al. | ||||||||
MSS | 14 | 1 | 15 | 0 | ||||
MSI | 0 | 9 | 0 | 9 | ||||
Royal Melbourne Hospital | ||||||||
MSS | 51 | 12 | 56 | 7 | ||||
MSI | 0 | 11 | 0 | 11 | ||||
Colorectal cancer cell lines | ||||||||
Giacomini et al. | ||||||||
MSS | NA | NA | 5 | 1 | ||||
MSI | NA | NA | 0 | 5 |
MSI typing (reference) . | 829-gene classifier . | . | 192-gene (229 IMAGE clone) classifier . | . | ||||
---|---|---|---|---|---|---|---|---|
. | MSS . | MSI . | MSS . | MSI . | ||||
Primary colorectal cancers | ||||||||
Aarhus University Hospital | ||||||||
MSS | 25 | 5 | 25 | 5 | ||||
MSI | 2 | 28 | 2 | 28 | ||||
Watanabe et al. | ||||||||
MSS | 14 | 1 | 15 | 0 | ||||
MSI | 0 | 9 | 0 | 9 | ||||
Royal Melbourne Hospital | ||||||||
MSS | 51 | 12 | 56 | 7 | ||||
MSI | 0 | 11 | 0 | 11 | ||||
Colorectal cancer cell lines | ||||||||
Giacomini et al. | ||||||||
MSS | NA | NA | 5 | 1 | ||||
MSI | NA | NA | 0 | 5 |
NOTE: Individual test samples were clustered against a common MSI/MSS reference set of primary colorectal cancers or colorectal cancer cell lines (Supplementary Table S1). Test samples included additional 30 MSI and 30 MSS cancers analyzed at the Aarhus University Hospital, 9 MSI and 15 MSS cancers from Watanabe et al. (16), 11 MSI and 63 MSS cancers from the Royal Melbourne Hospital, and 5 MSI and 6 MSS colorectal cancer cell lines from Giacomini et al. (17).
Similar results were obtained for colorectal cancer cell lines using the 192 gene (229 IMAGE clone) set, suggesting that this classification approach can also be applied to two-color cDNA microarray data provided that the same total RNA reference is being used. The sensitivity of MSI prediction was 100.0% (5 of 5) and specificity was 83.3% (5 of 6).
Functional category analysis of discriminating genes between MSI and MSS cancers. For the 829 gene set, functional category analysis identified five significant annotation clusters, correlating with cell-cell adhesion, ion binding, positive and negative regulation of metabolism, and immune response (Supplementary Table S3). When the 829 gene set was separated into genes showing up-regulation and down-regulation in MSI cancers, the immune response cluster was specifically up-regulated in the MSI cancer group. In contrast, the cell-cell adhesion, ion binding, and positive and negative regulation of metabolism clusters were specifically associated with the down-regulated genes. Functional category analysis of the smaller 192 gene set did not reveal any significant associations, but down-regulated genes from the cell-cell adhesion, ion binding, and positive and negative regulation of metabolism clusters were represented at the expected ratios (observed 33/116; expected 99/405; P = 0.58, χ2 test). In contrast, there were significantly fewer up-regulated genes from the immune response cluster than expected (observed 10/105, expected 93/424; P < 0.001, χ2 test), consistent with an absence of such a response in cell culture. A total of 20 MSI-associated genes were identified by at least two Affymetrix probe sets and one IMAGE clone and may therefore be regarded as good candidates for further study (Table 4).
Consistent MSI-associated candidate genes identified by at least two Affymetrix probe sets and one IMAGE clone
Gene ID . | Gene symbol . | Gene name . | MSI vs MSS . |
---|---|---|---|
NM_025113 | C13orf18 | Chromosome 13 open reading frame 18 | Down-regulated |
NM_003671 | CDC14B | CDC14 cell division cycle 14 homologue B (Saccharomyces cerevisiae) | Down-regulated |
BC003064 | DAB2 | Disabled homologue 2, mitogen-responsive phosphoprotein (Drosophila) | Down-regulated |
NM_013974 | DDAH2 | Dimethylarginine dimethylaminohydrolase 2 | Down-regulated |
NM_019114 | EPB41L4B | Erythrocyte membrane protein band 4.1-like 4B | Down-regulated |
NM_018267 | H2AFJ | H2A histone family, member J | Down-regulated |
BE566023 | KIAA0372 | KIAA0372 | Down-regulated |
NM_016436 | PHF20 | PHD finger protein 20 | Down-regulated |
AF131790 | SHANK2 | SH3 and multiple ankyrin repeat domains 2 | Down-regulated |
AB018322 | TMCC1 | Transmembrane and coiled-coil domain family 1 | Down-regulated |
NM_020182 | TMEPAI | Transmembrane, prostate androgen induced RNA | Down-regulated |
NM_005783 | TXNDC9 | Thioredoxin domain containing 9 | Down-regulated |
NM_021964 | ZNF148 | Zinc finger protein 148 | Down-regulated |
AA551142 | PHACTR2 | Phosphatase and actin regulator 2 | Down-regulated |
NM_030920 | ANP32E | Acidic (leucine-rich) nuclear phosphoprotein 32 family, member E | Up-regulated |
BC000751 | EIF5A | Eukaryotic translation initiation factor 5A | Up-regulated |
R60018 | RABEP1 | Rabaptin, RAB GTPase-binding effector protein 1 | Up-regulated |
BF343007 | TFAP2A | Transcription factor AP-2α (activating enhancer binding protein 2α) | Up-regulated |
AK021741 | TMF1 | TATA element modulatory factor 1 | Up-regulated |
M61715 | WARS | Tryptophanyl-tRNA synthetase | Up-regulated |
Gene ID . | Gene symbol . | Gene name . | MSI vs MSS . |
---|---|---|---|
NM_025113 | C13orf18 | Chromosome 13 open reading frame 18 | Down-regulated |
NM_003671 | CDC14B | CDC14 cell division cycle 14 homologue B (Saccharomyces cerevisiae) | Down-regulated |
BC003064 | DAB2 | Disabled homologue 2, mitogen-responsive phosphoprotein (Drosophila) | Down-regulated |
NM_013974 | DDAH2 | Dimethylarginine dimethylaminohydrolase 2 | Down-regulated |
NM_019114 | EPB41L4B | Erythrocyte membrane protein band 4.1-like 4B | Down-regulated |
NM_018267 | H2AFJ | H2A histone family, member J | Down-regulated |
BE566023 | KIAA0372 | KIAA0372 | Down-regulated |
NM_016436 | PHF20 | PHD finger protein 20 | Down-regulated |
AF131790 | SHANK2 | SH3 and multiple ankyrin repeat domains 2 | Down-regulated |
AB018322 | TMCC1 | Transmembrane and coiled-coil domain family 1 | Down-regulated |
NM_020182 | TMEPAI | Transmembrane, prostate androgen induced RNA | Down-regulated |
NM_005783 | TXNDC9 | Thioredoxin domain containing 9 | Down-regulated |
NM_021964 | ZNF148 | Zinc finger protein 148 | Down-regulated |
AA551142 | PHACTR2 | Phosphatase and actin regulator 2 | Down-regulated |
NM_030920 | ANP32E | Acidic (leucine-rich) nuclear phosphoprotein 32 family, member E | Up-regulated |
BC000751 | EIF5A | Eukaryotic translation initiation factor 5A | Up-regulated |
R60018 | RABEP1 | Rabaptin, RAB GTPase-binding effector protein 1 | Up-regulated |
BF343007 | TFAP2A | Transcription factor AP-2α (activating enhancer binding protein 2α) | Up-regulated |
AK021741 | TMF1 | TATA element modulatory factor 1 | Up-regulated |
M61715 | WARS | Tryptophanyl-tRNA synthetase | Up-regulated |
Molecular basis of MSI-associated gene expression changes. MSI colorectal cancers generally have near-diploid karyotypes, whereas MSS tumors tend to be aneuploid (22). In addition, MSS colorectal cancers and colorectal cancer cell lines show particularly high frequencies of DNA copy-number changes at certain chromosomal regions, such as loss of chromosome 17p and 18q or gain of chromosome 13 and 20q (23). Previous gene expression studies on primary cancers have shown that chromosomal losses and gains are associated with corresponding changes in gene expression (24, 25). We therefore hypothesized that systematic differences in the frequencies of DNA copy-number changes between MSI and MSS tumors might underlie the MSI-associated gene expression changes in colorectal cancers.
Array-based CGH data were available from a previous study by Douglas et al. (21) for 10 MSI and 13 MSS colorectal cancer cell lines analyzed for gene expression by Giancomini et al. The CGH data were used to determine frequencies of DNA copy-number changes for MSI and MSS cases across the genome, measured as the fraction of cases gained or lost at each bacterial artificial chromosome clone represented on the array. Differential frequencies between MSS and MSI cases were determined by subtracting frequencies in MSI from those in MSS cancers and plotted against bacterial artificial chromosome position (Fig. 2A and B). Similarly for gene expression data, frequencies of genes significantly up-regulated or down-regulated in MSS cases were determined for 5 Mb windows spaced at 1 Mb intervals across the genome. Differential frequencies between MSS and MSI cases were obtained by subtracting frequencies of down-regulated genes from those of up-regulated genes and plotted against chromosome position (Fig. 2A and B).
Comparison of DNA copy-number and gene expression differences between matched MSI and MSS colorectal cancer cell lines and unmatched primary tumors. A and B, differential DNA copy-number and gene expression frequencies for 10 MSI and 13 MSS colorectal cancer cell lines; data from Douglas et al. (21) and Giacomini et al. (17). C, differential DNA copy-number frequencies for 7 MSI and 102 MSS primary colorectal cancers from Nakao et al. (26). D-G, differential gene expression frequencies for 78 MSI and 77 MSS primary colorectal cancers analyzed at Aarhus University Hospital, 11 MSI and 63 MSS primary colorectal cancers analyzed at the Royal Melbourne Hospital, 33 MSI and 51 MSS primary colorectal cancers from Watanabe et al. (16), and 10 MSI and 10 MSS primary colorectal cancers from Koinuma et al. (13). For differential DNA copy-number frequencies: bottom bars, losses or deletions; top bars, gains or amplifications. For differential gene expression frequencies; bottom bars, regions for which genes in MSS cases show predominant down-regulation; top bars, regions for which genes in MSS cases show predominant up-regulation. Dashed lines, location of the centromeres.
Comparison of DNA copy-number and gene expression differences between matched MSI and MSS colorectal cancer cell lines and unmatched primary tumors. A and B, differential DNA copy-number and gene expression frequencies for 10 MSI and 13 MSS colorectal cancer cell lines; data from Douglas et al. (21) and Giacomini et al. (17). C, differential DNA copy-number frequencies for 7 MSI and 102 MSS primary colorectal cancers from Nakao et al. (26). D-G, differential gene expression frequencies for 78 MSI and 77 MSS primary colorectal cancers analyzed at Aarhus University Hospital, 11 MSI and 63 MSS primary colorectal cancers analyzed at the Royal Melbourne Hospital, 33 MSI and 51 MSS primary colorectal cancers from Watanabe et al. (16), and 10 MSI and 10 MSS primary colorectal cancers from Koinuma et al. (13). For differential DNA copy-number frequencies: bottom bars, losses or deletions; top bars, gains or amplifications. For differential gene expression frequencies; bottom bars, regions for which genes in MSS cases show predominant down-regulation; top bars, regions for which genes in MSS cases show predominant up-regulation. Dashed lines, location of the centromeres.
There was good evidence that gene expression changes at least partly reflected differences in frequencies of DNA copy-number alterations between MSI and MSS colorectal cancer cell lines. At chromosomal regions for which MSS cases showed high frequencies of loss compared with MSI cases, genes tended to show reduced levels of expression, whereas, at chromosomal regions for which MSS cases showed high frequencies of gain compared with MSI cases, genes tended to show increased levels of expression (r = 0.66; P < 0.001, Pearson's product-moment correlation test). These data suggest that DNA copy-number changes have profound effects on gene expression in vivo. Overall, 44.4% (1,345 of 3,028) of up-regulated and 72.1% (1,110 of 1,539) of down-regulated genes showed association with corresponding DNA copy-number changes (P < 0.001, χ2 test).
We then analyzed MSI-associated gene expression changes in primary colorectal cancers analyzed at Aarhus University Hospital, the Royal Melbourne Hospital, from Watanabe et al. and Koinuma et al. for evidence of causation by underlying DNA copy-number changes (Fig. 2C-G). As DNA copy-number data were not available for these tumors, alternative array-based CGH data were retrieved for 7 MSI and 102 MSS tumors published by Nakao et al. (26). Again, there was good evidence that differential frequencies in gene expression between MSS and MSI cases strongly resemble systematic frequencies in DNA copy-number changes. The Pearson's product-moment correlation coefficients for DNA copy-number frequencies against gene expression frequencies were 0.72 for Aarhus University Hospital, 0.69 for Royal Melbourne Hospital, 0.71 for Watanabe et al., and 0.41 for Koinuma et al. (P < 0.001 for all comparisons).
Discussion
We have found a high level of consistency of MSI-associated gene expression changes across independent studies of colorectal cancers from different ethnic populations. This finding shows that, despite differences in genetic background, limited study sizes, and the use of various analysis protocols, consistent changes in gene expression can be readily identified. A high level of consistency of MSI-associated gene expression changes was also found when comparing primary colorectal cancers and colorectal cancer cell lines, despite the former having been run on oligonucleotide microarrays and the latter on two-color cDNA microarrays. This concordance across platforms suggests that gene expression patterns in colorectal cancer cell lines broadly reflect those in primary tumors. Furthermore, divisive hierarchical clustering based on consistent gene sets successfully separated additional primary colorectal cancers from multiple studies into MSI and MSS cases, further showing the cross-laboratory reproducibility of this gene expression signature.
Single-sample classification of additional colorectal cancer samples from multiple studies was successfully achieved by divisive analysis clustering against a common MSI/MSS reference set. Compared with PCR-based MSI typing and irrespective of study origin of the test sample, the sensitivity and specificity of this approach were high, being ∼96% and 85%, respectively. Although unlikely to be applied for classification of MSI status in clinical practice unless combined with other signatures, given that current PCR-based MSI typing is technically less demanding and more cost-effective than DNA microarray analysis, this approach may provide a more general avenue for single-sample classification using gene expression signatures.
About 4% of MSI and 12% of MSS cases were not correctly classified using consistent sets of MSI-associated genes. This probably partly reflected experimental noise inherent to gene expression data from primary tissues (e.g., due to the presence of contaminating normal cells) and partly underlying genetic heterogeneity between cancers. For example, it has been reported that a small proportion of MSI cancers show aneuploid rather than near-diploid karyotypes, similar to MSS cancers (23). Conversely, a subset of MSS tumors appears to harbor only a few chromosomal changes. Given the observed association between MSI-associated gene expression changes and DNA copy number, this variation may account for some of misclassifications. Furthermore, there is the possibility that some misclassified MSI samples were not from sporadic cases but instead derived from patients with hereditary nonpolyposis colorectal cancer. The latter tumors may follow a pathway of tumorigenesis distinct from sporadic MSI tumors (27). However, there was no evidence for this from our two misclassified MSI patients from Aarhus University Hospital, both of which had presented late in life (66 and 55 years) and neither of whom had a family history of hereditary nonpolyposis colorectal cancer-associated cancers.
Our finding that immune response genes are up-regulated genes in MSI cancers is consistent with a previous report by Banerjea et al. (14) and with histopathologic data showing that MSI cancers tend to have more pronounced lymphocyte infiltration than MSS cancers (28). Furthermore, immune response-associated changes were not seen in the colorectal cancer cell lines. Although results from functional category analysis must be interpreted with caution, the novel observed down-regulation of genes involved in cell-cell adhesion, ion binding, and regulation of cellular processes in MSI cancers is intriguing and perhaps accounts for some differences in tumor behavior between MSI and MSS cases. Notably, 20 MSI-associated genes were identified by at least two Affymetrix probe sets and one IMAGE clone and may therefore be regarded as good candidates for further study. These genes include ZNF148, which, when overexpressed, has been shown to suppress adenoma growth in multiple intestinal neoplasia (ApcMin) mice, a widely used model of intestinal tumorigenesis (29). Loss of another candidate gene, TFAP2A, has been shown to deregulate E-cadherin and matrix metalloproteinase-9 and to increase tumorigenicity of colon cancer cells in vivo (30).
The comparison of array-based CGH and gene expression microarray data for primary colorectal cancers and colorectal cancer cell lines showed that MSI-associated gene expression changes broadly reflect systematic DNA copy-number differences between MSI tumors, which tend to be near-diploid, and MSS tumors, which tend to be aneuploid. These data show that DNA copy-number changes in cancer cells have profound effects on gene expression and therefore the potential to affect tumor cell behavior and phenotype. Taken together, our results suggest that this mechanism contributes to the clinical differences between MSI and MSS tumors.
In conclusion, we found that MSI-associated gene expression changes were highly consistent across multiple independent studies of colorectal cancers. Consistency was observed across different ethnic populations and between primary colorectal cancers and cancer cell lines. Consistent MSI-associated genes were successfully used to predict MSI status of additional individual colorectal cancer samples with high sensitivity and specificity. Together, these results suggest that microarray data are broadly comparable and reproducible across different laboratories. Our study provides novel insights into the MSI and MSS pathways of tumorigenesis by showing that DNA copy-number aberrations at least partly underlie MSI-associated gene expression changes. Genes associated with immune response were found to be up-regulated in MSI cancers, whereas genes associated with cell-cell adhesion, ion binding, and regulation of metabolism were found to be down-regulated. The candidate genes identified provide a starting point for further study of MSI and MSS cancers and may ultimately elucidate the clinical differences between these two types of colorectal cancers.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant support: Victorian Cancer Agency fellowship (L. Lipton).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
Acknowledgments
We thank the Australian Genome Research Facility for excellent technical support and the patients for participating in this study.