Abstract
Purpose: Non–small cell lung cancers (NSCLC) comprise multiple distinct biologic groups with different prognoses. For example, patients with epithelial-like tumors have a better prognosis and exhibit greater sensitivity to inhibitors of the epidermal growth factor receptor (EGFR) pathway than patients with mesenchymal-like tumors. Here, we test the hypothesis that epithelial-like NSCLCs can be distinguished from mesenchymal-like NSCLCs on the basis of global DNA methylation patterns.
Experimental Design: To determine whether phenotypic subsets of NSCLCs can be defined on the basis of their DNA methylation patterns, we combined microfluidics-based gene expression analysis and genome-wide methylation profiling. We derived robust classifiers for both gene expression and methylation in cell lines and tested these classifiers in surgically resected NSCLC tumors. We validate our approach using quantitative reverse transcriptase PCR and methylation-specific PCR in formalin-fixed biopsies from patients with NSCLC who went on to fail front-line chemotherapy.
Results: We show that patterns of methylation divide NSCLCs into epithelial-like and mesenchymal-like subsets as defined by gene expression and that these signatures are similarly correlated in NSCLC cell lines and tumors. We identify multiple differentially methylated regions, including one in ERBB2 and one in ZEB2, whose methylation status is strongly associated with an epithelial phenotype in NSCLC cell lines, surgically resected tumors, and formalin-fixed biopsies from patients with NSCLC who went on to fail front-line chemotherapy.
Conclusions: Our data show that patterns of DNA methylation can divide NSCLCs into two phenotypically distinct subtypes of tumors and provide proof of principle that differences in DNA methylation can be used as a platform for predictive biomarker discovery and development. Clin Cancer Res; 18(8); 2360–73. ©2012 AACR.
See commentary by Easwaran and Baylin, p. 2121
This article is featured in Highlights of This Issue, p. 2119
Successful development of targeted therapeutics now and in the future will hinge on defining patient subsets that are most likely to benefit from new drug candidates. Here, we show that patterns of DNA methylation can divide NSCLC into two phenotypically distinct subtypes of tumors and provide proof of principle that these kinds of differences in DNA methylation can be used as a platform for predictive biomarker discovery and development.
Introduction
Non–small cell lung cancer (NSCLC) accounts for the largest proportion of cancer-related deaths in the United States and worldwide (1). NSCLC is composed of multiple histologic subtypes including adenocarcinoma, squamous cell carcinoma, large cell carcinoma, and others. Until recently, histology has made little difference in terms of patient outcome on therapy (2). However, the safety and activity profile of some recently approved agents including bevacizumab, erlotinib, and pemetrexed suggest that histology may be an important variable in clinical decision making (3–5). In addition, it is now clear that some molecularly targeted agents are more efficacious in specific, molecularly defined subsets of patients (6). The epidermal growth factor receptor (EGFR)-tyrosine kinase inhibitor, erlotinib, induces striking clinical responses in patients with activating mutations in the EGFR kinase domain (7); however, there is evidence that a subset of patients with EGFR wild-type tumors also derive benefit from erlotinib therapy (8–10). Recent efforts have therefore focused on identifying molecular biomarkers that identify further subsets of patients that may derive benefit from erlotinib.
We previously reported on the identification of a gene expression signature that is associated with in vitro sensitivity or resistance to erlotinib. This gene expression signature divides NSCLC cell lines into epithelial-like and mesenchymal-like subsets (11). In multiple tumor types, an epithelial-to-mesenchymal transition (EMT) induces an aggressive phenotype characterized by increased motility and invasiveness and more recently has been implicated in resistance to chemotherapy and other drugs (12–17). While the mechanisms responsible for maintaining an epithelial or mesenchymal phenotype in cancer are not completely understood, recent evidence suggests that chromatin states, and in particular DNA methylation, are involved. Promoter methylation and gene silencing of E-cadherin (CDH1), the best-characterized epithelial marker, is frequently observed in mesenchymal-like breast cancers (18, 19). Molecules that induce an EMT, such as TGFβ1, lead to reduced CDH1 expression driven by transcriptional repressors that bind directly to the CDH1 promoter and recruit histone deacetylases and other chromatin remodeling proteins (20). Furthermore, downstream regulators of EMT including the miR-200 family of microRNAs are specifically silenced and gain promoter hypermethylation and repressive chromatin marks in some invasive and poorly differentiated tumors (21–23). Recent work describing an in vitro model system of EMT induction suggests that these DNA methylation changes are acquired in a predictable, rather than stochastic manner (20), suggesting that such patterns might be used as surrogate markers for cells in an epithelial versus a mesenchymal state. Taken together with evidence that genome-wide reprogramming of chromatin domains occurs during EMT (24), these data suggest that specific epigenetic profiles may be associated with the epithelial-like and mesenchymal-like phenotypes observed in NSCLCs.
Here, we took an integrated genomics approach to determine whether DNA methylation patterns could classify phenotypic subsets of NSCLC (see Supplementary Methods; Fig. 1 for schematic of our experimental approach). By combining microfluidics-based gene expression analysis and genome-wide methylation profiling, we show that prognostic subsets of NSCLCs can be defined on the basis of the differences in DNA methylation. Genome-wide DNA methylation profiling identified tumor-specific hyper- and hypomethylation patterns in the promoters and distal regulatory elements of genes involved in epithelial cell differentiation and transformation. We identified 2 differentially methylated regions (DMR), one in ERBB2 and one in ZEB2, whose methylation status is strongly associated with an epithelial phenotype in NSCLC cell lines, surgically resected primary NSCLCs, and tumors from patients who had failed front-line chemotherapy. Our data suggest that DNA-based biomarkers can be used to infer the biologic state of tumors and provide proof of principle that DNA methylation differences can be used as a platform for predictive biomarker discovery and development.
A Fluidigm-based EMT gene expression panel classifies NSCLC cell lines as epithelial-like or mesenchymal-like. A, comparison of hierarchical clustering of NSCLC cell lines using the 100-gene expression signature described in the study of Yauch and colleagues to the refined 20-gene EMT panel (reported in Supplementary Table S1). B, micrographs of H1975 and gBEC1 before and after chronic (∼4 weeks) exposure to TGFβ (magnification, 100×). C, quantitative PCR for the 20 EMT genes in H1975 and gBEC1. D, hierarchical clustering of 82 NSCLC cell lines using the 20 genes. |$2^{- \Delta {\rm C\rm_t}$|values were used for clustering. The expression data were normalized and median centered (samples and genes). Green indicates a low level or no mRNA expression for indicated genes; red indicates high expression. Hierarchical clustering characterizes 36 lines as epithelial-like and 34 lines as mesenchymal-like, with 12 forming a distinct intermediate group characterized by above median expression of genes from both the epithelial and mesenchymal gene sets.
A Fluidigm-based EMT gene expression panel classifies NSCLC cell lines as epithelial-like or mesenchymal-like. A, comparison of hierarchical clustering of NSCLC cell lines using the 100-gene expression signature described in the study of Yauch and colleagues to the refined 20-gene EMT panel (reported in Supplementary Table S1). B, micrographs of H1975 and gBEC1 before and after chronic (∼4 weeks) exposure to TGFβ (magnification, 100×). C, quantitative PCR for the 20 EMT genes in H1975 and gBEC1. D, hierarchical clustering of 82 NSCLC cell lines using the 20 genes. |$2^{- \Delta {\rm C\rm_t}$|values were used for clustering. The expression data were normalized and median centered (samples and genes). Green indicates a low level or no mRNA expression for indicated genes; red indicates high expression. Hierarchical clustering characterizes 36 lines as epithelial-like and 34 lines as mesenchymal-like, with 12 forming a distinct intermediate group characterized by above median expression of genes from both the epithelial and mesenchymal gene sets.
Materials and Methods
Please see Supplementary Methods for description of cell lines and tissues and for additional methods used in this study.
Fluidigm expression analysis
EMT gene expression analysis was conducted on 82 NSCLC cell lines using the BioMark 96 × 96 gene expression platform (Fluidigm) and a 20-gene EMT expression panel (Supplementary Table S1 and Methods). The ΔCt values were used to cluster cell lines according to EMT gene expression levels using Cluster v.3.0 and Treeview v.1.60 software (http://rana.lbl.gov/EisenSoftware.htm).
Illumina Infinium analysis
Microarray data were collected at Expression Analysis, Inc. (www.expressionanalysis.com) using the IlluminaHumanMethylation450 BeadChip (Illumina) as described (Supplementary Methods). Array data were analyzed and a methylation classifier was established using a “leave-one-out” cross-validation strategy (Supplementary Methods; refs. 25, 26). Array data have been submitted to the Gene Expression Omnibus database (accession number GSE36216).
Results
Epithelial-like and mesenchymal-like expression signatures correlate with erlotinib sensitivity in vitro
We previously defined a gene expression signature that correlates with in vitro sensitivity of NSCLC cell lines to erlotinib (11). This gene set was highly enriched for genes involved in EMT. From this work and other recent reports, we developed a quantitative reverse transcriptase PCR–based EMT expression panel on the Fluidigm nanofluidic platform (Supplementary Table S1). A comparison of the 100-probe set from the study of Yauch and colleagues and the 20-gene EMT Fluidigm panel for 42 of the lines profiled in the study of Yauch and colleagues shows that this 20-gene expression panel is a representative classifier of EMT (ref. 11; Fig. 1A).
To further evaluate whether our 20-gene panel was representative of the phenotypic changes associated with an EMT, we treated 2 cell lines with TGFβ1. As shown in Fig. 1B, TGFβ1 induced morphologic changes associated with an EMT. We then tested whether TGFβ-induced gene expression changes were consistent with an EMT in these cell lines. As expected, the genes associated with an epithelial phenotype were downregulated and genes associated with a mesenchymal phenotype were upregulated in these cell lines, albeit to different degrees (Fig. 1C).
To determine whether DNA methylation profiling could be used to classify NSCLC cell lines into epithelial-like and mesenchymal-like groups, we used our 20-gene expression panel to assign epithelial-like versus mesenchymal-like status to 82 cell lines. The NSCLC cell lines used in this study include most of the lines profiled in the study of Yauch and colleagues and an additional 52 lines, which included 6 lines with EGFR mutations (summary of cell line descriptions including histology included in Supplementary Table S2). Of the 82 cell lines, 36 were classified as epithelial-like and 34 were classified as mesenchymal-like on the basis of their expression of these markers (Fig. 1D). Twelve lines (indicated in the bottom cluster of Fig. 1D) were classified as epithelial-like but express a combination of epithelial and mesenchymal markers. Our interpretation is that these lines represent a distinct biology and, therefore, we designate them as intermediate. Thus, of the 82 NSCLC lines, we profiled that 89% could be classified clearly as epithelial or mesenchymal. For the most part, this epithelial-like versus mesenchymal-like expression phenotype was mutually exclusive, possibly reflecting a distinct underlying biology, which we hypothesized may be linked to distinct DNA methylation profiles.
Genome-wide methylation profiles correlate with Fluidigm-based EMT signatures in NSCLC cell lines
We first evaluated the Illumina Infinium 450K array as a platform for high-throughput methylation profiling by comparing the β-values for 52 probes and sodium bisulfite sequencing data on a subset of cell lines (N = 12). We observed a highly significant, strong positive correlation between methylation calls by the Infinium array and direct bisulfite sequencing (r = 0.926; Supplementary Fig. S1).
To identify DMRs that distinguished between epithelial-like and mesenchymal-like cell lines, we used a cross-validation strategy which simultaneously constructed a methylation-based classifier and assessed its prediction accuracy (see Supplementary Methods). When applied to our 69 cell line training set, this analysis yielded 549 DMRs representing 915 individual CpG sites that were selected as defining epithelial-like versus mesenchymal-like NSCLC cell lines with a false discovery rate–adjusted P value below 0.01 in 100% of the cross-validation iterations (Supplementary Table S3). The cross-validation estimated accuracy of the methylation-based classifier was 88.0% (±2.4%, 95% confidence interval).
Next, we used the CpG sites included in our methylation-based EMT classifier to cluster the 69 NSCLC cell lines (including 6 EGFR-mutant, erlotinib-sensitive lines) and 2 primary normal lung cell strains and their immortalized counterparts. This analysis revealed a striking segregation of epithelial-like, mesenchymal-like, and normal lines (Fig. 2). Notably, the methylation signal from these CpG sites clustered the epithelial-like and mesenchymal-like cell lines into their respective epithelial-like and mesenchymal-like groups with only 6 exceptions: the mesenchymal-like lines H1435, HCC4017, H647, H2228, H1755, and HCC15 clustered with the epithelial-like group (Fig. 2, indicated by Sample Type in the top). Interestingly, 5 of these 6 lines clustered closely together into a distinct subset of the mesenchymal-like lines by EMT gene expression analysis (Fig. 1D), suggesting that this gene expression phenotype associates with a somewhat distinct underlying methylation signature. Importantly, the mesenchymal-like phenotype harbors a larger proportion of hypermethylated sites than the epithelial phenotype. This suggests that changes in methylation may be required to stabilize the phenotypic alterations acquired during an EMT in NSCLCs.
DNA methylation profiling delineates epithelial-like (E) and mesenchymal-like (M) NSCLC cell lines. Seventy-two NSCLC cell lines and normal lung epithelial cells were profiled using the Illumina Infinium 450K Methylation array platform. Supervised hierarchical clustering was conducted using 915 probes that were significantly differentially methylated between epithelial-like and mesenchymal-like cell lines (false discovery rate = 0.01; Supplementary Methods). Annotated probes sets used for the cluster analysis are listed (Supplementary Table S3). Each row represents an individual probe on the Infinium 450K array and each column represents a cell line. Regions shaded blue in the heat map represent unmethylated regions, regions shaded red represent methylated regions. The top color bar shows columns representing the epithelial-like or mesenchymal-like status of each cell line as determined by Fluidigm EMT gene expression analysis. Green indicates epithelial-like and black indicates mesenchymal-like cell lines. The bottom color bar indicates the erlotinib response phenotype of each cell line. Red indicates erlotinib-sensitive lines; black indicates erlotinib-resistant lines; gray indicates lines with intermediate sensitivity to erlotinib. A Euclidian distance metric was used for clustering without centering; the color scheme represents absolute methylation differences.
DNA methylation profiling delineates epithelial-like (E) and mesenchymal-like (M) NSCLC cell lines. Seventy-two NSCLC cell lines and normal lung epithelial cells were profiled using the Illumina Infinium 450K Methylation array platform. Supervised hierarchical clustering was conducted using 915 probes that were significantly differentially methylated between epithelial-like and mesenchymal-like cell lines (false discovery rate = 0.01; Supplementary Methods). Annotated probes sets used for the cluster analysis are listed (Supplementary Table S3). Each row represents an individual probe on the Infinium 450K array and each column represents a cell line. Regions shaded blue in the heat map represent unmethylated regions, regions shaded red represent methylated regions. The top color bar shows columns representing the epithelial-like or mesenchymal-like status of each cell line as determined by Fluidigm EMT gene expression analysis. Green indicates epithelial-like and black indicates mesenchymal-like cell lines. The bottom color bar indicates the erlotinib response phenotype of each cell line. Red indicates erlotinib-sensitive lines; black indicates erlotinib-resistant lines; gray indicates lines with intermediate sensitivity to erlotinib. A Euclidian distance metric was used for clustering without centering; the color scheme represents absolute methylation differences.
EGFR-mutant NSCLCs typically present as well-differentiated adenocarcinomas in the peripheral lung. As anticipated, based on their epithelial-like expression phenotype and their characteristic histology, the EGFR-mutant cell lines behaved more similarly to epithelial-like lines than to mesenchymal-like lines. We also noted the segregation pattern of the cell lines based on in vitro sensitivity to erlotinib (Fig. 2, indicated by Sensitivity in the middle). As anticipated, based on prior studies (11), nearly all erlotinib-sensitive lines were associated with an epithelial-like phenotype whereas nearly all mesenchymal-like lines were resistant to erlotinib. However, not all epithelial-like lines were sensitive to erlotinib. Ten of the erlotinib-resistant lines clustered with the epithelial-like lines, and 4 erlotinib-sensitive lines, H838, H2030, RERF-LC-MS, and SK-MES-1, clustered with the mesenchymal-like lines. Notably, H838 and SK-MES-1 behaved as outliers with regard to erlotinib sensitivity when clustered by gene expression using our previously defined EMT expression signature (11). Some of the other outliers with respect to erlotinib sensitivity have mutations that explain their apparent resistance. For example, the epithelial-like line H1975 harbors a T790M mutation in EGFR and H1993 harbors an MET amplification. These genetic alterations confer resistance to erlotinib specifically, suggesting that the epigenetic signatures we observed are surrogates for the biologic state of the cell line rather than for erlotinib sensitivity, per se.
Sodium bisulfite sequencing of selected DMRs validates Infinium methylation profiling
We selected 17 DMRs identified by Infinium (Table 1) that were spatially associated with genes (in the 5′ CpG island or intragenic) and examined their methylation status by direct sequencing of cloned fragments of sodium bisulfite–converted DNA. We randomly selected 5 epithelial-like lines, 4 mesenchymal-like lines, and one intermediate line for sequencing validation. As shown in Fig. 3A and B, bisulfite sequencing of approximately 10 clones per cell line for 10 loci revealed that nearly all of these markers were almost completely methylated in at least 4 of the mesenchymal-like cell lines and in the intermediate line H522. In contrast, these loci were completely unmethylated in all 5 of the epithelial-like lines. Four of 10 markers that were methylated in mesenchymal-like lines, ESRP1 and CP2L3/GRHL2, miR200C, and MST1R/RON, are involved in epithelial differentiation (2, 27, 28). ESRP1 is an epithelial-specific regulator of alternative splicing that is downregulated in mesenchymal cells and CP2L3/GRHL2 is a transcriptional regulator of the apical junctional complex (27, 28); miR200C is a known negative regulator of the EMT inducer ZEB1 (29). ESRP1 and GRHL2 expression was downregulated in a larger panel of mesenchymal-like lines relative to all of the epithelial-like lines (Supplementary Fig. S2), consistent with the known absence of ESRP proteins in mesenchymal cells and the ability of these proteins to regulate epithelial transcripts that switch splicing during EMT. Pyrosequencing analysis indicated that GRHL2 was also hypermethylated in this broader panel of mesenchymal-like lines relative to epithelial-like lines (Supplementary Fig. S3).
Sodium bisulfite sequencing of selected DMRs validates Infinium methylation profiling. A, sodium bisulfite sequencing confirms regions of differential methylation between epithelial-like and mesenchymal-like NSCLC cell lines. Candidate regions identified by Infinium methylation profiling were selected for sodium bisulfite sequencing analysis. Ten regions associated with the indicated genes are differentially methylated in mesenchymal-like NSCLC lines. Methylation status was determined at individual CpG sites for 10 to 12 clones per cell line for the target region of each indicated gene. A green bar indicates epithelial-like, a black bar indicates mesenchymal-like, and a blue bar indicates intermediate-like NSCLC lines. Each row represents one clone and each column represents an individual CpG site. Open boxes represent unmethylated CpG sites; filled boxes represent methylated sites; shaded boxes are undetermined. Four loci (CLDN7, LAMB3, STX2, and GJB3) from the 20 gene panel that were also part of the 915-probe classifier were included in this analysis. DMRs associated with 2 additional genes, NKX6.2 and STX2, were evaluated by qMSP (Table 1, Supplementary Figs. S6 and S7). B, seven candidate regions associated with the indicated genes are differentially methylated in epithelial-like NSCLC lines. The DMR associated with ERBB2 was evaluated by pyrosequencing (Table 1; Fig. 5). C, pyrosequencing of the CLDN7 promoter region differentiates 42 NSCLC cell lines on the basis of epithelial-like/mesenchymal-like phenotype. Quantitative methylation was determined at 7 CpG sites by PyroMark analysis software using the equation: % methylation = (C peak height × 100/C peak height + T peak height). Data are represented as the mean ± SD percentage of methylation at 7 CpG sites. D, relative expression of CLDN7 mRNA was determined using a standard ΔCt method in 42 (n = 20 epithelial-like, 19 mesenchymal-like, 3 intermediate) DMSO-treated and 5-aza-dC–treated NSCLC cell lines. Expression values were calculated as a fold change in 5-aza-dC–treated relative to DMSO-treated control cells. Data are normalized to the housekeeping gene GAPDH and represented as the mean of 2 replicates. DMSO, dimethyl sulfoxide; GAPDH, glyceraldehyde-3-phosphate dehydrogenase.
Sodium bisulfite sequencing of selected DMRs validates Infinium methylation profiling. A, sodium bisulfite sequencing confirms regions of differential methylation between epithelial-like and mesenchymal-like NSCLC cell lines. Candidate regions identified by Infinium methylation profiling were selected for sodium bisulfite sequencing analysis. Ten regions associated with the indicated genes are differentially methylated in mesenchymal-like NSCLC lines. Methylation status was determined at individual CpG sites for 10 to 12 clones per cell line for the target region of each indicated gene. A green bar indicates epithelial-like, a black bar indicates mesenchymal-like, and a blue bar indicates intermediate-like NSCLC lines. Each row represents one clone and each column represents an individual CpG site. Open boxes represent unmethylated CpG sites; filled boxes represent methylated sites; shaded boxes are undetermined. Four loci (CLDN7, LAMB3, STX2, and GJB3) from the 20 gene panel that were also part of the 915-probe classifier were included in this analysis. DMRs associated with 2 additional genes, NKX6.2 and STX2, were evaluated by qMSP (Table 1, Supplementary Figs. S6 and S7). B, seven candidate regions associated with the indicated genes are differentially methylated in epithelial-like NSCLC lines. The DMR associated with ERBB2 was evaluated by pyrosequencing (Table 1; Fig. 5). C, pyrosequencing of the CLDN7 promoter region differentiates 42 NSCLC cell lines on the basis of epithelial-like/mesenchymal-like phenotype. Quantitative methylation was determined at 7 CpG sites by PyroMark analysis software using the equation: % methylation = (C peak height × 100/C peak height + T peak height). Data are represented as the mean ± SD percentage of methylation at 7 CpG sites. D, relative expression of CLDN7 mRNA was determined using a standard ΔCt method in 42 (n = 20 epithelial-like, 19 mesenchymal-like, 3 intermediate) DMSO-treated and 5-aza-dC–treated NSCLC cell lines. Expression values were calculated as a fold change in 5-aza-dC–treated relative to DMSO-treated control cells. Data are normalized to the housekeeping gene GAPDH and represented as the mean of 2 replicates. DMSO, dimethyl sulfoxide; GAPDH, glyceraldehyde-3-phosphate dehydrogenase.
Annotation of DMRs selected for sodium bisulfite sequencing or qMSP and pyrosequencing array design
Gene symbol . | Gene name . | Chromosomal location . | Regulatory element . | 5-aza-dC induction . | PBMCs . | Normal lung . | Epithelial/mesenchymal . | P (E vs. M) . | P (S vs. R) . |
---|---|---|---|---|---|---|---|---|---|
ZEB2 | Zinc finger E-box binding homeobox 2 | chr2: 144,989,352-144,989,168 | Conserved regulatory potential | Yes | Unmethylated | No | E | 0.0012 | 0.3432 |
NKX6.2 | Homeobox protein Nkx-6.2 | chr10: 134,448,826-134,449,879 | CpG island | NE | Partially methylated | No | E | 0.0568 | 0.8353 |
PEX5L | Peroxisomal biogenesis factor 5-like | chr3: 181,236,933-181,237,780 | CpG island | No | Unmethylated | No | E | 0.0084 | 0.6566 |
GALR1 | Galanin receptor 1 | chr18: 73,090,412-73,090,797 | CpG island | NE | Unmethylated | No | E | NE | NE |
PTPRM | Protein tyrosine phosphatase, receptor type, M | chr18: 7,932,674-7,933,993 | Conserved regulatory potential | Yes | Methylated | Yes | E | NE | NE |
ME3 | NADP-dependent malic enzyme 3 | chr11: 86,060,344-86,061,158 | CpG island | Some | Unmethylated | No | E | 0.0295 | 0.5771 |
SYK | Spleen tyrosine kinase | chr9: 92,631,210-92,632,740 | None defined | Yes | Methylated | Yes | E | NE | NE |
PCDH8 | Protocadherin 8 | chr13: 52,321,012-52,321,485 | CpG island | Yes | Unmethylated | No | E | 0.0656 | 0.5107 |
HOXC5 | Homeobox C5 | chr12: 52,712,688-52,7l3,529 | CpG island | NE | Unmethylated | No | M | 0.7493 | 0.0008 |
miR200C | microRNA 200c | chr12: 6,942,800-6,943,200 | None defined | NE | Methylated | No | M | NE | NE |
SERPINB5 | Serine (or cysteine) proteinase inhibitor, clade B, member 5; Maspin | chr18: 59,294,906-59,295,319 | Conserved regulatory potential | Yes | Methylated | Yes | M | NE | NE |
BCAR3 | Breast cancer antiestrogen resistance 3 | chr1: 93,852,868–93,853,418 | Conserved regulatory potential | Some | Methylated | Yes | M | 0.001 | 0.2071 |
FAM110A | Family with sequence similarity 110, member A | chr20: 822,480-822,120 | None defined | Some | Methylated | No | M | <0.0001 | 0.0007 |
CLDN7 | Claudin 7 | chr17: 7,103,446-7,106,446 | CpG island | Yes | Partially methylated | No | M | <0.0001 | 0.0011 |
ESRP1 | Epithelial splicing regulatory protein | chr8: 95,653,500-95,654,240 | Yes | Yes | Unmethylated | No | M | <0.0001 | 0.0043 |
GRHL2 | Grainyhead-like 2 | chr8: 102,575,373-102,575,793 | CpG island | Yes | Unmethylated | No | M | <0.0001 | 0.0004 |
RON | Macrophage stimulating 1 receptor | chr3: 49,916,089-49,916,545 | CpG island | Yes | Unmethylated | Some | M | 0.0009 | 0.0028 |
STX2 | 5yntaxin 2 | chr12: 129,868,924-129,869,427 | CpG island | No | Partially methylated | Yes | M | 0.0001 | 0.0023 |
TBCD | Tubulin-specific chaperone D | chr17: 78,440,425-78,440,951 | None defined | NE | Methylated | Yes | M | NE | NE |
ERBB2 | v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 | chr17: 37,861,100-37,863,650 | Putative enhancer | No | Methylated | No | M | <0.0001 | 0.0004 |
Gene symbol . | Gene name . | Chromosomal location . | Regulatory element . | 5-aza-dC induction . | PBMCs . | Normal lung . | Epithelial/mesenchymal . | P (E vs. M) . | P (S vs. R) . |
---|---|---|---|---|---|---|---|---|---|
ZEB2 | Zinc finger E-box binding homeobox 2 | chr2: 144,989,352-144,989,168 | Conserved regulatory potential | Yes | Unmethylated | No | E | 0.0012 | 0.3432 |
NKX6.2 | Homeobox protein Nkx-6.2 | chr10: 134,448,826-134,449,879 | CpG island | NE | Partially methylated | No | E | 0.0568 | 0.8353 |
PEX5L | Peroxisomal biogenesis factor 5-like | chr3: 181,236,933-181,237,780 | CpG island | No | Unmethylated | No | E | 0.0084 | 0.6566 |
GALR1 | Galanin receptor 1 | chr18: 73,090,412-73,090,797 | CpG island | NE | Unmethylated | No | E | NE | NE |
PTPRM | Protein tyrosine phosphatase, receptor type, M | chr18: 7,932,674-7,933,993 | Conserved regulatory potential | Yes | Methylated | Yes | E | NE | NE |
ME3 | NADP-dependent malic enzyme 3 | chr11: 86,060,344-86,061,158 | CpG island | Some | Unmethylated | No | E | 0.0295 | 0.5771 |
SYK | Spleen tyrosine kinase | chr9: 92,631,210-92,632,740 | None defined | Yes | Methylated | Yes | E | NE | NE |
PCDH8 | Protocadherin 8 | chr13: 52,321,012-52,321,485 | CpG island | Yes | Unmethylated | No | E | 0.0656 | 0.5107 |
HOXC5 | Homeobox C5 | chr12: 52,712,688-52,7l3,529 | CpG island | NE | Unmethylated | No | M | 0.7493 | 0.0008 |
miR200C | microRNA 200c | chr12: 6,942,800-6,943,200 | None defined | NE | Methylated | No | M | NE | NE |
SERPINB5 | Serine (or cysteine) proteinase inhibitor, clade B, member 5; Maspin | chr18: 59,294,906-59,295,319 | Conserved regulatory potential | Yes | Methylated | Yes | M | NE | NE |
BCAR3 | Breast cancer antiestrogen resistance 3 | chr1: 93,852,868–93,853,418 | Conserved regulatory potential | Some | Methylated | Yes | M | 0.001 | 0.2071 |
FAM110A | Family with sequence similarity 110, member A | chr20: 822,480-822,120 | None defined | Some | Methylated | No | M | <0.0001 | 0.0007 |
CLDN7 | Claudin 7 | chr17: 7,103,446-7,106,446 | CpG island | Yes | Partially methylated | No | M | <0.0001 | 0.0011 |
ESRP1 | Epithelial splicing regulatory protein | chr8: 95,653,500-95,654,240 | Yes | Yes | Unmethylated | No | M | <0.0001 | 0.0043 |
GRHL2 | Grainyhead-like 2 | chr8: 102,575,373-102,575,793 | CpG island | Yes | Unmethylated | No | M | <0.0001 | 0.0004 |
RON | Macrophage stimulating 1 receptor | chr3: 49,916,089-49,916,545 | CpG island | Yes | Unmethylated | Some | M | 0.0009 | 0.0028 |
STX2 | 5yntaxin 2 | chr12: 129,868,924-129,869,427 | CpG island | No | Partially methylated | Yes | M | 0.0001 | 0.0023 |
TBCD | Tubulin-specific chaperone D | chr17: 78,440,425-78,440,951 | None defined | NE | Methylated | Yes | M | NE | NE |
ERBB2 | v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 | chr17: 37,861,100-37,863,650 | Putative enhancer | No | Methylated | No | M | <0.0001 | 0.0004 |
NOTE: 5-aza-dC expression: NE, not evaluated.
In addition to identifying methylation markers associated with a mesenchymal phenotype, we also found several DMRs that defined epithelial-like cell lines by Infinium and bisulfite sequencing (Fig. 3B). Not surprisingly, 2 of these epithelial-specific markers were associated with genes involved in cellular adhesion functions or have been implicated in the regulation of EMT (PCDH8 and ZEB2). Interestingly, the epithelial-like lines were methylated in the first intronic region of ZEB2, a known inducer of EMT that has been negatively correlated with an epithelial gene signature in NSCLC lines. Collectively, these data establish distinct DNA methylation patterns in epithelial-like and mesenchymal-like cell lines that directly underlie differential gene expression patterns in known mediators of EMT.
Biologic relevance of DMRs
To evaluate the role of methylation in regulating expression of the genes associated with select DMRs, we carried out quantitative PCR in a panel of 34 5-aza-2′-deoxycytidine (5-aza-dC) and dimethyl sulfoxide–treated NSCLC cell lines. Not all DMRs were associated with obvious gene expression changes following 5-aza-dC treatment (see ERBB2, Table 1), but we noted significant induction of GRHL2, ESRP1, and CLDN7 transcripts in mesenchymal-like versus epithelial-like lines (Supplementary Fig. S4). From this group of genes, we selected CLDN7 as a representative marker of EMT and quantified its methylation status by pyrosequencing in an extended panel of 42 cell lines. Nearly all of the mesenchymal-like lines were methylated at the CLDN7 promoter region and exhibited dramatic induction of CLDN7 expression (>10-fold) in response to 5-aza-dC treatment (Fig. 3C and D). In contrast, CLDN7 was expressed in the majority of the epithelial-like cell lines and was not induced further by 5-aza-dC treatment. These data show a direct link between locus-specific DNA hypermethylation and transcriptional silencing in a subset of genes associated with epithelial-like and mesenchymal-like states in NSCLC cell lines.
Quantitative MSP classifies NSCLC cell lines into epithelial and mesenchymal subtypes and predicts for erlotinib sensitivity
Following independent validation of the methylation status of 17 markers by direct sequencing analysis, we expanded our discovery set to 70 NSCLC cell lines to determine whether these markers could correctly classify epithelial-like and mesenchymal-like phenotypes. On the basis of sodium bisulfite sequencing analyses, we selected methylated regions that best distinguished the epithelial-like lines from mesenchymal-like lines and designed quantitative methylation-specific PCR (qMSP) assays based on TaqMan technology (Supplementary Fig. S5). We used qMSP as an assay platform because previous work showed its use in detecting tumor-specific promoter hypermethylation in specimens obtained from patients with cancer. This method is highly sensitive and specific for quantifying methylated alleles and is readily adaptable to high-throughput formats, making it suitable for clinical applications (30–33). TaqMan technology is superior to SYBR-based designs for MSP due to the increased specificity of the assay imparted by the fluorescent probe, which does not act as a primer. To normalize samples for DNA input, we designed a bisulfite-modified RNase P reference assay to amplify input DNA independent of its methylation status. We conducted titration curves using control methylated DNA, DNA derived from peripheral blood monocytes (N = 20), and DNA from cell lines with known methylation status for each DMR (Supplementary Fig. S5). Of note, nearly all of the assays we developed result in essentially binary outputs for the presence or absence of methylation, which obviates the need for defining cutoff points.
Importantly, several of our most specific markers with regard to epithelial-like versus mesenchymal-like status were heavily methylated in peripheral blood mononuclear cell (PBMC) DNA, obviating their clinical use (Table 1; Supplementary Fig. S5). These experiments illustrate that a major hurdle to using qMSP or methylation assays for either early detection screening or predictive diagnostics is contamination by PBMCs or other types of immune cells. Thus, it is critical to test any new set of assays on multiple samples of PBMC DNA if the intent is to develop them for clinical applications.
We first determined whether these assays differentiated epithelial-like from mesenchymal-like cell lines based on our EMT gene expression classification. Thirteen candidate markers of epithelial (E) or mesenchymal (M) status were tested, including RON/MST1R (M), STX2 (M), HOXC5 (M), PEX5L (E), FAM110A (M), ZEB2 (E), ESRP1 (M), BCAR3 (E), CLDN7 (M), PCDH8 (E), NKX6.2 (M), ME3 (E), and GRHL2 (M). Ten of 13 markers were significantly associated with epithelial-like or mesenchymal-like status in using a P < 0.05 cutoff value (Fig. 4; Table 1). We next examined whether these same markers were predictive of erlotinib sensitivity in vitro. Seven of 13 DMRs were strongly predictive of erlotinib resistance (individual P < 0.005; Supplementary Fig. S7) and 3 of 13 DMRs, PEX5L, ME3, and ZEB2, were significantly associated with an epithelial phenotype but were not predictive of erlotinib sensitivity.
qMSP assays differentiate epithelial-like from mesenchymal-like NSCLC cell lines. TaqMan-based methylation detection assays specific for DMRs associated with the genes (A) MST1R/RON, (C) FAM110A, (E) CP2L3/GRHL2, and (G) ESRP1 are presented. qMSP assays were used to determine methylation in epithelial-like (n = 36) and mesenchymal-like (n = 34) NSCLC cell lines. Total input DNA was normalized using a bisulfite-specific RNase P TaqMan probe. Methylation levels are plotted as −ΔCt (indicated target gene- RNase P) for each sample on the y-axis. An increasing −ΔCt value indicates increasing methylation. Cell lines are grouped by epithelial-like/mesenchymal-like status on the x-axis. P values were determined using a 2-tailed, unpaired Student t test. Receiver operating characteristic (ROC) plots for (B) RON, (D) FAM110A, (F) GRHL2, and (H) ESRP1 are presented. Additional ROC plots are also presented (Supplementary Figs. S6 and S7). P values were determined using a Wilcoxon rank-sum test.
qMSP assays differentiate epithelial-like from mesenchymal-like NSCLC cell lines. TaqMan-based methylation detection assays specific for DMRs associated with the genes (A) MST1R/RON, (C) FAM110A, (E) CP2L3/GRHL2, and (G) ESRP1 are presented. qMSP assays were used to determine methylation in epithelial-like (n = 36) and mesenchymal-like (n = 34) NSCLC cell lines. Total input DNA was normalized using a bisulfite-specific RNase P TaqMan probe. Methylation levels are plotted as −ΔCt (indicated target gene- RNase P) for each sample on the y-axis. An increasing −ΔCt value indicates increasing methylation. Cell lines are grouped by epithelial-like/mesenchymal-like status on the x-axis. P values were determined using a 2-tailed, unpaired Student t test. Receiver operating characteristic (ROC) plots for (B) RON, (D) FAM110A, (F) GRHL2, and (H) ESRP1 are presented. Additional ROC plots are also presented (Supplementary Figs. S6 and S7). P values were determined using a Wilcoxon rank-sum test.
Hypomethylation of the ERBB2 DMR correlates with ERBB2 expression and an epithelial-like phenotype in NSCLC cell lines and primary tumors
While many of the DMRs were associated with CpG islands, a significant minority were located within genes. In some cases, the intragenic DMRs appeared to be hypomethylated relative to normal adjacent tissue (Supplementary Fig. S8). One such DMR, which was part of the methylation classifier, included a CpG site near exon 4 of the ERBB2 proto-oncogene. ERBB2 is a clinically validated drug target whose amplification and overexpression are associated with sensitivity to erlotinib and other inhibitors of HER signaling (34, 35). For these reasons, we further evaluated the relationship between ERBB2 expression and DNA methylation status in this region.
In silico analysis of this region using the UCSC genome browser suggested that the differentially methylated CpG site corresponding to probe cg00459816 overlapped with a potential regulatory element. Because this region was not within a CpG island and was not particularly GC rich, we designed pyrosequencing primers flanking this region to determine its methylation status in a panel of epithelial-like and mesenchymal-like cell lines. We observed a remarkable pattern of hypomethylation (mean methylation of 6 CpG sites ≤20%) in 13 of 16 epithelial-like lines relative to mesenchymal-like lines (mean methylation ≥70% in 20 of 21 mesenchymal-like lines; P < 0.001; Fig. 5A). Only one mesenchymal-like line, H1435, was hypomethylated at this locus. This exception was not surprising given our previous observation that H1435 was identified as a mesenchymal-like line by EMT expression analysis (Fig. 1D). Interestingly, epithelial-like lines exhibited significantly higher levels of ERBB2 expression (P < 0.001) than mesenchymal-like lines (Fig. 5B), although we did not observe induction of ERBB2 in these lines upon 5-aza-dC treatment (data not shown). We also observed that ERBB2 hypomethylation was strongly correlated with erlotinib sensitivity in vitro, suggesting its potential use as a predictive clinical biomarker of erlotinib response (Fig. 5C).
Hypomethylation of the ERBB2 correlates with ERBB2 expression and an epithelial-like phenotype in NSCLC cell lines and primary tumors. A, pyrosequencing determines quantitative methylation of the ERBB2 in epithelial-like and mesenchymal-like NSCLC cell lines. Data are represented as mean ± SD percentage of methylation at 6 consecutive CpG sites in the sequenced region. Percentages of methylation at each of the 6 individual CpG sites are also presented (Supplementary Fig. S9). P value was determined using a 2-tailed, unpaired Student t test. B, relative expression of ERBB2 mRNA was determined in NSCLC cell lines using TaqMan-based Fluidigm gene expression analysis (see Materials and Methods for detailed description). Data are represented as |$2^{- \Delta {\rm C\rm_t}$| (ERBB2-reference genes) values. P value was determined using a 2-tailed, unpaired Student t test. C, ERBB2 pyrosequencing analysis of NSCLC cell lines correlates ERBB2 hypomethylation with erlotinib sensitivity. Data are plotted as the mean ± SD percentage of methylation of 6 CpG cites against erlotinib IC50 values (described in Materials and Methods). D, methylation status and relative expression of ERBB2 mRNA was determined in 42 NSCLC primary tumors derived from FFPE tissue using pyrosequencing and TaqMan-based Fluidigm gene expression analysis (see Materials and Methods for detailed description). Percentage of methylation is represented as the mean of 2 CpG sites. ERBB2 expression data are represented as |$2^{- \Delta {\rm C\rm_t}$| (ERBB2-reference genes) values. A median cutoff point was used to dichotomize ERBB2-high and ERBB2-low tumors. P value was determined using a one-tailed Mann–Whitney U test.
Hypomethylation of the ERBB2 correlates with ERBB2 expression and an epithelial-like phenotype in NSCLC cell lines and primary tumors. A, pyrosequencing determines quantitative methylation of the ERBB2 in epithelial-like and mesenchymal-like NSCLC cell lines. Data are represented as mean ± SD percentage of methylation at 6 consecutive CpG sites in the sequenced region. Percentages of methylation at each of the 6 individual CpG sites are also presented (Supplementary Fig. S9). P value was determined using a 2-tailed, unpaired Student t test. B, relative expression of ERBB2 mRNA was determined in NSCLC cell lines using TaqMan-based Fluidigm gene expression analysis (see Materials and Methods for detailed description). Data are represented as |$2^{- \Delta {\rm C\rm_t}$| (ERBB2-reference genes) values. P value was determined using a 2-tailed, unpaired Student t test. C, ERBB2 pyrosequencing analysis of NSCLC cell lines correlates ERBB2 hypomethylation with erlotinib sensitivity. Data are plotted as the mean ± SD percentage of methylation of 6 CpG cites against erlotinib IC50 values (described in Materials and Methods). D, methylation status and relative expression of ERBB2 mRNA was determined in 42 NSCLC primary tumors derived from FFPE tissue using pyrosequencing and TaqMan-based Fluidigm gene expression analysis (see Materials and Methods for detailed description). Percentage of methylation is represented as the mean of 2 CpG sites. ERBB2 expression data are represented as |$2^{- \Delta {\rm C\rm_t}$| (ERBB2-reference genes) values. A median cutoff point was used to dichotomize ERBB2-high and ERBB2-low tumors. P value was determined using a one-tailed Mann–Whitney U test.
Fresh-frozen samples are not typically obtained during diagnosis of NSCLCs or as part of lung cancer clinical trials. Therefore, to be amenable to clinical applications, a pyrosequencing assay must be able to amplify limited, degraded DNA from formalin-fixed, paraffin-embedded (FFPE) tissue (commonly <150 bp). Because of the high concordance between the methylation states of 6 adjacent CpG sites within the ERBB2 DMR using a 228-bp pyrosequencing assay (Supplementary Fig. S9), we redesigned the assay to examine just 2 CpG sites. We evaluated the methylation status of ERBB2 in 42 late-stage (stage IIIb/IV) FFPE NSCLC tumors for which gene expression data were also available. Hypomethylation of the ERBB2 enhancer correlated strongly with expression of HER2 in biopsies obtained from patients who later went on to fail front-line chemotherapy (P < 0.011), recapitulating the pattern that we observed in cell lines (Fig. 5D).
Methylation status of ERBB2 and ZEB2 are associated with an epithelial-like phenotype in NSCLC primary tumors
Although the loss of CDH1 expression is the best-established marker of the EMT, it is a suboptimal classifier because CDH1 is only expressed when the cells are epithelial; a negative outcome could mean that the tissue is mesenchymal or it could be an artifact of tissue processing or staining. In addition, there are the confounding variables of tumor heterogeneity and the sometimes transient nature of an EMT. On the basis of these facts and our own observations (Fig. 1C and D), we reasoned that a combination of several markers would enable a more accurate representation of epithelial-like or mesenchymal-like biologic phenotypes. To determine how robust our expression panel was in tumors, we took CDH1 expression as an EMT anchor and then selected genes (13 in total) whose correlation with CDH1 showed the same sign in both cell lines and tumor samples (inverse correlation for mesenchymal markers; positive correlation for epithelial markers). The 31 tumors were then assigned an EMT score according to expression levels of these 13 genes (see Supplementary Table S4 and Methods). The tumors were also scored according to the methylation-based classifier described above. Remarkably, methylation scores correlated strongly with the expression-based EMT score in both cell lines (r = −0.880, P < 0.0001) and in surgically resected primary tumors (r = −0.668, P < 0.0001; Fig. 6B). These data indicate that, similar to cell lines, NSCLC primary tumors harbor DNA methylation patterns representative of their gene expression profiles and suggest that these methylation patterns can be used to classify tumors into epithelial-like and mesenchymal-like biologic subsets.
DNA methylation profiles and DNA methylation of ERBB2 and ZEB2 predict epithelial/mesenchymal phenotypes in late-stage NSCLC tumors. A, scoring systems were established to determine epithelial-like/mesenchymal-like status based on Infinium methylation profiling and Fluidigm gene expression analysis (see Supplementary Methods). Epithelial-like/mesenchymal-like status as determined by methylation score was correlated with epithelial-like/mesenchymal-like status as determined by gene expression score for 34 NSCLC cell lines. A methylation score (y-axis) for each cell line was computed on the basis of the methylation status at 915 CpG sites. A gene expression score (x-axis) was computed as described (Supplementary Methods). An increasing methylation score indicates an increasingly mesenchymal phenotype, whereas an increasing mRNA expression score indicates an increasingly epithelial gene expression phenotype. B, thirty-one NSCLC primary tumors profiled on the Infinium 450K array were clustered using the 915 EMT probe signature and their methylation scores were correlated with Fluidigm EMT gene expression scores. C, analysis of methylation of ERBB2 and epithelial/mesenchymal status in 47 NSCLC primary tumor samples derived from archival FFPE slides. Methylation status of ERBB2 was determined using pyrosequencing analysis (see Materials and Methods for detailed description). Data are represented as the mean of 2 CpG cites. Epithelial-like/mesenchymal-like status was determined using scores derived from TaqMan-based Fluidigm gene expression analysis (see Materials and Methods for detailed description). A median cutoff point was used to dichotomize epithelial-like/mesenchymal-like expression scores. P value was determined using a Student t test. D, analysis of methylation of ZEB2 and epithelial-like/mesenchymal-like expression score in 60 samples derived from archival FFPE slides. ZEB2 methylation status was determined using qMSP (see Materials and Methods for detailed description). An increasing −ΔCt value indicates increasing methylation. Data were normalized to the gene MeRNAse P. Epithelial-like/mesenchymal-like scores were determined using TaqMan-based Fluidigm gene expression (see Materials and Methods for detailed description). A median cutoff point was used to dichotomize epithelial-like/mesenchymal-like expression scores. P value was determined using a 2-tailed, unpaired Student t test.
DNA methylation profiles and DNA methylation of ERBB2 and ZEB2 predict epithelial/mesenchymal phenotypes in late-stage NSCLC tumors. A, scoring systems were established to determine epithelial-like/mesenchymal-like status based on Infinium methylation profiling and Fluidigm gene expression analysis (see Supplementary Methods). Epithelial-like/mesenchymal-like status as determined by methylation score was correlated with epithelial-like/mesenchymal-like status as determined by gene expression score for 34 NSCLC cell lines. A methylation score (y-axis) for each cell line was computed on the basis of the methylation status at 915 CpG sites. A gene expression score (x-axis) was computed as described (Supplementary Methods). An increasing methylation score indicates an increasingly mesenchymal phenotype, whereas an increasing mRNA expression score indicates an increasingly epithelial gene expression phenotype. B, thirty-one NSCLC primary tumors profiled on the Infinium 450K array were clustered using the 915 EMT probe signature and their methylation scores were correlated with Fluidigm EMT gene expression scores. C, analysis of methylation of ERBB2 and epithelial/mesenchymal status in 47 NSCLC primary tumor samples derived from archival FFPE slides. Methylation status of ERBB2 was determined using pyrosequencing analysis (see Materials and Methods for detailed description). Data are represented as the mean of 2 CpG cites. Epithelial-like/mesenchymal-like status was determined using scores derived from TaqMan-based Fluidigm gene expression analysis (see Materials and Methods for detailed description). A median cutoff point was used to dichotomize epithelial-like/mesenchymal-like expression scores. P value was determined using a Student t test. D, analysis of methylation of ZEB2 and epithelial-like/mesenchymal-like expression score in 60 samples derived from archival FFPE slides. ZEB2 methylation status was determined using qMSP (see Materials and Methods for detailed description). An increasing −ΔCt value indicates increasing methylation. Data were normalized to the gene MeRNAse P. Epithelial-like/mesenchymal-like scores were determined using TaqMan-based Fluidigm gene expression (see Materials and Methods for detailed description). A median cutoff point was used to dichotomize epithelial-like/mesenchymal-like expression scores. P value was determined using a 2-tailed, unpaired Student t test.
Because of tissue limitations and the process of formalin fixation, we were unable to conduct genome-wide analysis in the second cohort of tissue samples (FFPE, chemotherapy failure). For the same reasons, we were unable to amplify sufficient product from the tumor samples used in Fig. 6 to show similar data on CLDN7 as presented for cell lines in Fig. 3. As a result, we investigated whether ERBB2 could serve as a marker of an epithelial-like or mesenchymal-like phenotype in these tumor samples. We used a median cutoff point to classify tumors as epithelial-like or mesenchymal-like. Tumors that were classified as epithelial-like were hypomethylated at the ERBB2 enhancer relative to tumors classified as mesenchymal-like (P < 0.046), indicating a strong association between ERBB2 methylation status and overall gene expression phenotype (Fig. 6C). Having established that the DMR associated with the known EMT regulator ZEB2 was also a marker of an epithelial phenotype in NSCLC cell lines, we evaluated its association with epithelial-like/mesenchymal-like status in 60 late-stage NSCLCs—less material was required for qMSP compared with pyrosequencing, thus the difference in sample number between here and above. We determined ZEB2 methylation status using a qMSP preamplification method that we developed (described in Materials and Methods). qMSP analysis indicated that epithelial-like tumors were significantly hypermethylated at the ZEB2 locus relative to mesenchymal-like tumors (P < 0.008), again recapitulating the pattern that we observed in NSCLC cell lines (Fig. 6D).
These data show that both ERBB2 and ZEB2 methylation status are predictive of an epithelial phenotype in NSCLC cell lines, surgically resected tumors, and formalin-fixed biopsies from patients with NSCLCs who went on to fail front-line chemotherapy. Furthermore, they establish that pyrosequencing and qMSP assays can quantify methylation differences in patient DNA derived from FFPE tissue, showing that DNA methylation analysis may be a suitable platform for predictive biomarker development.
Discussion
In this report, we used an integrated genomics approach combining gene expression analysis with whole genome methylation profiling to show that methylation biomarkers are capable of classifying epithelial and mesenchymal phenotypes in NSCLCs. To our knowledge, this is the first demonstration that genome-wide differences in DNA methylation patterns are associated with distinct biologic and clinically relevant subsets of NSCLCs. DNA- and RNA-based microarray analyses, such as those used in our study to classify phenotypic subsets of NSCLC, require adequate tissue quantity and quality, which cannot always be obtained in routine clinical samples. Here, we show that by refining the genome-wide analyses of both gene expression profiles and DNA methylation profiles in cell lines, we capture a representative signature of markers that can be translated into a classification system applicable to clinical samples where tissue is limited.
A major challenge in the development of predictive biomarkers is the need to establish a robust “cut-point” for prospective evaluation. This is particularly problematic for protein-based assays such as immunohistochemistry. While widely used, immunohistochemistry is subject to a number of technical challenges that limit its use in the context of predictive biomarker development. These limitations include antibody specificity and sensitivity, epitope availability and stability, and the inherent subjectivity of data interpretation by different pathologists (24, 25). Molecular assays that can leverage the dynamic range and specificity of PCR are much more desirable. However, there are also limitations with PCR-based assays: RNA is highly unstable and requires that a cutoff point be defined prospectively. Mutation detection assays, while potentially binary, are limited by the availability of high prevalence mutation hot spots and target sequences. As we have shown, PCR-based methylation assays potentially address many of these issues because they have many of the properties of mutation assays, including a broad dynamic range and an essentially binary readout with similar sensitivity to mutation assays, yet due to the locally correlated behavior of CpG methylation states, the target regions for assay design can be quite large. Most importantly, DNA methylation can be used to infer the biologic state of tumors in much the same way as gene expression has been used in the past.
The use of genomic and transcriptional profiling in the identification of molecular features that are predictive of tumor response to targeted therapeutics is well established (36, 37). A perceived limitation of using DNA methylation as biomarker is that it is by definition, indirect; DNA methylation does not necessarily cause gene silencing but rather is likely a marker of the transcriptional state of a gene. However, recent work has shown that epigenetic profiles are prognostic of clinical outcome in patients with glioblastoma and breast cancer (38, 39) Here, we have shown that DNA methylation profiles are at least as informative as gene expression profiling in terms of defining biologically and clinically relevant differences between NSCLC subtypes. Indeed, when viewed on a gene-by-gene basis across many samples using quantitative PCR–based methods, methylation has a larger quantitative range than gene expression.
Previous studies have shown that loss of E-cadherin expression, indicative of an EMT, is correlated with poor prognosis in patients with NSCLC (11). With respect to the predictive value of E-cadherin expression in patients receiving erlotinib, further analysis in TRIBUTE patients has indicated a significantly longer time to progression for E-cadherin–positive patients receiving erlotinib and chemotherapy versus chemotherapy alone (bib11). To show the clinical relevance of our approach, we sought to identify biomarkers of epithelial and mesenchymal phenotypes that might also serve as surrogate markers of erlotinib sensitivity. Interestingly, more DMRs were predictive of a mesenchymal state and of erlotinib resistance than of erlotinib sensitivity. This finding may imply that a mesenchymal-like state is generally indicative of a more advanced tumor stage (19) and may represent tumors that have undergone more hypermethylation events than less advanced tumors. Conversely, some advanced tumors remain well differentiated with an epithelial-like phenotype. Therefore, we favor an alternative interpretation whereby the increase in overall methylation observed in mesenchymal-like cells that accompanies an EMT occurs because the cells are changing from an epigenetic state acquired during normal differentiation into a fundamentally different state. We propose that tumor cells may undergo a malignant form of differentiation during an EMT that becomes heritable over time through changes in DNA methylation. This interpretation is consistent with a recent observation by Dumont and colleagues in that methylation changes associated with EMT appear to occur in a deterministic rather than a stochastic manner (20). Indeed, we identified consistent hypermethylation events that also correlate with the transcriptional activity of multiple genes previously described as functionally relevant to EMT.
One unexpected finding was that of a DMR within the ERBB2 proto-oncogene locus. In contrast to many of the loci discussed above, the DMR in the ERBB2 locus was hypomethylated relative to adjacent normal tissue. Tumor-specific differential methylation is ordinarily viewed in the context of tumor-acquired hypermethylation. However, tumor-specific differences in DNA methylation were first described by Feinberg and Vogelstein as tumor acquired hypomethylation events involving both the HRAS and KRAS promoters (40). In addition, a number of recent studies have observed loss of imprinting whereby allele-specific expression is lost as a result of changes in methylation of the target gene (39). Thus, the DMR in the ERBB2 locus could undergo some form of demethylation during oncogenesis to support high-level expression of ERBB2. However, an alternative explanation might be that differential methylation of this locus occurs during normal lung development. In this case, epithelial-like tumors harboring a hypomethylated ERBB2 pattern arise from a specific cell lineage within the lung (6). Additional studies analyzing the methylation status of ERBB2 in epithelial subtypes in the normal lung will be required to address this question. While the DMR we identified is not located within a CpG island, its association with known regulatory factors suggests a putative enhancer function. Of note, hypomethylation of the ERBB2 locus was highly correlated with both higher expression of HER2 in cell lines and with an epithelial phenotype as evidenced by our gene expression panel. Our finding that ERBB2 hypomethylation is associated with erlotinib sensitivity raises the possibility that differential methylation of this region could serve as a predictive biomarker for inhibitors of EGFR or HER2 signaling. Similarly, the clinical significance of ZEB2 methylation in NSCLCs warrants further study, given the known role of ZEB2 expression in mediating EMT and the clinical association of EMT with chemoresistance.
One interesting possibility raised by these findings is that the markers identified here could subset NSCLC tumors that are amenable to treatment with epigenetic modifying drugs including histone deacetylase inhibitors and demethylating agents, administered alone or in combination with other targeted therapeutics. The relationship between genetic and epigenetic mechanisms of drug response is not well understood; however, an emerging role for nonmutational mechanisms of drug response exists (41). Indeed, recent work by Sharma and colleagues identifies a transiently acquired epigenetically mediated drug-tolerant phenotype in NSCLC cells, and ongoing clinical trials are evaluating the use of a chromatin-modifying agent in combination with erlotinib in patients with NSCLC (42). Furthermore, Sequist and colleagues describe a histologic transformation consistent with EMT in tumors of 2 patients with NSCLC who developed acquired resistance against EGFR-tyrosine kinase inhibitors in the absence of other known resistance mutations in components of the EGFR signaling pathway (13), suggesting a possible epigenetic mechanism of resistance in these tumors. Considered together with our findings, these studies indicate that the development of epigenetic-based molecular assays for identifying these potentially responsive or resistant patient populations and monitoring tumor epigenetic states will be useful.
In summary, we have shown that genome-wide differences in DNA methylation are fundamentally associated with gene expression patterns that distinguish NSCLC biologic subtypes. These methylation profiles may underlie therapeutically relevant tumor subtypes and may serve as a surrogate for gene expression profiles, a finding with broad implications for identifying subsets of patients who may benefit from molecularly targeted therapy. Importantly, DNA-based methylation markers of epithelial and mesenchymal phenotypes and other clinically relevant tumor subtypes will be useful in settings where patient biopsy material is limited or material for expression-based analyses is unavailable. Further studies evaluating the use of DNA methylation-based assays as predictive biomarkers in NSCLC and other tumor types are warranted.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interests were disclosed.
Acknowledgments
The authors thank Joe Muller for constructing the tricistronic vector used for immortalization of gBECs and gSACs and David Davis for generating the recombinant virus and carrying out the viral transduction of the gBEC and gSAC cultures. They also thank Adi Gazdar and John Minna for providing NSCLC cell lines used in the study; Ashi Malekafzali for procuring the tissue specimens; Expression Analysis for conducting the Illumina Infinium 450K methylation array profiling; and Jeff Settleman, Chana Davis, David Dornan, and Mark Lackner for critically reviewing the manuscript. Finally, the authors thank Zach Boyd and An Do for their contributions to this article.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.