The echinoderm microtubule-associated protein-like 4–anaplastic lymphoma kinase (EML4-ALK) fusion gene has been identified as an oncogene in a subset of non–small cell lung cancers (NSCLC). We used profiling of cancer genomes on an exon array to develop a novel computational method for the global search of gene rearrangements. This approach led to the detection of EML4-ALK fusion in breast and colorectal carcinomas in addition to NSCLC. Screening of a large collection of patient tumor samples showed the presence of EML4-ALK fusion in 2.4% of breast (5 of 209), 2.4% of colorectal (2 of 83), and in 11.3% of NSCLC (12 of 106). Besides previously known EML4-ALK variants 1 (E13; A20) and 2 (E20; A20), a novel variant E21; A20 was found in colorectal carcinoma. The presence of an EML-ALK rearrangement was verified by identifying genomic fusion points in tumor samples representative of breast, colon, and NSCLC. EML4-ALK translocation was also confirmed by fluorescence in situ hybridization assay, which revealed its substantial heterogeneity in both primary tumors and tumor-derived cell lines. To elucidate the functional significance of EML4-ALK, we examined the growth of cell lines harboring the fusion following EML4 and ALK silencing by small interfering RNA. Significant growth inhibition was observed in some but not all cell lines, suggesting their variable dependence on ALK-mediated cell survival signaling. Collectively, these findings show the recurrence of EML4-ALK fusion in multiple solid tumors and further substantiate its role in tumorigenesis. (Mol Cancer Res 2009;7(9):1466–76)
Chromosomal translocations and their corresponding gene fusions play an important role in the initiation of tumorigenesis and have been strongly associated with distinct tumor subtypes (1). The first cancer causing translocation t(9:22)(q34;q11) was identified in chronic myeloid leukemia resulting in the discovery of Philadelphia chromosome (2). This translocation juxtaposes the 5′ portion of the BCR gene with the 3′ portion of the tyrosine kinase ABL1, generating the BCR-ABL1 fusion gene with constitutive kinase activity. Similar to ABL1, a substantial number of genes involved in translocations have been causally implicated in carcinogenesis (3). However, the biological and clinical significance of translocations in solid tumors has been less appreciated compared with hematologic malignancies mainly due to their karyotypic complexity (4, 5).
An increasing number of translocations have been identified in the past two decades using technologies based on fluorescence in situ hybridization (FISH; ref. 6). The latest advances in microarray and sequencing technologies have also provided new tools for interrogating recurrent genomic aberrations at the whole genome level. For example, gene expression profiling of tumor samples coupled with a novel bioinformatics approach resulted in identification of a gene fusion between TMPRSS2 and ERG or ETV1 in the majority of prostate cancer samples (7). The presence of TMPRSS2-ERG fusion in prostate was further confirmed by profiling patient tumor samples on exon arrays (8). Although the feasibility of using exon array data for the detection of gene fusions has been shown, a more systematic approach is required when performing a global search for novel gene rearrangements across multiple cancer types.
Anaplastic lymphoma kinase (ALK) was first identified as a fusion partner of nucleophosmin in anaplastic large-cell lymphoma (ALCL) with the t(2, 5) (p23;q35) chromosomal translocation (9, 10). Translocations linking ALK to multiple fusion partners were subsequently identified in ALCL as well as in other malignancies, including neuroblastomas and myofibroblastic tumors (11). A novel gene fusion of ALK and the echinoderm microtubule-associated protein-like 4 (EML4) was recently identified in non–small cell lung cancer (NSCLC; ref. 12, 13). The EML4-ALK fusion was shown to result from a small inversion within chromosome 2p and was detected in 7% of NSCLC patient population. The presence of EML-ALK fusion in NSCLC was confirmed in a number of subsequent studies (14-18); however, it has not yet been detected in other carcinomas, including breast and colorectal (19, 20). A number of EML4-ALK fusion variants have been identified up to date (12, 18, 20-23); all of them involve an almost identical portion of ALK (exons 20-29) fused to EML4 exons 1 to 13 (E13; A20 or variant 1; ref. 12), 1 to 20 (E20; A20 or variant 2; ref. 12), 1 to 6 (E6a/b; A20 or variant 3; refs. 21, 22), 1 to 14 (E14; A20 or variant 4; ref. 20), 1 and 2 (E2; A20 or variant 5; ref. 20), and 1 to 18 (E18; A20 variant; ref. 18). Similar to other ALK gene fusions, all of the EML4-ALK transcript variants contain the cytoplasmic portion of ALK with the entire kinase domain.
The constitutive kinase activity of ALK is essential for proliferation of ALCL cells and its inactivation represents a feasible therapeutic approach for the treatment of ALCL (24). The intact ALK kinase domain within EML4-ALK possesses marked transforming as well as oncogenic activity in vitro and in vivo, respectively (12, 21, 25). Therefore, tumors harboring EML4-ALK fusion seem to represent a distinct subtype that might be responsive to ALK kinase inhibitors. Several studies showed that ALCL, NSCLC, and neuroblastoma cell lines harboring ALK rearrangements underwent cell cycle arrest and apoptosis when treated with ALK-selective inhibitors (26-29). Administration of two ALK inhibitors, TAE684 and PF-2341066, in ALK-positive tumor xenograft models caused significant tumor growth regression (22, 27, 28). Moreover, treatment with another small-molecule ALK inhibitor resulted in a disappearance of adenocarcinoma nodules in lungs of EML4-ALK transgenic mice (25). Thus, ALK kinase inhibitors alone or in conjunction with other chemotherapeutic agents may represent an effective treatment for patients whose tumors contain the EML4-ALK fusion.
Here, we used exon array profiling to develop a novel computational method for the identification of gene rearrangements in solid tumors. This approach revealed the presence of EML4-ALK fusion in breast, colorectal, and NSCLC samples. We used reverse transcription PCR (RT-PCR) to screen a large collection of breast, colorectal, and NSCLC patient samples for EML4-ALK fusion. The EML4-ALK rearrangement was found in multiple tumors and its underlying structure was deciphered by genomic PCR and FISH. In addition, we identified some tumor-derived cell lines positive for EML4-ALK and used them to investigate the functional significance of the fusion in cell growth and proliferation. Together, these data show the recurrence of EML4-ALK translocation in solid tumors and further substantiate its role in tumorigenesis.
Computational Identification of ALK Rearrangement
One hundred fifty-three samples, including 84 breast (34 HER2 positive, 26 luminal, and 24 basal), 26 colorectal adenocarcinoma, and 43 NSCLC (21 adenocarcinoma and 22 squamous), were profiled on Affymetrix human Exon 1.0 arrays. As chromosomal translocations often lead to up-regulation of the fusion gene at the 3′ end, we developed a whole genome search algorithm to recognize potential gene fusion candidates with a discordant expression between 5′ and 3′ exons in one or more samples (Fig. 1). Normalized expression values of all exon probe sets for a given gene were extracted into a matrix, with rows representing samples and columns representing exon probe sets. Each site between exon probe sets was examined and a putative breakpoint was predicted as a position that shows the maximum difference in expression between 5′ and 3′ exons in a subset of samples using the t statistic. If multiple breakpoints met the threshold in different samples, the best candidate breakpoint was selected for each gene by weighted majority voting, which took into consideration the magnitude of expression difference between 5′ and 3′ exons as well as the number of occurrences among all samples.
Following the whole genome search, ranking of genes with putative breakpoints was done using a summary score (M score), which was the median t statistic of samples containing the predicted breakpoint. The probability density distribution of the M scores for all examined genes was similar across colorectal, breast, and NSCLC samples showing the consistency of the scoring metric for different datasets (Supplementary Fig. S1A). The majority of genes had an M score of 0 as they did not meet the threshold for having a putative breakpoint. The M scores of all other genes had median and interquartile range of 2.2 and 1.15 for colorectal, 2 and 0.8 for breast, and 2.1 and 0.9 for NSCLC, respectively. A small fraction of genes were located at the tail of the density distribution, suggesting that they have remarkably larger M scores than the rest of the genome.
Because observed breakpoints may be present due to reasons other than gene fusions, we applied additional criteria to select biologically relevant genes. The predicted breakpoints between exons were compared with protein domain boundaries, and the exons to the 3′ of the predicted breakpoint were required to contain intact functional domains. Comparison to the Cancer Gene Census database was further used to prioritize genes previously known to be implicated in cancer (3). As a result, we identified ALK as a top candidate having the same breakpoint located between probe sets 2546434 (exon 19) and 2546433 (exon 20) in three types of tumor samples. Based on the Protein Family Database, we determined that the portion of ALK protein located downstream from the breakpoint contains an intact tyrosine kinase domain (30). A heatmap of ALK gene expression in colorectal, breast, and NSCLC samples is shown in Fig. 2. A significant difference between 5′ and 3′ expression was observed for one colorectal (HF-18092), two breasts (HF-20579 and HF-21749), and three NSCLC samples (HF-5158, HF-11756, HF-15224). The M scores of ALK in all three datasets were located at the tail of the density distribution ranging from 12.3 (ranked 6th) in colorectal, 11.2 (ranked 9th) in breast, and 9.9 (ranked 26th) in NSCLC. The M scores of ALK were highly significant with estimated q values of 0 (Supplementary Fig. S1B).
RT-PCR Detection of EML4-ALK Fusion
The sequence located 5′ of the computationally predicted breakpoint in the ALK transcript was determined in breast tumor sample HF-21749 using RNA ligase–mediated rapid amplification of cDNA ends. Sequence analysis of RNA ligase–mediated rapid amplification of cDNA end products revealed that the first 13 exons of EML4 were located upstream of the ALK exon 20, generating the EML4-ALK fusion variant E13; A20 or variant 1 (12). The presence of EML4-ALK fusion in the remaining five tumor samples predicted to have breakpoint in ALK was detected subsequently by RT-PCR.
RT-PCR assay was used to screen a collection of tumor samples, including 209 breast, 83 colorectal, and 106 NSCLC, for the presence of the EML4-ALK fusion transcript. PCR products were run on an agarose gel and fragments representative of variants E13; A20 (247 bp) and E20; A20 (1,000 bp) were cloned and sequenced (see Materials and Methods). The EML4-ALK fusion was detected in five breast tumor samples (5 of 209, 2.4%); both transcript variants E13; A20 and E20; A20 were present (Fig. 3; Table 1). The tumors positive for EML4-ALK included HER2+ (HF-21744), luminal (HF-20260), and basal (HF-20579, HF-21749, HF-21788), suggesting that the presence of the fusion is not specific to a particular breast cancer subtype. Similar to the breast cancer, low incidence of EML4-ALK fusion was found in colorectal carcinoma where only two adenocarcinoma samples were found positive (2 of 83, 2.4%; Fig. 3; Table 1). Sequencing analysis of PCR products showed that HF-18138 sample contained a previously known E20; A20 variant, whereas HF-18092 harbored a novel E21; A20 variant where EML4 exons 1 to 21 were fused to ALK exon 20. Although the E21; A20 variant was predicted as a possible in-frame fusion of EML and ALK (20), it has not yet been found in NSCLC (16, 18). Comparing with the breast and colorectal carcinomas, a higher frequency of EML4-ALK fusion transcript was found in NSCLC (12 of 106, 11.3%; Fig. 3; Table 1). The EML4-ALK fusion was detected only in adenocarcinomas, and a majority of the samples contained variant E13; A20. The HF-15224 sample harbored yet another EML4-ALK variant where EML4 exon 17 was fused with a 75-bp sequence from EML4 intron 17 followed by a truncated ALK exon 20 with a 62-bp deletion at the 5′ end (E17; ΔA20); it has not been determined whether this transcript gives rise to a functional protein. Overall, RT-PCR experiments showed some inconsistencies; in half of NSCLC samples, only two of three independent PCR reactions verified the presence of EML4-ALK fusion. A similar inconsistency of EML4-ALK transcript detection was reported recently (17), suggesting low levels of the fusion transcript expression in NSCLC.
|No. .||Sample Name .||Tissue .||Subtype .||EML4-ALK Variant .|
|No. .||Sample Name .||Tissue .||Subtype .||EML4-ALK Variant .|
Abbreviation: ND, not detected (i.e., EML4-ALK fusion was not found).
RT-PCR assay was subsequently used to examine a panel of 46 breast, 28 colorectal, and 50 NSCLC tumor-derived cell lines for the presence of EML4-ALK. Based on the sequencing of PCR products, two breast (HCC1500 and ZR75-1), one colorectal (SW1417), and five NSCLC cell lines (H460, H1975, HOP18, RERF-LC-KJ, VWRC-LCD) were identified to harbor the EML4-ALK fusion variant E13; A20 (Supplementary Fig. S2). NSCLC cell line H2228 harboring EML4-ALK variant E6a/b; A20 was used as a positive control (22). H2228 cells were shown to express a much higher level of the fusion transcript than most NSCLC tumors (17). Compared with H2228, all of the identified cell lines showed lower levels of EML4-ALK transcript expression. Similar to NSCLC tumors, inconsistent RT-PCR detection of EML4-ALK fusion was observed in cell lines. Two identified EML4-ALK–positive NSCLC cell lines, H460 and H1975, were previously shown as negative for the expression of EML4-ALK protein (29). Similarly, our effort to detect the fusion protein in EML4-ALK–positive cell lines was unsuccessful, except in H2228 cells (data not shown). The discrepancy in detection of EML4-ALK transcripts and the fusion protein was also observed in NSCLC tumor samples (17).
Genomic Confirmation of EML4-ALK Translocation
The genomic structure of EML4-ALK rearrangement was identified in tumor samples representative of NSCLC and breast and colorectal carcinomas. For long-range genomic PCR, we used PCR primers residing in EML4 introns 13 and 20 and in ALK intron 19 (see Materials and Methods). Genomic EML4-ALK fusion points were identified in two NSCLC samples harboring variant E13; A20 (Fig. 4A). The breakpoints in HF-15512 were located 4,726 bp downstream of EML4 exon 13 and 1,021 bp upstream of ALK exon 20, whereas the ones in HF-15560 were located 3,116 bp downstream of EML4 exon 13 and 523 bp upstream of ALK exon 20. These fusion points were distinct and different from those reported previously (12, 16, 18, 19). The genomic structure of the EML4-ALK fusion was characterized in breast tumor samples harboring variant E13; A20 and E20; A20 (Fig. 4A). The breakpoints in HF-21749 were located 385 bp downstream of EML4 exon 13 and 1,066 bp upstream of ALK exon 20; these breakpoints were different from the ones identified in NSCLC samples. In HF-20579 sample, the breakpoints were located 370 bp downstream of EML4 exon 20 and 949 bp upstream of ALK exon 20 (Fig. 4B). Genomic EML4-ALK fusion points were also identified in colorectal tumor sample HF-18092 harboring the E21; A20 variant. Sequence analysis revealed that EML4 was disrupted at 1,831 bp downstream of EML4 exon 21 and was ligated to a position 726 bp upstream of ALK exon 20 (Fig. 4C).
FISH assay was used to confirm the EML4-ALK translocation in tumor samples and tumor-derived cell lines positive for the fusion transcript. Instead of using a separate break-apart probe for each EML4 and ALK gene, we devised a three-color assay that detects the fusion and associated copy number changes within the same cell in a single experiment. Based on our assay design (Fig. 5A), we expected to see an array of red-blue-green fluorescence in cells deficient for EML4-ALK (Fig. 5B) and a red-green fusion signal accompanied by a separate blue signal in cells harboring the fusion (Fig. 5C). The presence of EML4-ALK translocation was confirmed by FISH in all of the examined tumor samples and cell lines. FISH analysis revealed substantial tumor heterogeneity where only a subset of cells ranging from 41.5% to 52.5% harbored the EML4-ALK translocation. Compared with the tumors, cell lines showed even more heterogeneity where EML4-ALK fusion signal was detected in less than 33% of examined cells (Supplementary Table S1). Such vast heterogeneity of EML4-ALK rearrangement in tumors and cell lines was unexpected and has not been reported previously. In addition, a low-level copy number gain of the whole 2p21-23 region was frequently seen in cell lines as well as in tumors (Fig. 5D). Two or more EML4-ALK fusion signals per cell were also observed in several cell lines. There was no deletion of the sequences centromeric to the ALK and telomeric to the EML4 locus, suggesting that a small inversion is the underlying mechanism of the EML4-ALK translocation.
Small Interfering RNA Silencing of EML4-ALK Fusion
To elucidate the functional significance of EML4-ALK in cell growth and proliferation, we examined the number of viable cells following small interfering RNA (siRNA)–mediated silencing of the EML4 and ALK genes. Survival of cells treated with siRNAs targeting EML4-ALK fusion (5′ EML4 and 3′ ALK siRNA pools) was compared with that of cells treated with siRNAs targeting endogenous EML4 and ALK (3′ EML4 and 5′ ALK siRNA pools, respectively). Upon silencing of 5′ EML4 and 3′ ALK, more than 50% growth inhibition was seen in three cell lines, including H2228, H460, and HCC1500 (Fig. 6A). The growth inhibition was first observed at 48 hours posttransfection and it remained significant during the 72- and 96-hour time points (P ≤ 0.01 from one-tail t test). Breast cell line ZR75-1 showed growth inhibition of less than 20% only at the 96-hour time point. A similar response was observed in two other cell lines, VWRC-LCD and RERF-LC-KJ (data not shown). The remaining three EML4-ALK cell lines, H1975, HOP18, and SW1417, did not show change in cell viability under any condition (Fig. 6A). A different growth response observed between the cell lines suggested their variable level of dependence on ALK-mediated cell survival signaling. To verify that the observed growth inhibition was specific to cells harboring the EML4-ALK fusion, we also measured cell viability in several cell lines lacking the fusion (A549, MDA-MB-231, and H838). All of them consistently showed no change in cell viability upon silencing of EML4 and ALK (Fig. 6B), suggesting that the EML4-ALK fusion is important for the growth of cells expressing this oncokinase.
To verify sufficient siRNA-mediated silencing of EML4-ALK, we measured the level of EML4 and ALK transcripts in several cell lines before and after transfection. Similar to the siRNA pools, pairs of gene-specific primers and probes located within 5′ and 3′ of EML4 and 5′ and 3′ of ALK were used (see Materials and Methods). A significant decrease of EML4 transcript ranging from 45% to 80% was observed upon silencing of 5′ EML4 in all cell lines, including the ones with (H2228, H460, HC1500) and without the fusion (A549; Fig. 6C). Silencing of 3′ EML4 was equally efficient, resulting in a similar decrease of endogenous EML4 transcript (data not shown). A substantial decrease of ALK transcript ranging from 30% to 55% was detected upon silencing of 3′ ALK in all of the cell lines (Fig. 6C). However, the efficiency of silencing 5′ ALK could not be determined because the level of the endogenous ALK transcript was below detectable range (high Ct values). Collectively, these data showed that silencing of EML4-ALK fusion was sufficient and, moreover, responsible for the observed cell growth inhibition in H2228, H460, and HCC1500 cells.
The discovery of translocations and their corresponding gene fusion products in solid tumors could potentially increase with the use of innovative approaches that enable their detection. Although one study showed the feasibility of using exon array data for the detection of TMPRSS2-ERG fusion (8), a systematic approach for the identification of gene rearrangements in multiple carcinomas has not yet been reported. Here, we describe for the first time one such approach where exon array profiling of tumor samples in combination with a novel computational method have resulted in the detection of ALK gene fusion in three widespread carcinomas, including breast, colorectal, and previously known NSCLC. The algorithm we developed searches for abrupt changes in the level of expression between two stretches of consecutive exons across multiple samples. From a global search of breast, colorectal, and NSCLC datasets, a putative breakpoint between exons 19 and 20 of ALK with an extremely high M score suggested potentially the presence of the same underlying rearrangement in all three tumor types. As our approach infers genomic aberrations from expression change at the transcript level, gene rearrangements without significant change in expression could not be detected. In addition, many other events, including alternative splicing, may contribute to expression differences between exon probe sets. Therefore, it is important to incorporate other information such as protein domain composition when prioritizing novel, biologically relevant genomic aberrations. One obstacle of using exon array data was the poor performance of numerous probe sets. We sought to mitigate the effect of poor probes by excluding those that cross-hybridize to different genomic locations. Another limitation of identifying putative breakpoints from exon array data is that the predicted location is dependent on the genomic position of the probe sets and thus breakpoints located within intergenic regions could not be detected.
Although the presence of EML4-ALK fusion in NSCLC has been well documented (12, 14-18), our study is the first one to report its occurrence in breast and colorectal carcinomas. Based on RT-PCR screening of patient samples, we detected the presence of EML4-ALK fusion in 2.4% of breast, 2.4% colorectal, and in 11.3% of NSCLC. Although others have searched for EML4-ALK fusion in breast and colorectal tumor samples (19, 20), it was not found likely due to its low frequency. Despite of the low frequency, the recurrence of EML4-ALK fusion in breast and colorectal carcinoma represents a significant increase in the number of patients (∼5,000 per year in the United States) and exceeds that of ALCL patients where ALK translocation is present at a much higher frequency. The frequency of EML4-ALK fusion in NSCLC in our study is slightly higher than those reported previously (3-7%; refs. 12, 14-18). Patient samples from Asian and Caucasian populations were part of our collection; however, any other information on patient history was not available. Factors such as ethnicity, age, gender, tumor histology, mutations in epidermal growth factor receptor (EGFR), KRAS and TP53, tobacco exposure, etc., may have contributed to the observed difference in EML4-ALK frequency in NSCLC. For example, a recent study reported that 4.9% of Chinese NSCLC patients contained EML4-ALK fusion; however, its frequency was much higher in lung adenocarcinomas from nonsmoking women that were wild-type for EGFR and KRAS (29%; ref. 18). Similar to other studies (14, 16, 18, 31), the presence of EML4-ALK fusion was detected only in adenocarcinomas. All of our EML4-ALK–positive NSCLC samples were wild-type for EGFR and KRAS (data not shown), confirming that the presence of fusion and EGFR/KRAS mutations are mutually exclusive (12, 14, 16, 18). In summary, the presence of EML4-ALK fusion in multiple carcinomas, including breast, colorectal, and NSCLC, shows that its occurrence is broader than previously thought and, furthermore, not specific to NSCLC. Similarly, a recent study reported the presence of EML4-ALK in nonneoplastic lung tissue, further questioning its specificity to NSCLC (17).
A number of different EML4-ALK transcript variants have been reported up to date (12, 18, 20-23). The most frequent EML4-ALK variants in NSCLC are E13; A20 (variant 1; ref. 12), E20; A20 (variant 2; ref. 12), and E6a/b; A20 (variant 3; refs. 21, 22). A multiplex RT-PCR assay was recently developed for screening all possible in-frame EML4-ALK variants, including the one where EML4 exon 21 is fused to ALK exon 20 (20). However, variant E21; A20 has not yet been found, suggesting that it is rare or absent in NSCLC (16, 18). Here, we have shown that the E21; A20 variant occurs not in NSCLC but in colorectal cancer. We have also characterized the genomic structure underlying the EML4-ALK variant E21; A20 by identifying precise fusion points in the colorectal tumor sample HF-18092. Similarly, the genomic structure of EML4-ALK fusion was identified in breast and NSCLC samples harboring the E13; A20 and E20; A20 variants. All of the genomic breakpoints were diverse and different from the ones reported previously (12, 16, 18, 19), confirming that multiple genomic EML4-ALK rearrangements result in the production of the same transcript variant.
The FISH assay was used to confirm the presence of EML4-ALK translocation in tumor samples and cell lines harboring EML4-ALK fusion. FISH data had verified that a simple inversion, rather than deletion, is the underlying mechanism of EML4-ALK translocation. Similar to a recent study (16), genomic rearrangements supporting a possibility of other fusion partners for ALK or EML4 were not observed. A substantial heterogeneity of the EML4-ALK translocation in both primary tumors and cell lines has been revealed by FISH analysis. For example, only up to 53% of the tumor cells were found to harbor EML4-ALK translocation. Moderate heterogeneity of EML4-ALK in NSCLC tumors (50-100% cells) was previously reported in a study that used tissue microarrays (15). In contrast to tissue microarrays, which provide information only for a small portion of the tumor, whole tissue sections used here offered an opportunity to examine a large number of cells throughout the tumor, resulting in a better assessment of sample heterogeneity. Interestingly, a recent study showed an even lower percentage of EML4-ALK–positive cells (∼2%) in nine NSCLC tumor samples harboring the fusion transcripts (17). Such vast heterogeneity should be carefully examined because it has significant consequences on future diagnostic and therapeutic approaches designed for patient populations harboring the EML4-ALK rearrangement.
The functional role of EML4-ALK fusion in cell growth and proliferation was assessed by measuring cell viability following siRNA-mediated silencing of EML4 and ALK. Cell growth inhibition was never observed in cell lines lacking the EML4-ALK fusion. On the contrary, cell lines harboring the EML4-ALK fusion showed variable growth response following siRNA silencing. Consistent and significant growth inhibition was observed in three cell lines, including lung H2228 and H460 and breast HCC1500. In contrast to our data, decreased cell growth of H2228 cells was not previously observed upon siRNA silencing of ALK (32). Similarly, H2228 cells exhibited drug sensitivity as well as resistance when treated with the ALK-selective inhibitor TAE684 (22, 29). In our study, the magnitude of growth inhibition did not seem to correlate with the number of cells harboring EML4-ALK translocation. For example, all three cell lines showed a similar growth inhibition upon silencing of EML4-ALK, whereas the number of cells harboring the translocation was 2-fold higher in H2228 (25%) and H460 (29%) than in HCC1500 (11.2%). Although how growth inhibition occurred is not understood, this finding, together with the absence of growth response in some EML4-ALK–positive cell lines (H1975, HOP18, SW1417), suggests that other signaling mechanisms independent of ALK may regulate cell growth and proliferation. One such mechanism involves coactivation of other receptor tyrosine kinases and was reported for the EML4-ALK–positive cell line DFCI032 (22). DFCI032 cells were found resistant to ALK inhibitor TAE684 due to coactivation of EGFR and ERBB2 and only a combination of TAE684 and EGFR/ERBB2 inhibitor was effective in inhibiting their growth. Thus, it appears that the functional role of EML4-ALK in tumorigenesis could vary based on the level of tumor dependence on ALK signaling as well as on the presence of coexisting oncogenic events. A better understanding of both phenomena will enable further progress in designing effective treatments for patient population harboring the EML4-ALK gene fusion.
Materials and Methods
Patient tumor samples representative of breast, colorectal, and NSCLC were acquired from commercial sources and managed by Genentech's Human Tissue Bank. All tumor samples were classified by the Human Tissue Bank into the following subtypes: colorectal adenocarcinoma (83); breast HER2 positive (72), luminal (73) and basal (64); NSCL adenocarcinoma (57), squamous (46) and small cell (3). Tumor-derived cell lines were obtained from the American Type Culture Collection.
RNA from tumor samples was extracted using AllPrep (Qiagen) following the manufacturer's instructions. The RNA quantity was measured using Nanodrop ND-1000 UV-spectrophotometer (NanoDrop Technologies) and RNA quality was assessed using Agilent 2100 Bioanalyzer (Agilent Technologies). rRNA was first removed with RiboMinus Human Transcriptome Isolation (Invitrogen) and cDNA synthesis was done with the Whole Transcript Sense Target Labeling (Affymetrix). The cDNA was fragmented and biotin-labeled using Whole Transcript Terminal Labeling (Affymetrix). Biotinylated targets were hybridized onto Affymetrix human Exon 1.0 ST arrays following the manufacturer's protocol. The arrays were washed in the Fluidics Station 450 and scanned on the GeneChip scanner 3000 7G. Microarray data generated by profiling samples on exon array are deposited in the National Center for Biotechnology Information GEO database under accession number GSE16534.
Expression intensities for the “core” probe sets were calculated using quantile normalization and the RMA-Sketch method from Affymetrix's Power Tools package. Probe sets having inconsistent cross-hybridization properties were excluded from the analysis. Expression intensities for each exon probe set were normalized across all samples by subtracting the mean and dividing the SD. Normalized expression values of all probe sets for a given gene were extracted into a matrix with rows representing samples and columns representing exon probe sets. To identify putative breakpoints in a given sample, each position between exon probe sets was examined for their ability to differentiate the probe sets into two distinct groups. Student's t statistic was calculated, comparing the expression values between the two groups of probe sets for each possible position. A T score was calculated as the maximum t statistic among all possible positions for the sample. If the T score was above a given threshold t0 (we used t0 >1), the position that gives the T score was identified as a putative breakpoint for the given sample. If the maximum t statistics from all possible breakpoints did not pass the threshold, no breakpoint was predicted and T score was set to 0 for the given sample. When multiple breakpoints for a given gene were predicted in different samples, the best breakpoint was determined by weighted majority voting. For each putative breakpoint, every sample was labeled as positive or negative as to whether the sample was predicted to have the respective breakpoint. Breakpoint with the highest sum of T scores from positive samples was identified as the best candidate for the given gene in the whole sample set. To prioritize gene fusion candidates in the whole genome, we sought to design a ranking metric without bias toward gene length or breakpoint frequency. An M score was calculated as the median of T scores among the positive samples for the candidate breakpoint for each gene. The statistical significance of the M score was evaluated using a null distribution empirically derived from permuting the positions of exon probe sets in each sample for 1,000 times. To correct for multiple hypothesis testing, we applied the Benjamini-Hochberg FDR procedure to obtain q values (33).
Detection of EML4-ALK Fusion
A fusion partner of ALK was determined by performing RNA ligase–mediated rapid amplification of 5′ and 3′ cDNA ends with GeneRacer kit (Invitrogen). First-strand cDNA was amplified with Advantage HD DNA polymerase mix (Clontech) using GeneRacer 5′ primer and ALK-6R primer (CATGAGGAAATCCAGTTCGTCCTG). Subsequent nested PCR was done using GeneRacer 5′ nested primer and ALK-2R primer (GAGGTCTTGCCAGCAAAGCAGTAG). Amplification products were gel purified with QIAquick gel extraction (Qiagen) and cloned using pCR4-TOPO TA Cloning (Invitrogen). Sequencing was done in Genentech Sequencing laboratory using 3730xl DNA Analyzer (Applied Biosystems). Sequencing products were analyzed with Sequencher software (Gene Codes). Basic Local Alignment and Search Tool against the BLAT database4
RT-PCR Screening for EML4-ALK Transcripts
RT-PCR was carried out using One Step RT-PCR (Qiagen) and primers previously described for the detection of EML4-ALK variants 1, 2 (12), and 3 (21). PCR conditions for the detection of EML4-ALK fusion transcript included cDNA synthesis at 50°C for 30 min, denaturation at 95°C for 15 min, 40 cycles consisting of denaturation at 95°C for 30 s, annealing at 60°C for 30 s, and strand elongation at 72°C for 1 min and a final elongation step at 72°C for 10 min. As an internal control, we used primers for the glyceraldehyde-3-phosphate dehydrogenase (GAPDH; CAACGACCACTTTGTCAAGCTC and CTCTCTTCCTCTTGTGCTCTTGC) and performed 20 cycles of amplification. PCR products were resolved on agarose gel and their sizes were determined by using Trackit 1 kb Plus DNA ladder (Invitrogen). Fragments representing EML4-ALK fusion product were excised, gel purified, cloned, and sequenced as described above.
Identification of EML4-ALK Genomic Fusion Points
Genomic PCR was done with 50 to 100 ng of DNA in a 25 μL reaction containing LongAmp Taq DNA polymerase (New England Biolabs) under the following conditions: 3 min at 95°C followed by 30 cycles of 10 s at 95°C, 1 min at 55°C, and 10 min at 68°C plus a final extension for 20 min at 68°C. The genomic fusion points for the E13; A20 variant were identified using forward PCR primer Fusion-genome-S (12) or a primer residing in EML4 intron 13 (AGGAGAGAAAGAGCTGCAGTG) and reverse primer Fusion-genome-AS (12) or a primer located in ALK intron 19 (GCTCTGAACCTTTCCATCATACTT). For the detection of the E20; A20 and E21; A20 variants, the forward primers were placed within EML4 exon 20 (ACTGGTCCCCAGACAACAAG) or intron 20 (TTACTCTGTCAAATTGATGCTGCT), whereas the reverse primer was Fusion-genome-AS (12). The PCR products were resolved on agarose gel; if they appeared specific, the original PCR product was used for direct sequencing. However, if additional nonspecific fragments were present, the desired fragments were excised, gel purified, cloned, and sequenced as described above.
FISH assay was done on all tumor samples where formalin-fixed, paraffin-embedded sections were available and on tumor-derived cell lines identified as EML4-ALK positive by RT-PCR. All the locus-specific probes used for the FISH experiments were developed using bacterial artificial chromosomes (BAC) based on the UCSC Genome Browser March 2006 assembly. The FISH probe for the ALK locus composed of two overlapping BACs (RP11-328L16 and RP11-701P18). A single BAC (RP11-299C5) was used for the EML4 and three overlapping BACs (RP11-77G15, RP11-257N21, and CTD-786A2) were used to develop a probe for the region between ALK and EML4 (Fig. 5A). The probes were designed in such a way that the nuclei harboring EML4-ALK exhibit a red-green fusion signal, whereas the normal cells show an array of red, blue, and green signal. A commercially available probe for CEP2 (Abbott Laboratories) was used to confirm the localization of the above-described probes to chromosome 2. The hybridization efficiency of the FISH probes was >95%. At least a hundred nuclei per sample were analyzed for the EML4-ALK rearrangement. Probe preparation and FISH on cytogenetic and formalin-fixed, paraffin-embedded samples were done as described previously (34). The slides were visualized using an Olympus BX61 microscope and analyzed using FISHView software (Applied Spectral Imaging).
Cells were plated in triplicates onto a 96-well plate, and each siRNA experiment was done at least three times. Four distinct siRNA pools targeting 5′ EML4, 3′ EML4, 5′ALK, and 3′ALK were used (Supplementary Fig. S3). Experimental controls included nontransfected cells and cells transfected with Lipofectamine 2000 (Invitrogen), TOX siRNA pool (Dharmacon), and nontargeting siRNA pool (Dharmacon). The CellTiter-Glo Luminescent Cell Viability assay (Promega) was used to determine the number of viable cells at 48, 72, and 96 h posttransfection. A relative decrease in cell growth was determined by comparing the number of cells upon silencing of 5′ EML4 and 3′ ALK (EML4-ALK fusion) with those upon silencing of 3′ EML4 and 5′ ALK (endogenous EML4 and ALK, respectively). The P value for each comparison at a given time point was calculated using a one-tail t test.
The efficacy of siRNA silencing was determined by measuring the relative quantity of EML4 and ALK transcripts in both untransfected and transfected cells. Cell pellets were collected 30 h posttransfection and RNA was prepared using RNeasy Mini Kit (Qiagen). Taqman assay was done in a two-step process using High-Capacity cDNA Reverse Transcription kit and TaqMan Gene Expression Master Mix (Applied Biosystems). Primers and probes targeting 5′ EML4 (Hs01040675_m1), 3′ EML4 (Hs00219420_m1), 5′ ALK (Hs01058321_m1), 3′ ALK (Hs00608289_m1), and GAPDH (4333764F) were purchased from Applied Biosystems, and the relative quantity of transcripts was determined following the manufacturer's protocol.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
We thank the Genentech Human Tissue Bank group for providing tumor samples, the Sequencing laboratory for completing our sequencing requests, James Lee for his invaluable help with siRNA experiments, David Davis and Richard Neve for critical review of the manuscript, and Fred de Sauvage for guidance and support.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.