Abstract
Oncogene-induced senescence (OIS), a tumor-suppressive mechanism that is induced by the replicative and metabolic stress of oncogene activation, is a key barrier in the development of BRAF V600E colon cancer. Inhibition of this mechanism has been observed through epigenetic changes observed in sporadic serrated polyps, as well as through the germline mutations associated with those who develop serrated polyposis. We hypothesize that upregulated autocrine factors exist that are specific to the serrated pathway and also promote bypass of oncogene-induced senescence. To identify such autocrine factors, we integrate analyses of microarrays of sessile serrated polyps and two large colon cancer cohorts, the Cancer Genome Atlas (TCGA; n = 153), and French national Cartes d'Identité des Tumeurs (CIT) program (n = 462), with enhanced gene annotation through natural language processing techniques of the existing medical corpus. We reproducibly associate higher expression of the ligand–receptor axis of TFF2 and CXCR4 with BRAF V600E-mutant colon cancer (P = 3.0 × 10−3 and 0.077, respectively for TCGA; P = 3.0 × 10−8 and 5.1 × 10−7 for CIT). Given well-described oncogenic roles of TFF2 and CXCR4 in colon cancer, and availability of CXCR4 inhibitors for other clinical indications, this ligand–receptor axis may represent an actionable target for prevention and treatment of this molecular subtype of colorectal cancer. Cancer Prev Res; 8(7); 614–9. ©2015 AACR.
Introduction
Sessile serrated adenomas (SSA) represent an alternative pathway to colon carcinogenesis, representing only 2% to 9% of polyps found during colonoscopy, but accounting for approximately 15% of colon cancers (1, 2). Previously, we demonstrated that germline loss-of-function mutations in pathways of oncogene-induced senescence (OIS) predispose individuals to develop multiple SSAs, which typically harbor activating somatic mutations in BRAF (3). In addition, this OIS barrier to carcinogenesis has been observed in sporadic serrated polyps (4). Given the overall complexity of OIS signaling, we hypothesize that such lesions also secrete autocrine factors to promote bypass of this important tumor-suppressive mechanism during neoplastic development and progression. Targeting such factors, if concurrently upregulated in colorectal tumors, may represent a complementary approach to existing chemotherapeutic strategies, as well as provide new opportunities for cancer prevention.
To identify novel disease-specific and therapeutic-relevant targets, recent emphasis has been placed on “big data” approaches given the ever increasing number of publicly available genomic and transcriptome datasets for colon cancer. However, purely agnostic, discovery-based approaches to analyze such extensive datasets through either genome-wide association studies (GWAS) or unbiased transcriptomic signatures have yielded limited success in finding novel precision medicine targets for colon cancer prevention and treatment that are applicable to reasonable subsets of patients (5). Limitations from these approach are inherent in both their study design and appeal; discarding well-established mechanisms of pathogenesis in search of a wider array of targets, the necessity of a large Bonferroni correction for multiple hypothesis testing limits the number of confident hits (6). However, when such strategies are necessarily aggregated to the pathway level, they become heavily reliant on curated annotation libraries, which are often incomplete in the context of the existing medical corpus. For negative regulators of senescence, the associated GO annotation library (GO:2000773) contains only 9 unique genes despite the larger number of genes already implicated in this process.
To overcome these obstacles, we employ alternative “big data” approaches to facilitate more comprehensive hypothesis-directed analytics across large datasets. We independently analyze and integrate the results of several publicly-available microarray datasets spanning serrated neoplastic progression from polyps to cancers with natural language processing techniques to interrogate the entire medical corpus for relevant senescence mechanisms. These approaches rely on the aggregate knowledge of the entire scientific community. We demonstrate the strengths of this approach by identifying a previously unrecognized ligand–receptor axis upregulated in BRAF V600E colon cancer, for which an already existing FDA-approved therapy may be repurposed.
Materials and Methods
Microarray and protein array analyses
All publicly available microarray datasets for human sessile serrated polyps/adenomas were imported directly from the NCBI Gene Expression Omnibus (GSE12514, GSE43841, and GSE45270) and analyzed in the Nextbio cloud platform (Illumina Inc.; refs. 7, 8). Prospectively, the conventionally applied threshold of genes that had a 2-fold overexpression in SSAs over normal mucosa (if the latter not present, then compared against tubular adenomas) and a P < 0.05 in 2/3 datasets were selected for further analysis. This filter identified 104 genes. Level 3 gene expression data (log2 normalized expression from Agilent 244K Custom Gene Expression 4502A-07), stage, BRAF mutation status, and mismatch repair (MMR) status for The Cancer Genome Atlas (TCGA) colon cancer cohort were obtained from the TCGA Data Portal (National Cancer Institute) and Cbioportal (Memorial Sloan Kettering Cancer Center, New York, NY) accessed on February 5, 2014 (9). Log2 normalized gene expression from the Affymetrix U133 2.0 plus array (Affymetrix Inc, Santa Clara, CA), stage, MMR status, and BRAF V600E mutation of the French national Cartes d'Identité des Tumeurs (CIT) program were accessed from GSE39582. Because Affymetrix U133 2.0 arrays have multiple CXCR4 probesets, we utilized the probe 217028 which was predicted in silico to most accurately represent CXCR4 expression (by the Jetset algorithm) over combined calls from all the probesets (10).
Level 3 protein reverse phase protein array data were available for a subset of TCGA colon cancer cases used in the original publication (MD Anderson Cancer Center, Houston, TX). Phospho-ERK1/2 log2 normalized levels were available under values listed under probe MAPK_pT202_Y204-R-V.
Search algorithms and natural language processing
Full gene names were derived from the ExPASy GPSDB Gene/Protein Synonyms finder from the gene symbol hits obtained from the microarray data. Only “preferred” gene names and symbols designated by ExPASy were used for automated literature search queries. Paralog gene identification was performed by matching gene symbols, searching for an exact match after truncation of the numbers at the end. The intersection of full gene names from paralogs was stored for subsequent search.
To identify genes discovered during microarray analysis relevant for senescence, we created a python script that processed all titles and abstracts of the PubMed-indexed articles as of March 1, 2014, using the NCBI E-Utilities function (National Center for Biotechnology Information, Bethesda, MD). Relevant hits included articles that were searched with the gene symbol or gene stem (if paralogs also overexpressed) encoded as text words, along with the term “senescence” as a text word appearing in title or abstract. From these resultant hits, search terms were scanned to occur within two sentences of each other (proximity search) in the title or abstract by using the “.” character as a determinant of sentence boundaries.
Next, to find genes that inhibit senescence, we parsed sentences containing the query term and “senescence” and determined the smallest word distance (with preference occurring in the same sentence), and word order between them. For the case in which the gene preceded “senescence” in the word order, up to 5 words occurring in the same sentence proximal to the query term were examined to ensure no inhibition was discovered by searching for word stems (and all associated synonyms retrieved using the Big Huge Thesaurus API) of the following words: knockdown, knockout, silence, delete, inhibit, inactivate, downregulate, suppress (11). An exception to this rule was the term “tumor suppressor,” which was disregarded if found. Next, these similar word stems were examined in the 5 words preceding the term “senescence.” Cases with inhibition of the query term and no inhibition of “senescence,” and with no inhibition of the query term and inhibition of “senescence” were scored as positive hits.
In the case in which “senescence” preceded the query term, we limited searches in which both phrases occurred within the same sentence and analyzed words suggesting inhibition before each key word as described. We noted the location of the word “by” if it occurred in between terms to signify high likelihood of passive voice, and its location in relative to inhibitory words. In the scenario in which there was an inhibitory word before “senescence” or “by,” and no inhibitory term after “by” or before the query term, this was scored as a positive hit. Also, if there was no inhibitory word before “senescence” or “by,” and an inhibitory term after “by” and before the query term, this was also deemed a positive hit.
All positive hits determined by natural language processing were subsequently validated by manual curation. To further validate the success of algorithm, all initial hits of gene symbols intersecting with senescence were also manually evaluated by an independent reviewer (T. Austin) which confirmed that the algorithm had not missed any other autocrine/paracrine effectors secreted.
Statistical analyses
Linear regression analysis was performed using R 2.15.2 and the “lm” command with stage, BRAF mutation status (WT vs. V600E), and MMR status—microsatellite stable (MSS) or microsatellite-high (MSI-H). Stage information was condensed into one of four possible values (I, II, III, or IV) for the TCGA cohort. CIT cohort had stages 0, I, II, III, and IV. For the TCGA cohort, MSI status was grouped into MSS (MSS or MSI-L) or MSI-H. P values for linear regression reported represent Pr(>∣t∣) values, where P < 0.05 was considered statistically significant. Two-tailed t test was used to analyze the role of BRAF V600E mutational status on TFF2 expression in the CIT cohort after multivariate adaptive regression splines regression (R called “earth”) demonstrated that BRAF mutational status was the only relevant variable after the forward and backward pass. After satisfying the Shapiro–Wilk test for normality, a Pearson correlation coefficient and associated P value were calculated with normalized TFF2 mRNA expression and phospho-ERK1/2 protein expression from the TCGA cohort.
Results
To perform this integrative analysis, we initially analyzed three independent microarray studies of sessile serrated polyps in comparison with either normal mucosa, or if the latter was unavailable, with tubular adenomas to search for protein-coding genes that were at least 2-fold overexpressed at a P < 0.05 reproducible in 2/3 datasets (Supplementary Table S1). Prospectively, we created computational rules to implicate genes involved in senescence inhibition from free text of abstracts indexed within Pubmed. Of the resulting hits from the serrated polyp microarray data, we discovered that the query terms “TFF1,” “trefoil factor” (the stem encompassing paralog genes TFF1 and TFF2), “AGR2,” and “ME1” were linked to observations of senescence inhibition and tumor growth, with TFF1 explicitly linked to OIS (Table 1; ref. 12).
Identification of inhibitors of senescence overexpressed in sessile serrated adenomas
. | Microarray fold change . | . | Natural language processing . | |||
---|---|---|---|---|---|---|
Gene . | GSE12514 . | GSE43841 . | GSE45270 . | Secreted . | PMID . | Parsed Phrase . |
TFF1 | 6.3 | N/A | 5.4 | Yes | 21451135 | Trefoil factor 1 acts to suppress senescence |
TFF2 | 2.7 | 40.2 | 18.4 | Yes | (Paralog of TFF1) | |
AGR2 | 2.4 | 1.9 | 2.5 | No | 22467239 | Knockdown of AGR2 induces cellular senescence |
ME1 | 2.2 | 3.2 | 5.0 | No | 23334421 | Downregulation of ME1 and ME2 also modulates the outcome of p53 activation, leading to strong induction of senescence |
. | Microarray fold change . | . | Natural language processing . | |||
---|---|---|---|---|---|---|
Gene . | GSE12514 . | GSE43841 . | GSE45270 . | Secreted . | PMID . | Parsed Phrase . |
TFF1 | 6.3 | N/A | 5.4 | Yes | 21451135 | Trefoil factor 1 acts to suppress senescence |
TFF2 | 2.7 | 40.2 | 18.4 | Yes | (Paralog of TFF1) | |
AGR2 | 2.4 | 1.9 | 2.5 | No | 22467239 | Knockdown of AGR2 induces cellular senescence |
ME1 | 2.2 | 3.2 | 5.0 | No | 23334421 | Downregulation of ME1 and ME2 also modulates the outcome of p53 activation, leading to strong induction of senescence |
NOTE: 104 genes were found at least 2-fold reproducibly overexpressed in SSAs. Of these genes, natural language processing techniques of the existing medical corpus identified three genes (TFF1, AGR2, and ME1) to inhibit senescence. One paralog (TFF2) was also overexpressed. Bold text reflects query terms, where underlined word designates the negative modifier. PMID refers to PubMed ID number.
Given that TFF1 and TFF2 are already established to be secreted effectors for epithelial restitution of the gastrointestinal tract, we next assessed if overexpression of TFF1 and TFF2 extends beyond the polyp stage to colon cancers derived from the serrated pathway. We analyzed two independent, large datasets of colon cancer gene expression that are linked to well-annotated molecular and clinical data (BRAF mutation status, mismatch repair status, and stage) from the initially published Cancer Genome Atlas Project and the French national CIT program (Table 2; refs. 13, 14).
Characteristics of the TCGA and CIT cohorts
. | TCGA (%) . | CIT (%) . |
---|---|---|
Total number of cases | 153 | 462 |
BRAF mutation status | ||
BRAF V600E | 17 (11.1) | 45 (9.7) |
BRAF WT | 136 (88.9) | 417 (90.3) |
Microsatellite status | ||
MSS | 125 (81.7) | 390 (84.4) |
MSI-H | 28 (18.3) | 72 (15.6) |
Stage | ||
Stage 0 | 0 (0) | 4 (0.9) |
Stage I | 28 (18.3) | 32 (6.9) |
Stage II | 63 (41.2) | 204 (44.2) |
Stage III | 39 (25.5) | 164 (35.5) |
Stage IV | 23 (15.0) | 58 (12.6) |
. | TCGA (%) . | CIT (%) . |
---|---|---|
Total number of cases | 153 | 462 |
BRAF mutation status | ||
BRAF V600E | 17 (11.1) | 45 (9.7) |
BRAF WT | 136 (88.9) | 417 (90.3) |
Microsatellite status | ||
MSS | 125 (81.7) | 390 (84.4) |
MSI-H | 28 (18.3) | 72 (15.6) |
Stage | ||
Stage 0 | 0 (0) | 4 (0.9) |
Stage I | 28 (18.3) | 32 (6.9) |
Stage II | 63 (41.2) | 204 (44.2) |
Stage III | 39 (25.5) | 164 (35.5) |
Stage IV | 23 (15.0) | 58 (12.6) |
In the TCGA cohort, we applied multivariate linear regression incorporating the clinical categorical variables described. BRAF V600E tumors had significantly higher log2 normalized TFF2 expression in the TCGA cohort (P = 3.0 × 10−3). We next attempted to confirm these findings in the independent and larger CIT cohort. Given that the log2 normalized expression levels failed to satisfy conditions for linear regression, we used a multivariate adaptive regression splines technique. However, since model construction using forward and backward variable selection only left BRAF mutation status as the relevant variable, we instead performed a two-tailed t test. This analysis confirmed the strong association between BRAF V600E and TFF2 expression (3.0 × 10−8) in the CIT cohort (Fig. 1). Multivariate linear regression of log2 normalized TFF1 expression in the TCGA cohort was not significantly associated with BRAF V600E (P = 0.13), but demonstrated a stronger relationship in the CIT cohort (P = 3.8 × 10−3). TFF1 expression was not significantly associated with MSI status or stage in either the TCGA or CIT cohort.
Log2 normalized gene expression of TFF2 and CXCR4 in the TCGA Colon Cancer and CIT cohorts. In multivariate regression, BRAF V600E mutation was significantly associated with higher levels of TFF2 in TCGA (P = 3.0 × 10−3), which was also validated by t test in the CIT cohort (P = 3.0 × 10−8). BRAF V600E mutation was also associated in multivariate regression models with higher CXCR4 expression which approached statistical significance in TCGA (P = 0.077). This association was statistically significant in the larger CIT cohort (P = 5.1 × 10−7).
Log2 normalized gene expression of TFF2 and CXCR4 in the TCGA Colon Cancer and CIT cohorts. In multivariate regression, BRAF V600E mutation was significantly associated with higher levels of TFF2 in TCGA (P = 3.0 × 10−3), which was also validated by t test in the CIT cohort (P = 3.0 × 10−8). BRAF V600E mutation was also associated in multivariate regression models with higher CXCR4 expression which approached statistical significance in TCGA (P = 0.077). This association was statistically significant in the larger CIT cohort (P = 5.1 × 10−7).
Since increased TFF2 expression has been linked to increased ERK1/2 activation in other gastrointestinal cell types, as well as evidence that promotility effects of TFF2 are dependent on active ERK signaling, we next assessed if we could detect a positive relationship in between TFF2 mRNA levels and phospho-ERK1/2 protein expression in colonic tissues within the TCGA protein array data (15, 16). Because there is already a well-described association between BRAF V600E on ERK1/2 signaling, we focused this analysis on BRAF wild-type (WT) samples with available protein array data (93 samples; Supplementary Fig. S1). We did observe a statistically significant, positive correlation between TFF2 mRNA expression and phospo-ERK1/2 levels (Pearson correlation coefficient = 0.22, two-tailed P = 0.039).
Given the unambiguous upregulation of TFF2 in colon cancers derived from the serrated pathway, we next examined the relative expression of its putative receptor, CXCR4, which has been implicated in cancer recurrence and metastasis. In the TCGA cohort, multivariate regression demonstrated that higher CXCR4 expression was strongly associated with MSI-H status (P = 2.3 × 10−4); an association with BRAF V600E mutation status that approached statistical significance was also found (P = 0.077). In subgroup analyses restricted to MSS cancers, BRAF V600E was significantly associated with higher levels of CXCR4 (P = 0.043). We again confirmed these associations in the larger CIT cohort (Fig. 1). We also observed that CXCR4 expression was significantly associated with MSI-H status (P = 4.4 × 10−5) and BRAF V600E (5.1 × 10−7), with no substantial collinearity observed (variance inflation factor test) between these two clinical variables in the regression model. Thus, in two independent cohorts, CXCR4 expression in colon cancers is associated with the BRAF V600E mutation.
Discussion
In the context of a tumor's MMR status, the presence of the BRAF V600E mutation has been associated with higher mortality and worse outcomes (17). To identify novel therapeutic strategies for treating this molecular subtype, we searched for autocrine factors that would assist in the bypass of OIS, a key tumor-suppressive mechanism that is inactivated in the course of serrated tumorigenesis. We employed multiple, independent gene expression datasets of polyps and cancers with enhanced annotation strategies based on natural language processing techniques to identify and reproducibly associate higher levels of TFF2 and CXCR4 with BRAF V600E colon cancers. Furthermore, we demonstrate a modest, but statistically significant, positive correlation, between TFF2 mRNA levels and phospho-ERK1/2 status in BRAF wild-type colon cancers. In the context of the many numerous upstream signaling pathways that influence ERK1/2 signaling, this detectable correlation is supportive of the positive relationship between TFF2 and ERK signaling observed in vitro in other types of gastrointestinal cancers (16).
Trefoil factors such as TFF1 and TFF2 are well-described signaling molecules that orchestrate intestinal epithelial reconstitution after injury—inducing epithelial proliferation and migration, while reducing local inflammation. Moreover, excess of these factors have predominantly been associated with protumorigenic effects in the colon. Previous studies have highlighted higher mRNA and protein levels of TFF2 in serrated polyps, and this study extends these observations to serrated cancers with BRAF V600E mutations (7, 8, 18). Trefoil factors have been shown to promote colon cancer cell survival, anchorage-independent growth, and invasion (19). Recently, TFF1 has been demonstrated to block early-stage OIS in a variety of cell types (12). Blockade of its paralog, TFF2, in highly expressing colon cancers has also been demonstrated to induce apoptosis in vitro (20). In addition to sporadic lesions, mRNA and protein expression of trefoil factors are also strongly upregulated in the tumors of those with serrated polyposis syndromes (18). Taken together with our findings, these data support a general role for TFF1 and TFF2 as tumor promoters.
However, the function of TFF1 and TFF2 in carcinogenesis may also be context-dependent. Consistent with its well-described immunomodulatory role in tissue repair, an important exception to an oncogenic effect of TFF2 is its potential tumor-suppressive role within an inflammatory milieu central to pathogenesis of certain cancers of the stomach (Helicobacter pylori associated intestinal-type) and colon (colitis-associated cancer; refs. 21, 22). In this setting, global TFF2 deficiency appears to enhance the chronic inflammatory responses which predispose the transformation of gastrointestinal mucosa to neoplasia.
After demonstrating that TFF2 is reproducibly associated with serrated neoplasia from polyp to cancer, we subsequently demonstrated the novel association of BRAF V600E-mutant colon cancers with higher expression of CXCR4, the putative receptor for TFF2. Higher mRNA levels of CXCR4 in colon cancers have been previously validated at the protein level by immunohistochemistry and have been linked to higher rates of recurrence, distant metastases, and poor survival (23). CXCR4 activation has been described to activate EGFR and ERK signaling, pathways involved in drug resistance to the BRAF V600E inhibitor vemurafenib (24–26).
Given the present clinical availability of a CXCR4 inhibitor (plerixafor) for hematopoietic stem cell harvesting, as well as ongoing clinical trials of newer agents for indications such as myelokathexis, these results may have near-term clinical application. In vitro blockade of CXCR4 by plerixafor in highly expressing colon cancer cell lines has been recently demonstrated to chemosensitize cells to existing therapeutics and also independently reduce proliferation and migration (27). CXCR4 inhibition can also inhibit the oncogenic effects of high levels of TFF2 (16). Notably, conditional knockout mouse models of either TFF2 or CXCR4 demonstrate normal colonic development (28, 29). Also, TFF2 protein expression is relatively absent in normal colonic mucosa (30). Based on this evidence, the biologic effects of CXCR4 inhibitors may be relatively selective for neoplastic tissue, limiting potential toxicity in normal tissues, a critical component of successful clinical translation.
In this study, we also link higher levels of TFF2, previously shown to be elevated in sessile serrated polyps, and now CXCR4 with the already clinically utilized molecular marker, BRAF V600E, in colorectal cancers. Based on these findings, this autocrine signaling axis with higher expression levels of both ligand and receptor is a promising adjuvant therapeutic target for stage II–IV BRAF V600E mutated colorectal cancers, as well as a candidate for cancer prevention for those with serrated polyposis who are at high risk of developing such tumors (Fig. 2). Although TFF2 is not the sole ligand for CXCR4, the oncogenic effects of both proteins make CXCR4 an attractive therapeutic target for this molecular subtype. Experiments and clinical trials that repurpose CXCR4 inhibitors that have successfully completed phase I studies or are FDA approved should be considered for molecularly tailored treatment and prevention of the serrated pathway to colorectal cancer.
Interaction of the TFF2–CXCR4 axis with the active RAS–RAF–MEK–ERK pathway in the serrated pathway of colorectal cancer. Presently, cetuximab is an EGFR inhibitor approved for colorectal cancer. Clinical trials for the use of vemurafenib for BRAF V600E are ongoing. ERK, extracellular signal-regulated kinases.
Interaction of the TFF2–CXCR4 axis with the active RAS–RAF–MEK–ERK pathway in the serrated pathway of colorectal cancer. Presently, cetuximab is an EGFR inhibitor approved for colorectal cancer. Clinical trials for the use of vemurafenib for BRAF V600E are ongoing. ERK, extracellular signal-regulated kinases.
Disclosure of Potential Conflicts of Interest
M.K. Gala has ownership interest in New Amsterdam Genomics, Inc. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: M.K. Gala, A.T. Chan
Development of methodology: M.K. Gala, A.T. Chan
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M.K. Gala, S. Ogino, A.T. Chan
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M.K. Gala, T. Austin, S. Ogino, A.T. Chan
Writing, review, and/or revision of the manuscript: M.K. Gala, T. Austin, S. Ogino, A.T. Chan
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M.K. Gala, A.T. Chan
Study supervision: M.K. Gala, A.T. Chan
Grant Support
M.K. Gala was supported by the American College of Gastroenterology (Junior Faculty Career Development Award) and the NIH (K23 DK103119). A.T. Chan was supported by the NIH (K24 DK098311, R01 CA137178, R01 CA176272).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.