Abstract
The 5-year survival rate of esophageal cancer is less than 10% in developing countries, where more than 90% of these cancers are esophageal squamous cell carcinomas (ESCC). Endoscopic screening is undertaken in high incidence areas. Biomarker analysis could reduce the subjectivity associated with histologic assessment of dysplasia and thus improve diagnostic accuracy. The aims of this study were therefore to identify biomarkers for esophageal squamous dysplasia and carcinoma. A publicly available dataset was used to identify genes with differential expression in ESCC compared with normal esophagus. Each gene was ranked by a support vector machine separation score. Expression profiles were examined, before validation by qPCR and IHC. We found that 800 genes were overexpressed in ESCC compared with normal esophagus (P < 10−5). Of the top 50 genes, 33 were expressed in ESCC epithelium and not in normal esophagus epithelium or stroma using the Protein Atlas website. These were taken to qPCR validation, and 20 genes were significantly overexpressed in ESCC compared with normal esophagus (P < 0.05). TNFAIP3 and CHN1 showed differential expression with IHC. TNFAIP3 expression increased gradually through normal esophagus, mild, moderate and severe dysplasia, and SCC (P < 0.0001). CHN1 staining was rarely present in the top third of normal esophagus epithelium and extended progressively towards the surface in mild, moderate, and severe dysplasia, and SCC (P < 0.0001). Two novel promising biomarkers for ESCC were identified, TNFAIP3 and CHN1. CHN1 and TNFAIP3 may improve diagnostic accuracy of screening methods for ESCC. Cancer Prev Res; 9(7); 558–66. ©2016 AACR.
Introduction
Cancer of the esophagus is the 6th most common cause of cancer-related deaths in the world (1). Although the esophageal squamous cell carcinoma (ESCC) subtype has been declining to around 30% of all esophageal cancers in the Western world, it remains the most common subtype in developing countries and can represent up to 90% of cases in the highest risk areas of Iran and China (1). Patients with ESCC usually present late, with locally advanced disease or metastases, resulting in a 5-year survival rate in the United States of 18% (2). However, survival can be as low as 10% in-high risk populations, where the medical infrastructure is less well developed (3). Early detection and treatment are associated with improved survival (4, 5). Rapid advances in imaging (6), minimally invasive endoscopic therapies (7–9), and novel chemoradiotherapy regimes (10) provide the opportunity to improve patient outcomes when disease is diagnosed early.
In high incidence areas for ESCC, screening for cancer and dysplasia using Lugol iodine chromoendoscopy is in use (11, 12) and has been demonstrated to significantly reduce ESCC mortality in a recent 10-year prospective community assignment study in China (13). Dysplasia diagnosis is difficult with intra- and interobserver variation. There is currently a lack of suitable adjunctive diagnostic biomarkers for ESCC to facilitate the diagnosis of dysplasia (14, 15). Recent work has begun to identify candidate genes for differentiating ESCC premalignant changes (16–22); however, the studies have different designs and rarely examine the same genes, making cross comparisons difficult especially given that some studies are qualitative rather than quantitative (15).
We hypothesized that we could identify protein biomarkers for squamous cell dysplasia and ESCC that would be suitable for adjunctive use to pathology diagnosis and may inform us on the molecular events leading to carcinogenesis. The aims of this study were therefore to identify candidate genes that are upregulated in ESCC and squamous dysplasia compared with normal esophageal epithelium and then to validate the putative targets at both the RNA and protein levels on samples from a cohort of patients with ESCC and healthy controls.
Materials and Methods
Microarray analysis
A publicly available cDNA microarray dataset (23) was used to identify gene expression profiles from 65 samples (26 ESCC and 39 normal esophageal epithelium controls, Fig. 1). A total of approximately 9,400 unique cDNA clones were available. The normalized test:reference hybridization signal intensity ratios were converted to log2 ratios and clear outliers were excluded (1 from normal control, 2 from ESCC). A one-dimensional support vector machine separation (SVM) score for each gene was calculated for high expression in ESCC compared with low expression in normal controls with a soft 1-norm margin with weight C = 1,000. For each gene, these scores were divided by the fold change between the geometric average of low and high expression and the geometric average fold change of high expression against control.
Genes were ranked using the SVM score, with a low score reflecting: (i) consistent expression in each group of samples, (ii) a good separation between normal and ESCC expression levels, and (iii) a satisfactory level of expression in ESCC samples.
Protein Atlas evaluation
The expected expression profile of the 50 genes with the lowest SVM scores was assessed using the Protein Atlas website (http://www.proteinatlas.org/) to ensure their suitability for paraffin-embedded tissues. Genes were excluded on the basis of reported protein expression in the epithelium of normal oesophagus or if they were known to be solely expressed in the stroma.
Human specimens
The putative genes were validated using qRT-PCR in 30 samples each of: normal esophagus from patients who were endoscopically normal (NN), ESCC from the tumor (T), and normal esophagus taken from the same patients as far from the tumor site as possible (NT). The NT and T groups were a "matched cohort," as the corresponding NT and T samples were paired from each patient.
The protein expression of putative biomarkers validated by qRT-PCR was confirmed by IHC on paraffin-embedded sections from the NT and T samples from the matched cohort, and on 34 paraffin-embedded biopsies of normal esophagus, 31 mild dysplasia (Mild), 31 moderate dysplasia (Mod), and 31 severe dysplasia (Sev) samples from an “independent dysplastic cohort.”
In this study, the NN samples were collected from patients attending endoscopy at Addenbrooke's Hospital (Cambridge, United Kingdom) for routine diagnostic procedures with endoscopically normal esophagus: the NT and T samples from esophagectomy specimens used in a previous study in Linxian, China (24), and the biopsies of the “dysplastic cohort” from another previous study in Linxian, China (25). The sample fixation and processing was all performed according to local, clinical standard operating procedures. All of the original studies, and the use of collected specimens for future evaluations, were approved by the appropriate IRBs.
RNA extraction and qRT-PCR
Total RNA was extracted from frozen samples using an AllPrep DNA/RNA Mini Kit (QIAGEN Ltd) and was then reverse transcribed using the QuantiTect Reverse Transcription Kit (QIAGEN Ltd). qRT-PCR was performed using the LightCycler 480 SYBR Green I Master Mix according to the manufacturer's instructions (Roche Diagnostics GmbH). PCR consisted of 45 cycles of 95°C denaturation (10 seconds), 60°C annealing, and extension (20 seconds). Positive controls were identified for each primer pair. The cycle threshold (Ct) was determined for each sample and the average Ct of the triplicate samples was calculated. The expression of each gene relative to the geometric mean of the triplicate average Ct values for β-actin and 40S ribosomal protein S18 (RPS18) was determined as ΔCt. A melt curve was constructed for each primer.
IHC
Sections of 3.5 μm each were stained using a Bond Max Autostainer with the Bond Polymer Refine Detection Kit according to the manufacturer's instructions (Leica Microsystems). Origin of the primary antibodies and staining conditions are detailed in Supplementary Table S1. A negative control was performed by omission of the primary antibody.
The extent of staining on each slide was double scored. For all genes except CHN1, extent was scored on the basis of the percentage of stained epithelium: 0 if absent, 1 for up to 30%, 2 for 34% to 66%, and 3 for ≥67%. For CHN1, extent was scored based on staining from the basal membrane to the epithelial surface: 0 if absent, 1 for any staining in the basal third of the epithelium, 2 for staining in the basal two thirds of the epithelium, and 3 for staining in the superficial third of the epithelium. Intensity was scored as 0 if absent, 1 for weak, 2 for medium, and 3 for strong staining.
Statistical analysis
A one-way ANOVA analysis with Dunn multiple comparisons test was performed to analyze differences in mRNA expression. A Kruskal–Wallis one-way ANOVA by ranks was performed to analyze differences in IHC scoring between all sample groups. A Wilcoxon matched-pairs signed-rank test was performed to analyze differences in IHC scoring between NT and T samples from the matched cohort. A one-way ANOVA analysis with Dunn's multiple comparisons test was used to compare IHC scoring between sample groups. All statistics were performed using Prism (GraphPad Software).
Results
Identification of putative targets
The SVM score analysis of a publicly available cDNA microarray dataset (examples shown in Supplementary Fig. S1) yielded 800 genes, which were overexpressed in ESCC compared with normal esophagus (P < 10−5, adjusted for multiple comparison). Expected expression profiles were evaluated using the Protein Atlas website for the 50 most significant genes: 9 genes were found to be expressed in normal esophagus (BAP1, SERPINH1, GUCY1A3, LAMC2, MMP2, SQSTM1, MMP10, NT5E, and SOCSE), and 8 were expressed only in the stroma and would therefore not be suitable for a biopsy or cytology screening test (POSTN, CSPG2, MFAP2, CDH11, COL4A2, PODXL, SPARC, and D2S448; Fig. 1). These 17 genes were excluded from validation, but the 8 genes expressed solely in the stroma may be of importance in disease progression. A total of 33 genes were taken to mRNA validation.
mRNA validation
Altered expression in ESCC compared with normal esophagus was confirmed at the mRNA level by qRT-PCR in 30 histopathologically verified tissues each from normal esophagus (NN), normal esophagus from ESCC patients (NT), and ESCC (T). The matched cohort of NT and T samples allowed analysis for the specificity of biomarkers compared with histologically confirmed ESCC within and between cancer patients.
A suitable primer pair could not be designed for E2Ig4. Validation of this target gene was therefore not taken any further. Eight genes (LUM, THY1, PLAU, TIA-2, PTGS2_2, UCHL1, PTSG2_1, LAP1B) were not detected in either normal (NN, NT) or ESCC (T) samples. Four genes (ALCAM, LAMA3, LTBR, ERBB2) had no statistical difference in expression between groups (Figs. 1 and 2A). These genes were excluded from further validation.
The expression of 20 genes was significantly higher in ESCC compared with normal samples (range P = 0.0002 to P < 0.0001). However, despite the significant difference in expression, 10 of these genes (FST, ITGA6, F2R, NELL2, DUSP6, SULF1, IL1B, FRP1, PLAT, and IGFBP7) displayed marked overlap in expression between sample groups. It was therefore unlikely that the difference in mRNA expression would translate to a clear difference in protein expression and these genes were therefore not taken forward to validation at the protein level (Figs. 1 and 2B). It is very interesting to note that for 11 out of these 20 genes, their level of expression in NT was intermediate between NN and T (Figs. 1 and 2; Supplementary Table S2), suggesting a strong field defect around ESCC.
The remaining 10 genes contained groups of analogous genes: TNFα-induced protein 3 and 6 (TNFAIP3, TNFAIP6), collagens type III α 1 and type I α 2 (COL3A1, COL1A2), and C-C motif chemokine ligands 18 and 3 and 3-like 1 (CCL18, CCL3, CCL3L1). The presence of multiple members of the same family suggests that these are biologically relevant in the disease pathogenesis. Some homologous genes validated at the mRNA level, TNFAIP3 and TNFAIP6, COL3A1 and COL1A2 as well as CCL18, CCL3 and CCL3L1. TNFAIP3, COL3A1 and CCL18 were selected over their homologues for protein validation, as they displayed a better separation in expression between sample groups (Figs. 1 and 2C).
Therefore, a total of 6 genes, CCL18, COL3A1, TNFAIP3 together with CHN1, CTSL, TNC, all overexpressed in ESCC compared with normal samples (P < 0.0001) and with less than 20% overlap between these groups, were selected to be taken to protein validation (Figs. 1 and 3). It is interesting to note that the expression of 5 of these genes (CCL18 P = 0.0233, COL3A1 P = 0.0008, CHN1 P = 0.0004, CTSL P < 0.0001, TNC P < 0.0001) in normal esophagus from normal patients was lower than in normal esophagus from cancer patients. Again, this suggests some degree of field defect in pathologically normal epithelium adjacent to the cancer.
Protein validation
Increased expression in ESCC compared with normal esophagus was validated at the protein level by IHC in histopathologically verified sections from the "matched" and "dysplastic" cohorts of samples, containing normal esophagus from ESCC patients (NT), ESCC (T), normal esophagus, and mild, moderate, and severe dysplasia. CCL18 was not statistically overexpressed in ESCC compared with matched normal (Fig. 1 and Supplementary Fig. S2). Although TNC was statistically upregulated in ESCC compared with normal esophagus (P < 0.0002), TNC was not expressed at all in nearly 38% of cases. Furthermore, when it was expressed, its expression was limited to small foci of tumor cells in 45% of samples (Fig. 1 and Supplementary Fig. S2 and S3). Therefore, both CCL18 and TNC were excluded from further IHC (Fig. 1 and Supplementary Figs. S2 and S3). COL3A1 (Fig. 4A) showed a significant difference (P = 0.0001) in staining across the normal, dysplastic, and cancer groups, Dunn multiple comparisons test demonstrated that COL3A1 was overexpressed in all groups compared to normal esophagus but no statistical differences were seen with increased severity of dysplasia. COL3A1 was expressed mainly in the stromal compartment with very limited epithelial staining (Supplementary Fig. S3) and was therefore unlikely to be a suitable diagnostic biomarker. CTSL was expressed in both stroma and epithelium in normal and cancer samples (Fig 4B), but no significant difference (P = 0.3586) was seen across the normal, dysplastic, and cancer groups. In contrast, staining of CHN1 (Figs. 4C and 5) showed a progressive extent of staining towards the superficial layers of the esophageal epithelium, with staining in the superficial third of the epithelium (i.e. a score of 3) seen in 25% of normal esophagus, 34% of mild dysplasia, 65% of moderate dysplasia, 84% of severe dysplasia, and 97% of ESCC samples (P < 0.0001). Furthermore, there is a statistically higher expression in moderate dysplasia, severe dysplasia and ESCC compared with normal esophagus (P < 0.0001 for each comparison) as well as for severe dysplasia and ESCC compared with mild dysplasia (P < 0.0001 for each comparison). TNFAIP3 (Fig. 4D and 5) also demonstrated stronger staining with increasing dysplasia (P < 0.0001), with staining scores of ≥1 (i.e. at least 1 cell staining positively) in 16% of normal esophagus, 18% of mild dysplasia, 30% of moderate dysplasia, 55% of severe dysplasia, and 63% of ESCC samples with expression being significantly higher in ESCC compared with normal esophagus, mild and moderate dysplasia (P < 0.001, P < 0.01 and P < 0.05, respectively).
Discussion
We have demonstrated using microarray data analysis followed by subsequent validation at the mRNA and protein level, that CHN1 and TNFAIP3 are candidate biomarkers for ESCC to aid in the diagnosis of dysplasia and carcinoma. Furthermore, a number of genes, which may play a role in the progression to ESCC, were also identified and the functional role of these genes would be interesting to explore in the future.
Although assessment of mRNA expression by qRT-PCR is notably cheaper and less labor intensive than IHC, thus allowing parallel throughput of multiple prospective biomarkers, RNA species are relatively unstable. Although some progress has been made using such approaches to understand the pathogenesis of SCC (for example, ref. 26), they rarely survive the paraffin-embedding process. We have therefore aimed at identifying protein biomarkers that could be of clinical use using a standard technique, such as IHC, (27) in keeping with other protein biomarker approaches. However, we do acknowledge that one drawback of IHC is that it does rely on subjective interpretation.
Of the 50 markers selected for validation, only two were validated at the protein level using very stringent criteria. This may appear like a low validation rate, especially given that 20 of 33 genes validated at the mRNA level. This is an example of the difficulties faced when trying to identify biomarkers for a particular cancer. Although significant differences can be seen in mRNA expression, they do not necessarily translate to levels of protein expression. Furthermore, it is possible that the changes in mRNA expression only equate to protein level changes that are too subtle to detect by IHC. Although the expression level or staining extent of TNFAIP3 or CHN1, respectively, increased along the progression from normal esophagus to SCC, neither marker is perfect at defining the dysplastic or cancer states. Combining both markers might however offer a specific and sensitive test for esophageal dysplasia and early squamous cell cancer of the esophagus.
There were some limitations to the microarray experiments conducted; however, these did not detract from the results obtained. The microarray experiments were not designed specifically to identify markers distinguishing between ESCC and normal esophagus, but rigorous statistical measures were employed to reduce the effect of this shortfall, which also reduced the number of putative genes. It is interesting to note that only 20 of 33 targets were validated by qPCR. Eight of the excluded genes were due to undetectable expression levels in biopsy samples. This is most likely due to the amplification of the RNA extracted from samples in the microarray protocol (23), which could account for the observed difference in base expression values.
The expression level of 13 of the 20 genes increased gradually between normal esophagus from normal patients, normal esophagus from cancer patients, and cancer samples (Figs. 2 and 3). As the pathology of all samples was confirmed by an expert pathologist, it is unlikely that dysplasia or cancer was present in the normal samples from cancer patients. This intermediate level of SCC biomarkers suggests that a field defect exists in SCC patients. This field defect could be utilized diagnostically. Even in the event of biopsy that misses the area of cancer and/or dysplasia, an abnormal biomarker could still be detected and patients could be recalled for further investigation.
The biological reason for alterations in the expression of CHN1 and TNFAIP3 is worthy of further study. Chimerin 1 (CHN1) is expressed in neurons and is predominantly found in the cerebral cortex. CHN1 is a Rho GTPase-activating protein that plays a role in dendritic morphology (28) and axon guidance (29). Missense mutations in CHN1 have been associated with variants of Duane retraction syndrome (30, 31) and cranial nerve abnormalities (32). CHN1 may therefore play a role in cellular remodeling in dysplastic and cancer cells. TNFα-induced protein 3 (TNFAIP3) is a ubiquitin-editing enzyme, which inhibits NFκB and TNF-mediated apoptosis. It is associated with many autoimmune conditions (33–36) and has been noted to have tumor suppressor functions in lymphomas and colorectal cancer (37–40). However, TNFAIP3 also has oncogenic properties, with implication in tamoxifen resistance in breast cancer and developing resistance to apoptosis to promote cancer cell survival (41). It would be interesting to understand the role of TNFAIP3 in squamous cell cancer and its link with possible resistance to chemotherapy.
These biomarkers could also have applicability to non-endoscopic cytologic screening methods since on a population-wide scale, endoscopy-based methods are not logistically or economically feasible due to their high cost and requirement for expertise (12, 15, 42, 43). Non-endoscopic cell sampling techniques are less invasive and costly, although the sensitivity and specificity of cytologic assessment have been disappointing (12, 25, 44, 45). Coupling a pan-esophageal non-endoscopic cell collection device with analysis of biomarkers could improve diagnostic accuracy. Equally, biomarker analysis of endoscopic specimens could reduce both the requirement for histopathologic expertise and the risk of sampling bias because of the molecular field defect, thus potentially reducing both the procedure length and the number of samples required. Hence the biomarker assisted analysis could reduce the cost of endoscopic diagnosis to a level where it could be considered for screening high risk populations (15, 46).
This work is complimentary to other work to identify biomarkers for the diagnosis of ESCC using a variety of techniques including methylation, array CGH, expression arrays, and proteomics (26, 27, 47, 48). Most of these have focused on invasive cancer whereas our focus is on the detection of dysplasia with a clinically applicable method.
In summary, the biomarker discovery/validation pipeline successfully identified markers for esophageal squamous dysplasia and squamous cell carcinoma of the esophagus. A clinical study assessing the value of CHN1 and TNFAIP3 as diagnostic biomarkers in high incidence areas for ESCC would be the next step. In the context of ESCC screening, it is envisaged that these biomarkers could help the identification of patients with moderate or severe dysplasia that would benefit most from endoscopic treatments to prevent the development of invasive squamous cell cancers.
Disclosure of Potential Conflicts of Interest
R.C. Fitzgerald and P. Lao-Sirieix are named on patents related to a non-endoscopic device called Cytosponge that has been licensed to Medtronic.
Authors' Contributions
Conception and design: G. Couch, P. Lao-Sirieix, R.C. Fitzgerald
Development of methodology: G. Couch, P. Lao-Sirieix, R.C. Fitzgerald
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): G. Couch, J.E. Redman, S.M. Dawsey, P. Lao-Sirieix, R.C. Fitzgerald
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): G. Couch, J.E. Redman, L. Wernisch, R. Newton, P. Lao-Sirieix, R.C. Fitzgerald
Writing, review, and/or revision of the manuscript: G. Couch, J.E. Redman, S.M. Dawsey, P. Lao-Sirieix, R.C. Fitzgerald
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J.E. Redman, P. Lao-Sirieix
Study supervision: R.C. Fitzgerald
Other (analysis and scoring of immunohistochemistry slides): S. Malhotra
Grant Support
The Addenbrooke's Hospital Human Research Tissue Bank, supported by the NIHR Cambridge Biomedical Research Centre (5-4690), supported this study. R.C. Fitzgerald received funding from the NIHR (NIHR-RP-R2-12-011) and the Evelyn Trust (11/23). This study was also supported in part by the intramural research program of the NCI (HHSNZ 61201100483P). R.C. Fitzgerald has programmatic funding from the Medical Research Council (4050375780) and a National Institute of Health Research (NIHR) Professorship. The Fitzgerald Group also has infrastructure support from the Biomedical Research Centre (812039) and the Experimental Medicine Centre (C507/A15580).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.