Abstract
The majority of cancer-related deaths in the United States and worldwide are attributed to lung cancer. There are more than 90 million smokers in the United States who represent a significant population at elevated risk for lung malignancy. In other epithelial tumors, it has been shown that if neoplastic lesions can be detected and treated at their intraepithelial stage, patient prognosis is significantly improved. Thus, new strategies to detect and treat lung preinvasive lesions are urgently needed in order to decrease the overwhelming public health burden of lung cancer. Limiting these advances is a poor knowledge of the earliest events that underlie lung cancer development and that would constitute markers and targets for early detection and prevention. This review summarizes the state of knowledge of human lung cancer pathogenesis and the molecular pathology of premalignant lung lesions, with a focus on the molecular premalignant field that associates with lung cancer development. Lastly, we highlight new approaches and models to study genome-wide alterations in human lung premalignancy in order to facilitate the discovery of new markers for early detection and prevention of this fatal disease. Cancer Prev Res; 9(7); 518–27. ©2016 AACR.
Introduction
Lung cancer is the leading cause of cancer-related deaths in the United States and worldwide (1, 2). The 5-year survival rates for advanced late-stage lung cancer remain poor (∼5%; refs. 3, 4), leading to the supposition that early diagnosis of lung cancer or its treatment in premalignant stages will reduce the public health burden of this fatal malignancy (5–8). The vast majority (∼85%) of lung cancers are causally linked to tobacco carcinogen exposure (3, 4). Heavy smokers as well as patients who have survived a prior aerodigestive cancer comprise a high-risk population that may be targeted for early detection and chemoprevention efforts (7). Also, while the risk of developing lung cancer decreases after smoking cessation, the risk never returns to baseline (9). Therefore, novel approaches for early detection and treatment of lung cancer among former smokers are urgently needed. Curtailing these key advances is a limited understanding of the earliest events in lung cancer pathogenesis.
Lung cancer consists of several histologic types, including small-cell lung carcinoma (SCLC) and non–small cell lung carcinoma (NSCLC), with the latter comprising the majority of diagnosed (∼85%) lung tumors (4, 10). NSCLC is composed of the following three major histologic subtypes: lung adenocarcinomas (LUAD), lung squamous cell carcinomas (LUSC), and large-cell lung carcinomas (LCLC; refs. 3, 4, 10). The different lung cancer subtypes are thought to develop through diverse and unique molecular pathways. This supposition is supported by the divergent anatomical locations and suggested cells of origin of the different lung cancer subtypes. For example, compared with LUSCs and SCLCs that typically arise from the major bronchi and are centrally located, LUADs are thought to develop from small bronchi, bronchioles or alveolar epithelial cells and are typically peripherally located as reviewed elsewhere (10, 11).
Accumulating evidence suggests that malignant lung tumors develop through progressive pathologic changes known as preneoplastic or premalignant lesions (12, 13). Limited molecular aberrations (discussed below) have been described in lung premalignant lesions. Studies have shown that cytologically normal airway epithelial cells lining the entire respiratory tract and that have been exposed to smoking carry molecular alterations that may signify the onset of lung cancers (14, 15), a paradigm commonly referred to as the “airway field of injury” (refs. 14, 15; Fig. 1A). Additional studies have also pointed to molecular alterations (e.g., tumor-associated gene expression patterns) that are shared between NSCLCs and adjacent cytologically normal airway epithelia (referred to as molecular and adjacent airway field of cancerization; ref. 14; Fig. 1B). These premalignant fields can provide biologic insights into the development of lung tumors (refs. 14, 15; Fig. 1). Particularly, the airway field of injury, because it is widespread in the lung and precedes the development of tumors and premalignant lesions, also provides potential clinical opportunities for early detection and chemoprevention in high-risk smokers and in patients who have survived a cancer of the upper aerodigestive tract (14, 16).
It is clear that early detection of lung cancer or intraepithelial lesions and effective prevention are crucial to reducing lung cancer mortality. However, and to date, large-scale lung cancer chemoprevention trials have been negative with neutral, or even harmful, results (6, 17). Additionally, early discovery and diagnosis of lung cancers is still suboptimal (5). These limitations are largely attributed to the scarcity of reliable molecular targets (e.g., genomic loci or genes) on which to assess the efficacy of agents in clinical trials (6, 16, 17) and markers for early detection of lesions (5, 18). Identification of these markers and targets is thought to heavily rest on comprehending early lung oncogenesis. In spite of this urgency, we still do not know the molecular mechanisms by which the airway field of injury evolves to a lung cancer (Fig. 1C) and we are unable to identify the individuals at higher risk (e.g., heavy smokers) who are more likely to develop these lesions. Further and in particular, we still have no effective means to predict which premalignant lesions will go on to progress to malignant lung cancer and, if so, how we can prevent or block this process.
Here, we review the current state of knowledge of the molecular pathology of lung preneoplasia and oncogenesis. The review also describes the airway field of injury phenomenon and how molecular alterations in this epithelial field are relevant to the biology of lung cancer and to early detection and prevention of the disease. We will also consider new approaches to better understand genome-wide changes in lung premalignant phases as potential means to identify new markers that can guide strategies for early detection and prevention of lung cancer.
Lung Cancer Pathogenesis
Multistep tumorigenesis, in which tumors develop through a series of progressive pathologic changes (preneoplastic or precursor lesions) with corresponding genetic and epigenetic aberrations, has been demonstrated in various organs, including the skin, lung, and colon (19–23). Hyperplasia, squamous metaplasia, squamous dysplasia, and carcinoma in situ (CIS) comprise changes in the large airways that precede or accompany invasive LUSC (10, 12, 13). Dysplastic squamous lesions themselves are typically categorized by different intensities—mild, moderate and severe—based on a continuum of cytologic and histologic atypical changes (10, 12, 13). The sequential preneoplastic pathologic and molecular changes associated with LUSCs point to a multistage manner in the development of this lung cancer subtype from the respiratory mucosa (22). On the other hand, studies have suggested that the accumulation of molecular abnormalities beyond a certain threshold, rather than the sequence of alterations, is the basis for development of the malignant phenotype (14). For example, sequential premalignant changes have been poorly documented for LUADs, neuroendocrine tumors, and SCLCs. Atypical adenomatous hyperplasia (AAH) is the only sequence of morphologic change identified so far for the development of invasive LUADs (12, 13, 24, 25). Diffuse idiopathic pulmonary neuroendocrine cell hyperplasia (DIPNECH), a rare lesion that includes local extraluminal proliferation in the form of tumorlets, is considered to be the precursor lesion for neuroendocrine carcinoids of the lung (26). Notably, there are no known preneoplastic lesions for the most common type of neuroendocrine lung tumors, SCLCs. Increasing our understanding of early events in lung cancer pathogenesis would allow us to better pinpoint the manner (multistep vs. linear) in which the malignant lung phenotype arises (27). Development of the lung malignant phenotype appears to be both due to stepwise, sequence-specific and multistage molecular pathogenesis and due to accumulation and combination of genetic and epigenetic abnormalities.
Molecular pathogenesis of LUAD
A large body of evidence suggests that at least two molecular pathways are involved in the development of LUAD, the Kirsten rat sarcoma viral oncogene (KRAS) and epidermal growth factor receptor (EGFR) pathways in smoker and non-smoker patients, respectively. Mutations in EGFR, particularly in-frame deletions of exon 19 and L858R and L861Q variants in exon 21, are strongly associated with never-smoking status, female gender, and East Asian ethnicity (28–31). On the other hand, mutations in KRAS are strongly associated with development of LUADs linked to tobacco consumption (4, 28, 32–35). Other commonly occurring point mutations and copy-number alterations (CNA) have been described in LUAD (32, 36, 37) and are reviewed elsewhere (38).
As mentioned before, AAH is the only sequence of morphologic change identified so far for the development of LUADs (12, 13, 24, 25) and there is consensus that the pathogenesis of many adenocarcinomas is largely unknown. Earlier studies have proposed that both Clara cells and alveolar type II (AT2) cells are the progenitor cells of LUADs (10, 39). However, accumulating evidence now strongly suggests that surfactant protein C (SPC) expressing AT2 cells, and not Clara (also known as club) cells, are the cell of origin of LUADs (40, 41). Due to the difficulty in distinction between AAHs and bronchioalveolar carcinoma (BAC), new classification guidelines for LUAD have discontinued the BAC terminology, which is now replaced with adenocarcinoma in situ (AIS; for LUADs with pure lepidic growth) or minimally invasive adenocarcinoma (MIA; predominant lepidic growth with less than 5-mm invasion; ref. 42). It is worthwhile to note that AAHs are thought to be involved in the linear progression of cells of the “terminal respiratory unit” (TRU) to AIS and subsequently invasive LUADs (11, 27, 43) due to the expression of common genes between the TRU and AAH. These studies in East Asian LUAD patients have postulated that most, if not all, peripheral LUADs progress from alveoli through AAH as a preneoplastic lesion. On the other hand, other studies suggested that LUADs may arise from bronchiolar epithelium and small bronchi based on the finding of driver EGFR mutations (e.g., EGFR p.L858R) in bronchioles adjacent to EGFR-mutant LUADs (44, 45).
There are shared molecular alterations between LUADs and AAHs and they provide further evidence that AAHs represent preneoplastic lesions in the course of LUAD pathogenesis (46). Previous studies have reported that ∼30% to 40% of AAHs display KRAS (codon 12) mutations (25). In addition, EGFR mutations are shared at similar frequencies between AAHs, AIS lesions, and LUADs, particularly in sequence of lesions preceding development of the TRU subtype of LUADs (11). It is worthy to note that EGFR mutations were detected in adjacent normal-appearing peripheral respiratory epithelium in 43% of EGFR-mutant LUAD but not in patients wild-type for the oncogene (44), suggesting that there may be different cells of origin (bronchiolar epithelium and AT2 cells) for EGFR-mutant LUADs. Other molecular aberrations that were identified in AAHs include overexpression of Cyclin D1, survivin, and ERBB2 oncoproteins (24, 47, 48), loss-of-heterozygosity (LOH) in chromosomes 3p (18%), 9p (CDKN2A), 9q (tuberous sclerosis complex 1/TSC1), 17q, and 17p (TP53; refs. 10, 49) and reduced expression of the tumor suppressor serine threonine kinase 11 (STK11 also known as LKB1; ref. 50). Epigenetic alterations including DNA methylation of CDKN2A and PTPRN2 were also reported in AAHs (51). Studies aimed at interrogating differential gene expression and copy-number profiles between low-grade lesions (e.g., AAH and AIS) and LUADs found that amplification of the EGFR oncogene was the predominant differential molecular feature between early lesions and LUADs (27).
Studies have also shown that AAH and AIS lesions express high levels of NKX2-1 (also known as TITF-1; ref. 43), a homeodomain-containing transactivating factor predominantly expressed in the terminal lung bronchioles and lung periphery (52, 53). In addition, NKX2-1 is crucial for branching morphogenesis during normal lung development (52–54) and transactivates the expression of surfactant proteins important for the differentiation of AT2 cells (55). NKX2-1 is commonly gained or amplified (14q13.3 locus) in LUAD pointing to a cell-lineage specific oncogenic function for this transcriptional factor in LUADs (32, 36, 56, 57). It is important to mention that while various reports have pointed to a growth-promoting role for NKX2-1 in human LUAD cells (56–58), studies in Kras(LSL-G12D/+);p53(flox/flox) mice demonstrated that Nkx2-1 suppresses murine lung tumor growth and metastatic potential in vivo (59).
Molecular pathogenesis of LUSC
It has been postulated that basal cells in the large airways exhibit pluripotent capacity following cigarette smoke exposure, giving rise to metaplastic and dysplastic squamous cells, which in turn function as precursors of squamous cell carcinomas (10, 12, 13). Earlier studies aimed at examining sequential molecular abnormalities in the pathogenesis of LUSC indicated that genetic abnormalities commence in histologically normal epithelium and augment with progression of histopathologic changes (22, 60, 61). Allelic losses at multiple 3p (3p21, 3p14, 3p22–24 and 3p12) chromosome sites and 9p21 (CDKN2A) in normal-appearing bronchial epithelia were described as the earliest detected changes in the sequence of pathogenesis of LUSCs (22, 61). Allelic imbalance in 8p21–23, 13q14 (RB1) and 17p13 (TP53) were detected in squamous preinvasive lesions (22, 61, 62). DNA methylation of CDKN2A is also found in early squamous preinvasive lesions with frequencies that increase with histopathologic progression (24% in squamous metaplasia and 50% in CIS; ref. 63). Vascular endothelial growth factor (VEGF) isoforms and VEGF receptors (VEGFR) have been found to be elevated in bronchial squamous dysplastic lesions compared with normal bronchial epithelia (64), supporting the notion that angiogenesis develops early in lung carcinogenesis and that these abnormalities provide a rationale for the development of targeted antiangiogenic chemoprevention strategies. Further, angiogenic squamous dysplasias (ASD), a subset of squamous dysplasias, exhibit vascular budding in the subepithelial tissues with elevated microvessel density pointing to an architectural rearrangement of the capillary microvasculature (65). Additionally, the presence of this lesion in high-risk smokers suggests that aberrant patterns of microvascularization may occur at an early stage of bronchial carcinogenesis (65, 66). Besides altered angiogenesis, other pathways, including fatty acids and retinoic acid signaling, are also shared between squamous preinvasive lesions and LUSCs and are reviewed elsewhere (67, 68).
The SOX2 oncogene, demonstrated to be amplified (3q26.3) in LUSCs, promotes survival of tumors with amplification of the gene (69, 70). Additionally, SOX2 was found to foster the growth of lung tumor cells, particularly cells of squamous histology, and to mediate stemness of lung cancer stem cells (69, 71, 72). Alterations in SOX2 are also evident in premalignant phases of LUSC development. Yuan and colleagues demonstrated that SOX2 protein expression was completely absent in lung adenocarcinoma pathogenesis and highly expressed in preinvasive squamous lesions and in LUSCs (73). McCaughan and colleagues demonstrated, by analysis of the 3q chromosomal region, that high-grade bronchial dysplasias but not low-grade lesions exhibited amplification of SOX2 (74). Notably, SOX2 gain was associated with clinical progression of high-grade preinvasive squamous lesions (74). In a manner analogous to NKX2-1 in LUADs, it is plausible to surmise that SOX2 functions as a lineage-restricted oncogene in the early pathogenesis of LUSCs (73, 74).
In contrast to studies in malignant tumors, genome-wide analyses of lung premalignancy have been extremely rare. It is conceivable that molecular profiling of premalignant lung lesions will substantially increase our understanding of early events in lung cancer pathogenesis. An important step in this direction was the study by Nakachi and colleagues in which single-nucleotide polymorphism (SNP) arrays were used to characterize novel chromosomal alterations in preinvasive dysplastic lesions (75). The study revealed known and novel recurrent chromosomal alterations in dysplastic lesions (75). These chromosomal regions included losses of putative tumor suppressor genes RNF20 and SSBP2 and amplification of the RASGRP3 oncogene (75). Notably, some of the chromosomal alterations were found in multiple premalignant lesions (75), indicative of widespread field effects (discussed further below). In another study by Ooi and colleagues, laser-capture microdissected and paired normal basal cells, premalignant lesions and LUSCs were analyzed by RNA-sequencing (RNA-seq; ref. 76). This study pinpointed gene expression profiles modulated early on between normal cells and premalignant lesions as well as profiles that changed between preneoplastic tissues and the LUSCs (76). Using pathways and gene network analyses, these expression changes were found to be indicative of TP53 inhibition and MYC activation (76). The study by Ooi and colleagues provides important clues into early mechanisms of LUSC pathogenesis and potential biomarkers for chemoprevention. It is important to note that the findings were limited by a small number of LUSC patients (n = 4; ref. 76), thus warranting future studies with larger sample sizes. In addition, similar studies that comprise orthogonal (e.g., DNA sequencing) analyses will allow the identification of drivers in the development of squamous preinvasive lesions and in their progression to malignant lung tumors.
SCLC pathogenesis
To date, there is no phenotypically identifiable preneoplastic lesion for SCLC. Compared with NSCLC, very little is known about early events in the molecular pathogenesis of SCLC. A study assessing LOH at several chromosomal sites and microsatellite instability in histologically normal and hyperplastic bronchial epithelia adjacent to lung tumors demonstrated a significantly higher incidence of alterations in epithelia nearby SCLCs than those adjacent to NSCLCs (77). These earlier findings point to widespread and extensive molecular damage in normal tissues surrounding SCLCs and suggest that SCLC may develop directly from histologically normal or mildly abnormal epithelium without passing through a more complex histologic sequence. It is noteworthy that studies using genetically engineered mouse (GEM) models demonstrated that SCLC development is driven by inactivation of TP53 and RB1 (78) as well as by activation of the hedgehog pathway (79, 80). In a study using integrative genomic analyses, human SCLCs were found to exhibit known alterations in TP53 and RB1, recurrent mutations in the histone modifiers CREBBP, EP300, and MLL as well as amplifications in the FGFR1 oncogene (81). More recently, proteomic profiling studies have identified alterations in SCLC that comprise therapeutically viable targets, including PARP1 and DNA repair proteins (82, 83). Also, Mohammad and colleagues described the histone demethylase lysine demethylase 1 (LMD1) as a therapeutic target for SCLC (84). It is noteworthy that the roles of these aforementioned alterations and targets in normal and premalignant phases of SCLC development are still poorly comprehended.
Field of Injury and Field Carcinogenesis
Earlier work by Danely Slaughter assessing oral premalignant and malignant lesions revealed widespread histologic abnormalities in epithelial tissues that were distant to the oral cavity (e.g., in esophagus and the lung) and at the margins of invasive cancers (85). The studies by Slaughter and colleagues point to a “field cancerization,” an effect in which the entire carcinogen-exposed epithelial field is primed for cancerization and tumor development (85). Field carcinogenesis has been described in various epithelial cell malignancies, including gastric, esophageal, hepatic, cervical, skin, and lung cancers (14, 86–88). In 1961, Auerbach and colleagues suggested that cigarette smoke induces extensive histologic changes in the bronchial epithelia of smokers, leading to widespread premalignant lesions throughout the respiratory epithelium, suggestive of a field effect (89). The advent of molecular high-throughput assays allowed the extension of Slaughter's concept to molecular changes, a phenomenon termed airway field of injury (15). It is now thought that the injured (e.g., by tobacco) airway epithelium exhibits various molecular alterations (e.g., aberrant gene expression, LOH) that precede and thus inform of the onset of primary lung cancer (15, 90–95). Because airways of smokers invariably exhibit some degree of inflammation and inflammatory-related damage, the field of injury in the lung may also be explained by both direct effect of tobacco carcinogens and initiation of the inflammatory response, and this concept has been extensively reviewed elsewhere (15).
Various studies have suggested that adjacent normal-appearing airway epithelia exhibit molecular alterations that are characteristic of the nearby lung tumor, an effect referred to by our group and others as the molecular and adjacent airway field of cancerization (14, 96, 97). As mentioned before, detailed analysis of histologically normal, premalignant, and malignant epithelia from LUSC patients revealed that multiple, sequentially occurring allelic imbalance events such as LOH commence in clonally independent foci early in the multistage pathogenesis of LUSC (22, 61). Notably, 31% of histologically normal epithelium adjacent to lung tumors exhibited LOH at chromosomal regions 3p and 9p (22). LOH was also described in normal bronchial epithelia of smokers without cancer and, importantly, appeared to have persisted for many years after smoking cessation (92, 95). Activating mutations in the KRAS and EGFR oncogene have also been described in histologically normal tissue adjacent to lung tumors (44, 98) and widespread mutations in TP53 were reported in bronchial epithelia of cancer-free smokers (99). Epigenetic alterations are also present in the airway field of injury (63, 100–102). Belinsky and colleagues demonstrated aberrant methylation of CDKN2A and DAPK in the normal-appearing bronchial epithelium of lung cancer cases and cancer-free controls (100). The same study also revealed high level of concordance in CDKN2A methylation status between lung tumors and adjacent normal bronchial epithelium (100). Other genes that were described as methylated in the airway field of injury in cancer-free smokers include RAR-β2, RASSFF1A, and GSTP1 (101, 102).
Studies that used high-throughput approaches, such as gene expression profiling, to interrogate the airway field of injury that is widespread throughout the lung have provided new insights into early events in lung cancer pathogenesis (103). Hackett and colleagues analyzed expression of 44 anti-oxidant–related and demonstrated significant upregulation of 16 antioxidant genes in the airways of smokers relative to non-smokers (91). Normal-appearing bronchial epithelia of healthy cancer-free smokers were shown to comprise global changes in gene expression when compared with epithelium in non-smokers (93). Importantly, irreversible changes in airway gene expression following several years of smoking cessation were described and were proposed to be associated with the persistent risk former smokers exhibit for developing lung cancer (90, 93). An important milestone in field of injury studies was the development of a gene-expression signature in cytologically normal mainstem bronchus epithelium that can distinguish smokers with lung cancer from those with benign disease (94). Later, Gustafson and colleagues derived a PIK3CA pathway activation signature, which was found to be elevated in the cytologically normal bronchial airway of smokers with lung cancer or with premalignant lesions relative to cancer-free controls. Importantly, among smokers with premalignant lesions, this PIK3CA signature decreased in the field of injury following chemoprevention with the PIK3CA inhibitor myoinositol, and changes in this airway gene-expression signature were associated with clinical response as measured by regression of the dysplastic lesion (104). More recently, Shaykhiev and colleagues demonstrated that airway basal cells in healthy smokers undergo reprogramming toward an embryonic stem cell–like phenotype and postulated that this process represents an early event in the development of lung carcinomas in smokers (105). It is worthwhile to mention that expression profiling studies also demonstrated widespread extent of spread of the molecular field of injury, revealing common alterations in bronchial, nasal, and buccal epithelia of smokers (106). Also, 119 genes were found to be modulated similarly by smoking in both bronchial and nasal epithelium (107), although the bronchial and nasal compartments also exhibited notable differences in gene expression.
Field carcinogenesis has also been shown to comprise aberrant modulation of microRNAs (miRNA; ref. 108). Global alterations in miRNAs were found in the airways of smokers relative to non-smokers (109). More recently, a study using microRNA-sequencing (miRNA-seq) strategy by Perdomo and colleagues identified a novel miRNA, miR-4423, that exhibited lineage-specific properties because it was found to be exclusively restricted in expression to the airway epithelium (110). This study also revealed that miR-4423 was decreased in lung tumors and inhibited features of the lung malignant phenotype in vitro and in vivo (110), suggesting that miRNAs also lie at the intersection between field carcinogenesis and early events in lung cancer pathogenesis.
The spatiotemporal dynamics of the airway field of injury and of the adjacent (to lung tumor) molecular field of cancerization were analyzed in recent studies by assessment of multiple field samples per patient. A study analyzing airway gene expression at several time points (up to 3 years) following surgery in definitively treated early-stage smoker NSCLC patients, who were enrolled in a phase II surveillance trial, provided novel insights into spatiotemporal characteristics of the airway field of injury (96, 111). Although this study was limited by lack of airway samples prior to surgery and when the tumor was still in situ, it identified candidate pathways and markers that were differentially modulated with time in the field (96, 111). These included increased bronchial expression of phosphorylated forms of the AKT1 and ERK1/2 oncogenes (111). These findings suggest that the airway field of injury exhibits temporal molecular changes that may be associated with lung cancer recurrence in surgically treated early stage patients (96). In another study by the same group, paired NSCLCs, uninvolved normal lung tissues and multiple normal-appearing airways in the adjacent field of cancerization were analyzed by whole-transcriptome profiling (97). Importantly, the study demonstrated that gene expression patterns in the adjacent airway field of cancerization were enriched with a signature that can distinguish smokers with lung cancer from smokers with benign disease (97). This study also identified notable spatial properties in the airway field of cancerization in which gene profiles of adjacent but not distant airway cells closely embodied expression patterns of the nearby NSCLC (97). One of these genes, LAPTM4B, a lysosome-associated oncogene, was later shown to be a novel activator of the NRF2 stress response pathway in lung cancer cells (112). Like the report by Gustafson and colleagues (104), these studies (97, 111, 112) suggest that airway field of injury and the relatively more adjacent or local field of cancerization comprise activation of therapeutically pliable signaling pathways, thus creating an opportunity for the development of early targeted chemoprevention strategies.
The above studies in field carcinogenesis hold the potential to directly impact clinical care by providing novel strategies for early detection and diagnosis (7). Although bronchoscopy is routinely used as an initial diagnostic tool in smokers with a suspicious lesion found on chest imaging, it is frequently non-diagnostic, often resulting in additional invasive testing, including surgical lung biopsy even when the lesion is benign (113, 114). In order to address this clinical unmet need, Whiteny and colleagues extended the initial observations of a cancer-specific gene-expression signature in the mainstem bronchus (94) by developing a 17-gene classifier, derived from bronchial epithelial cells collected during bronchoscopy, able to discern current or former smokers with lung cancer from those without the malignancy (115). In a recent translational study by Silvestri and colleagues, the performance of the 17-gene classifier as a bronchial biomarker for lung cancer detection was prospectively validated in 639 suspect smokers undergoing bronchoscopy in two multicenter trials, Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) 1 and 2 trials (116). It is worthwhile to note that the classifier and bronchoscopy exhibited a combined sensitivity of 97% for diagnosing lung cancer in both cohorts, with high sensitivity for the classifier irrespective of size, location, cell type, or stage of the cancer (116). Notably, among smokers with a non-diagnostic bronchoscopy and an intermediate pretest probability of malignancy (where the physician is most uncertain about cancer status), the negative predictive value of the classifier was 91%, potentially impacting clinical decision making by allowing these patients to be followed with CT surveillance as opposed to invasive biopsies in patients without lung cancer (116).
Emerging Concepts and Approaches to Study Lung Cancer Pathogenesis
The application of standard genomic technologies (e.g., arrays and sequencing) to survey genome-wide alterations has largely enhanced our understanding of the biology of malignant lung tumors. In sharp contrast, our understanding of the molecular pathology of the airway fields of injury and cancerization and of lung premalignant lesions is extremely limited. Understanding the landscape of genomic changes in normal or premalignant phases in lung oncogenesis may provide valuable insights into the earliest events that drive the pathogenesis of lung cancer.
Deep next-generation sequencing
Next-generation sequencing (NGS) technology, through whole-genome, whole-exome, and whole-transcriptome approaches, holds great promise for providing invaluable insights into lung cancer biology, diagnosis, prevention, and therapy (117). NGS enables the sequencing of expressed genes, exons, and complete genomes, providing data on levels of expression, with a substantially larger dynamic range compared with array technology, sequence alterations, single-nucleotide variations as well as structural genomic aberrations (117). While various reports have used NGS to interrogate the molecular pathology of malignant lung tumors (32, 36–38, 118, 119), similar studies in premalignant lung tissues have been very limited.
Beane and colleagues interrogated the transcriptome of airway epithelia from a small number of smokers with and without lung cancer by RNA-seq (120). This study identified markers and operative signaling cues, in the airway field of injury, that appear pertinent to lung oncogenesis (120). More recently, using deep-sequencing technology, Perdomo and colleagues identified a novel lineage-specific miRNA, miR-4423, that is primarily expressed in normal airway epithelium (110). Following RNA-seq, the study then demonstrated that miR-4423 regulated airway epithelial cell differentiation in vitro and suppressed features of the lung malignant phenotype in vitro and in vivo (110). Recently and as described above, RNA-seq was used to interrogate genes that are differentially expressed between paired normal epithelia, preinvasion squamous lesions, and LUSCs (76). This study identified genes modulated “early” between normal epithelia and preneoplasia as well as markers modulated later in the pathogenesis of the lesions to LUSCs, although findings were limited by the small cohort size (76). Adequately powered NGS studies of the airway field and lung preneoplasia will undoubtedly expand our understanding of the biology of lung cancer pathogenesis.
Intra-lesional heterogeneity
Concepts in evolutionarily biology have provided important insights into the evolution and progression of cancer (121, 122). Earlier work by Nowell suggested that tumor cells acquire mutations some of which provide survival advantages to selective clones of cells (123). Accumulating evidence points to a branched evolution pattern for tumor pathogenesis in which subclones expand independently, acquiring different mutations over time (121, 122). In addition, these multiple subclonal populations can coexist together (121). Evidence of heterogeneity among tumor cells was first observed by molecular pathologists through cytogenetic analysis and examination of karyotypes (121, 124, 125). Heterogeneity was also previously genetically discerned by the identification of differential copy-number signals among different cells in the same tumors (reviewed in ref. 126). With the advent of NGS technologies, intratumoral heterogeneity (ITH) can now be assessed genome wide at the single-nucleotide level (121). Genomic ITH has now been described in various malignancies, including lung cancer (121, 127, 128). Zhang and colleagues used whole-exome sequencing to survey the spectrum of ITH in 11 early-stage LUADs (128). Notably, this study alluded to an association between the extent of ITH and post-surgical relapse, although findings were limited by the relatively small cohort and warrant validation in additional studies (128). It is tempting to speculate that paired analysis of molecular spectrum of ITH in the matched normal-appearing airway epithelia, premalignant lesions and primary NSCLCs will shed light on shared genealogy among the different stages of lung oncogenesis. It is conceivable to surmise that smoking-exposed airway epithelia in the fields of injury/cancerization or premalignant lesions may exhibit cellular subclones with different mutational patterns, and thus variable capacity to progress to NSCLC. Resolving these features will allow us to better understand the spatiotemporal dynamics of lung cancer evolution.
Identification of subtle alterations in lung premalignancy
Airway epithelia with field carcinogenesis or premalignant lesions present unique challenges when profiling their molecular changes via standard genomic technologies. These technologies were originally designed for scenarios of higher purity of the targeted tissue, or at least more balanced contributions of components of interest, i.e., the cells of the premalignant lesion harboring somatic mutation (129, 130). When the proportion of aberrant cells in the targeted tissue is low, analogous to low tumor purities (“cellularity,” or proportion of DNA harboring an aberration), the task of identifying alterations becomes exceedingly difficult. For the normal-appearing airway fields of injury/cancerization and preinvasive lesions, genomic alterations will certainly be subtle, and in some cases rare (within samples and across individuals).
There are two main sources of information when assessing evidence for CNAs: (i) total copy-number gains and losses from allelic intensities (from SNP microarrays) or read coverage (from NGS); and (ii) imbalance in allele-specific signals (B-allele frequencies; BAF) at germline heterozygous loci (131). Generally, standard computational methods may discern BAF dispersions when the aberration proportions are above 10% to 15% (132–135). However, methods that model the dispersion in BAFs induced by a CNA ignore critical information in subtle shifts of the BAF values (136). Because it is likely that a single chromosomal event, such as a CNA (i.e., a loss, gain, or cn-LOH), induced the allelic imbalance in a particular contiguous region on a chromosome, the direction the BAFs move from the canonical value for a heterozygote (one-half) is not random (136). They, instead, move according to the allelic composition of an inherited chromosome, also known as a haplotype (136). Thus, methods that model the distribution of BAF values jointly can gain considerable power, particularly at low cell fractions, over standard algorithms that ignore this information and instead consider only the magnitude of their shifts but not their direction (136). Precancerous lesions present greater challenges than characterizing cancer genomes, and incorporation of haplotype information in this space is even more critical.
Recently, computational methods have been developed that incorporate this information and deal directly with uncertainty in haplotype estimation, hapLOH (136), and J-LOH (137). These highly sensitive haplotype-based computational methods can deal with low-cellularity samples, or situations where the aberrant cell fraction is below 5% by assessing consistency between BAF values and inherited chromosomes, estimated statistically (138, 139), capturing the direction of the BAF values (136, 137, 140, 141). This extra level of specificity is critical when attempting to characterize phenomena that are rare (infrequent along the genome or in few samples) and/or subtle (occurring in a low proportion of the cells from which the DNA was obtained), settings anticipated in normal-appearing the airway field of injury or in preneoplasia. When applied to study chromosomal alterations in the airway field of cancerization, these haplotype-based methods may be able to discern subtle events of allelic imbalance in the adjacent normal-appearing airway that are also present in the nearby NSCLC as well as within different multiregion intratumoral biopsies within the same early-stage NSCLC tumor. Applying these sensitive methods to infer the order of somatic events should lend valuable insights into studies of field carcinogenesis and preneoplasia and offer windows into the earliest genomic changes associated with lung cancer pathogenesis and, thus, targets for chemoprevention.
Concluding Remarks
Despite numerous efforts that have centered on characterizing molecular profiles in lung cancer, this malignancy still accounts for the biggest fraction of cancer-related deaths in the United States and worldwide (1, 2, 7). Despite encouraging findings from the National Lung Screening Trial (142, 143), early detection of lung cancer is challenging due to the lack of biomarkers for the early diagnosis of the disease and due to the presence of multiple neoplastic molecular pathways that mediate lung carcinogenesis (5, 18, 103, 114). Moreover, little progress has been made in personalized and targeted chemoprevention strategies for this fatal malignancy (6, 17). Preneoplastic changes have been used as surrogate endpoints for chemopreventive studies. However, it was suggested that this “shooting in the dark” approach may explain the reasons behind the general failures of clinical chemoprevention studies (16). Therefore, novel approaches to identify the best population to be targeted for early detection and chemoprevention should be devised, and risk factors for lung cancer development or relapse need to be better defined. Achieving this rests heavily on understanding early events (e.g., field carcinogenesis) in the molecular pathogenesis of lung cancer. This notion is exemplified by the recent and encouraging findings of the prospective AEGIS-1 and -2 clinical trials demonstrating sensitive and specific detection of lung cancer based on a molecular gene classifier derived from relatively accessible normal airway cells of suspect smokers (116). Comprehensive analysis of early molecular events in lung cancer pathogenesis will undoubtedly unravel biomarkers that can aid early detection and prevention efforts deliver their longstanding promise to oppose lung cancer.
Disclosure of Potential Conflicts of Interest
A.E. Spira is a consultant/advisory board member for Veracyte Inc. No potential conflicts of interest were disclosed by the other authors.
Grant Support
This study was supported in part by Departments of Defense (DoD) grant W81XWH-10-1-1007 (H. Kadara, I.I. Wistuba, and A. Spira) and W81XWH-11-2-0161 (A. Spira), NIH grant R01HG005859 (P. Scheet), and Cancer Prevention and Research Institute of Texas (CPRIT) award RP150079 (P. Scheet and H. Kadara).