Understanding the earliest molecular and cellular events associated with cancer initiation remains a key bottleneck to transforming our approach to cancer prevention and detection. While TCGA has provided unprecedented insights into the genomic events associated with advanced stage cancer, there have been few studies comprehensively profiling premalignant and early-stage disease or elucidating the role of the microenvironment in premalignancy and tumor initiation. In this article, we make a call for development of a “Pre-Cancer Genome Atlas (PCGA),” a concerted initiative to characterize the molecular alterations in premalignant lesions and the corresponding changes in the microenvironment associated with progression to invasive carcinoma. This initiative will require a multicenter coordinated effort to comprehensively profile (cellular and molecular) premalignant lesions and their corresponding “field of injury” collected longitudinally as the lesion progresses towards or regresses from frank malignancy across multiple tumor types. Genomic characterization of alterations in premalignant lesions and their microenvironment, for both bulk tissue and single cells, will enable development of biomarkers for early detection and risk stratification as well as allow for the development of novel targeted cancer interception strategies. The multi-institutional and multidisciplinary collaborative “big-data” effort underlying the PCGA will help usher in a new era of precision medicine for cancer detection and prevention. Cancer Prev Res; 9(2); 119–24. ©2016 AACR.

One of the critical barriers to developing new approaches for cancer detection and prevention is the lack of understanding of the key molecular and cellular changes that cause cancer initiation and progression. Unlike the extensive work that has been done profiling advanced stage tumors, few studies have comprehensively profiled the genomic alterations found in precancerous tissues. Premalignant lesions are currently characterized by histologic changes that precede the development of invasive carcinoma (1, 2). These lesions can often be identified in regions surrounding an invasive tumor, in biopsies taken from patients undergoing diagnostic evaluation for suspicion of cancer, or in samples acquired during preventive screening. Currently, limited metrics exist to identify lesions that will likely progress to carcinoma and require intervention from those that will naturally regress or remain stable (3, 4). As imaging modalities and screening guidelines advance, the number of lesions identified will grow resulting in a need for more precise risk stratification methods and effective early intervention. Characterization of the molecular alterations in premalignant lesions and the corresponding changes in the microenvironment associated with progression would hasten the development of biomarkers for early detection and risk stratification as well as suggest preventive interventions to reverse or delay the development of cancer.

In this article, we make a call for the development of a new collaborative initiative, the “Pre-Cancer Genome Atlas (PCGA),” in which comprehensive genomic profiling of premalignant lesions and their corresponding field of injury is performed longitudinally and combined with clinical data including histology and outcome (progression/regression). Just as The Cancer Genome Atlas (TCGA) has ushered in a new era of precision treatment for advanced stage cancers, we envision the PCGA leading to a new era of personalized approaches for early cancer detection and prevention.

As the progression of cancer was initially described pathologically, the molecular processes that guide cancer initiation and development are being continually unraveled. The governing principle guiding cancer development is the same process active in most domains of biology: evolution. Genetic alterations in individual cells occur randomly due to replication errors or as a result of exposure to carcinogens. Mutations, copy number changes, and potentially epigenetic alterations can alter the ability of the cell to proliferate and survive in different environments. A mutation, for example, can confer a selective advantage, allowing the cell and its progeny to proliferate and gradually out-compete other cells lacking the same alteration. Genetic alterations in different molecular pathways can alter various cellular phenotypes or “hallmarks.” The acquisition of a hallmark may occur by altering any one of several genes within the same underlying molecular pathway. Many cancer hallmarks have been characterized and include sustained cellular growth and proliferation, resisting cell death, replicative immortality by increasing telomere length, avoiding immune surveillance, as well as others (5). The mechanisms driving invasion can be thought of as a multistep evolution where the cumulative acquisition of driving genetic alterations allows the cells to exhibit multiple hallmarks and invade the proximal tissue.

To characterize the molecular alterations associated with cancer, TCGA consortium performed “multi-omic” profiling on over 11,000 advanced stage tumors from over 33 tumor types including DNA sequencing for mutation detection, SNP microarrays for copy number analysis, RNA sequencing for fusion detection and gene expression analysis, methylation data for epigenomic alterations, reverse protein phase arrays for protein quantification, and small RNA sequencing for miRNA expression analysis over the course of the last 10 years (6). These efforts have had a major impact in two different areas. First, by identifying genes recurrently altered within and across tumors types, the number of putative cancer driver genes has extended from several dozens to several hundred. For example, Lawrence and colleagues examined mutations and indels for 21 tumor types and identified over 250 genes as significantly mutated more than expected by chance (7). Likewise, Zack and colleagues examined focal copy number alterations across 11 tumors types and observed 140 regions recurrently gained or lost, many with novel putative cancer genes (8). Second, TCGA efforts have led to molecular reclassification via multi-omic clustering. Most tumor types had been previously stratified into subgroups using histologic characteristics alone. However, in several tumor types, clustering across global gene expression, copy number, and methylation patterns revealed molecular subgroups largely distinct from their histologic counterparts. In lower-grade gliomas (LGG), for example, histology classification suffers from observer variability and does not sufficiently predict clinical outcomes. Clustering of multi-omic data in LGG revealed 3 robust molecular subclasses defined by combinations of IDH1/IDH2 mutation status, 1p/19q codeletion, and TP53 mutation status (9). These molecular subtypes were strongly associated with prognosis and other distinct clinical characteristics beyond standard histology suggesting they should be incorporated into clinical practice.

Despite the significant advances in the genomic characterization of advanced stage disease, a number of critical questions still remain. Given the evolutionary model that a specific sequence of genomic events acquired over many years cause the transition from normal epithelium to invasive carcinoma (10), having a complete catalog of driver genes for each tumor type is only the first step in understanding cancer progression. Indeed, many driver genes are often altered within an advanced stage tumor; however, the order with which the events occurred can be difficult to ascertain. In some circumstances, the predicted clonality of the mutations can be used to infer early events. For example, clonal mutations harbored by all cancer cells occur earlier in the route to frank malignancy compared with subclonal mutations present only in subset of cells (11, 12). This procedure can struggle to reveal the path of progression in some tumor types as mutations in many driver genes are predicted to be clonal. Comprehensive genomic profiling of longitudinally sampled premalignant lesions as they progress toward cancer (as detailed below) will provide critical insights into the sequence of molecular events that drive progression to invasive cancer (Fig. 1). This molecular reclassification of premalignancy will greatly improve our ability to predict which lesions are at higher risk of progressing to invasive carcinoma and allow for the development of novel targeted early interventional and therapeutic strategies.

Figure 1.

Normal cells (orange), often after long-term exposure to carcinogens, can obtain changes in key molecular pathways and cause uncontrolled proliferation (light purple) leading to a premalignant or precancerous lesion. After accumulating additional alterations, cells are able to invade surrounding tissues forming a tumor (dark purple); however, in some cases, precancerous lesions naturally regress as marked by the bidirectional arrows. Changes in the microenvironment may also contribute to a lesion's progression or regression. For example, functional immune cells (blue) with the capability of recognizing and destroying abnormal cells may eventually be repressed (black) by activation of immune checkpoints. We propose the “Pre-Cancer Genome Atlas (PCGA)” initiative to comprehensively profile molecular alterations in premalignant lesions and the surrounding microenvironment throughout the stages of progression towards or regression from invasive cancer. In tissues that can be accessed via relatively noninvasive procedures, premalignant lesions can be sampled longitudinally to better understand the relationship between molecular alterations, progression/regression, and other clinical outcomes. In tissues that are difficult to access, these premalignant lesions can be cross-sectionally sampled in regions surrounding resected tumors to gain insights about the evolutionary history of themalignant cells. By identifying and understanding the initial events that drive initiation, progression, regression and invasion, we will advance early detection biomarkers and identify effective intervention strategies to reduce the number of individuals with aggressive advanced stage disease.

Figure 1.

Normal cells (orange), often after long-term exposure to carcinogens, can obtain changes in key molecular pathways and cause uncontrolled proliferation (light purple) leading to a premalignant or precancerous lesion. After accumulating additional alterations, cells are able to invade surrounding tissues forming a tumor (dark purple); however, in some cases, precancerous lesions naturally regress as marked by the bidirectional arrows. Changes in the microenvironment may also contribute to a lesion's progression or regression. For example, functional immune cells (blue) with the capability of recognizing and destroying abnormal cells may eventually be repressed (black) by activation of immune checkpoints. We propose the “Pre-Cancer Genome Atlas (PCGA)” initiative to comprehensively profile molecular alterations in premalignant lesions and the surrounding microenvironment throughout the stages of progression towards or regression from invasive cancer. In tissues that can be accessed via relatively noninvasive procedures, premalignant lesions can be sampled longitudinally to better understand the relationship between molecular alterations, progression/regression, and other clinical outcomes. In tissues that are difficult to access, these premalignant lesions can be cross-sectionally sampled in regions surrounding resected tumors to gain insights about the evolutionary history of themalignant cells. By identifying and understanding the initial events that drive initiation, progression, regression and invasion, we will advance early detection biomarkers and identify effective intervention strategies to reduce the number of individuals with aggressive advanced stage disease.

Close modal

Another critical component in any evolutionary process is the selective pressure imposed by the environment. As the majority of advanced stage tumor profiling has been done on bulk tumor tissue, characterizing the role of immune and stromal cell populations in the process of carcinogenesis has been challenging. In the past, only cell types with specific markers could be systematically identified by either IHC or advanced flow cytometry approaches in combination with gene expression analysis. However, the ability to agnostically characterize all cell populations within a sample is now achievable with the advent of single-cell RNA sequencing. With this technology, the expression state of each individual single cell can be measured and used to determine both the cell type and molecular pathways that may be active within the cell. Recent successes with cancer immunotherapies, such as antibodies blocking PD-1 or PD-L1, that are currently being developed for the treatment of over 30 cancer types (13), underpin the importance of characterizing the contributions of the different cell types in cancer development. Characterization of the immune cell populations in premalignant lesions with progressive versus regressive phenotypes is an opportunity to provide unprecedented insight into the role of the microenvironment in determining cancer initiation and progression, a critical step towards development of immunoprevention strategies.

Genomic characterization of premalignant lesions using bulk tissue or single cells will help elucidate the mechanisms of disease progression. In turn, these findings can be exploited to develop biomarkers to inform cancer screening/early detection strategies and treatment of cancer at the earliest stages. Identification of premalignant disease processes and their likelihood of progression may help prioritize individuals for cancer screening and dictate the appropriate screening intervals. Biomarker driven cancer screening has the potential to maximize early detection while minimizing false positives that incur added costs as well as increase an individuals' radiation exposure and/or procedure-related complications. Cancer prevention clinical trials could also utilize biomarkers of premalignant disease to select subjects with a high likelihood of progression or response. Molecular selection by adding premalignant biomarkers to the trial entrance criteria has been recently reported to greatly reduce the number of subjects needed to test drug efficacy (14). Biomarkers may also be used in addition to histology to monitor treatment response thereby increasing trial speed and reducing trial cost. Improvements such as these could bring more prevention and targeted agents to the clinic, perhaps reducing the number of people that go on to develop cancer.

Studies demonstrating the ability of genomic profiling to provide insights into the biology of premalignancy have been recently reviewed (15). We summarize several recent examples that set the stage for a larger PCGA initiative outlined below. Stachler and colleagues characterized two major paths of esophageal adenocarcinoma development by performing whole exome sequencing on tumor and adjacent Barrett esophagus within the same patient (16). In contrast to previous hypotheses, the majority of esophageal adenocarcinomas developed by first obtaining a TP53 mutation, followed by whole genome doubling and genomic instability, and finally oncogene amplification resulting in frank malignancy. The remainder of tumors displayed progressive inactivation of tumor suppressors such as CDKN2A and SMAD4, followed by oncogene activation, and genome instability. Interestingly, some patients had lesions with different sets of somatic alterations suggesting that they had formed independently (i.e., were clonally unrelated). These results suggest that extensive sampling of suspect areas is necessary to accurately capture the diversity of alterations and that comprehensive methodologies capable of detecting complex events such as whole genome doubling are needed.

Shain and colleagues utilized targeted sequencing of cancer genes in primary melanomas and adjacent precursor lesions to uncover the order of key driver events (17). The well-characterized and targetable mutation BRAF V600E in addition to other mutations in the MAPK pathway was substantially enriched in benign lesions, suggesting these are early events in melanoma carcinogenesis. Mutations in other common driver genes were observed only in intermediate or later stages of disease such as CDKN2A loss, TERT promoter mutations, or the SWI/SNF chomatin modifiers ARID1A, ARID2, or SMARCA4. In the PCGA initiative, it will be important to sample various premalignant histologies as was done by Shain and colleagues to inform our understanding of cancer pathogenesis and risk of cancer development or how close to invasive malignancy a lesion may be.

Somatic mutations have been observed in tissue without clear histologic evidence of cancer. Mutations in genes such as DMNT3A, TET, and ASXL1 were found in the blood of subjects without any appreciable hematologic abnormalities at the time of sample collection (18). However, the presence of somatic mutations was associated with an increased risk for developing a hematologic malignancy as well as an increase in overall mortality as determined by longitudinal follow-up. These results support the notion that many people may already have a “first hit” that can produce a premalignant clonal expansion of cells. This raises the question of when clinical intervention should be applied in the premalignant setting. The number of hits that are needed to warrant clinical intervention may be different for various cancer types and may be dependent on a concurrent understanding of the interactions between precancerous cells and the immune system. We propose that answers to these questions can be most readily answered with a combination of longitudinal sampling, molecular profiling, and thorough clinical characterization ideally throughout the entire process of cancer development.

Advances in genomic profiling pioneered by TCGA and related studies have largely overcome many of the technical profiling challenges. The largest obstacle impeding the understanding of cancer initiation and progression and development of early detection tools is the lack of systematic collection, annotation, and profiling of premalignant lesions. We propose a concerted multi-institutional effort to collect premalignant tissue across multiple tumor types followed by comprehensive genomic profiling to enhance our understanding of early-stage disease and build upon the foundation created by TCGA. Like TCGA, this will likely require more than 20 medical centers coordinating effort to collect and annotate the relevant clinical specimens. A number of recent examples of this type of coordinated effort include an NCI initiative to collect >1,000 surgically resected pancreatic cyst samples from 5 medical centers as well as a United States Department of Defense-funded consortium collecting airway samples via bronschoscopys from >1,000 smokers at risk for lung cancer at 11 military and Veteran hospitals.

Premalignant lesions can be identified and collected in a variety of ways. In tissues that can only be accessed via invasive surgery, premalignant lesions may be identified and sampled by histologic review of fresh or banked tumor specimens and their resection margins. Comparing the overlap of genomic alterations in the premalignant lesions to those found in the invasive tumor can identify early events in the process of carcinogenesis (19). However, cross-sectional studies like these may have limitations due to formalin fixation or the small quantity of tissue available after laser capture microdissection (LCM). These challenges may require researchers to use less comprehensive targeted approaches such as those utilized in the study of AAH lesions adjacent to lung adenocarcinoma and may preclude the use of some genomic technologies such as RNA sequencing (20). Additional limitations include the requirement that the tumor to be evolutionarily related to some of the profiled premalignant lesions to infer early versus late events. Many lesions may in fact arise independently and will be clonally distinct from the tumor.

In contrast, other tissues accessible via relatively noninvasive procedures may be a useful starting point for PCGA studies, including bronchoscopy for the respiratory track, endoscopy for the upper gastrointestinal track, colonoscopy for the large intestine, the Papanicolaou test for the cervix, or visual examination of the skin and oral cavity. In fact, some of the earliest studies in cancer progression started with surveying genomic alterations in polyps in the colon (21). Importantly, the ability to identify suspect regions by visual examination or other fiber optic tools enables fresh samples, albeit often small in size, to be collected and stored in conjunction with formalin fixed samples for histologic review. Fresh frozen tissue is more amenable to genomic profiling and is critical for technologies such as single-cell RNA-seq that currently require tissue disassociation and cell sorting soon after sample procurement. Similarly, single-cell DNA-seq can be achieved by performing single nuclei sorting from fresh frozen tissue. This approach can have even higher resolution to unravel evolutionary relationships among subclonal populations (22, 23).

Sample collection from accessible tissues also lends itself to repeated sampling of the same site over time. As screening studies become more common and are increasingly implemented as standard of care, the opportunities for sample collection of premalignant tissue will become more prevalent. There is also the potential to leverage recent advances in “liquid biopsy” technology to longitudinally follow genomic alterations in circulating blood that may reflect alterations found within premalignant lesions (19). Despite the technical feasibility of collecting premalignant tissue detailed above, significant barriers remain in having relatively healthy patients contribute research samples that increase risk and procedural time, as well as the challenge to physicians in collecting, annotating, and banking these additional research specimens. This type of tissue sampling will take careful thought and organization by the participating organizations and will likely proceed at a slower pace than with tumor samples. However, the return on this investment will be significant; longitudinal profiling of premalignant lesions will allow us to better elucidate both the order of somatic alterations as well as the corresponding changes specific to the premalignant microenvironment that enable the transformation and ultimate invasion leading to frank carcinoma.

Recent advances in cancer screening and next-generation sequencing technology have set the stage for an unprecedented opportunity to characterize the genomic alterations associated with premalignant disease progression. While TCGA has provided us with a comprehensive catalog of driver genes for each tumor type, the sequence of these genomic events that characterize the progression of premalignant lesions to invasive cancer remains to be unraveled. In addition, we know little about how changes in the immune cells and premalignant microenvironment contribute to disease initiation and progression. Comprehensive profiling of genomic and microenvironment changes that occur longitudinally in premalignant lesions as they progress towards (or regress away from) invasive cancer, a “Pre-Cancer Genome Atlas (PCGA),” will provide novel targets for disease interception that can be used to both develop early detection biomarkers as well as enable personalized therapeutic approaches. Creation of this PCGA will require a multi-institutional and multidisciplinary collaborative big-data “pre-cancer moonshot” effort (consistent and aligned with the recent Obama/Biden initiative) to collect, annotate, and profile premalignant lesions across multiple tumor types. This initiative will also require development of novel high-throughput functional screens in the premalignant in vitro setting as well as in vivo models of premalignancy to test the functional role of candidate genes and immune cell types. Ultimately, the PCGA will help usher in a new era of precision medicine for cancer detection and prevention.

All authors except S. Platero have received a commercial research grant from Janssen Pharmaceuticals. S. Platero is an employee of Janssen Pharmaceuticals. A. Spira is a consultant to Veracyte Inc. No potential conflicts of interest were disclosed by the other authors.

This work was supported by Janssen Pharmaceuticals.

1.
Wacholder
S
. 
Precursors in cancer epidemiology: aligning definition and function
.
Cancer Epidemiol Biomarkers Prev
2013
;
22
:
521
7
.
2.
Berman
JJ
. 
Precancer: the beginning and the end of cancer
.
Sudbury, MA
:
Jones and Bartlett Publishers
; 
2010
.
3.
Nasiell
K
,
Nasiell
M
,
Vaclavinkova
V
. 
Behavior of moderate cervical dysplasia during long-term follow-up
.
Obstet Gynecol
1983
;
61
:
609
14
.
4.
Merrick
DT
,
Gao
D
,
Miller
YE
,
Keith
RL
,
Baron
AE
,
Feser
W
, et al
Persistence of bronchial dysplasia is associated with development of invasive squamous cell carcinoma
.
Cancer Prev Res
2016
;
9
:
96
104
.
5.
Hanahan
D
,
Weinberg
RA
. 
Hallmarks of cancer: the next generation
.
Cell
2011
;
144
:
646
74
.
6.
The future of cancer genomics
.
Nat Med
2015
;
21
:
99
.
7.
Lawrence
MS
,
Stojanov
P
,
Mermel
CH
,
Robinson
JT
,
Garraway
LA
,
Golub
TR
, et al
Discovery and saturation analysis of cancer genes across 21 tumour types
.
Nature
2014
;
505
:
495
501
.
8.
Zack
TI
,
Schumacher
SE
,
Carter
SL
,
Cherniack
AD
,
Saksena
G
,
Tabak
B
, et al
Pan-cancer patterns of somatic copy number alteration
.
Nat Genet
2013
;
45
:
1134
40
.
9.
Brat
DJ
,
Verhaak
RG
,
Aldape
KD
,
Yung
WK
,
Salama
SR
,
Cooper
LA
, et al
Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas
.
N Engl J Med
2015
;
372
:
2481
98
.
10.
Vogelstein
B
,
Kinzler
KW
. 
The path to cancer –three strikes and you're out
.
N Engl J Med
2015
;
373
:
1895
8
.
11.
Carter
SL
,
Cibulskis
K
,
Helman
E
,
McKenna
A
,
Shen
H
,
Zack
T
, et al
Absolute quantification of somatic DNA alterations in human cancer
.
Nat Biotechnol
2012
;
30
:
413
21
.
12.
Landau
DA
,
Carter
SL
,
Stojanov
P
,
McKenna
A
,
Stevenson
K
,
Lawrence
MS
, et al
Evolution and impact of subclonal mutations in chronic lymphocytic leukemia
.
Cell
2013
;
152
:
714
26
.
13.
Ribas
A
. 
Releasing the brakes on cancer immunotherapy
.
N Engl J Med
2015
;
373
:
1490
2
.
14.
William
WN
 Jr
,
Papadimitrakopoulou
V
,
Lee
JJ
,
Mao
L
,
Cohen
EE
,
Lin
HY
, et al
Erlotinib and the Risk of oral cancer: the erlotinib prevention of oral cancer (EPOC) randomized clinical trial
.
JAMA Oncol
2015 Nov 5
.
[Epub ahead of print]
.
15.
Kensler
TW
,
Spira
A
,
Garber
JE
,
Szabo
E
,
Lee
JJ
,
Dong
Z
, et al
Transforming cancer prevention through precision medicine and immune-oncology
.
Cancer Prev Res
2016
;
9
:
2
10
.
16.
Stachler
MD
,
Taylor-Weiner
A
,
Peng
S
,
McKenna
A
,
Agoston
AT
,
Odze
RD
, et al
Paired exome analysis of Barrett's esophagus and adenocarcinoma
.
Nat Genet
2015
;
47
:
1047
55
.
17.
Shain
AH
,
Yeh
I
,
Kovalyshyn
I
,
Sriharan
A
,
Talevich
E
,
Gagnon
A
, et al
The genetic evolution of melanoma from precursor lesions
.
N Engl J Med
2015
;
373
:
1926
36
.
18.
Jaiswal
S
,
Fontanillas
P
,
Flannick
J
,
Manning
A
,
Grauman
PV
,
Mar
BG
, et al
Age-related clonal hematopoiesis associated with adverse outcomes
.
N Engl J Med
2014
;
371
:
2488
98
.
19.
Ooi
AT
,
Gower
AC
,
Zhang
KX
,
Vick
JL
,
Hong
L
,
Nagao
B
, et al
Molecular profiling of premalignant lesions in lung squamous cell carcinomas identifies mechanisms involved in stepwise carcinogenesis
.
Cancer Prev Res
2014
;
7
:
487
95
.
20.
Izumchenko
E
,
Chang
X
,
Brait
M
,
Fertig
E
,
Kagohara
LT
,
Bedi
A
, et al
Targeted sequencing reveals clonal genetic changes in the progression of early lung neoplasms and paired circulating DNA
.
Nat Commun
2015
;
6
:
8258
.
21.
Fearon
ER
,
Vogelstein
B
. 
A genetic model for colorectal tumorigenesis
.
Cell
1990
;
61
:
759
67
.
22.
Navin
N
,
Kendall
J
,
Troge
J
,
Andrews
P
,
Rodgers
L
,
McIndoo
J
, et al
Tumour evolution inferred by single-cell sequencing
.
Nature
2011
;
472
:
90
4
.
23.
Francis
JM
,
Zhang
CZ
,
Maire
CL
,
Jung
J
,
Manzo
VE
,
Adalsteinsson
VA
, et al
EGFR variant heterogeneity in glioblastoma resolved through single-nucleus sequencing
.
Cancer Discov
2014
;
4
:
956
71
.