Abstract
Chromosomal instability is central to the process of carcinogenesis. The genome-wide detection of somatic chromosomal alterations (SCA) in small premalignant lesions remains challenging because sample heterogeneity dilutes the aberrant cell information. To overcome this hurdle, we focused on the B allele frequency data from single-nucleotide polymorphism microarrays (SNP arrays). The difference of allelic fractions between paired tumor and normal samples from the same patient (delta-θ) provides a simple but sensitive detection of SCA in the affected tissue. We applied the delta-θ approach to small, heterogeneous clinical specimens, including endobronchial biopsies and brushings. Regions identified by delta-θ were validated by FISH and quantitative PCR in heterogeneous samples. Distinctive genomic variations were successfully detected across the whole genome in all invasive cancer cases (6 of 6), carcinoma in situ (3 of 3), and high-grade dysplasia (severe or moderate; 3 of 11). Not only well-described SCAs in lung squamous cell carcinoma, but also several novel chromosomal alterations were frequently found across the preinvasive dysplastic cases. Within these novel regions, losses of putative tumor suppressors (RNF20 and SSBP2) and an amplification of RASGRP3 gene with oncogenic activity were observed. Widespread sampling of the airway during bronchoscopy demonstrated that field cancerization reflected by SCAs at multiple sites was detectable. SNP arrays combined with delta-θ analysis can detect SCAs in heterogeneous clinical sample and expand our ability to assess genomic instability in the airway epithelium as a biomarker of lung cancer risk. Cancer Prev Res; 7(2); 255–65. ©2013 AACR.
Introduction
Lung cancer is the leading cause of cancer-related deaths worldwide and in the United States (1, 2). This high mortality is due to the late diagnosis at a symptomatic advanced stage when surgical cure is impossible. To improve the outcome for lung cancer, new approaches for prevention and early detection are required (3–5).
In lung squamous cell carcinogenesis, the stepwise histopathologic changes in the bronchial epithelia that precede cancer development in heavy smokers, starting with normal epithelium, progressing through hyperplasia, squamous metaplasia, dysplasia, and carcinoma in situ (CIS), have been well documented (6). However, the rate and risk of progression of squamous dysplasia to CIS and ultimately to invasive cancer remains controversial and poorly understood (4, 7). Several publications have supported the concept that somatic chromosomal alterations (SCA) are better prognostic biomarkers than premalignant histology alone (8–13). The detection of these SCAs in small biopsies with significant cellular heterogeneity has been limited by dilution with normal cells, as well as the inability of in situ techniques, such as multiprobe FISH, to interrogate the majority of the genome.
The loss of genome integrity is regarded as the most prominent “enabling characteristic” in the development of cancers, by which certain mutant genotypes expand and evolve (14). The consequences of genomic instability are SCAs such as amplifications and deletions in genome copy number (15, 16). Array-based comparative genomic hybridization (aCGH) is one tool employed to identify some of these alterations across the whole genome (17, 18). More recently, single-nucleotide polymorphism microarray (SNP array) technology has been used widely because it enables high-density genotyping, leading to more comprehensive SCA detection (19–23). Unlike aCGH, SNP arrays generate intensity differences as well as allelic ratios, and allow for analysis of not only copy-number change, but also loss of heterozygosity (LOH; including copy-neutral LOH) at high resolution. Furthermore, the power to detect SCAs is greatly increased when paired with normal samples, because the differences between subject and reference directly reflect somatic events (20, 24–26). The within-patient paired analysis removes unrelated germline copy-number changes that occur normally in all individuals.
Meanwhile, the cellular heterogeneity from typical clinical samples remains a basic obstacle to the sensitive and accurate genomic analysis of any type of cancer (20, 27, 28). The use of laser capture microdissection technology may improve the ability to collect more affected homogeneous cell populations from formalin-fixed slides. However, collecting an adequate amount of DNA from these microscopic sections to perform efficient SNP array analysis remains problematic (29, 30). To overcome these issues, a number of analytical methods that enable the detection and evaluation of SCAs even in heterogeneous specimens are under development (25, 31–36).
In this report, we initially describe and validate subtraction of allelic fraction (delta-θ) for SCA detection utilizing the quantitative measurement of allelic imbalances in paired sample analysis, which controls for natural chromosomal variations in copy number. Validation experiments show that SCAs can be detected in a background of approximately 90% normal cell content. Next, we demonstrate that when using delta-θ, even heterogeneous specimens such as bronchial biopsies and brushings can be reliable and informative sources for SNP array analyses. Using this strategy, we have found novel SCAs in preneoplastic lesions. Detection of SCAs in small, heterogeneous clinical samples will enable novel insights into the pathobiology of premalignant lesions.
Materials and Methods
Samples and experimental data sets
Genomic DNA from the non–small cell lung cancer cell line (CRL-5868D: NCI-H1395, adenocarcinoma) and its matched lymphoblastoid cell line (CRL-5957D: NCI-BL1395, B lymphoblast cell) were obtained from the American Type Culture Collection (ATCC). The genomic DNA, purchased from the ATCC, was prepared from cell lines grown by the ATCC and authenticated by using the Promega PowerPlex Systems STR Profiling Kit. A titration series (100.0%, 25.0%, 12.5%, 6.3%, 3.1%, 1.6%, and 0% tumor DNA content) was made by mixing the cancer and the normal cell DNA. DNA concentrations {diluted to 50 ng/μL and 200 ng (4 μL) for each sample} were processed according to the manufacturer's protocol and hybridized to the HumanOmni2.5-Quad BeadChips (Illumina).
Clinical specimens
Autofluorescence and white-light bronchoscopy as well as blood collection were carried out on current or former smokers with a > 20-pack year smoking history after obtaining written informed consent in an Institutional Review Board–approved protocol. Endobronchial biopsy and/or brushing (after biopsy) was performed at the same visually concerning area. Multiple endobronchial sites were biopsied throughout the airway according to the study protocol.
These multiple biopsy specimens, including the biopsy at the concerning area, were formalin fixed, paraffin embedded, stained with hematoxylin and eosin, and histologically graded by the study pathologist (W.A. Franklin) as described (3). They were classified into 8 categories as defined by WHO classification and assigned a score according to the following system: 1, normal; 2, reserve cell hyperplasia; 3, squamous metaplasia; 4, mild dysplasia; 5, moderate dysplasia; 6, severe dysplasia; 7, CIS; and 8, invasive carcinoma. All nonmalignant lesions, including normal histology, hyperplasia, squamous metaplasia, and dysplasia, are referred to as premalignant. We use the term “preinvasive” for lesions up to and including CIS lesions. Because some patients had multiple bronchoscopies, some bronchial lesions were biopsied more than once. We define a biopsy sample as “persistent” when the histology grade score does not change or declines only 1 grade score in the next bronchoscopy (at least more than 12 months interval in between two bronchoscopies). “Regressive” samples are defined when the grade score declines 2 or more in the next bronchoscopy.
For DNA isolation, fresh biopsies from each concerning site were homogenized in TRIzol reagent (Life Technologies) and after chloroform addition, the interface layer was saved. Protein was precipitated from the interface using sodium citrate, followed by ethanol precipitation of the supernatant. The DNA pellet was dissolved in Tris-EDTA (TE) buffer (10 mmol/L Tris–HCl pH 8.0, 0.1 mmol/L EDTA). For brushings, after overnight digestion with proteinase-K, DNA was isolated from protein using sodium chloride (salting out), precipitated by ethanol, and dissolved in TE buffer. For reference blood samples, the column-based extraction protocol was conducted using QuickGene-610 (AutoGen). Extracted DNA was quantified, verified to be of high molecular weight by agarose gel, diluted to 50 ng/μL, and labeled for SNP array analysis (HumanOmni 2.5-Quad BeadChip and Human 660W-Quad BeadChip; Illumina). Obtained call rates for all samples run were 98.8% ± 1.6% (mean ± SD). For one brushing specimen, we tested two DNA isolation methods: One is an extraction from saline in which the brush was vigorously vortexed to detach the epithelial cells. The other is a direct DNA extraction from residual cells adhering to the brush after vortexing. The same amounts of high-quality genomic DNA in these two conditions were processed and hybridized to the arrays.
Data analysis
Fluorescent signals were imported into GenomeStudio software (Illumina), where the genotype data were generated and transformed to normalized intensity (R) and allelic ratio (theta; θ) through the calculations below (20):
where XA and YB denote transformed normalized signal intensities from A and B alleles for a particular SNP locus. In paired sample analysis, these two parameters are conventionally transformed into two outputs: Log2(Rsubject/Rreference) referred to as log2R ratio (LRR), and B allele frequency shown individually as BAFsubject and BAFreference. In LRR, any deviations from zero are evidence for copy-number change, whereas BAF refers to a normalized measure of relative signal intensity ratio of the B and A alleles. Deviations from the expected values (0.0, 0.5, and 1.0 representing AA, AB, and BB alleles, respectively) are indicative of chromosomal alterations. To obtain a transformed BAF profile in which genomic segmentation strategy can be applied (see the next section), noninformative homozygous alleles (AA and BB) in the reference (normal) sample were removed by comparison of genotype calls between the subject and the reference. Then, the BAF profile was reflected into transformed BAF along the 0.5 axes, named “modified BAF” in our study. This approach is derived from the mirrored BAF method (25).
Subtraction of allelic fractions (delta-θ) is generated through the following calculation between θsubject and θreference:
Homozygous alleles in the reference sample were also removed. If there is no somatic alteration at a locus, the delta-θ value is near zero. However, once any somatic change occurs in subject, delta-θ shows any positive value (up to 0.5). In a rare case of balanced biallelic amplification (e.g., copy number is 4 with AABB alleles), delta-θ as well as BAF show normally distributed plots because there is no allelic imbalance.
Delta-θ and modified BAF are both based on θ values, but are composed of different concepts. Delta-θ is a direct, intrapatient comparison of θsubject and θreference, which represents only somatic alterations. In modified BAF, the reference sample is used only for selecting heterozygous alleles to exclude uninformative homozygous alleles. This implies that the aberrant regions detected by modified BAF could include not only somatic changes but also germline ones. An example is shown in Supplementary Fig. S1. Subtraction of allelic fractions (delta-θ) is not a unique use of BAF or θ measures, but works efficiently to detect somatic changes in heterogeneous samples.
To efficiently identify SCA, we first started with the review of the delta-θ plot (SCAs are detected in this first step), then we referred to the LRR and BAF plots to identify the nature of the detected SCA. The SNP array data have been deposited at the Gene Expression Omnibus (GSE43168).
Data visualization and genomic segmentation methods
Visualization of the data was performed using Partek Genomic Suite 6.6 software (Partek Inc.). For visualizing delta-θ and modified BAF, each gray dot represents gross delta-θ or modified BAF values from heterozygous alleles in the reference, whereas red (or black in black and white figures) dots represent the smoothed values for every 30 gray dots. For visualizing LRR, each gray dot comes from all individual SNP markers, and blue (or black) dots indicate the smoothed values for every 30 gray dots. In BAFsubject visualization, gray dots reflect all θsubject values.
To statistically delineate each SCA region and its breakpoints, genomic segmentation was conducted. Genomic segmentation results are labeled “GS” in the figures. Partek has implemented the circular binary segmentation (CBS) algorithm (37) in their product. Detailed information is available at the Partek website (38). We applied this algorithm for LRR, delta-θ, and modified BAF profiles setting the following three parameters: minimum genomic markers, P-value threshold, and signal to noise ratio. Practically, multiple parameter corrections and iterative optimizing processes were performed for each case to achieve or approach the most convincing segmentation data. However, conducting genomic segmentation especially for LRR is occasionally challenging in heterogeneous samples because the noise hinders the effective parameter setting, and the samples have variable noise in their SNP arrays. Thus, to achieve a balance of sensitivity and specificity, data from these three profiles were calculated using the same parameter settings for a given sample. In the comparative studies of titration cell line series, minimum markers, P value, and signal to noise ratio were set to 100, 0.001, and 0.3, respectively. In clinical samples, those parameters were decided individually after iterative optimizations (shown in Supplementary Table S1). Genomic position is based on human genome assembly (hg19).
FISH
To validate selected SNP array results, FISH was performed using sections of endobronchial biopsy specimens obtained at the same location before the arrayed biopsy/brushing sample. Unstained slides with paraffin-embedded sections were hybridized with a number of FISH probes, including sequences of PIK3CA, TP63, D5S721-D5S23 (encompassing SEMA5A), CDKN2A, and NKX2–1, according to the previously published protocols (12). They are located at 3q26.32, 3q28, 5p15.2, 9p21, and 14q13.3, respectively. Centromere probes (CEP3 at 3p11.1–3q11.1 and CEP9 at 9p11–9q11) were used as references.
Copy-number qPCR assay
Quantitative PCR (qPCR) assays were used to validate the copy-number status of the genes located within aberrant chromosomal regions. The corresponding blood sample DNA was used as a calibrator control. The gene copy numbers of RARB (located at 3p24.2), SEMA3B (3q21.3), DNAH5 (5p15.3), GDNF (5p13.2), RNF20 (9q31.1), RASGRP3 (2p22.3), and SSBP2 (5q14.1) were determined by a duplex Taqman copy-number assay (Life Technologies) with RNase P (14q11.2) as the reference assay. Assays were performed according to the manufacturer's protocol. Copy numbers were called by relative quantification methods in 4 to 6 replicate measurements through CopyCaller software version 2.0.
Results
Delta-θ detects SCA regions
The Illumina genotyping assay generates two independent values at each SNP locus: R and θ (22). R is a representation of normalized signal intensity, whereas θ indicates the allelic ratio at a given locus. These two parameters are conventionally visualized as LRR and BAF plots, respectively. Our study focused on the detection of somatic alterations in the comparison of subject (e.g., bronchial biopsy or brushing) and reference (e.g., blood) using a paired sample set from the same individual. Whereas LRR is represented as Log2(Rsubject/Rreference) comparing Rsubject to Rreference, BAF is plotted separately as BAFsubject or BAFreference (see Materials and Methods for additional details). To apply the pairwise concept to a θ-based plot, we used delta-θ, the difference of θreference and θsubject (subtraction of allelic fractions). Delta-θ can be analyzed through genomic segmentation and is a sensitive parameter to detect somatic chromosomal rearrangements, including copy-neutral events. The basic concept of delta-θ is illustrated using a cancer cell line genomic DNA (NCI-H1395; H1395) as subject and the lymphoblastoid cell line DNA (NCI-BL1395) as reference, both of which originated from the same individual (Fig. 1A). The genome-wide SCA profile of the NCI-H1395 cell line is also found at the Wellcome Trust Sanger Institute website. In Fig. 1B, the delta-θ plot for this pair is visualized together with conventional LRR and BAFsubject plots. Each plot represents a variety of different SCAs across chromosome 6. Delta-θ is an easily interpretable one-band plot in which differences between normal and abnormal regions are clearly discernible, including copy-neutral LOH regions, which are undetected using LRR.
The visualization of delta-θ and overlap rates of detected segments in the genomic DNA titration series. A, the process of delta-θ generation is illustrated in the polar coordinate plot: SNP markers in the NCI-H1395 cancer cell line (subject) result from the somatic change of the same markers compared with the NCI-BL1395 lymphoblastoid cell line (reference). 100 SNP markers are randomly selected from a hemizygous deletion region at chromosome 6q. B, schematic diagram of chromosome 6 showing the range of SCA determined in NCI-H1395 as annotated at the Wellcome Trust Sanger Institute: delta-θ (top), LRR (middle), and BAFsubject (bottom) represent a variety of different SCAs, including various degrees of amplifications and regions with LOH, as well as normal regions. C, the overall segments detected by each plot across the entire autosomal chromosomes in titrated tumor contents (25.0%, 12.5%, and 6.25%) are shown as concordance rates to the total detected segments in 100% tumor content. Modified BAF was used to apply the genomic segmentation strategy to the BAFsubject plot.
The visualization of delta-θ and overlap rates of detected segments in the genomic DNA titration series. A, the process of delta-θ generation is illustrated in the polar coordinate plot: SNP markers in the NCI-H1395 cancer cell line (subject) result from the somatic change of the same markers compared with the NCI-BL1395 lymphoblastoid cell line (reference). 100 SNP markers are randomly selected from a hemizygous deletion region at chromosome 6q. B, schematic diagram of chromosome 6 showing the range of SCA determined in NCI-H1395 as annotated at the Wellcome Trust Sanger Institute: delta-θ (top), LRR (middle), and BAFsubject (bottom) represent a variety of different SCAs, including various degrees of amplifications and regions with LOH, as well as normal regions. C, the overall segments detected by each plot across the entire autosomal chromosomes in titrated tumor contents (25.0%, 12.5%, and 6.25%) are shown as concordance rates to the total detected segments in 100% tumor content. Modified BAF was used to apply the genomic segmentation strategy to the BAFsubject plot.
Delta-θ yields improved sensitivity in heterogeneous cancer models
Next, we modeled the effect of tumor heterogeneity to detect various types of SCAs by mixing the cancer cell line DNA with its matched lymphoblastoid cell line DNA, creating a titration series with the following cancer cell DNA content: 100.0%, 25.0%, 12.5%, 6.3%, 3.1%, 1.6%, and 0.0% (100.0% lymphoblastoid cell line DNA). Each sample in this series containing cancer cell DNA was paired for analysis with 100.0% lymphoblastoid cell line DNA as the reference sample.
For each tumor content case, LRR, BAFsubject, as well as delta-θ plots were generated. The CBS algorithm, which is implemented as genomic segmentation strategy (GS) in Partek Genomics Suite 6.6, was used to computationally define SCA regions (see Materials and Methods). BAFsubject was transformed into “modified BAF” profile borrowing the mirrored BAF concept, in which GS was applied (see Methods). We evaluated whether these three plots correctly detected SCA regions regardless of whether they identified the nature of SCA (amplification, deletion, or copy-neutral LOH). Figure 1C shows the concordance rates of detected SCA segments overlapping between 100% tumor content and other diluted tumor contents across all autosomal chromosomes. The genome-wide view of all detected segments represented from 100%, 25.0%, and 12.5% tumor contents are also shown (Supplementary Fig. S2).
In the 100.0% tumor content sample, almost identical aberrant segments were detected among those three approaches, except for copy-neutral LOH regions in the LRR. However, in the 25% tumor content sample, more than 50% of true SCA segments were missed by LRR, whereas almost all SCA regions were sensitively segmented by θ-based approaches (delta-θ and modified BAF). In the 12.5% tumor content sample, θ-based approaches were still able to correctly call approximately 50% of SCA segments. In samples with less than 12.5% tumor content, conducting accurate segmentation became more difficult. However, regions large in size or extensively amplified were still detected. These results imply that when investigating heterogeneous samples using Illumina's SNP microarrays, θ-based approaches can produce more reliable segmentation data than an R-based approach.
Genome-wide SNP array analysis of heterogeneous clinical samples
To explore the practical utility of SNP array analysis combined with delta-θ for heterogeneous clinical samples, we investigated bronchial biopsies and brushings containing cancer or dysplastic cells that were likely to be contaminated with normal stromal cells. In total, 30 whole bronchial biopsies/brushes from 18 patients with heavy smoking histories were investigated [6 invasive cancer, 3 CIS, 15 dysplasia (4 severe, 7 moderate, and 4 mild dysplasia), 3 hyperplasia, and 3 normal histology; Table 1 and Supplementary Fig. S3]. In spite of each specimen's heterogeneity, delta-θ sensitively detected SCAs across the entire genome (Supplementary Table S1 for complete listing). Whereas SCAs were detected in all six invasive cancer cases, SCAs were detected in 6 of 14 samples with higher grade dysplasias (grades 5–7). None of the 10 samples with histology grade less than moderate dysplasia (grade <5) showed SCAs.
Clinical features and SNP array results of 18 patients
Histology grade . | Sample name . | Bronchial location . | Biopsy/brush . | Patient number . | SCA detection . | Smoking history . | Histology stability . |
---|---|---|---|---|---|---|---|
SQ | IC-1 | LUL | Brush | 1 | Yes | Former | – |
SQ | IC-2 | RLL | Brush | 1 | Yes | Former | – |
SQ | IC-3 | RUL | Brush | 1 | Yes | Former | – |
SQ | IC-4 | RUL | Brush | 2 | Yes | Current | – |
SQ | IC-5 | RUL | Biopsy | 3 | Yes | Current | – |
LC | IC-6 | LUDB | Biopsy | 4 | Yes | Current | – |
CIS | CIS-1 | LUL | Biopsy | 5 | Yes | Current | Persistent |
CIS | CIS-2 | LUDB | Biopsy | 6 | Yes | Current | Persistent |
CIS | CIS-3 | LUL | Brush | 7 | Yes | Current | Persistent |
SD | SD-1 | RML | Biopsy | 5 | Yes | Current | – |
SD | SD-2 | LLL | Brush | 8 | No | Former | – |
SD | SD-3 | RUL | Brush | 9 | No | Current | Regressive |
SD | SD-4 | RUL | Brush | 10 | No | Former | – |
MD | MD-1 | LUL | Brush | 6 | Yes | Current | Persistent |
MD | MD-2 | RUL | Brush | 6 | No | Current | – |
MD | MD-3 | LUL(Li) | Brush | 7 | Yes | Current | Persistent |
MD | MD-4 | RLL | Brush | 7 | No | Current | – |
MD | MD-5 | RUL | Brush | 11 | No | Current | Regressive |
MD | MD-6 | LUL | Brush | 12 | No | Former | Persistent |
MD | MD-7 | RML | Brush | 12 | No | Former | Regressive |
MiD | MiD-1 | LUL | Brush | 12 | No | Former | Persistent |
MiD | MiD-2 | LUDB | Brush | 13 | No | Current | – |
MiD | MiD-3 | RUL | Brush | 14 | No | Current | Regressive |
MiD | MiD-4 | RML | Brush | 15 | No | Former | Regressive |
HP | HP-1 | RLL | Biopsy | 5 | No | Current | – |
HP | HP-2 | LUL | Brush | 16 | No | Current | – |
HP | HP-3 | RUL | Brush | 15 | No | Former | Persistent |
NH | NH-1 | RML | Brush | 11 | No | Current | – |
NH | NH-2 | LUDB | Brush | 17 | No | Former | Persistent |
NH | NH-3 | RML | Brush | 18 | No | Former | Persistent |
Histology grade . | Sample name . | Bronchial location . | Biopsy/brush . | Patient number . | SCA detection . | Smoking history . | Histology stability . |
---|---|---|---|---|---|---|---|
SQ | IC-1 | LUL | Brush | 1 | Yes | Former | – |
SQ | IC-2 | RLL | Brush | 1 | Yes | Former | – |
SQ | IC-3 | RUL | Brush | 1 | Yes | Former | – |
SQ | IC-4 | RUL | Brush | 2 | Yes | Current | – |
SQ | IC-5 | RUL | Biopsy | 3 | Yes | Current | – |
LC | IC-6 | LUDB | Biopsy | 4 | Yes | Current | – |
CIS | CIS-1 | LUL | Biopsy | 5 | Yes | Current | Persistent |
CIS | CIS-2 | LUDB | Biopsy | 6 | Yes | Current | Persistent |
CIS | CIS-3 | LUL | Brush | 7 | Yes | Current | Persistent |
SD | SD-1 | RML | Biopsy | 5 | Yes | Current | – |
SD | SD-2 | LLL | Brush | 8 | No | Former | – |
SD | SD-3 | RUL | Brush | 9 | No | Current | Regressive |
SD | SD-4 | RUL | Brush | 10 | No | Former | – |
MD | MD-1 | LUL | Brush | 6 | Yes | Current | Persistent |
MD | MD-2 | RUL | Brush | 6 | No | Current | – |
MD | MD-3 | LUL(Li) | Brush | 7 | Yes | Current | Persistent |
MD | MD-4 | RLL | Brush | 7 | No | Current | – |
MD | MD-5 | RUL | Brush | 11 | No | Current | Regressive |
MD | MD-6 | LUL | Brush | 12 | No | Former | Persistent |
MD | MD-7 | RML | Brush | 12 | No | Former | Regressive |
MiD | MiD-1 | LUL | Brush | 12 | No | Former | Persistent |
MiD | MiD-2 | LUDB | Brush | 13 | No | Current | – |
MiD | MiD-3 | RUL | Brush | 14 | No | Current | Regressive |
MiD | MiD-4 | RML | Brush | 15 | No | Former | Regressive |
HP | HP-1 | RLL | Biopsy | 5 | No | Current | – |
HP | HP-2 | LUL | Brush | 16 | No | Current | – |
HP | HP-3 | RUL | Brush | 15 | No | Former | Persistent |
NH | NH-1 | RML | Brush | 11 | No | Current | – |
NH | NH-2 | LUDB | Brush | 17 | No | Former | Persistent |
NH | NH-3 | RML | Brush | 18 | No | Former | Persistent |
NOTE: The results of Fisher exact tests in arrayed patients/lesions with 5–7 histology grade; P = 0.20 in current versus former smokers with SCA detection compared with no SCA detection, and P = 0.048 in persistence versus regression of lesions with SCA detection compared with no SCA detection. Disease stability: persistent change, ≤ 1 histology grade; regressive change, ≥ 2 grades; and —, no repeated bronchoscopy available.
Abbreviations: HP, hyperplasia; LC, large cell lung cancer; LUL, left upper lung; LUDB, left upper divisional bronchus; LUL(Li), left upper lung (lingula); LLL, left lower lung; MiD, mild dysplasia; MD, moderate dysplasia; NH, normal histology; RLL, right lower lung; RML, right middle lung; RUL, right upper lung; SD, severe dysplasia; SQ, squamous cell lung cancer.
We identified genomic regions by delta-θ with frequently overlapping SCAs in the 6 preinvasive samples (Table 2). Among them, SCAs at chromosome 3p (3p26.3 to 3p12.3, 76.8Mbps), 5p (5p15.33 to 5p11, 47.8Mbps), 8p (8p23.3 to 8p11.21, 39.9Mbps), 9p (9p24.3 to 9p21.2, 26.6Mbps), and 13q (13q11 to 13q34, 95.8Mbps) are known to be common genetic events observed in preinvasive bronchial dysplasias (4, 39). Other frequently overlapping regions, which have never been previously reported in the bronchial preneoplasia, were also discovered. Some of these novel regions are relatively short, but affect several genes (e.g., a minimum SCA overlap shared at 2p22.3 (3.1Mbp) contains 13 genes).
Frequently overlapping regions among preinvasive cases
Chromosome . | Cytoband . | Length (bp) . | Potential gene of interest . | Previous report . | Estimated SCA type . | SCA detected (n = 6) . |
---|---|---|---|---|---|---|
2p | 2p22.3 | 3,112,242 | RASGRP3 | No | Amplification | 3 |
3p | 3p26.3–12.3 | 76,829,014 | RARB, SEMA3B | Yes | Deletion/CN-LOH | 4 |
5p | 5p15.33–11 | 47,814,582 | TERT, GDNF | Yes | Amplification | 3 |
5q | 5q14.1 | 613,020 | SSBP2 | No | Deletion/CN-LOH | 3 |
7p | 7p15.3 | 531,270 | DFNA5 | No | Deletion/CN-LOH | 4 |
7q | 7q36.3 | 858,066 | PTPRN2 | No | Deletion/CN-LOH | 3 |
8p | 8p23.3–11.21 | 39,942,807 | MTUS1 | Yes | Complex | 3 |
9p | 9p24.3–21.2 | 26,562,586 | CDKN2A | Yes | Deletion/CN-LOH | 5 |
9q | 9q31.1–31.2 | 5,599,950 | RNF20 | No | Deletion/CN-LOH | 5 |
13q | 13q11–34 | 95,827,527 | RB1, BRCA2 | Yes | Deletion/CN-LOH | 3 |
14q | 14q21.1–21.2 | 2,530,363 | FANCM | No | Deletion/CN-LOH | 3 |
21q | 21q22.2–22.3 | 6,154,199 | TFF1 | No | Deletion/CN-LOH | 3 |
Chromosome . | Cytoband . | Length (bp) . | Potential gene of interest . | Previous report . | Estimated SCA type . | SCA detected (n = 6) . |
---|---|---|---|---|---|---|
2p | 2p22.3 | 3,112,242 | RASGRP3 | No | Amplification | 3 |
3p | 3p26.3–12.3 | 76,829,014 | RARB, SEMA3B | Yes | Deletion/CN-LOH | 4 |
5p | 5p15.33–11 | 47,814,582 | TERT, GDNF | Yes | Amplification | 3 |
5q | 5q14.1 | 613,020 | SSBP2 | No | Deletion/CN-LOH | 3 |
7p | 7p15.3 | 531,270 | DFNA5 | No | Deletion/CN-LOH | 4 |
7q | 7q36.3 | 858,066 | PTPRN2 | No | Deletion/CN-LOH | 3 |
8p | 8p23.3–11.21 | 39,942,807 | MTUS1 | Yes | Complex | 3 |
9p | 9p24.3–21.2 | 26,562,586 | CDKN2A | Yes | Deletion/CN-LOH | 5 |
9q | 9q31.1–31.2 | 5,599,950 | RNF20 | No | Deletion/CN-LOH | 5 |
13q | 13q11–34 | 95,827,527 | RB1, BRCA2 | Yes | Deletion/CN-LOH | 3 |
14q | 14q21.1–21.2 | 2,530,363 | FANCM | No | Deletion/CN-LOH | 3 |
21q | 21q22.2–22.3 | 6,154,199 | TFF1 | No | Deletion/CN-LOH | 3 |
NOTE: SCA regions shared by at least 2 patients are shown. The trend SCA type was estimated by investigating the mean value of LRR in the detected segment.
“Complex” indicates both deleted and amplified segments exist in the region.
Boldface types are the genes selected for FISH and qPCR assays to confirm the estimated trend of SCA type (shown in Figs. 2 and 3).
To validate the regions detected by delta-θ, several SCA regions containing the sequences of known cancer-related genes were selected and investigated by FISH and copy-number qPCR assays (Fig. 2A, 2B for FISH, and Fig. 2C for copy-number qPCR). The regions identified as SCAs by θ-based approaches were validated by those assays. On the other hand, the LRR plot was not necessarily useful as a reliable way to identify SCAs. In a sample with a noisy or deviated LRR plot, fragmented segments that resulted from highly variable LRR values failed to detect true SCA regions (Supplementary Fig. S4).
Detection and validation of SCAs by FISH and qPCR assays. A, in a brushing sample with CIS (CIS-3 in Table 1), contaminated with normal stromal cells, several SCA regions were suggested by delta-θ (top). Among them, the two regions at 5p and 9p were chosen (middle), and confirmed as amplification (copy number = 5.90 ± 2.31) and deletion (copy number = 1.36 ± 0.48), respectively, by FISH using 5p15 (green) and CDKN2A probes (red; bottom). LRR and modified BAF also detected those two regions. B, in a biopsy sample with CIS (CIS-1), which is relatively homogeneous, several SCA regions were detected by delta-θ (top). Using tricolor FISH probes (PIK3CA in red, TP63 in yellow at 3q, and NKX2–1 in green at 14q), 3q and 14q were confirmed as amplification (3.52 ± 1.07 and 4.30 ± 1.05) and deletion (0.48 ± 0.58), respectively. Modified BAF detected both regions, whereas fragmented segments by LRR missed the deletion at 14q. C, in a brushing sample with moderate dysplasia (MD-3), which is relatively homogeneous, several SCA regions were suggested by delta-θ. qPCR assays predicted copy numbers at four locus (1.44 ± 0.08, 1.63 ± 0.20, 5.28 ± 0.40, and 4.92 ± 0.98, shown as mean ± SD in quadruplicate measurements). Control indicates reference blood (predicted copy number is 2.03 ± 0.14).
Detection and validation of SCAs by FISH and qPCR assays. A, in a brushing sample with CIS (CIS-3 in Table 1), contaminated with normal stromal cells, several SCA regions were suggested by delta-θ (top). Among them, the two regions at 5p and 9p were chosen (middle), and confirmed as amplification (copy number = 5.90 ± 2.31) and deletion (copy number = 1.36 ± 0.48), respectively, by FISH using 5p15 (green) and CDKN2A probes (red; bottom). LRR and modified BAF also detected those two regions. B, in a biopsy sample with CIS (CIS-1), which is relatively homogeneous, several SCA regions were detected by delta-θ (top). Using tricolor FISH probes (PIK3CA in red, TP63 in yellow at 3q, and NKX2–1 in green at 14q), 3q and 14q were confirmed as amplification (3.52 ± 1.07 and 4.30 ± 1.05) and deletion (0.48 ± 0.58), respectively. Modified BAF detected both regions, whereas fragmented segments by LRR missed the deletion at 14q. C, in a brushing sample with moderate dysplasia (MD-3), which is relatively homogeneous, several SCA regions were suggested by delta-θ. qPCR assays predicted copy numbers at four locus (1.44 ± 0.08, 1.63 ± 0.20, 5.28 ± 0.40, and 4.92 ± 0.98, shown as mean ± SD in quadruplicate measurements). Control indicates reference blood (predicted copy number is 2.03 ± 0.14).
Next, we used copy-number qPCR assays to confirm the delta-θ results in several novel SCA regions identified. The following genes were selected for this study: Ring Finger Protein 20 (RNF20) located at 9q31.1, RAS guanyl releasing protein 3 (RASGRP3) at 2p22.3, and single-stranded DNA binding protein 2 (SSBP2) at 5q14.1 (Table 2). Although heterogeneity in some of the preinvasive samples made LRR plots uninterpretable, qPCR assay results showed a trend of amplification or deletion/copy-neutral LOH, in each SCA detected by delta-θ (Fig. 3).
The trend of SCA types estimated by qPCR at the regions containing three selected genes in preinvasive samples. Using the copy-number probes for three different genes (RNF20, RASGRP3, and SSBP2), the common trend of amplification or deletion(/copy-neutral LOH) was estimated in each gene. In delta-θ plots, the regions of 90–120 Mbp at 9q, 0–50 Mbp at 2p, and 50–100 Mbp at 5q were illustrated. Mean values of delta-θ, LRR from SNP array, and copy number by qPCR in each segmented SCA region are shown. The direction of either amplification or deletion was inferred by considering those mean values. qPCR resulted in showing the concordant trend of copy-number estimation to SNP array data in spite of various degrees of sample heterogeneity. *P values resulted from two-sample t tests in comparison of individual preinvasive sample and blood in 4–6 replicate measurements.
The trend of SCA types estimated by qPCR at the regions containing three selected genes in preinvasive samples. Using the copy-number probes for three different genes (RNF20, RASGRP3, and SSBP2), the common trend of amplification or deletion(/copy-neutral LOH) was estimated in each gene. In delta-θ plots, the regions of 90–120 Mbp at 9q, 0–50 Mbp at 2p, and 50–100 Mbp at 5q were illustrated. Mean values of delta-θ, LRR from SNP array, and copy number by qPCR in each segmented SCA region are shown. The direction of either amplification or deletion was inferred by considering those mean values. qPCR resulted in showing the concordant trend of copy-number estimation to SNP array data in spite of various degrees of sample heterogeneity. *P values resulted from two-sample t tests in comparison of individual preinvasive sample and blood in 4–6 replicate measurements.
Among the 14 cases with histology grade 5–7 (preinvasive), there was no significant difference in the average histology grades of the actual clinical samples prepared and run on the SNP array (P = 0.07; filled dots in Supplementary Fig. S3). However, an individual patient's overall average histology (filled and open circles) was significantly higher in SCA-positive patients versus SCA-negative patients (P = 0.021; Fig. 4). The biopsies were obtained from an individual patient in a broad sampling of the airway mucosa (at ≥ 4 well-distributed bronchial areas biopsied besides concerning areas). Patients whose bronchial mucosa showed any SCA had highly severe lesions across the bronchus.
The correlation between individual patient's overall average histology grade and SCA detection. Average histology grade (from all biopsies for each patient) were compared among three different groups: three patients whose arrayed samples with CIS/SD/MD (histology grade 5–7) showed positive SCAs (left), 5 patients whose arrayed samples with SD/MD (histology grade 5–6) resulted in negative SCA detection (middle), and six patients whose samples with less than MD (histology grade <5) were arrayed, but resulted in negative SCA detection (right). Each dot represents the average histology grade of a patient (see Supplementary Fig. S3), and each horizontal bar shows the average value of each group's average histology (4.5, 3.0, and 2.6, respectively). *P = 0.021 in the t test between left and middle groups; **, P = 0.010 in the t test between left and middle + right groups; ns, not significant.
The correlation between individual patient's overall average histology grade and SCA detection. Average histology grade (from all biopsies for each patient) were compared among three different groups: three patients whose arrayed samples with CIS/SD/MD (histology grade 5–7) showed positive SCAs (left), 5 patients whose arrayed samples with SD/MD (histology grade 5–6) resulted in negative SCA detection (middle), and six patients whose samples with less than MD (histology grade <5) were arrayed, but resulted in negative SCA detection (right). Each dot represents the average histology grade of a patient (see Supplementary Fig. S3), and each horizontal bar shows the average value of each group's average histology (4.5, 3.0, and 2.6, respectively). *P = 0.021 in the t test between left and middle groups; **, P = 0.010 in the t test between left and middle + right groups; ns, not significant.
In addition, we compared two different DNA isolation methods from the brushing specimen shown above (MD-3; see Materials and Methods). The sample obtained directly from brushing resulted in higher delta-θ values, suggesting that an adequate amount of more homogeneously affected epithelial cells can be obtained directly from a brush after vortexing (Supplementary Fig. S5).
Discussion
The advances made in high-density chip technology have improved the sensitivity and specificity of detection of aberrant chromosomal rearrangements. Nevertheless, the heterogeneous nature of relatively smaller premalignant specimens compared with solid invasive cancer tissues has made the analysis and interpretation of data more challenging. By using blood genomic DNA from the same patient as a reference, we demonstrate how delta-θ (subtraction of allelic fraction) helps alleviate this problem.
Theta-based approaches are very sensitive at detecting regions of chromosomal abnormality, but do not provide information about the type of genomic change. Computationally integrated algorithms, such as genome alteration print (40), consider both the LRR and BAF data to infer copy-number gain or loss. In the analysis of relatively homogeneous samples, such integrated algorithms can generate more information, including copy-number estimation, and take less time for completing comprehensive analysis than delta-θ. However, in heterogeneous samples, delta-θ provides greater sensitivity for detecting alterations between 10% and 25% tumor/abnormal cell content, depending on the type of SCA. In cancer genome studies, finding somatically derived SCAs is critical to identify truly carcinogenic variants (26).
Initially, we were unsure about whether biopsies or brushings would be better suited for SNP array analysis. Although both specimens can be used, brushing samples contain a higher proportion of epithelial cells. In general, genomic DNA isolated directly from the brush showed the highest signal to noise ratio (Supplementary Fig. S5). Overall, brushings seem more attractive than biopsies for SNP array-based genome research. We have not assessed microdissected biopsies.
In the analysis of preinvasive lesions, delta-θ revealed previously characterized SCAs as well as novel SCA regions, which were highly overlapping among preinvasive lesions. Some of these novel regions are small in size, but do contain at least one gene that may have a cancer-related function. For example, RNF20 deficiency has been recently reported to trigger genomic instability (41, 42). SSBP2 stabilizes transcriptional cofactor protein and regulates malignant transformation (43). Both of these genes are reported to act as putative tumor suppressors. Meanwhile, another study indicated that RASGRP3 has a tumorigenic function by activating the RAS signaling pathway (44; Table 2). Analysis of more than 200 lung squamous cell carcinoma samples (Cancer Genome Workbench; 45) showed that the genomic regions containing RNF20 and SSBP2 are frequently deleted (35% and 66%, respectively), and the region containing RASGRP3 is frequently amplified (45%). These novel overlapping SCAs across preinvasive cases can be added to the frequent somatic genomic rearrangements associated with the development of cancer. We speculate that these SCAs were not previously discovered due to their relatively small size and low signal, particularly in heterogeneous samples. Our findings demonstrate that, even at the preinvasive dysplastic stage in bronchial epithelium, more instances of genomic alteration are occurring across the genome than was previously appreciated.
The preinvasive specimens with positively detected SCAs were shown to be accompanied by multifocal and advanced histologic changes throughout the airway (Fig. 4), referred to as “field cancerization” (46, 47). Although a wide variety of chromosomal alterations are thought to be important in the development of invasive cancer, these changes are identified in the airway of current or former smokers without known lung cancer. The consistency of delta-θ implies that, even if a sample contains only a small portion of cytogenetically affected epithelial cells (∼10%), SCA regions are detectable in SNP array analysis. Considering these findings together, advanced preinvasive lesions, including high-grade dysplasia and CIS with detectable SCAs, may identify early-stage patients before the dominant and clonal expansion observed in late-stage invasive tumors.
Although the number of samples analyzed is limited, several patients in our study have undergone repeated bronchoscopies to monitor dysplastic lesions. SCAs are more likely to be detected in moderate or worse dysplasias that are persistent over multiple bronchoscopies (Table 1). Meanwhile, no significant difference was seen between current and former smokers whose specimens showed positive or negative SCA.
Because SNP arrays are able to sensitively monitor these emerging SCAs, testing early-stage lesions by this approach may identify patients at higher risk of developing invasive cancer and these subjects may be excellent candidates for chemoprevention studies. As our study is cross-sectional and of limited size, both larger cross-sectional and longitudinal studies will be needed to prove this hypothesis.
More and more studies are now using next generation sequencing (NGS) technology. NGS not only provides efficient mutation analysis, but can also detect copy-number variations under specific circumstances (48). Specific gene mutations can be detected by NGS technology in highly heterogeneous cases (49, 50). However, the high cost and analytical complexity currently limit this approach (51). In addition, concise and efficient analytical methods seem to be needed to derive copy-number estimation in heterogeneous cases (52). We believe that a microarray-based approach, supported with established analytical methodologies, can be a more cost-effective approach for screening small, premalignant bronchial lesions.
In conclusion, distinctive genomic variations were successfully detected across the whole genome by SNP arrays even in the heterogeneous cell population found in bronchial premalignancy, by using subtraction of allelic fractions, delta-θ. Using this strategy, we have demonstrated the occurrence of at least three SCAs previously undescribed in preinvasive lesions. The genes contained within these regions show losses of putative tumor suppressors (RNF20 and SSBP2) and an amplification of a gene with oncogenic activity (RASGRP3). SNP array technology can expand our ability to assess genomic instability in the airway epithelium as a biomarker of lung cancer risk.
Disclosure of Potential Conflicts of Interest
M. Varella-Garcia has received honoraria as an educational speaker for Abbott Molecular Inc. D. Merrick is a consultant/advisory board member of Pfizer Inc. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: I. Nakachi, J.L. Rice, C.D. Coldren, M.G. Edwards, R.S. Stearman, W.A. Franklin, R.L. Keith, M.T. Lewis, Y.E. Miller, M.W. Geraci
Development of methodology: I. Nakachi, J.L. Rice, C.D. Coldren, M. Varella-Garcia, W.A. Franklin, M.T. Lewis, B. Gao, M.W. Geraci
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): I. Nakachi, J.L. Rice, M. Varella-Garcia, W.A. Franklin, R.L. Keith, M.T. Lewis, B. Gao, D.T. Merrick, Y.E. Miller
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): I. Nakachi, J.L. Rice, C.D. Coldren, M.G. Edwards, R.S. Stearman, S.C. Glidewell, M. Varella-Garcia, W.A. Franklin, M.W. Geraci
Writing, review, and/or revision of the manuscript: I. Nakachi, J.L. Rice, M.G. Edwards, R.S. Stearman, S.C. Glidewell, M. Varella-Garcia, W.A. Franklin, R.L. Keith, D.T. Merrick, Y.E. Miller, M.W. Geraci
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): I. Nakachi, S.C. Glidewell, W.A. Franklin, B. Gao, Y.E. Miller
Study supervision: C.D. Coldren, W.A. Franklin, B. Gao, Y.E. Miller, M.W. Geraci
Acknowledgments
The authors thank Okyong Cho (University of Colorado Denver Genomics and Microarray Core Lab, Aurora, CO) for technical assistance and the use of Cancer Center Shared Resource (P30 CA046934). The authors also thank Heather Malinowski (Department of Pathology, University of Colorado Denver) for technical assistance.
Grant Support
The study was partially supported by National Cancer Institute (NCI) grants SPORE in Lung Cancer P50 CA058187 (M. Varella-Garcia, W.A. Franklin, Y.E. Miller, and M.W. Geraci), Cancer Center Support Grant P30 CA046934 (M. Varella-Garcia, W.A. Franklin, Y.E. Miller, and M.W. Geraci), and RO1 CA164780 (Y.E. Miller and M.W. Geraci), a NHLBI grant R21 HL094927 (C.D. Coldren), a LUNGevity Foundation Early Detection Grant (Y.E. Miller), and Cancer League of Colorado research grant (Y.E. Miller and M.W. Geraci). This was also supported in part by the Biostatistics/Bioinformatics Shared Resource of Colorado's NIH/NCI CCSG P30 CA046934 (M.G. Edwards).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.