Abstract
Chromosome copy gain, loss, and loss of heterozygosity (LOH) involving most chromosomes have been reported in many cancers; however, less is known about chromosome instability in premalignant conditions. 17p LOH and DNA content abnormalities have been previously reported to predict progression from Barrett's esophagus (BE) to esophageal adenocarcinoma (EA). Here, we evaluated genome-wide chromosomal instability in multiple stages of BE and EA in whole biopsies. Forty-two patients were selected to represent different stages of progression from BE to EA. Whole BE or EA biopsies were minced, and aliquots were processed for flow cytometry and genotyped with a paired constitutive control for each patient using 33,423 single nucleotide polymorphisms (SNP). Copy gains, losses, and LOH increased in frequency and size between early- and late-stage BE (P < 0.001), with SNP abnormalities increasing from <2% to >30% in early and late stages, respectively. A set of statistically significant events was unique to either early or late, or both, stages, including previously reported and novel abnormalities. The total number of SNP alterations was highly correlated with DNA content aneuploidy and was sensitive and specific to identify patients with concurrent EA (empirical receiver operating characteristic area under the curve = 0.91). With the exception of 9p LOH, most copy gains, losses, and LOH detected in early stages of BE were smaller than those detected in later stages, and few chromosomal events were common in all stages of progression. Measures of chromosomal instability can be quantified in whole biopsies using SNP-based genotyping and have potential to be an integrated platform for cancer risk stratification in BE.
Genomic instability has been hypothesized to play a crucial role in the development of cancer for decades (1), and there is substantial evidence to support this hypothesis. Numerous studies have investigated genome-wide copy number changes and loss of heterozygosity (LOH; refs. 2–5) in EA and other cancers, including renal cell (6), colon (7, 8), cervical (9), pancreas (10), lung (11, 12), and glioblastoma (13). These studies have reported that the cancer genome acquires extensive chromosomal changes during neoplastic progression from a normal cell to cancer. Many of these alterations confer a selective advantage for the cancers (“drivers”) relative to normal cells, but others seem to be hitchhikers that are evolutionarily neutral (“passengers”; ref. 14). Few studies have used single nucleotide polymorphism (SNP)–based technologies to evaluate genome-wide copy number and LOH in premalignant stages of neoplastic progression compared with those in cancer (15–17). Characterization of premalignant conditions is important to understand the biology of neoplastic progression as well as to identify potential targets for cancer prevention and to develop methods for cancer risk stratification or early diagnosis of an asymptomatic cancer.
The incidence of esophageal adenocarcinoma (EA) is increasing more rapidly than any other cancer in the United States (18, 19). EA is typically detected at an advanced stage in which it is rapidly fatal with a mortality of >80% (20, 21). Barrett's esophagus (BE) is a condition that develops in a subset of patients with chronic gastroesophageal reflux disease in which the normal squamous epithelium of the esophagus is replaced by intestinal metaplasia (22). Although BE is the only known precursor of EA, most patients with BE die of causes unrelated to BE or EA (23, 24). Further, recent evidence suggests that, in most cases, BE seems to be a successful adaptation to a harsh intraesophageal environment with chronic acid and bile reflux associated with erosion, ulceration, reactive oxygen species, and oxidative damage (25–28). Thus, there is a need for biomarkers that can distinguish patients with common, benign BE for whom surveillance can be safely prolonged, rare cases that will progress and therefore merit careful surveillance or intervention to prevent EA, and patients who require early detection because they have already developed a small EA that cannot be visualized endoscopically.
BE is one of the best models of human intraepithelial neoplasia because the premalignant epithelium can be safely visualized and biopsied so that genomic changes can be compared in different stages of neoplastic evolution and then studied longitudinally by endoscopic biopsy surveillance for EA risk management (29, 30). Genetic progression models and longitudinal studies of oral leukoplakia have also shown successes in using somatic genetic biomarkers for risk stratification in head and neck cancer (31–35). This is in contrast to many other premalignant conditions that are typically removed when detected, such as colonic adenomas, or are unable to be systematically sampled for biopsy examination over time because of limited accessibility or potential clinical complications. Studies of BE indicate that inactivation of CDKN2A by LOH, methylation, and/or mutation is selected as an early event that predisposes to large clonal expansions in the BE segment (36–38). Inactivation of TP53 by mutation and LOH is subsequently selected, typically in a CDKN2A-deficient background, and predisposes to progression to increased 4N fractions (G2/tetraploidy), aneuploidy, and EA (14, 29, 32–34, 38–42).
We reported recently in a 10-year prospective cohort study that a panel of chromosome instability biomarkers, including LOH spanning, the CDKN2A and TP53 loci on chromosome arms 9p and 17p, and DNA content abnormalities (increased 4N and aneuploidy), identified a high-risk patient subset that had a 79% 5-year cumulative incidence of EA and a low-risk subset with a 0% cumulative EA incidence to nearly 8 years of follow-up after their baseline endoscopic biopsy surveillance (43). This and other studies support the hypothesis that progressive chromosomal instability and clonal evolution drive progression from a benign premalignant state to cancer (29, 32–34, 42, 44). However, it has been difficult to combine these biomarkers into a single platform for clinical use because various and complex sample processing methods are required, including flow cytometric cell sorting and multiple research assays for LOH detection. Further, regions of LOH evaluated in these studies were based on low-density microsatellite polymorphisms (45–47). In addition, genome-wide studies of chromosomal instability in EA have identified common regions of copy gain, loss, copy neutral LOH, and a small number of homozygous deletions (2), but these events have not been quantified in early stages of progression in BE.
Here, we evaluate genome-wide copy gain, loss, and LOH and assess SNP-based quantification of aneuploidy in a cross-sectional study of patients representing different stages of progression from early BE metaplasia to advanced EA using paired samples with Illumina multisample 33K SNP BeadChips. Selection of patients and biopsies in this study was guided by previous extensive characterization of the Seattle Barrett's Esophagus cohort, which allowed more precise estimation of risk based on molecular characterization and more endoscopic follow-up data than are typically available. Chromosome alterations, including genome-wide copy gain, loss, and LOH, were evaluated and compared across the progression stages using whole, unpurified biopsies. Overall genomic instability was quantified relative to DNA content aneuploidy in the same biopsy to assess feasibility of using SNP arrays as a common platform for biomarkers of EA risk.
Materials and Methods
Study participants and biological samples
Participants were enrolled in the Seattle Barrett's Esophagus Study, which was approved by the Human Subjects Division of the University of Washington in 1983 and renewed annually thereafter with reciprocity from the Fred Hutchinson Cancer Research Center Institutional Review Board from 1993 to 2001. Since 2001, the study has been approved by the Fred Hutchinson Cancer Research Center Institutional Review Board with reciprocity from the University of Washington Human Subjects Division.
Mucosal biopsies used in this cross-sectional study were obtained from 42 patients using previously defined protocols for endoscopic mucosal biopsy and evaluation of surgical resection specimens (48, 49). A constitutive control sample from gastric tissue was collected for each of the 42 participants and used as the reference for paired LOH and copy number analysis. The 42 participants each had histologically diagnosed specialized intestinal metaplasia and represented four different stages of neoplastic progression in BE defined as follows:
[BE early] 12 surveillance endoscopic biopsies from 12 participants with BE who never developed DNA content abnormalities (aneuploidy or increased 4N fraction), high-grade dysplasia, or EA during any follow-up endoscopies (mean 118.3 months, range 59.2-182.6 months) and in whom no TP53 mutation or 17p LOH was detected at baseline (F/M 1:11, mean age 58 years);
[BE instability] 12 endoscopic biopsies from 12 participants with BE who developed a DNA content abnormality (aneuploidy and/or increased 4N) during surveillance (mean 104.5 months, range 42.2-208.3 months) but in whom no TP53 mutation or 17p LOH was detected at baseline (F/M = 3:9, mean age 63 years). During this follow-up period, 6 of 12 participants had a diagnosis of high-grade dysplasia and one participant developed cancer after 14 years. DNA content abnormalities and high-grade dysplasia are known risk factors for progression to EA (48);
[Advanced BE] 10 Barrett's mucosal biopsies within premalignant Barrett's epithelium from 10 surgical resections in participants with microscopic EAs, including nine intramucosal and one with superficial submucosal invasion detected during endoscopic surveillance (mean BE surveillance 21.1 months, range 4.2-42.3 months, F/M 2:8, mean age 63); and
[EA] 8 large EA specimens averaging 4.8 cm in diameter (range 1.5-9 cm) from the surgical resection specimens of 8 patients (F/M 0:8, mean age 75 years).
DNA content flow cytometry to determine ploidy of BE/EA biopsies
Whole biopsy specimens and gastric controls, including fresh/frozen endoscopic biopsies (n = 48) or surgical resection specimens (n = 36), were processed. For the BE and EA samples, whole endoscopic biopsies (n = 24) or surgical resection specimens (n = 18) were minced in buffer used for preparing tissue for flow cytometry [146 mmol/L NaCl, 10 mmol/L Tris base (pH 7.5), 1 mmol/L CaCl2, 0.5 mmol/L MgSO4, 0.05% bovine serum albumin, 21 mmol/L MgCl2, 0.2% Igepal CA-630 Sigma I3021] as previously described (50). Approximately 90 μL of unfixed nuclei were saturated with 4′,6-diamidino-2-phenylindole (5 μg/mL, Accurate Chemical) and evaluated for DNA content on a Cytopeia inFlux flow cytometer. Listmode files were analyzed using Multicycle (Phoenix Flow Systems) as described previously (51, 52).
SNP-based copy gain, loss, and LOH using Illumina Infinium technology
DNA from the remaining minced tissue of each whole biopsy was extracted using standard Puregene DNA Isolation Kit as recommended by the manufacturer (Gentra Systems, Inc.), and quantitated with the Quant-iT PicoGreen dsDNA Assay Kit following the manufacturer's protocol (Invitrogen) on the CytoFluor II plate reader. DNA for SNP arrays was not purified by flow cytometry because this study is designed to identify abnormalities that ultimately may be assessed in whole biopsies without further purification in a clinical setting. Each sample was genotyped using a Illumina 12-sample Human Hap300_Pool10 Beadchip targeting 33,423 SNP loci (33K SNP array) from the HumanHap300 BeadChip and processed at Illumina using standard multisample Infinium protocols and reagents according to the manufacturer's instructions. DNA (average 223.8 ng input) was amplified, enzymatically fragmented, precipitated, resuspended in hybridization buffer, denatured, and hybridized to the BeadChip overnight at 48°C. After the overnight incubation, the BeadChip was washed, primer extended, and stained on a Tecan Genesis/EVO robot using a Tecan GenePaint slide processing system. After staining, BeadChips were washed, coated, and then scanned using Illumina's BeadArray Reader. The probe intensity data were extracted and analyzed using Illumina's BeadStudio 3.1 software.
Genotyping data consist of two-channel intensity data corresponding to the two alleles. Data are generated as rectangular coordinates of the raw A versus raw B allele intensities. After normalization with BeadArray Reader, the genotyping data were transformed to a polar coordinate plot of normalized intensity R = Xnorm + Ynorm and w θ = (2/pi)* arctan (Ynorm / Xnorm), where Xnorm and Ynorm represent transformed normalized signals from alleles A and B for a particular locus. For each patient, a pair of normal gastric (reference) and a BE/EA sample (subject) was processed in paired mode by using default reference clustering. The B-allele frequency variable, based on the θ values, is calculated for each locus as described previously (53).
Statistical methods
We examined the MA plots (signal intensity versus signal ratio plot) for each sample and found no bias. Paired constitutive controls for each patient were used to ensure that all abnormalities detected were somatic lesions and not constitutive copy number variations that could arise when using a single reference sample for all patients (54, 55). Paired sample analysis of BE or EA samples (defined as “sub”) paired with the control gastric sample (defined as “ref”) from each patient generated two variables, the log-ratio log 2(Rsub/Rref) and the |dAlleleFreq sub-ref|, that were used for copy number and LOH detection (53). The |dAlleleFreq sub-ref| represents the absolute value of the difference in allele frequency between the subject (BE or EA) and reference (paired constitutive control) samples. Because both samples are from the same patient, the difference of allele frequency at any given locus should be approximately zero, except in regions exhibiting somatic chromosomal anomalies such as copy number differences or LOH. Two stages of data analysis were done in this study. The first identified copy gain, copy loss, and LOH for each chromosome arm. The second tested whether the frequency of a gain, loss, or LOH identified in a given segment along the chromosome arm reached statistical significance across patients.
Specifically, we defined the normal baseline copy number (2N) on each chromosome using |dAlleleFreq sub-ref|. After wavelet processing (see below) of |dAlleleFreq sub-ref|, regions with signal values including at least 10% contiguous SNPs on either chromosome arm with a |dAlleleFreq sub-ref| equal to zero were identified. These regions were used to set the baseline chromosome copy number [adjusted log 2(Rsub/Rref) to zero and was applied specifically to that chromosome]. If a contiguous region of 10% of SNPs could not be found on a chromosome, then no adjustment was made for a baseline copy number. A wavelet method was used to smooth both copy number log 2(Rsub/Rref) and |dAlleleFreq sub-ref| signal values (56, 57). Daubechies3 wavelets were used in this study, and array signal was decomposed to the second level with the wavelets and reconstructed following dynamic thresholding (depending on the signal to noise ratio for that sample). For a specific constitutive genotype “AB,” after above wavelet procedure processing (including thresholding), probes with log 2(Rsub/Rref) signal values that deviated from zero were identified as either copy gain (allelic imbalance, e.g., AAB or balanced, e.g., AABB) or loss (deletion or homozygous deletion). Probes with |dAlleleFreq sub-ref|, which deviated from zero after wavelet processing including thresholding, were identified to be LOH (allelic imbalance, e.g., A and AAB or copy neutral LOH, e.g., AA). Infrequently, there were a few noncontiguous probe signal levels that were more than 4 SDs from the mean signal of a chromosome arm. Because some of those signal values could be smoothed out by wavelets, those signal values were included as lesions for that patient and subjected to consistency testing in the second stage of analysis. A Kruskal-Wallis test was used to compare the median number of lesions and sizes of lesions in each group. For the size of events, we defined a gain, loss, or LOH segment as continuous until there were at least two adjacent probes without that lesion. Thus, the boundaries of a lesion segment were defined by at least two or more probes without the lesion. Poisson regression was used to test the frequency of the lesion in each group.
Percent of probes across the genome with copy gain, loss, and LOH in different stages of neoplastic progression measured in the 33K SNP array. *, %SNP probes with LOH was calculated using only those SNPs that were heterozygous in the paired constitutive control.
Percent of probes across the genome with copy gain, loss, and LOH in different stages of neoplastic progression measured in the 33K SNP array. *, %SNP probes with LOH was calculated using only those SNPs that were heterozygous in the paired constitutive control.
In the second stage of analysis, the significance of frequencies of lesions across the patients was tested. Copy number gain, loss, and LOH were tested separately along the chromosome arms. Specifically for each of the three types of lesions, every chromosome arm was divided into 0.5-Mb windows, although the last window may be smaller than 0.5 Mb. If one or more lesions were included in a window in the first stage of the analysis, then one event was counted for that window. If no lesion was called in the first stage, the window was given a value of zero. Fisher's exact test was used to test the number of events across the patients of a comparison group. We compared the lesion frequency within each window in both the early and late stage separately, relative to the null hypothesis of no lesion in a window. The P values of these comparisons were adjusted to have a false-positive discovery number of 1 or less (58). The starting and ending positions of any given lesion on a chromosome may be different across multiple patients. Therefore, to test whether a lesion was common among patients, we used the window method as a practical approach, considering sample size and SNP informativity rates. The 0.5-Mb window size was determined by considering the specific array probe density (coverage of the genome) and SNP informativity (heterozygosity) rate for LOH detection (59). Larger windows have higher probability of containing an informative SNP, but the resolution for detection is lower, whereas a smaller window will have higher resolution but may not contain informative SNPs. Using an average SNP informativity rate of 0.26 in dbSNP and considering the multisample Illumina 33K BeadChip density, we selected a 0.5-Mb window size to have 0.8 LOH detection power. We evaluated window sizes from 0.1 to 1.0 Mb and obtained essentially identical results. Empirical receiver operating curves were generated according to previously described methods (60). All the analyses were carried out in Matlab (version 7.1, The Mathworks, Inc.) and Statistical Analysis System (SAS version 9.1, SAS Institute, Inc.).
Results
Molecular and pathology criteria defining stages of progression
The 42 pairs of biopsies evaluated in this study were obtained from 42 patients, including 6 women and 36 men of mean age 64.3 years (range 43-93 years). Table 1 depicts the participant selection criteria with regard to molecular and pathology measures for each stage of progression. Patients were classified using previously evaluated flow cytometrically purified biopsies (mean 9.64 biopsies per patient) for 17p LOH and TP53 mutation (at baseline only) and DNA content abnormalities by flow cytometry as described previously (29, 43).
Molecular and pathologic stages of progression
. | . | Patient characteristics at time of sampling . | EA risk biomarkers* in follow-up . |
---|---|---|---|
“Early stage,” endoscopic biopsies from BE | BE early (n = 12) | No biomarker abnormalities* | 0/12 develop DNA content abnormality |
0/12 develop high-grade dysplasia | |||
BE instability (n = 12) | No biomarker abnormalities* | 12/12 develop DNA content abnormality | |
6/12 with diagnosis of high-grade dysplasia | |||
“Late stage,” surgical resection specimens | Advanced BE (n = 10) | 17pLOH, TP53 mutation | Not applicable |
Aneuploidy | |||
Preoperative endoscopic diagnosis of small, early cancer; samples from premalignant epithelium | |||
EA tumor (n = 8) | 17pLOH, TP53 mutation | Not applicable | |
Aneuploidy | |||
Samples from large EA tumor mass |
. | . | Patient characteristics at time of sampling . | EA risk biomarkers* in follow-up . |
---|---|---|---|
“Early stage,” endoscopic biopsies from BE | BE early (n = 12) | No biomarker abnormalities* | 0/12 develop DNA content abnormality |
0/12 develop high-grade dysplasia | |||
BE instability (n = 12) | No biomarker abnormalities* | 12/12 develop DNA content abnormality | |
6/12 with diagnosis of high-grade dysplasia | |||
“Late stage,” surgical resection specimens | Advanced BE (n = 10) | 17pLOH, TP53 mutation | Not applicable |
Aneuploidy | |||
Preoperative endoscopic diagnosis of small, early cancer; samples from premalignant epithelium | |||
EA tumor (n = 8) | 17pLOH, TP53 mutation | Not applicable | |
Aneuploidy | |||
Samples from large EA tumor mass |
*Biomarkers = 17pLOH (spanning TP53), TP53 mutation, DNA content abnormality (aneuploidy and/or increased 4N), high-grade dysplasia.
Genome-wide chromosome instability in different stages of neoplastic progression
Genome-wide chromosome instability was evaluated in patient samples representing the four different stages of neoplastic progression described in Table 1. Figure 1 shows the average percentage of SNPs with copy gain or loss abnormalities and the percentage of informative SNPs with allelic imbalance due to copy gain or LOH, including copy loss and copy neutral events using paired constitutive controls (see Materials and Methods). The number of SNPs with copy gain, loss, and LOH show a highly statistically significant trend of increasing number of altered SNPs across each of the four progression stages (P < 0.001). In early stages of progression, very few SNPs have copy number or LOH abnormalities (Fig. 1). In late stages of BE progression, the frequency of SNP alterations increased dramatically (Fig. 1). Direct comparisons of number of SNP probes with gain, loss, and LOH between the early stages of progression (BE early versus BE instability groups), and also between the late stages of BE (advanced BE and EA groups), did not reach the P = 0.01 significance level, given that the statistical power of the tests are small when the sample size of each group is not large.
To distinguish between a high a number of small chromosomal events versus a small number of larger events, for example 100 regions each including 10 SNPs, compared with 10 regions each including 100 SNPs, we assessed the size of altered chromosomal copy number regions. Figure 2 shows the distribution of copy gain and loss segment sizes in the four stages of BE progression.
There was a trend for increased copy gain segment size with progression stages (P < 0.01). However, copy gain frequency and segment size did not reach statistical significance between early BE and BE instability stages, given the samples sizes of comparison groups (P = 0.65, P = 0.70, respectively; Fig. 2). Similarly, the frequency and segment size of copy gain also did not reach statistical significance at P = 0.01 between advanced BE and EA stages (P = 0.84, P = 0.04, respectively). However, when the combined early stages (BE early and BE instability) were compared with the combined late stages (advanced BE and EA), the frequency and segment size of copy gain were both significantly different (P < 0.001).
There was a statistically significant difference in the number of regions of copy loss between the BE early and BE instability groups (P = 0.005, Fig. 2). However, many of these regions were based on loss of a single SNP. Neither the total number of SNPs with loss (P = 0.12, Fig. 1) nor the median sizes of loss (P = 0.20, Fig. 2) were different in the two groups. Similarly, neither frequency nor segment size of copy loss reached a P = 0.01 level of statistical significance between advanced BE and EA (P = 0.04, P = 0.23, respectively) given the sample size. However, when the combined early stages were compared with the combined late stages, frequency and segment size of copy loss were both significantly different (P < 0.001).
SNP distribution patterns along the chromosome and the heterozygosity rate of specific SNPs precisely affect the ability to call LOH segment sizes (59). Therefore, we plotted the distribution of LOH using allele frequency differences between paired constitutive controls and BE/EA samples (|dAlleleFreq sub-ref|) for each informative probe displayed by chromosome in each sample (Fig. 3).
Distribution of informative SNPs with LOH (black bars) on each chromosome (rows, oriented p-arm at the bottom and q-arm at the top of each chromosome) in individual samples from 42 patients (columns). Regions with no black bars represent either noninformative SNPs or no LOH. Patients are grouped along the X axis into the four stages of neoplastic progression and the numbers along the X axis refer to individual patients. A, chromosomes 1 to 5; B, chromosomes 6 to 12; C, chromosomes 13 to 22.
Distribution of informative SNPs with LOH (black bars) on each chromosome (rows, oriented p-arm at the bottom and q-arm at the top of each chromosome) in individual samples from 42 patients (columns). Regions with no black bars represent either noninformative SNPs or no LOH. Patients are grouped along the X axis into the four stages of neoplastic progression and the numbers along the X axis refer to individual patients. A, chromosomes 1 to 5; B, chromosomes 6 to 12; C, chromosomes 13 to 22.
Somatic copy gain, loss, and LOH lesions might confer a selective advantage to the clone. Alternatively, a lesion may be the result of random, nonselected events arising during neoplastic evolution, especially in later stages of progression after inactivation of TP53. Therefore, we examined the frequency of each lesion across patients. The numbers of patients in each of the four stages are small and the results from Figs. 1 and 2 indicate that BE early and BE instability groups have statistically similar magnitudes (albeit not the same) of copy gain, loss, and LOH frequencies and sizes relative to each other compared with advanced BE and EA groups. Therefore, patients from the BE early and BE instability groups were combined into a single early-stage group to get more power for lesion detection. Also, advanced BE and EA have similar (albeit not the same) magnitude of instability relative to each other compared with the two early groups. Therefore, the advanced BE and EA patients were combined into a late-stage group and were evaluated separately. Statistically significant levels of chromosome copy change and LOH events among patients are shown for early (Supplementary Fig. S1A and B) and late (Supplementary Fig. S1C and D) stages. A complete list of statistically significant abnormalities identified in each 0.5-Mb window detected in early and late stages of progression are presented in Supplementary Tables S1 and S2.
We also evaluated the relationship between the significant alterations identified independently in genome-wide analysis of early (BE early and BE instability) and late (advanced BE and EA) stages of progression. Figure 4 illustrates the relationships among the significant copy gain (Fig. 4A), copy loss (Fig. 4B), and LOH (Fig. 4C) windows. Of the 56 0.5-Mb windows identified with copy gain in late stages, only 2 of 56 (4%) also reached statistical significance in early stages of progression. Of the windows identified as reaching statistical significance in late stages, 22 of 350 (6%) copy loss and 71 of 878 (8%) LOH events also reached statistical significance in early stages of progression. The specific lesions and locations that were common in both early and late stages are shown in Table 2. For the patients in this study, the average rate of informative SNPs on this array is 0.33. Thus, there is a significant probability for small segments with copy loss to have no informative SNPs available for LOH analysis. The LOH events listed in Table 2 that did not reach statistical significance in the small regions of copy loss on chromosomes 1, 4, 10, 11, 12, 18, and 20 were mainly due to lack of informative SNPs.
Number of 0.5-Mb windows with chromosome lesions that reached statistical significance across patients in early (BE early and BE instability) and late (advanced BE and EA) stages of neoplastic progression. White, the number of 0.5-Mb windows that reached statistical significance in early-stage (BE early and BE instability) samples only but were not significant in late-stage samples. Intersection of white and dotted, the number of events that reached statistical significance both in early-stage (BE early and BE instability) and late-stage (advanced BE and EA) samples. Dotted, the number of events that reached statistical significance in late-stage (advanced BE and EA) samples only but were not significant in early stages. A, gain. B, loss. C, LOH.
Number of 0.5-Mb windows with chromosome lesions that reached statistical significance across patients in early (BE early and BE instability) and late (advanced BE and EA) stages of neoplastic progression. White, the number of 0.5-Mb windows that reached statistical significance in early-stage (BE early and BE instability) samples only but were not significant in late-stage samples. Intersection of white and dotted, the number of events that reached statistical significance both in early-stage (BE early and BE instability) and late-stage (advanced BE and EA) samples. Dotted, the number of events that reached statistical significance in late-stage (advanced BE and EA) samples only but were not significant in early stages. A, gain. B, loss. C, LOH.
Lesions that reached statistical significance in both early and late stages of progression
Lesion type . | [Chromosome] (location and range in ±0.5 Mb) . |
---|---|
Gain | [8] (145.5); [10] (73.5) |
Loss | [1] (186.6); [3] (59.8-60.8); [4] (33.9); [9] (9.0-12.1, 20.5-25.0, 28.5-30.5, 44.5); [10] (68.0); [11] (38.5); [12] (28.3); [16] (77.3); [18] (64.9); [20] (22.0) |
LOH | [3] (59.8-60.8); [9] (0.5-38.1); [16] (77.3) |
Lesion type . | [Chromosome] (location and range in ±0.5 Mb) . |
---|---|
Gain | [8] (145.5); [10] (73.5) |
Loss | [1] (186.6); [3] (59.8-60.8); [4] (33.9); [9] (9.0-12.1, 20.5-25.0, 28.5-30.5, 44.5); [10] (68.0); [11] (38.5); [12] (28.3); [16] (77.3); [18] (64.9); [20] (22.0) |
LOH | [3] (59.8-60.8); [9] (0.5-38.1); [16] (77.3) |
Comparison of genome-wide SNP-based LOH to flow cytometric DNA content aneuploidy in unpurified, whole biopsies
DNA content abnormalities as measured by flow cytometry have been reported to be useful for cancer risk stratification in BE (48, 61, 62). To test the ability of a SNP platform to detect populations of cells with abnormal DNA content, we minced each biopsy, divided the homogenate in half, and processed half using the multisample Illumina 33K SNP arrays and the other half by DNA content flow cytometry using our standard protocol and analysis as described in Materials and Methods. Multiple genetic mechanisms can generate allelic imbalance resulting in LOH measured by allele frequency differences between paired constitutive controls and BE/EA samples (|dAlleleFreq sub-ref|), including copy gain in a heterogenous population, copy loss, and copy neutral LOH (55, 63). Although small regions of LOH may not be detected if there are insufficient informative SNPs, genome-wide chromosomal instability arising in a population of cells would be detected by the LOH signal of a SNP-based platform, with the exception of balanced copy number alterations such as a pure tetraploid population or pure population with homozygous deletions. We compared genome-wide LOH results obtained in unpurified biopsies to DNA content flow cytometry for aneuploidy detection in the same biopsies (Table 3). Aneuploidy was detected by flow cytometry in 11 of 42 biopsies. A threshold of ≥1,000 SNPs with LOH throughout the genome, which represents about 9% of the average number of informative heterozygous SNPs on the 33K platform, provided the optimal sensitivity and specificity for discriminating flow cytometric diploid and aneuploid samples. Ten of 11 (∼91%) of the aneuploid populations that were detected by flow cytometry had ≥1,000 SNPs with LOH. The one exception was the only near-diploid aneuploid sample in the study. This sample had a flow cytometric DNA content of 2.3N. Patients with near-diploid aneuploid populations are at lower risk of progressing to EA (61). There were 7 samples with ≥1,000 SNPs with LOH but that were diploid by flow cytometry. Six of seven samples were from patients with EA or advanced BE. Thus, genome-wide LOH has potential to be a surrogate for flow cytometric DNA content aneuploidy.
The relationship between genome-wide LOH and stages of progression
We also evaluated the sensitivity and specificity of genome-wide LOH in whole biopsies using the 33K array for distinguishing patients with or without concurrent cancer. The total number of informative SNPs with LOH (LOHTotal) across all 22 chromosomes was summed for each patient. The mean LOHTotal for early stages (BE early and BE instability) was 456 (SE 123) compared with 3,445 (SE 587) for advanced stages (advanced BE and EA). If LOHTotal was used to distinguish between early and late stages, we obtained an area under the curve of 0.91, as shown in the empirical ROC curves with the raw data in Fig. 5. The results were essentially identical using paired samples for copy gain and loss.
ROC curve for separating advanced stages (advanced BE and EA) versus early stages (BE or BE instability) using total genome-wide LOH [area under the curve (AUC) = 0.91] or copy gain and loss (area under the curve = 0.91).
ROC curve for separating advanced stages (advanced BE and EA) versus early stages (BE or BE instability) using total genome-wide LOH [area under the curve (AUC) = 0.91] or copy gain and loss (area under the curve = 0.91).
Finally, we compared genome-wide LOH measurements to DNA content flow cytometric aneuploidy for sample classification of early and late stages of progression (Table 4). The results are promising for SNP-based assessment of aneuploidy for cancer risk in whole, unpurified biopsies.
Comparison of DNA content flow cytometry alone and genome-wide LOH for discriminating progression stages
Tissue type . | Diploid . | Aneuploid . | gw-LOH (<1,000) . | gw-LOH (≥1,000) . |
---|---|---|---|---|
BE early or BE instability | 24 | 0 | 23 | 1 |
Advanced BE or EA | 7 | 11 | 2 | 16 |
Tissue type . | Diploid . | Aneuploid . | gw-LOH (<1,000) . | gw-LOH (≥1,000) . |
---|---|---|---|---|
BE early or BE instability | 24 | 0 | 23 | 1 |
Advanced BE or EA | 7 | 11 | 2 | 16 |
Discussion
A previous chromosome instability biomarker panel for cancer risk stratification in BE required Ki67/DNA content multiparameter flow cytometric cell sorting for enrichment of proliferating epithelial cells, a panel of short tandem repeat polymorphisms on chromosomes 9 and 17 for LOH analysis, and DNA content flow cytometric assessment of tetraploidy and aneuploidy (43). Although this chromosome instability panel accurately identified patients at high and low risk for progression to EA, it required a constellation of technologies that are difficult to perform outside of research centers. Here, we report that 9p and 17p LOH could be detected in whole, unprocessed biopsies and that genome-wide SNP-based measures of LOH and copy number performed as well or better than DNA content flow cytometry for detection of aneuploidy, providing a significant advancement toward validating the biomarkers in a clinically compatible, SNP DNA array platform. A similar platform might also be useful in assessing cancer risk in oral leukoplakia, based on the previous panel of 9p and 17p LOH, TP53 abnormalities, and chromosome polysomy (43). We used SNP DNA arrays on neoplastic and paired constitutive control samples from each individual to provide better resolution for smaller genomic events, improve signal-to-noise ratios, and differentiate constitutive copy number variation from somatic genetic events that arise during neoplastic progression. Because DNA arrays can measure LOH, copy number, and known mutations as well as DNA methylation, in which bisulfite treatment produces a C→T SNP at CpG sites, this technology may provide a common platform to compare these biomarkers in validation studies to efficiently select those markers most appropriate for patient management.
In this study, we showed feasibility for SNP-based assessment of somatic copy number gain, loss, and LOH in patients at different stages of neoplastic progression as well as a genome-wide measure of aneuploidy. Validation of a common biomarker platform as well as validation of reliable markers for risk stratification in BE will require one or more large cohort studies. The ideal cohort study would include patients at all levels of EA risk and the biomarker(s) would be evaluated on prospectively collected samples. Testing of multiple biopsies, for example, one biopsy every 2 cm, from each endoscopic procedure for a given patient will likely increase reliability because neoplastic clones occupy variably sized regions of a Barrett's segment. Well-designed cohort studies using a common platform to detect LOH, copy change, and DNA methylation abnormalities should be able to identify a small set of unbiased, reliable markers for BE risk stratification. Validated biomarkers from the cohort studies could be translated into a less expansive, clinically compatible SNP analysis platform, such as custom GoldenGate, VeraCode BeadExpress, or Pyrosequencing assays, for routine, large-scale sample testing in the clinical setting. New tools such as massively parallel sequencing technologies are likely to significantly advance our understanding of the underlying somatic-genetic mechanisms in cancer, including detection of tandem duplications, balanced translocations or inversions, and discovery of as yet unidentified critical genes. However, these technologies are not yet feasible for large-scale cohort studies or clinical application.
Using paired constitutive controls, we consistently detected recurrent copy gains, losses, and LOH in unpurified biopsies even in early benign BE that showed no evidence of progression to EA for a mean of 10 years. At advanced stages of progression, the neoplastic genome contained frequent and extensive regions of copy gain, loss, and LOH, consistent with a recent report characterizing EA using high-density SNP arrays (2). To characterize chromosomal instability during neoplastic progression and identify regions that may contain genes useful for cancer prevention, cancer risk stratification, and early detection, we examined chromosomal abnormalities found only in patients at early stages of progression, those common in patients at both early and late stages, and those only detected in patients at late stages of progression. Only a small number of abnormalities were frequent in early stages of progression, and with the single exception of chromosome arm 9p, the sizes of chromosome abnormalities were small at early stages compared with late stages. Of the significant 0.5-Mb windows with copy gain, loss, and LOH identified in late stages of progression, only 4%, 6%, and 8%, respectively, also reached statistical significance in early stages of progression. Chromosome arm 9p was unusual in the sense that it acquired large regions of copy loss and LOH spanning most of the arm at early stages of progression. Within these large 9pLOH events, we found three statistically significant, distinct regions of copy loss spanning 9.0-12.1, 20.5-25.0, and 28.5-30.5 Mb, with the most frequent being potential homozygous deletions at the CDKN2A/ARF locus, which have been reported previously in multiple cancer types (2, 64–67). CDKN2A abnormalities have been associated with large clonal expansions in BE as well as other conditions (14, 36, 37, 68) and may be selected in BE as part of an adaptation to a harsh acid reflux environment (25–28). Additionally, chromosome 9 is highly structurally polymorphic with more than 1,000 annotated genes and has evolved many intrachromosomal and interchromosomal duplications (69, 70). Two other common abnormalities were significant in early-stage BE, including copy loss spanning the FHIT (3p, 59.8-60.6 Mb) and WWOX (16q, 77.3 Mb) loci, detected in paired samples in 46% and 24% of early-stage patients. These regions have been extensively studied in EA and other cancer types (71–82), and recently Nancarrow et al. (2) reported these regions as the two most common regions of homozygous deletions in EA, detected in 74% and 35%, respectively. This is consistent with the patients with advanced BE and EA in the present study in whom we detected FHIT and WWOX copy loss in 78% and 56%, respectively, in the late-stage biopsies. Other relatively small lesions that contain multiple genes reached statistical significance among patients at early stages, such as copy gain on 8q24.3 and 10q22.1.
The paradox of specific chromosomal abnormalities that seem to be selected both in early and late stages of progression is that these events occur too frequently in early stages to be sufficient for development of EA because the rate of progression to EA is very low in most persons with BE (43, 48, 61, 83–86). This suggests several possibilities. First, these early, high-frequency events might be necessary but not sufficient for progression. Second, the frequent, early events might be selected as part of the adaptation to gastroesophageal reflux, but have relatively little involvement in progression to EA, and could appear in late stages as hitchhikers (passengers) on selected events that drive progression to EA. Third, they might represent regions susceptible to chromosome damage such as fragile sites (87) that could then hitchhike on selected abnormalities such as an epigenetically modified progenitor cell population (88). Regardless of mechanism, patients with early-stage BE appear to have a low risk of progressing to, or dying of, EA (23, 24, 43, 48, 61, 83–86, 89, 90). Effective identification of high-risk patients will be required to guide cancer prevention strategies in BE, focusing on patients who may benefit from the intervention because their risk of cancer is high and reassuring low-risk patients who are unlikely to benefit from an intervention targeting only the BE mucosa.
In this article, we also report that frequent loss or LOH involving whole chromosomes or arms were observed in late stages of progression, including 3p, 5, 9p, 13, 17p, and 18q (Figs. 1-3; Supplementary Fig. S1; Supplementary Table S2), many of which have recently been reported and reviewed (55), as well as in earlier low-density short tandem repeat allelotypes (45–47, 91). Large recurring regions of copy gain, loss, and LOH may provide reproducible assessment of aneuploidy by higher-density SNP platforms and this could be combined with an array assessment of more localized regions of copy gain, loss, or LOH in a common platform to assess chromosomal instability for risk stratification in BE and other conditions. We also confirmed 17p status in the SNP arrays compared with previous short tandem repeat results in 38 of 42 patients (90%) although we did not use the same biopsies in the current study and at least two of the discordant cases could be proven to be due to sampling limitations because the samples also lacked aneuploid populations that were present in the original biopsies (data not shown).
Our study has limitations, including the relatively small sample size in each of the participant categories, and validation of our results will require large cohort studies. However, our results provide important insight into the level of chromosomal instability relative to stages of neoplastic progression in BE. In addition, supervised statistical methods could be used for sample classification, but we chose not to prematurely exclude or include markers from the present study because of the limited sample sizes and probe densities. Further, although the current SNP platforms are potentially technically limited in detecting a perfect 4N tetraploid population, a large, pure tetraploid clone may be rare in an evolving neoplastic population of cells. Despite these potential limitations, this study succeeded in proof of concept that LOH, copy number, and aneuploidy can be assessed in unprocessed whole biopsies from BE using a SNP array. By assessing chromosome instability in premalignant and malignant stages of neoplastic progression, the results can guide future hypothesis-driven research to better evaluate genomic abnormalities for cancer risk stratification, early detection, and identification of candidate targets for cancer prevention.
In summary, our results indicate that SNP array assessment of genome-wide chromosomal changes that develop during neoplastic progression provide promise for improved EA risk stratification in patients with BE, and this hypothesis, as well as others based on DNA methylation, can be efficiently tested in cohort studies, using prospectively accumulated biospecimens (Early Detection Research Network phase 4; ref. 92). This single-platform assessment of chromosome instability biomarkers might also be used to identify patients at high risk of developing EA for cancer prevention trials.
Disclosure of Potential Conflicts of Interest
D.A. Peiffer, D. Pokholok, and K.L. Gunderson, employees of Illumina, Inc. No other potential conflicts of interest were disclosed.
Acknowledgments
We thank David Cowan for database management, Christine Karlsen for patient care coordination, and Valerie Cerera for flow cytometry. Illumina, BeadArray, and Infinium are registered trademarks or trademarks of Illumina, Inc.