Abstract
African Americans are disproportionately affected by early-onset, high-grade malignancies. A fraction of this cancer health disparity can be explained by genetic differences between individuals of African or European descent. Here the wild-type Pro/Pro genotype at the TP53Pro72Arg (P72R) polymorphism (SNP: rs1042522) is more frequent in African Americans with cancer than in African Americans without cancer (51% vs. 37%), and is associated with a significant increase in the rates of cancer diagnosis in African Americans. To test the hypothesis that Tp53 allele–specific gene expression may contribute to African American cancer disparities, TP53 hemizygous knockout variants were generated and characterized in the RKO colon carcinoma cell line, which is wild type for TP53 and heterozygous at the TP53Pro72Arg locus. Transcriptome profiling, using RNAseq, in response to the DNA-damaging agent etoposide revealed a large number of Tp53-regulated transcripts, but also a subset of transcripts that were TP53Pro72Arg allele specific. In addition, a shRNA-library suppressor screen for Tp53 allele–specific escape from Tp53-induced arrest was performed. Several novel RNAi suppressors of Tp53 were identified, one of which, PRDM1β (BLIMP-1), was confirmed to be an Arg-specific transcript. Prdm1β silences target genes by recruiting H3K9 trimethyl (H3K9me3) repressive chromatin marks, and is necessary for stem cell differentiation. These results reveal a novel model for African American cancer disparity, in which the TP53 codon 72 allele influences lifetime cancer risk by driving damaged cells to differentiation through an epigenetic mechanism involving gene silencing.
Implications:TP53 P72R polymorphism significantly contributes to increased African American cancer disparity. Mol Cancer Res; 12(7); 1029–41. ©2014 AACR.
Introduction
Colon cancer is the second leading cancer cause of death in the United States (1). African Americans have a disproportionately high incidence and mortality rate from many epithelial cancers, and colon cancer specifically, when compared with European Americans (1). This increased risk is associated with an earlier age of onset (5 years on average) as well as more poorly differentiated tumor phenotypes (2). This disparity in relative risk is likely due in part to genetic factors, as many environmental factors are shared between these 2 groups within the United States.
Differences in variant allele frequency within the TP53 gene may account for some of the disproportionate cancer risk associated with African ancestry. The TP53 gene is a tumor suppressor and is found to be mutated or lost in more than half of human malignancies (3). This gene codes for the Tp53 protein which is a regulator of many apoptosis, cell-cycle arrest, and DNA damage response pathways. There is a nonsynonymous polymorphism in codon 72 (SNP ID = rs1042522) of the TP53 gene, one allele codes for proline (Pro) at amino acid 72 and the other for arginine (Arg; ref. 4). Multiple studies, including the 1,000 genomes project (5–7), show that the Pro allele is more common in African Americans whereas the Arg allele is predominant in those of European descent. The Pro allele has demonstrated the ability to enhance the inflammatory response in a mouse knock-in model (8). Individuals exhibiting the Pro/Pro (P/P) genotype have a decreased mean age of onset in colorectal cancer (9). The P/P genotype is enriched in individuals over the age of 85, despite the increased cancer-related mortality (10, 11). In contrast, studies utilizing a temperature sensitive Tp53 mutant suggest that the Arg allele has the ability to preferentially induce apoptosis (12, 13), and the Arg allele induces a stronger apoptotic response to chemotherapeutic treatments (14, 15).
Given this allele-specific predisposition toward increased longevity (P/P) or apoptosis (Arg), individuals expressing the Arg allele may be protected from DNA damage by more efficient removal of stem cells with somatic mutations. However, epidemiologic results are not clear; in some cases, the Arg allele seems protective whereas in others the Pro allele exhibits protection (4). Furthermore, the increased frequency of the P/P genotype in African Americans does not account for the increased percentage of high grade, undifferentiated tumors found within this group. We therefore sought to identify a unifying mechanism that could explain the influence of TP53 codon 72 polymorphisms on both cancer and longevity.
The normal differentiation program of stem cells involves the Tp53 protein (16, 17). Mutations in TP53 are found with greater frequency in poorly differentiated tumors (18), and these tumors exhibit a gene expression signature in common with embryonic and induced pluripotent stem cells (19, 20). This signature includes the targeted silencing of genes by the Polycomb Repressor Complex 2 (PRC2), a key regulator of stem cell differentiation (20). TP53 deletion or knockdown enhances the ability of differentiated cells to revert back to a pluripotent state (21, 22). Although a precise mechanism is yet to be defined, TP53 is clearly linked to stem cell differentiation. TP53 can control the expression of chromatin modifying genes like histone-lysine N-methyltransferase (EZH2), a catalytic subunit of PRC2 (23). This interaction between TP53 and chromatin modifying genes expands the influence of TP53 beyond the list of genes for which Tp53 has a direct transcriptional role in their expression.
We hypothesized that TP53 codon 72 polymorphism accentuates the previously described ethnic disparity in cancer risk by affecting stem cell differentiation. If a mutated stem cell were to undergo differentiation, subsequent daughter cells would still be affected by the mutations present, but if the same stem cell is allowed to maintain pluripotency the number of cells it can pass the mutations on to is increased and will have a greater ability to potentiate. This would lead to less differentiated tumors with a higher grade, as seen African Americans.
This hypothesis was tested by investigation of the TP53 codon 72 allele-specific gene expression profiles and screening for allele-specific shRNA suppressors of Tp53 induced growth arrest. We used somatic cell knockout techniques (24) to create isogenic derivatives of RKO colon carcinoma cells that are hemizygous for each TP53 allele (Pro/- or Arg/-). RNA sequencing was then performed to characterize TP53-dependent transcripts. A second-site repressor screen for shRNAs that bypassed Tp53-mediated growth arrest was also conducted, identifying candidate TP53 allele–specific suppressors. One Arg-specific candidate identified in both assays was PR domain containing 1β (PRDM1 aka BLIMP1). Prdm1β recruits H3K9 and H3K27 trimethyl repressive chromatin marks and is required for adult intestinal stem cell differentiation(25, 26). We propose that this may be the mechanism by which TP53 enforces stem cell differentiation, protects from cancer, but causes aging. This would explain the disparity in the incidence of early onset colon cancer in TP53P/P individuals, who are overrepresented in the African American population.
Materials and Methods
Cell culture
Wild-type TP53 and double knockout (KO) RKO colon cancer cells were obtained from B. Vogelstein (27), and were grown at 37°C/5% CO2 in McCoy's 5A media (Fisher) supplemented with 10% FBS and pen/strep (Fisher). Cells were passaged every 2 to 3 days to maintain subconfluency.
Genotyping of patients with colon cancer
The TP53Arg72Pro polymorphism resides in exon 4 of the TP53 gene. We used exon-spanning primers to amplify exon 4 by PCR: TP53E4F = 5′-CCTGGTCCTCTGACTGCTCT-3′; TP53E4R = 5′-GCCAGGCATTGAAGTCTCAT-3′ (Integrated DNA Technologies). Sequences GCCTCCCTCGCGCCATCAG (454A) and GCCTTGCCAGCCCGCTCAG (454B; Integrated DNA Technologies) were appended to the 5′ and 3′ ends of each PCR primer, and used as universal priming sites for subsequent sequencing of the PCR products. PCR products were prepared using KAPA SYBR FAST Universal 2X qPCR reagents and the following PCR protocol. A 20 μL reaction used 10 μL 2× Master Mix, 1 μL 5 μmol/L primer F, 1 μL 5 μmol/L primer R, 7 μL H2O, and 1 μL (3 ng) of genomic DNA template (Kappa Biosystems). Thermal cycling conditions were as follows: 98°C, 2 minutes, 98°C, 10 seconds, 63°C, 10 seconds, 70°C, 30 seconds, 3 repeats; 98°C, 10 seconds, 60°C, 10 seconds, 70°C, 30 seconds, 3 repeats; 98°C, 10 seconds, 57°C, 10 seconds, 70°C, 30 seconds, 3 repeats; 98°C, 10 seconds, 56°C, 10 seconds, 70°C, 30 seconds, and finally 49 repeats for a total of 58 cycles. A melt curve analysis was performed to verify the PCR products, starting at 55°C for 30 seconds, and then heating slowly at 0.5°C per minute until 95°C. Bi-directional amplicon sequencing was performed by Beckman Coulter Genomics. Sequences were aligned to the appropriate reference sequence using CLC-Genomics Workbench and all sequence trace files were inspected manually to determine genotype. Genotypes were determined from tumor and/or normal genomic DNA.
RKO P72R polymorphism clone construction
RKO human colon carcinoma cells are heterozygous Pro/Arg at TP53 codon 72 (28). A TP53 knockout (−/−) line (RKO 249–KO) was constructed and previously described. Pro/- (8B7—Pro1 and 10C5—Pro2) and Arg/- (9D4—Arg1 and 10H2—Arg2) lines were created by knocking out either the Pro or the Arg allele or both using an existing adeno-associated TP53-targeting vector (27, 29). All lines are available upon request. Genotyping was performed using 1 primer in the neomycin resistance gene and 1 in the TP53 gene: NeoF = 5′-CTG GGA GAA GGT GCC ATG AT-3′ and TP53R = 5′-TGT GCT GTG ACT GCT TGT AG-3′ in a 20 μL reaction. Thermal cycling conditions were as follows: 98°C, 2 minutes; 98°C, 20 seconds; 67.2°C, 30 seconds; 70°C, 120 seconds, 65 repeats; 55°C, 120 seconds; melt curve starting at 55°C increasing 0.5°C every 10 seconds for 81 repeats. The TP53 status of all cell lines was routinely verified by restriction digest using the BstU1 method (30). Moreover, Western blotting did not detect any truncated forms of Tp53 expressed from the deleted alleles (whereas it clearly showed the full length protein; Fig. 2B).
RNA-seq
RNA (4 μg) from all 6 RKO clones treated with either DMSO or 10 μmol/L etoposide was prepared for sequencing using the TruSeq RNA Sample Prep Kit (Illumina) following the manufacturer's protocol. Briefly, polyA mRNA was purified and fragmented. Then first and second strand cDNA synthesis was performed, ends were repaired and adenylated, adapters were ligated, and fragments were enriched with PCR amplification. The library was validated with an Agilent Technologies 2100 Bioanalyzer (Agilent Technologies).
RNA-seq preparations from all 6 RKO clones treated with either DMSO or 10 μmol/L etoposide were paired ends, and read using Illumina HiSEQ2K platform using PhiX as a reference control (Illumina). The resultant FASTQ files were imported into CLC Genomics Workbench and aligned to the human hg19/B37 reference sequence. Total exon reads or transcript reads per kilobasepair per million (RPKM) were used as gene expression values.
We utilized the R/Bioconductor package DESeq, which provides methods to test for differential expression for sequence count data by the use of negative binomial distribution. We tested for differential expression for each gene with respect to sample type after accounting for treatment effect and used the Benjamini and Hochberg (BH) method to correct the P-value for false discovery rate (FDR).
shRNA screen
The second-site suppressor screen was done on all 6 RKO cell line variants. The experiment took 12 total days of tissue culture. On Day 1, RKO clones were seeded at 1.5 × 106 cells in one T-150 flask per cell line, and incubated at 37°C for 24 hours. On Day 2, each cell line was infected with a lentiviral shRNA library at multiplicity of infection ∼0.8 as described (31) with the addition of 8.0 μg/mL Polybrene for 24 hours. The library consists of 27,500 shRNAs targeting 5,043 genes (Decipher Project—Human Module 1, Signaling Pathway Targets).
On Day 3, the lentiviral-containing media was changed with fresh growth media, and on Day 5, cell were treated with 5 μmol/L nutlin-3a, and left in this media for 7 days. This achieved enrichment of cells infected with specific shRNAs enabling proliferation rather than senescence. After 7 days of nutlin-3a selection, the surviving cells were collected and genomic DNA of each cell line was purified (Agentcourt DNAdvance; Beckman Coulter).
For high-throughput sequencing, shRNA-integrated DNA was real-time PCR amplified for each RKO cell line using KAPA HiFi HotStart (Kapa Biosystems) and then PCR purified (Agentcourt AMPure XP). Primers, Gex_F and Gex_R are part of the lentiviral plasmid construct (Decipher Project—pRSI9-U6-(sh)-UbiC-TagRFP-2A-Puro vector). Primers were designed with A and B linkers to compliment 454 A and B sequencing primers. The following Gex-tagged A and B primers were used:
TagA_Gex_F
GCCTCCCTCGCGCGATCAGCAAGCAGAAGACGGCATACGAGA
TagB_Gex_R
GCCTTGCCAGC CCGCTCAGAATGATACGGCGACCACCGAGA
Data were collected with a Roche 454 sequencer at the engencore of the University of South Carolina, and analyzed using CLC Genomics Workbench.
Western blotting
Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) was performed at 200 V in a 10% polyacrylamide gel containing SDS (Criterion TGX; Bio-Rad). The proteins in the gel were then transferred to a polyvinylidene difluoride (PVDF) membrane (Immun-Blot PVDF Membrane, 0.2 μm; Bio-Rad) using a Criterion Blotter (Bio-Rad). After blocking with 5% bovine serum albumin (BSA; Fisher) in Tris-buffered saline with Tween-20 (TBST; Fisher), proteins were incubated with specific primary antibodies (monoclonal anti-Tp53 DO-1; Santa Cruz, dilution 1:1000 and monoclonal anti-PARP 46D11; Cell Signalling, dilution 1:2,000) overnight at 4°C with gentle rocking. Membranes were washed 3 times in TBST for 7 minutes, incubated in the appropriate secondary antibody (antimouse IgG HRP-labeled; Vector Laboratories, dilution 1:10,000 or anti-rabbit IgG HRP-labeled; Santa Cruz Biotechnology, dilution 1:10,000) and then washed 3 times in TBST for 7 minutes again, all with gentle rocking. Bands were visualized with Pierce ECL Western Blotting Substrate (Thermo Scientific) according to the manufacturer's instructions. β-Actin was used as a loading control. To probe for this, membranes were first incubated in stripping buffer (Glycine/Tween/pH 2.2) for 15 minutes at 20°C, blocked with 5% BSA in TBST, and then probed with anti-β-actin (goat polyclonal, C11 Santa Cruz, dilution 1:1,000). Following washing as above, the membrane was incubated in the secondary antibody, anti-goat IgG HRP-labeled (Santa Cruz Biotechnology, dilution 1:10,000), then washed and visualized as above.
qPCR
Total RNA from RKO clones treated with 0, 10, 20, or 50 μmol/L etoposide or 5, 10, or 20 μmol/L (−)-nutlin3a was isolated using TRizol according to manufacturer's protocol. RNA quality was assayed with an Agilent Technologies 2100 Bioanalyzer (Agilent Technologies). Random decamers (Invitrogen) and 125 ng of RNA was used to make 20 μL of cDNA using SuperScriptII (Invitrogen) according to the manufacturer's protocol. The cDNA was diluted 1:100 before qPCR. PCR products were prepared using KAPA SYBR FAST Universal 2× qPCR Reagents and the following PCR protocol. A 20 μL reaction used 10 μL 2× Master Mix, 1 μL 5 μmol/L primer F, 1 μL 5 μmol/L primer R, and 2 μL of etoposide treated diluted cDNA or ChIP preparations or 5 μL of nutlin3a treated diluted cDNA template (Kappa Biosystems). Thermal cycling conditions were as above for TP53 genotyping. Thermal cycling conditions for ChIP amplification were as follows: 98°C, 2 minutes, 94°C, 10 seconds, 63°C, 15 seconds, 3 repeats; 94°C, 10 seconds, 60°C, 15 seconds, 3 repeats; 94°C, 10 seconds, 57°C, 15 seconds, 3 repeats; 98°C, 10 seconds, 56°C, 15 seconds, 49 repeats for a total of 58 cycles. A melt curve analysis was performed to verify the PCR products, starting at 95°C for 30 second, and then decreasing slowly at 0.1°C per 15 seconds until 65°C. The following cDNA primers were used:
cDNA_PRDM1_F CTTCCCTCCTTTGCATTGAA,
cDNA_PRDM1_R GAACCTTGCCTTTTTGTGGA;
GDF15_F2 AAGATTCGAACACCGACC,
GDF15_R2 GAGAGATACGCAGGTGCAG.
ChIP primers were as follows:
PRDM1_ChIP_bs4_F CTTCCCTCCTTTGCATTGAA,
PRDM1_ChIP_bs4_R GAACCTTGCCTTTTTGTGGA.
Chromatin immunoprecipitation
RKO clones grown in 10-cm plates were treated with 10 μmol/L (−)-nutlin3a or DMSO for 24 hours. Chromatin preparations were made with ChIP-It Express Enzymatic Kit (Active Motif) according to the manufacturer's protocol. Enzymatic shearing was performed for 30 minutes to produce short fragments for sequencing. Immunoprecipitation was performed with DO1 monoclonal antibody (Santa Cruz Biotechnology) or IgG (Santa Cruz Biotechnology) as a control overnight at 4°C.
shRNA inhibition of PRDM1 and TP53
A panel of shRNAs for PRDM1 (Invitrogen V2LHS_94736,94739,94740,94741 and V3LHS_407846), TP53 (Invitrogen V3LHS_333919 and 333920) and nonsilencing (NS, Invitrogen) were transfected into RKO clone cells. Day 1—800,000 cells were plated on 60-mm culture dishes. Day 2—Cells were transfected using Lipofectamine Reagent (Invitrogen). Day 3—Cells were plated in T-75 flasks and treated with DMSO, puromycin 5 μg/mL, or Nutlin-3a 10 μmol/L. After 1 week cells were trypsiized and pelleted. Pelleted cells were resuspended in PBS + 1% BSA and counted on a FACScan XP5 (Becton Dickinson).
Analysis of TCGA data
TCGA COAD bamfiles assembled to hg19 (https://cghub.ucsc.edu/) were imported into CLC genomics workbench (http://www.clcbio.com/) and variants were called using a probabilistic variant detector with 99.9% likelihoods threshold used as a cutoff. PRDM1 expression data [RNA-seq, total exon reads, normalized, log (10) transformed were obtained from the UCSC cancer genome browser (https://genome-cancer.ucsc.edu/proj/site/hgHeatmap/; ref. 32)]. COAD clinical data were obtained from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm). TP53 genotype counts for European and African American individuals without cancer were obtained from the 1,000 genomes project (http://www.1000genomes.org/).
Statistical associations between genotype frequency, PRDM1 expression, ethnicity, gender, cancer, and age of diagnosis were determined using Partek Genomics Suite (http://www.partek.com/) and Graphpad Prism (http://www.graphpad.com/scientific-software/prism/).
Results
Role of TP53Pro72Arg in colon cancer
We first investigated TP53Pro72Arg genotype frequencies in patients with colon cancer and normal controls of both European and African ancestry. We determined the TP53Pro72Arg genotype of 93 patients with colon cancer with known ethnic ancestry and age at diagnosis and combined these data with that obtained from 97 patients with colon cancer from the Cancer Genome Atlas Colorectal Adenocarcinoma dataset (ref. 33 and Supplementary Table S1). There is a positive association between the P/P genotype and a diagnosis of colon adenocarcinoma, but only in African Americans (P/P 51% to Not P/P 37%; Fisher exact P = 0.0505). We next hypothesized that the P/P genotype may be associated with a lower mean age of diagnosis for patients with colon cancer. The P/P genotype was significantly associated with a decrease in the mean age of diagnosis by 6.84 years (P = 0.0107; Fig. 1A), whereas being African American only reduced the mean age of diagnosis by 6.26 years (P = 0.0076; Fig. 1B). When both genotype and ethnicity are analyzed together, African Americans with the P/P genotype have a decrease in mean age of diagnosis of 9.12 years (P = 0.0337; Fig. 1C). Non-African Americans with the P/P genotype had a reduced mean age of diagnosis of 3.03 years, but this was not statistically significant (P = 0.3915; Fig. 1C). We considered 2 possible mechanisms by which the P/P genotype could accelerate colon cancer progression: either by being linked to a fewer number of sequential somatic mutations required, or by increasing the rate at which these mutations occur. To determine which mechanism is most consistent with the observed data, we fit the incidence data to a progression model (ref. 34; Fig. 1D, solid lines) given by
where S is the fraction of patients remaining undiagnosed, C is the rate coefficient, t is time, and nM is the number of mutations minus 1. We found that for both P/P and Not P/P individuals nM = 6, which is consistent with other predictions for the number of genetic events required for the development of colon cancers (29). This analysis revealed that the difference between P/P and Not P/P individuals was in the rate at which the mutations occurred; African American P/P individuals accumulated cancer-causing mutations more than twice as fast (CAA-P/P/CAA-Not P/P = 2.35). Thus, this analysis suggests that individuals with a P/P genotype are at elevated risk for early-onset colon cancer because of having an accelerated rate at which colon cancer–causing mutations accumulate (number of mutations per person per year). We next identified all somatic mutations in TCGA COADREAD tumors using the UCSC Cancer Genome Browser (TCGA_COADREAD_mutation dataset), and compared the numbers of somatic mutations present within tumors derived from AA, EA, P/P, and Not P/P individuals. There was no significant difference in the numbers of somatic mutations observed or in the rate at which they accumulated (number of mutations per tumor per year). This contrasts with the significant increase in the number of mutations per person per year observed in African Americans with P/P genotypes, and indirectly suggests a mechanism of increased numbers of at risk cancer progenitor (stem) cells within African Americans of the P/P genotype. Taken together, these results suggest that the TP53 Pro72Arg P/P genotype is a significant risk factor for colon cancer development in African Americans.
Effect of race and the P/P genotype on mean age of colon cancer diagnosis. In all panels, error bars correspond to standard error. A–C, the mean age of diagnosis is plotted for individuals having either a P/P phenotype or not (P = 0.0107) (A), being African American (AA) or not (P = 0.0076) (B), or the combination of phenotype and ethnicity (the * indicates the difference between AA-P/P and AA-Not P/P is significant P = 0.0337) (C). D, the % of undiagnosed patients is plotted against the age of diagnosis for P/P or not P/P individuals. The solid lines correspond to a least squares best-fit of Eq. 1 using nM = 6 (the number of mutations minus 1). The C values for AA-P/P, AA-Not P/P, Not AP/P, and Not AA-Not P/P populations are—2.19 × 10−11 y−1, 9.35 × 10−12 y−1, 8.67 × 10−12 y−1, and 7.70 × 10−12 y−1.
Effect of race and the P/P genotype on mean age of colon cancer diagnosis. In all panels, error bars correspond to standard error. A–C, the mean age of diagnosis is plotted for individuals having either a P/P phenotype or not (P = 0.0107) (A), being African American (AA) or not (P = 0.0076) (B), or the combination of phenotype and ethnicity (the * indicates the difference between AA-P/P and AA-Not P/P is significant P = 0.0337) (C). D, the % of undiagnosed patients is plotted against the age of diagnosis for P/P or not P/P individuals. The solid lines correspond to a least squares best-fit of Eq. 1 using nM = 6 (the number of mutations minus 1). The C values for AA-P/P, AA-Not P/P, Not AP/P, and Not AA-Not P/P populations are—2.19 × 10−11 y−1, 9.35 × 10−12 y−1, 8.67 × 10−12 y−1, and 7.70 × 10−12 y−1.
TP53 allele–specific gene expression profiling
Because Tp53 acts primarily as a transcription factor, we reasoned that the tendency of P/P individuals to exhibit accelerated colon cancer progression may be related to TP53Pro72Arg allele–specific gene expression patterns. To investigate Pro and Arg allele-specific expression, we made use of a colon carcinoma cell line, RKO, which is heterozygous for the TP53 codon 72 polymorphism (denoted WT), and somatic cell knockout techniques [see Fig. 2 and Materials and Methods (24)] to derive 4 independent hemizygous knockout derivative cell lines, 2 having only a single Pro allele, and 2 having only a single Arg allele. We denote these single-allele cell lines as Pro1, Pro2, Arg1, and Arg2. We compared these single-allele derivatives to the starting WT cell line and to a knockout derivative in which both alleles had been disrupted using the same technology (27, 35). Tp53-dependent gene expression was observed by treatment of the panel with the known Tp53-inducers etoposide and (−)nutlin-3a [the (−) denotes the active enantiomer], as compared with treatment with the DMSO vehicle control (etoposide) and the inactive nutlin-3a enantiomer [(+)nutlin-3a] control (Fig. 2B). Relative to the controls, wild-type RKOs (Pro/Arg) and all 4 hemizygous knockout derivatives demonstrated equivalently robust increases in Tp53 levels 24 hours after drug treatment. The Arg variant of Tp53 clearly showed a higher electrophoretic mobility, consistent with previous reports (36), whereas the heterozygous, WT cell line showed a wider band spanning the effective molecular weights of both the Pro and Arg variants.
Generation of cell lines and experimental workflow. A, the parental RKO colon carcinoma cell line is heterozygous for the TP53 codon 72 polymorphism. Using somatic cell knockout techniques, 4 hemizygous cell lines were created, 2 having only the Pro allele, and 2 having only the Arg allele. A knockout cell line was also created. All 6 of these cell lines were (i) treated with etoposide or DMSO (control), and then subjected to RNAseq analysis and (ii) tranduced with an shRNA library, treated with nutlin-3a, and the subjected to deep DNA sequencing. B, Western blot analysis for Tp53 expression levels of all 6 cells lines treated with either DMSO, 10 μmol/L etoposide, 10 μmol/L inactive nutlin-3a, or 10 μmol/L active nutlin-3a. Note the large Tp53 response in all cell lines with the exception of the KO, and the slight mobility differences in Arg vs. Pro.
Generation of cell lines and experimental workflow. A, the parental RKO colon carcinoma cell line is heterozygous for the TP53 codon 72 polymorphism. Using somatic cell knockout techniques, 4 hemizygous cell lines were created, 2 having only the Pro allele, and 2 having only the Arg allele. A knockout cell line was also created. All 6 of these cell lines were (i) treated with etoposide or DMSO (control), and then subjected to RNAseq analysis and (ii) tranduced with an shRNA library, treated with nutlin-3a, and the subjected to deep DNA sequencing. B, Western blot analysis for Tp53 expression levels of all 6 cells lines treated with either DMSO, 10 μmol/L etoposide, 10 μmol/L inactive nutlin-3a, or 10 μmol/L active nutlin-3a. Note the large Tp53 response in all cell lines with the exception of the KO, and the slight mobility differences in Arg vs. Pro.
To probe allele-specific function using these 6 cell lines, we performed 2 sets of global experiments. First, we treated the cells with 10 μmol/L etoposide for 24 hours (to induce DNA damage and thus a Tp53 response) or a control (DMSO), and analyzed the gene expression responses via RNA sequencing (RNA-seq). Second, we performed a second-site suppressor screen, where all 6 cell lines were infected with a lentiviral library consisting of ∼27,000 shRNAs targeting 5,043 mRNAs (Decipher project—Signaling Pathway Targets Set). The cells were then treated with the active enantiomer of nutlin-3a for 7 days, which inhibits the Mdm2-Tp53 interaction and thereby stabilizes Tp53, inducing cell-cycle arrest and a senescence-like state. By performing colony formation assays on uninfected cells, we noted that at this concentration of nutlin, all Tp53 containing RKO derivatives underwent permanent arrest, whereas the double knockout continued to proliferate unchecked. Infected cells that incorporated into their genomes shRNAs that enable proliferation in the presence of Tp53 would outgrow those cells that remained arrested. We reasoned that allele-specific transcripts overlapping with allele-specific shRNA suppressors would be highly relevant as functionally important Tp53 allele-specific target genes. Below we present analyses of these 2 global experiments.
RNA-seq analysis
We performed hierarchical cluster analysis of moderate depth (20 million reads per sample) gene expression data obtained from cells treated with etoposide (E) or vehicle control (C; Fig. 3A and Supplementary Table S2). The Tp53-containing cell lines showed robust Tp53-response gene activation, whereas Tp53-knockout cells were largely unresponsive to treatment. In the presence of etoposide, RKO clones clustered according to genotype, with Arg-containing clones activating a distinct subset of genes compared with Pro/- derivatives. To identify Tp53 and allele-specific transcripts, we had a biostatistician analyze the data using the R/Bioconductor package DESeq, which provides methods to test for differential expression for sequence count data by the use of negative binomial distribution. The method improves upon the existing methods on the issue of overdispersion by providing shrinkage estimator for the distribution's variance (37). We analyzed Tp53-dependent, but allele-independent behavior and identified a list of genes including the Tp53 effectors BCL2A1, BTG2, and GDF15 (Fig. 3B–D). For the WT and ARG and PRO versus KO comparison, we detected 3,657 significant differentially expressed genes after adjusting for FDR (Fig. 4A and B and Supplementary Table S3). Example expression profiles are shown for the ARG versus PRO comparison, we detected 22 genes, which showed significant differential expression after adjusting for FDR (Fig. 4C and D and Supplementary Table S4). To verify these results, we performed deep sequencing (>200 million reads per sample) of 1 Arg/- and 1 Pro/- clone in the presence of etoposide, querying the expression levels of 18,302 genes (Fig. 4E and F). We detected 125 genes that were significantly differentially expressed between Arg/- and Pro/- genotypes (FDR-corrected P-value <0.05). Tp53 responsive genes (allele-specific and allele-independent) are summarized in Supplementary Table S5.
Overview of the RNAseq data. A, after alignment using the CLC Genomics Workbench, the number of reads per exon for each gene used as an input for hierarchical clustering analysis. Blue denotes low counts, whereas red denotes high counts. B–D, RNA-Seq transcript expression profile for Tp53 effectors BL2A1 (B), BTG2 (C), and GDF15 (D).The x-axis indicates cell line, and the y-axis denotes the transcript expression levels.
Overview of the RNAseq data. A, after alignment using the CLC Genomics Workbench, the number of reads per exon for each gene used as an input for hierarchical clustering analysis. Blue denotes low counts, whereas red denotes high counts. B–D, RNA-Seq transcript expression profile for Tp53 effectors BL2A1 (B), BTG2 (C), and GDF15 (D).The x-axis indicates cell line, and the y-axis denotes the transcript expression levels.
RNAseq transcript expression profiles. A and B, scatter plot (A) and volcano plot (B) of expression profiles for Tp53-dependent, allele independent genes. Points colored in red were significantly different (FDR adjusted P-value <0.05). C and D, scatter plot (C) and volcano plot (D) of expression profiles for Tp53 codon 72 dependent genes. Points colored in red were significantly different (FDR adjusted P-value <0.05). E and F, scatter plot (E) and volcano plot (F) of expression profiles for etoposide-treated Arg and Pro samples subjected to ultra-deep sequencing. Points colored in red were significantly different (FDR adjusted P-value <0.05, and greater than 2-fold different between the 2 samples.
RNAseq transcript expression profiles. A and B, scatter plot (A) and volcano plot (B) of expression profiles for Tp53-dependent, allele independent genes. Points colored in red were significantly different (FDR adjusted P-value <0.05). C and D, scatter plot (C) and volcano plot (D) of expression profiles for Tp53 codon 72 dependent genes. Points colored in red were significantly different (FDR adjusted P-value <0.05). E and F, scatter plot (E) and volcano plot (F) of expression profiles for etoposide-treated Arg and Pro samples subjected to ultra-deep sequencing. Points colored in red were significantly different (FDR adjusted P-value <0.05, and greater than 2-fold different between the 2 samples.
Second-site suppressor screen identifies PRDM1 as an important allele-specific target
The RNA-seq experiments described above allowed us to identify Tp53 allele–specific target transcripts. However, it remained unclear which members of these lists may contribute to TP53 allele–specific phenotypes. Nutlin induces growth arrest of RKO colon cancer cells, in a Tp53-dependent manner. We used this phenotype as a proxy for stem cell differentiation, and proceeded to perform a second-site suppressor screen whereby all 6 cell lines were infected with 27,500 individual shRNAs targeting 5,043 mRNAs and subsequently put under selection with nutlin-3a. Nutlin-3a binds to the E3 Ubiqutin ligase Mdm2 and inhibits its ability to downregulate Tp53 levels, thereby increasing the stability of the Tp53 protein and raising its expression levels (38). Thus, the targets of the shRNAs that enable cell growth in the presence of both Tp53 and nutlin-3a may be thought of as mediating the effects of Tp53-mediated cell-cycle arrest. We isolated genomic DNA of each cell line after 7 days of nutlin-3a selection, prepared PCR products spanning the integrated lentiviral genomes, and then identified and enumerated the shRNAs via deep amplicon sequencing. Because nutlin-3a normally causes growth arrest, any shRNA that blocks the effects of nutlin-3a to allow cell proliferation would be significantly enriched after 7 days of growth. Any overlap between the allele-specific lists generated above and lists generated by this screen would yield interesting candidate genes for further analysis.
We first analyzed Tp53 allele–independent function by ranking shRNA gene targets by their counts in Tp53-containing cell lines versus the KO and corresponding P value (see Materials and Methods; Supplementary Table S6 and Fig. 5A). One of the top hits was TP53, as would be expected, providing confidence in the proposed screening method. We next analyzed Tp53 allele–specific function by ranking shRNA gene targets by their cumulative counts in the Pro1 and Pro2 cell lines versus the Arg1 and Arg2 cell lines, and vice versa (see Materials and Methods; Supplementary Tables S7 and S8 and Fig. 5B). The Polycomb-related proteins Eed (39) and Phf1 (40), which enforce transcriptional repression and differentiated states, were on the Arg list. This is consistent with the idea that Arg may have an allele-specific differentiation-related function. Most notably, the only gene, which is both on the allele-specific transcript RNA-seq and on the second-site suppressor screen lists, is PRDM1. We therefore wanted to analyze PRDM1 in greater detail as an Arg-specific transcript.
Volcano plots of the shRNA screen results. A, the x-axis is the difference in counts between the sum of those found in any Tp53 containing cell line (WT, Pro1, Pro2, Arg1, or Arg2) minus 5 times those found in the KO. A count corresponds to a sequenced shRNA against a particular gene. The y-axis is the negative log of the P-value of obtaining the result, which was calculated using the binomial distribution, with the probability of success corresponding to an shRNA being equally likely to be observed in any cell line, and the number of trials as the total number of counts across all cell lines for a particular gene. B, this plot is similar to A, with the exception that the x-axis corresponds to the difference in counts between Arg-containing (Arg1 and Arg2) and Pro-containing (Pro1 and Pro2) cell lines.
Volcano plots of the shRNA screen results. A, the x-axis is the difference in counts between the sum of those found in any Tp53 containing cell line (WT, Pro1, Pro2, Arg1, or Arg2) minus 5 times those found in the KO. A count corresponds to a sequenced shRNA against a particular gene. The y-axis is the negative log of the P-value of obtaining the result, which was calculated using the binomial distribution, with the probability of success corresponding to an shRNA being equally likely to be observed in any cell line, and the number of trials as the total number of counts across all cell lines for a particular gene. B, this plot is similar to A, with the exception that the x-axis corresponds to the difference in counts between Arg-containing (Arg1 and Arg2) and Pro-containing (Pro1 and Pro2) cell lines.
Validation of PRDM1 as an Arg-specific transcript
The RNAseq data suggested that the β isoform of PRDM1 is an Arg-Up transcript (Fig. 6A). To validate PRDM1β as an Arg-Up transcript, we performed several qPCR experiments. First, we treated all cell lines with various doses of etoposide and observed PRDM1β mRNA levels (Fig. 6B). Second, we treated all cell lines with various doses of Nutlin-3a and observed PRDM1β mRNA levels (Fig. 6C). These results provide strong evidence that PRDM1β mRNA expression levels are only etoposide responsive in cell lines containing the Arg allele. This case is made even stronger by the fact that in these same samples, significant drug-induced responses of 3 canonical Tp53 allele-independent response genes (CDKN1A, GDF15, and MDM2) are seen for both the Pro and Arg cell lines (Fig. 6D–I).
Validation of the Arg-specific transcript PRDM1β. A, RNAseq transcript expression profile for PRDM1β. B and C, PRDM1β expression induced by various doses of etoposide (B) or Nutlin-3a (C) in different cell lines as measured by qPCR. D–I, expression of Tp53 positive controls (CDKN1A, GDF15, MDM2) induced by various doses of etoposide (D–F) or Nutlin-3a (G–I) in different cell lines, as measured by qPCR. J, amount of PRDM1 or CDKN1A DNA pulled down by Tp53-ChIP as measured by qPCR. Cells were treated with Nutlin-3a or control, and the ratio of measurements obtained for these 2 treatment conditions is plotted for each gene and cell line. A value of 1 indicates that Nutlin-3a did not induce any additional Tp53 binding to the gene of interest. K, cells containing shRNA after 1 week treatment with Nutlin-3a, measured by FACS analysis selecting for GFP expression. Data shown as fold change compared with NS shRNA.
Validation of the Arg-specific transcript PRDM1β. A, RNAseq transcript expression profile for PRDM1β. B and C, PRDM1β expression induced by various doses of etoposide (B) or Nutlin-3a (C) in different cell lines as measured by qPCR. D–I, expression of Tp53 positive controls (CDKN1A, GDF15, MDM2) induced by various doses of etoposide (D–F) or Nutlin-3a (G–I) in different cell lines, as measured by qPCR. J, amount of PRDM1 or CDKN1A DNA pulled down by Tp53-ChIP as measured by qPCR. Cells were treated with Nutlin-3a or control, and the ratio of measurements obtained for these 2 treatment conditions is plotted for each gene and cell line. A value of 1 indicates that Nutlin-3a did not induce any additional Tp53 binding to the gene of interest. K, cells containing shRNA after 1 week treatment with Nutlin-3a, measured by FACS analysis selecting for GFP expression. Data shown as fold change compared with NS shRNA.
An obvious hypothesis for explaining these data is that the Arg variant of Tp53 binds more strongly to the promoter of PRM1β than does the Pro variant. This is particularly relevant because of a recently reported Tp53 binding motif upstream of exon 4 of the PRDM1β isoform, which contains the start site for PRDM1β (41). To test this hypothesis, we performed chromatin immunoprecipitation using a Tp53 antibody in the presence or absence of Nutlin-3a, and then looked for CDKN1A (an allele-independent control) and PRDM1 (Fig. 6J). We found that there was greatly enhanced binding of Tp53 to the promoters of both genes in all cell lines relative to the KO, but there was not significant allele specificity. Thus, it seems that differential promoter binding cannot explain the Tp53 allele–specific behavior of PRDM1β. This is consistent with previous work that did not show any differential DNA binding capacity of these Tp53 variants (36). Finally, the data collected from the PRDM1 shRNA transfection panel (Fig. 6K) show that the PRDM1 shRNAs offer a survival advantage to the WT and 9D4 Arg cell lines (2.32- to 3.82-fold increase over NS), although the effect is less robust in the 10H2 cells (1.15–1.81), and are ineffective in the KO and Pro cells (no significant change from NS). This further supports the hypothesis that the Arg allele affects expression and function of PRDM1.
To confirm PRDM1 expression is associated with the Arg/Arg (R/R) genotype at the TP53P72R locus, we performed variant calls on TCGA COAD-READ normal bamfiles to identify genotypes, and analyzed the TCGA COAD-READ gene expression dataset (TCGA_COADREAD_exp_HiSeqV2_exon) for PRDM1 expression (RNA-seq, total exon reads, normalized, log10 transformed). The mean PRDM1 expression level for 99 R/R individuals (8.84) was significantly higher (P = 0.032) than that found in 18 P/P individuals (8.55). P value was determined by three-way ANOVA using genotype, gender and ethnicity as factors.
Discussion
We have investigated the TP53Pro72Arg polymorphism in colon cancer from both an epidemiologic and molecular biology perspective. Epidemiologically, we found that the P/P genotype is significantly associated with accelerated cancer progression, and is likely a major factor in the early age of onset in African Americans (Fig. 1). Our data are consistent with the findings published by Katkoori and colleagues demonstrating that the P/P genotype is a significant risk factor for colon cancer in African Americans (42). Zhang and colleagues, likewise found that the Pro allele and P/P genotype were enriched in Asian patients with gastric cancer compared with noncancer control individuals (43). Sullivan and colleagues reported that Pro72 cells primarily enter a G1 arrest with minimal apoptosis at chemotherapeutic doses that result in a strong apoptotic response in Arg72 cells (14). Jeong and colleagues found that the Arg allele increased expression of proapoptotic genes, such as PUMA and the Pro allele favored expression of cell-cycle regulators, like CDKN1A (12). However, not all studies have been in agreement over the function of the Pro allele. Frank and colleagues found that the Pro allele enhanced apoptotic response to ionizing radiation and increases transactivation of inflammatory response genes (8). The differential function of the Arg and Pro alleles between these studies may be because of species differences in the model systems deployed, or differences in the genetic backgrounds of human studies, as we and others have shown that the Pro allele is a colon cancer risk factor for humans of African, but not European descent. In our view, the preponderance of data suggests that the Pro allele favors a temporary cell-cycle arrest and DNA repair response over a more permanent response like apoptosis or differentiation in the case of stem cells. Further analysis suggests that the P/P genotype–related health disparity is based on an overall 2-fold faster accumulation of mutations, rather than a reduction in the number of mutations necessary for transformation. Our analysis of the mutation rate within tumors of P/P and Not P/P individuals found no significant difference in the mutation rate because of TP53 codon 72 polymorphism within the tumor. Multiple factors are influential on the mutation rate, but individuals with the P/P genotype having a larger stem cell population would increase the probability that driver mutations could accumulate to be passed on. If this is the case, our analyses would predict that P/P individuals could have as many as twice the number of colonic crypt stem cells. This would help explain the increased longevity of P/P individuals (10, 11) and propensity for high-grade undifferentiated tumors in African Americans (2).
Using a panel of single-allele hemizygous derivatives having only the Arg or only the Pro polymorphism, along with deep sequencing technology, we generated lists of Tp53-dependent, allele-independent, and TP53 codon 72 allele-dependent transcripts. We went on to perform an shRNA-based second-site suppressor screen for generating resistance to Tp53-induced growth arrest. We took a similar approach as with the RNA-seq data: first establish relevance with a Tp53 allele–independent study, then move onto allele-specific analysis. Predictably, shRNAs against TP53 showed up at the top of the Tp53 allele–independent study, and other identified targets are of interest for future study. The chromatin-modifying and differentiation-related factors EED, PHF1, and PRDM1 showed up as Arg-specific mediators of Tp53 function, supporting the ideas developed from the RNA-seq data analysis that Tp5372Arg has a novel, but as yet unidentified differentiation function.
The overlap between the RNA-seq studies and the shRNA screen was PRDM1, and we confirmed that the β isoform is specifically upregulated by Tp5372Arg. Prdm1 binds to a currently uncharacterized array of promoters and recruits a histone H3 K4 demethylase and a histone H3 K9 methyltransferase (G9a; ref. 25). Prdm1 is therefore able to silence active chromatin, and some of its targets may include PCNA and MKI67, well-known proliferation genes (44). Moreover, Prdm1 is critical for enforcing terminal B-cell differentiation (45, 46), is implicated in differentiation of T cells (47) and functions as a tumor suppressor (48, 49). It is interesting to speculate about a potential Arg-related aging mechanism, by which DNA damage induces a Prdm1 response, leading to a transient chromatin-silencing activity whose products persist throughout an individual's life. The build-up of these products would eventually increase the propensity of stem cells to differentiate and/or simply lose their pluripotent function, thereby reducing the size of the stem cell pool. A similar TP53-dependent mechanism has indeed been described in Fanconi's anemia for controlling the size of the hematopoietic stem cell pool (50). Such a mechanism is consistent with the possible tendency of P/P individuals to have a larger stem cell pool, thereby exhibiting increased longevity at the expense of accelerated cancer progression. More experiments, however, will of course be needed to more fully explore this hypothesis.
There are likely other germline variants that contribute to the accelerated cancer progression observed in African Americans, but the high frequency of P/P individuals within this population is clearly a contributing factor. The theorized stem cell differentiation induced by the Arg allele removing damaged stem cells from the population. The model predicts that with successive waves of DNA-damaging signals, each transient activation of Tp53 would trigger the accumulation of additional H3K9Me3 repressive marks at genes required for stem-cell maintenance, thus slowly eroding the overall replicative potential of the stem cell pool but protecting from cancer in the process. Our hypotheses based on these data would help explain the disparity in the incidence of early onset colon cancers in TP53 Pro72/Pro72 individuals, who are overrepresented in the African American population. Finally, the novel cell lines described herein will be valuable reagents for the scientific community to use in pursuit of allele-specific pharmacologic intervention strategies to mitigate inherited cancer risk.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: C.C. Weige, M.R. Birtwistle, J. Tidwell, C. Farrell, F. Bunz, M. Shtutman, P.J. Buckhaults
Development of methodology: E. Cloessner, J. Tidwell, H. Ji, M. Shtutman, P.J. Buckhaults
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.C. Weige, M.R. Birtwistle, Z. Berrong, E. Cloessner, K. Duff, J. Tidwell, B. Wilkerson, C. Farrell, C.E. Banister, P.J. Buckhaults
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.C. Weige, M.R. Birtwistle, H. Mallick, N. Yi, Z. Berrong, K. Duff, J. Tidwell, M. Clendenning, C.E. Banister, P.J. Buckhaults
Writing, review, and or revision of the manuscript: C.C. Weige, M.R. Birtwistle, H. Mallick, Z. Berrong, E. Cloessner, K. Duff, J. Tidwell, M. Clendenning, B. Wilkerson, K.E. Creek, P.J. Buckhaults
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Farrell, K.E. Creek, P.J. Buckhaults
Study supervision: M. Clendenning, P.J. Buckhaults
Acknowledgments
Data from the Cancer Genome Atlas Colorectal Adenocarcinoma dataset was included in our analyses, as such the authors thank the TCGA Research Network.
Grant Support
This study was supported by the funding from NIH (grant no. U01CA158428). Additional support was provided by core facilities supported by NCI funding (grant nos. P30-AR048311 and P30-AI027767).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.