TP53 undergoes multiple RNA-splicing events, resulting in at least nine mRNA transcripts encoding at least 12 functionally different protein isoforms. Antibodies specific to p53 protein isoforms have proven difficult to develop, thus researchers must rely on the transcript information to infer isoform abundance. In this study, we used deep RNA-seq, droplet digital PCR (ddPCR), and real-time quantitative reverse transcriptase PCR (RT-qPCR) from nine human cell lines and RNA-seq data available for tumors in The Cancer Genome Atlas to analyze TP53 splice variant expression. All three methods detected expression of the FL/40TP53α_T1 variant in most human tumors and cell lines. However, other less abundant variants were only detected with PCR-based methods. Using RNA-seq simulation analysis, we determined why RNA-seq is unable to detect less abundant TP53 transcripts and discuss the implications of these findings for the general interpretation of RNA-seq data. Cancer Res; 76(24); 7151–9. ©2016 AACR.

The p53 tumor suppressor protein plays a central role in maintaining the integrity of the genome by promoting repair, senescence, or apoptosis of DNA-damaged cells (1, 2). Thus, it is not surprising that TP53 is the most frequently mutated gene in most human cancers (3). Recently, the clinical importance of assessing factors beyond just TP53 mutation status, such as the relative abundance of functionally distinct TP53 RNA variants and the protein isoforms they encode, has been recognized. In humans, at least 9 different protein-encoding transcripts are expressed from the TP53 locus (Fig. 1).

Figure 1.

The genomic complexity at the TP53 locus using the GRCh37/hg19 as a reference. A, Schematic of the human TP53 gene structure. The nine TP53 RNA transcripts encoding 12 protein isoforms (FLp53α, FLp53β, FLp53γ, Δ40p53α, Δ40p53β, Δ40p53γ, Δ133p53α, Δ133p53β, Δ133p53γ) generated by alternative splicing (α, β, and γ) and alternative promoter usage (P1 and P2) are indicated along with other TP53 transcripts reported in the literature. At the top of the figure, exons are numbered and illustrated in purple, with the regions of supplementary exon 9b that are included in the alternatively spliced β and γ variants shown in pink and blue, respectively. Intron 1 was truncated due to space constraints (represented by double vertical dashed lines). SINE repeat regions along the gene locus are shown in orange. Genomic sequences that correspond to the RT-qPCR products generated in this study are shown: pFL, pΔ40, and pΔ133 correspond to the full-length, Δ40, and Δ133 alternative 5′ ends of TP53 transcripts, respectively, while pα, pβ, and pγ correspond to the α, β, and γ alternative 3′ ends of TP53 transcripts, respectively. Four 125-nt RNA sequence regions against which RNA-seq reads were counted are also shown; R1 is located in exon 4, R2 is common to the AluJb element and the Δ133TP53 5′ UTR sequence, R3 is unique to the Δ133TP53 5′ UTR sequence and excludes the AluJb element, and R4 spans exons 5 and 6. B, The intron 1–exon 2 junction of FL/Δ40TP53_T2 and FL/Δ40TP53_T1 mRNA transcripts are compared, showing the additional CAG sequence incorporated at the start of exon 2 in the FL/Δ40TP53_T1 splice variant family. C, The partial overlap between the 5′ UTR of the Δ133P53 splice variant family and an AluJb repeat element is identified, with the sequence of this overlap shown below. D, The overlap between transcripts from adjacent gene WRAP53 and the exon 1 of TP53 relative to the forward strand of chromosome 17. Gene structures were drawn using Fancy Gene v1.4 (49).

Figure 1.

The genomic complexity at the TP53 locus using the GRCh37/hg19 as a reference. A, Schematic of the human TP53 gene structure. The nine TP53 RNA transcripts encoding 12 protein isoforms (FLp53α, FLp53β, FLp53γ, Δ40p53α, Δ40p53β, Δ40p53γ, Δ133p53α, Δ133p53β, Δ133p53γ) generated by alternative splicing (α, β, and γ) and alternative promoter usage (P1 and P2) are indicated along with other TP53 transcripts reported in the literature. At the top of the figure, exons are numbered and illustrated in purple, with the regions of supplementary exon 9b that are included in the alternatively spliced β and γ variants shown in pink and blue, respectively. Intron 1 was truncated due to space constraints (represented by double vertical dashed lines). SINE repeat regions along the gene locus are shown in orange. Genomic sequences that correspond to the RT-qPCR products generated in this study are shown: pFL, pΔ40, and pΔ133 correspond to the full-length, Δ40, and Δ133 alternative 5′ ends of TP53 transcripts, respectively, while pα, pβ, and pγ correspond to the α, β, and γ alternative 3′ ends of TP53 transcripts, respectively. Four 125-nt RNA sequence regions against which RNA-seq reads were counted are also shown; R1 is located in exon 4, R2 is common to the AluJb element and the Δ133TP53 5′ UTR sequence, R3 is unique to the Δ133TP53 5′ UTR sequence and excludes the AluJb element, and R4 spans exons 5 and 6. B, The intron 1–exon 2 junction of FL/Δ40TP53_T2 and FL/Δ40TP53_T1 mRNA transcripts are compared, showing the additional CAG sequence incorporated at the start of exon 2 in the FL/Δ40TP53_T1 splice variant family. C, The partial overlap between the 5′ UTR of the Δ133P53 splice variant family and an AluJb repeat element is identified, with the sequence of this overlap shown below. D, The overlap between transcripts from adjacent gene WRAP53 and the exon 1 of TP53 relative to the forward strand of chromosome 17. Gene structures were drawn using Fancy Gene v1.4 (49).

Close modal

Multiple studies have now shown that p53 isoforms have distinct functions and frequently contribute to diseases including cancer (4–18). Elevated expression of the Δ40p53 isoform has been associated with metastatic melanoma and triple-negative breast cancers (7, 19). In contrast, elevated levels of TP53β were found to be negatively associated with tumor size and positively associated with disease-free survival of breast cancer patients with TP53 mutations (7). Opposing functions of these two isoforms was further demonstrated in melanoma cell lines where Δ40p53 was shown to inhibit p53-dependent transcription of downstream target genes, whereas p53β stimulated transactivation (19). p53β also promoted replicative senescence in T cells, which was blocked by Δ133p53 (4). Δ133p53 has also been shown to inhibit p53-dependent apoptosis and G1 arrest, but not G2 arrest (20), suggesting that Δ133p53 does not just act as a dominant negative, and the zebrafish homolog of Δ133p53, Δ113p53, also inhibits p53-initiated apoptosis (21). Studies in mice suggest that the Δ40p53 isoform controls the switch from pluripotency to differentiation (22) as well as controling β-cell proliferation and glucose homeostasis (23). Δ133p53α has been shown to stimulate proliferation (6) and angiogenesis of tumor cells (6) and Δ133p53β promotes stem cell differentiation by upregulating pluripotency factors such as SOX2, OCT3/4, and NANOG (24). A mouse model of Δ133p53 designated Δ122p53 was shown to promote hyperproliferation, tumorigenesis, and inflammation (9) and overexpression of Δ122p53 promoted cell migration, invasion through three-dimensional (3D) matrices (16), and upregulation of metastasis-associated proteins (14). Finally, recent data from the Δ122p53 mouse has shown that there is also cooperativity as Δ122p53 enhanced the survival of p53-mutant mice (25) and full-length p53 increased the ability of Δ122p53 to promote cell migration (16). This is consistent with the isoforms functioning in a complex network to determine cell fate outcomes (17).

Thus, given the potential importance of p53 isoforms in normal cell physiology and in disease, it is important to be able to quantitate the expression levels of the isoforms in cells and tissues. Unfortunately, as antibodies specific to p53 protein isoforms have proved difficult to develop (26), analysis of isoform abundance can presently only be inferred from RNA transcript levels, although it is true that there will not always be a direct correlation between transcript and protein. Methods to detect TP53 RNA splice variants include RT-qPCR (27), digital droplet PCR (ddPCR), expression microarrays, and NanoString. ddPCR has the particular advantage of providing absolute quantitation of target TP53 sequences (28). At the whole transcriptome level, RNA-seq is progressively displacing traditional microarray methods, as it can in principle identify novel transcripts generated by alternative splicing, gene fusion events, strand-specific and antisense transcription, and small and noncoding RNAs, as well as assessing allelic bias (29–32).

Use of RNA-seq data from large cancer genomic data repositories (33, 34) potentially provides an excellent resource to investigate expression of TP53 transcript variants. While assessing the abundance of transcripts expressed from the TP53 locus as a whole is straightforward, identifying individual TP53 transcript variants is not trivial, due to the complexity of the TP53 gene locus encoding multiple transcripts, several of which appear to be expressed at low levels. TP53 gene transcription can be initiated either at a promoter upstream of exon 1 (P1) expressing several forms of full-length protein (FLp53) or from an internal promoter in intron 4 (P2), expressing the Δ133p53 isoforms. The Δ133TP53 transcript also encodes the Δ160p53 protein, which lacks the first 160 amino acids generated by alternative initiation of translation (10). In addition, internal ribosome entry site (IRES)-mediated translation of TP53 mRNA leads to expression of the Δ40p53 isoforms (35). Of note, there are two sets of transcripts capable of encoding both full-length (FL)/Δ40TP53_T1 and FL/Δ40TP53_T2, differ by only three nucleotides in their length, due to the incorporation of CAG at the intron exon junction of exon 2 (Fig. 1B). Another mechanism by which Δ40TP53 transcripts are generated depends on G-quadraplex structures in intron 3, which affect the splicing of intron 2 (36). Finally, alternative splicing between exon 9 and exons 9B/10 generates three alternative 3′ ends for TP53 transcripts (TP53α, β, and γ; ref. 12). Thus, the human TP53 gene can express at least 9 transcript variants (Fig. 1) encoding at least 12 protein isoforms (FLp53α; β and γ, Δ40p53α; β and γ, Δ133p53α, Δ133p53β, Δ133p53γ, Δ160p53α, β, and γ). In addition to these protein-coding transcripts, several noncoding transcripts generated through alternative splicing have been reported, for example, p53Ψ, a transcriptionally inactive p53 isoform with an ability to reprogram cells toward a metastatic-like state (Fig. 1; refs. 31, 37).

To characterize TP53 RNA splice variant expression, in this study, we quantified the expression of all known TP53 transcripts using RNA-seq data from The Cancer Genome Atlas (TCGA) and 9 human cell lines. Our results show that irrespective of tumor type and cell line, and even when using unusually high RNA-seq read depth, commonly used bioinformatic analysis pipelines struggled to detect low abundance TP53 RNA splice variants and had lower sensitivity compared with PCR-based methods. Our analysis of simulated RNA-seq data showed that in the presence of unequal TP53 splice variant abundance or low RNA-seq read counts, significant biases in TP53 splice variant quantification occur. Our data also suggest that only the FL/Δ40TP53α_T1 variant is highly expressed in human cancers and in the 9 cell lines, whereas other isoforms are expressed at much lower levels. These results have both biological and technical implications for TP53 research as well as broader implications for the many studies that quantify RNA splice variants in RNA-seq data.

Retrieval of RNA-seq data from TCGA datasets and data analysis

We downloaded TCGA RNA-seq data, which was processed using the RNA-seq by Expectation–Maximization (RSEM) method (38) and normalized to a fixed upper quartile (TCGA MapspliceRSEM version 0.7 pipeline using the GRCh37/hg19 reference), for 10,310 tumors over 32 cancer types (Supplementary Table S1; level 3 data downloaded on October 20, 2015 from the TCGA data portal). Detailed description of the processing protocol can be found in the TCGA open access FTP download directories (https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/). All data analyses and visualizations were performed using the R statistical framework.

RNA-seq of cell lines and data analysis

Total RNA was extracted using TRIzol reagent (Thermo Fisher Scientific) from 6 human osteosarcoma cell lines: HAL, KPD, U2OS, Saos-2, ZK58, OHS, and three human fibroblast cell lines: IIICF/c, LFS-05F-24 (LFS), and JFCF-6 (JFCF) obtained from R. Reddel (Children's Medical Research Institute, New South Wales, Australia) over the period of 2014–2015. All cell lines were validated for authenticity by CellBank Australia (http://www.cellbankaustralia.com/) using STR profiling. Strand-specific libraries for TruSeq total RNA sequencing were generated and 2 × 15 cycles of PCR amplification were carried out as per the manufacturer's recommendation prior to sequencing of the 9 samples. Sequencing was performed using 3 lanes of the Illumina HiSeq platform (2 × 125 bp, paired end) with each library multiplexed across all 3 lanes. Library and run parameters are summarized in Supplementary Table S2. Raw sequence files for the cell lines are deposited in SRA and their accession numbers are SAMN05725778 (HAL), SAMN05725779 (KPD), SAMN05725784 (IIICF/c), SAMN05725785 (LFS), SAMN05725780 (U2OS), SAMN05725781 (Saos-2), SAMN05725782 (ZK58), SAMN05725786 (JFCF), and SAMN05725783 (OHS), respectively. RNA-seq reads were quality trimmed using cutadapt v1.9.1 (39) to remove surplus adapters, bases with Phred score <20, and paired reads <50 bp post-trimming. Reads were aligned using bowtie-2 v2.2.7 (40) via RSEM v1.2.28 (38) with recommended RSEM parameters for stranded RNA-seq data. Gene and transcript abundance was quantified using RSEM. Sequencing reads were also aligned using bowtie-2 v2.2.7 (40) via TopHat v2.1.1 (41) and quantified using Cufflinks v2.2.1 (42), using the default settings.

The mRNA sequence for 7 TP53 transcripts (FL/Δ40TP53α_T1, FL/Δ40TP53α_T2 FL/Δ40TP53β, FL/Δ40TP53γ, Δ133TP53α, β, and γ) was downloaded from the UCSC genome browser (43) along with the Locus Reference Genomic (LRG) database IDs (44) for the reference track set to GRCh37/hg19 (Supplementary Table S3). Reads were then simulated for each of these 7 TP53 transcripts using dwgsim v0.1.11 (https://github.com/nh13/DWGSIM; ref. 45). Simulated reads were aligned and transcript abundance was quantified using either bowtie-2 (40) via RSEM (38), or bowtie-2 (40) via TopHat (41), and quantified using Cufflinks (42). Read counts in each sample overlapping 4 specific genomic regions were counted by using the sequences and their reverse complements as bespoke references then mapping the reads to these references using bowtie-2 (40), followed by counting with HTSeq Python scripts (46).

ddPCR and RT-qPCR

The RNA (2 μg) extracted from the 9 cell lines was DNase I treated (Thermo Fisher Scientific) and then reverse transcribed using qScript cDNA SuperMix (Quanta Biosciences), according to the manufacturer's instructions. Primers were designed for specific TP53 transcript subclasses (FL/Δ40TP53_T1, FL/Δ40TP53_T2 and Δ133TP53, TP53α, β, and γ). Primer sequences and amplification efficiency and the method used to determine amplification efficiency for each primer pair is provided in Supplementary Table S4. Transcript abundance was measured using the Bio-Rad QX200 ddPCR System (Bio-Rad). In brief, 0.5–50 ng of cDNA was added to 10 μL of QX200 ddPCR EvaGreen Supermix (Bio-Rad) with 100 nmol/L each of forward and reverse primers in a total volume of 20 μL. Droplets were generated with Droplet Generation Oil for EvaGreen (Bio-Rad), transferred to a 96-well PCR plate, and an endpoint PCR run in a C1000 Touch Thermal Cycler as follows: 95°C for 5 minutes, then 40 cycles of 95°C for 30 seconds and 60°C for 1 minute, then 1 cycle each of 4°C for 5 minutes, and then 90°C for 5 minutes. Droplets were read using the Bio-Rad QX200 reader and the data analyzed using QuantaSoft software (Supplementary Fig. S1). The limit of detection was based on obtaining >1 but <2 droplets for each primer pair in the volume used for the ddPCR reaction, which was corrected for the actual amount of cDNA in the reaction to calculate copies/μg. On the basis of this analysis, the limit of detection for ddPCR was defined as 30 copies/μg.

RT-qPCR was done as follows: 66 ng of cDNA was added to 5 μL of SYBR Premix Ex Taq II (Tli RNase H Plus; Takara Bio), 0.4 μL of Rox dye, and 200 nmol/L of each primer in a final volume of 10 μL. The PCRs were run on a QuantStudio 12K Flex Real-Time PCR System (Life Technologies) as follows: 95°C for 2 minutes, then 40 cycles of 95°C for 30 seconds, 60°C for 1 minute, followed by a melt curve stage to visualize the dissociation curves of the product (Supplementary Fig. S2). RT-qPCR was performed for each sample with each primer pair in triplicate. Transcript abundance from RT-qPCR was calculated using the equation:

formula

To avoid detection of any false positive products, the limit of detection was set at a Ct value of 32, as its precision decreases and the number of false positive hits increases at higher Ct values. The RT-qPCR products were Sanger sequenced and the sequences of the amplicons are provided in Supplementary Fig. S3.

RNA-seq detects abundantly expressed TP53 transcriptional variants

To estimate the expression of TP53 transcript variants, we analyzed TCGA RNA-seq data from 10,310 human tumors across 32 tumor types. In 99% of TCGA tumors, the overall TP53 gene expression was above the median expression of all genes and in 75% of tumors TP53 was expressed above the 75th percentile of all genes (Fig. 2A). However, despite the abundance of TP53, only two RNA splice variants were readily identified across the different tumor types. These included FL/Δ40TP53α_T1 and an uncharacterized variant (uc010cne.1), while other TP53 transcripts, including the canonical FL/Δ40TP53α_T2, FL/Δ40TP53β, and FL/Δ40TP53γ transcripts and the Δ133TP53α, β, and γ transcripts, were not detected (Fig. 2B).

Figure 2.

The transcript T1 is the most frequently detected TP53 transcript from RSEM-analyzed data. A, Percentile rank of TP53 gene expression within individual tumors of RNA-seq RSEM analyzed data. Within each tumor, genes were sorted by increasing expression of the TP53 gene along the x-axis, with the median gene expression across tumors shown in the dotted gray line. B, Distribution of the expression of either any RNA encoded by the TP53 gene locus or specific TP53 transcript variants from RNA-seq data analyzed using the RSEM method for 32 cancer types from TCGA. Details for each of the tumor types, including definition of the abbreviations, are summarized in Supplementary Table S1. C, Distribution of the expression of either the TP53 gene locus or transcript variant from RNA-seq data analyzed using the RSEM method for nine human cell lines. The line in the middle of each box represents the median, and the top and bottom outlines of the box represent the first and third quartile, respectively.

Figure 2.

The transcript T1 is the most frequently detected TP53 transcript from RSEM-analyzed data. A, Percentile rank of TP53 gene expression within individual tumors of RNA-seq RSEM analyzed data. Within each tumor, genes were sorted by increasing expression of the TP53 gene along the x-axis, with the median gene expression across tumors shown in the dotted gray line. B, Distribution of the expression of either any RNA encoded by the TP53 gene locus or specific TP53 transcript variants from RNA-seq data analyzed using the RSEM method for 32 cancer types from TCGA. Details for each of the tumor types, including definition of the abbreviations, are summarized in Supplementary Table S1. C, Distribution of the expression of either the TP53 gene locus or transcript variant from RNA-seq data analyzed using the RSEM method for nine human cell lines. The line in the middle of each box represents the median, and the top and bottom outlines of the box represent the first and third quartile, respectively.

Close modal

Similar to the tumors, the only transcript variant from RSEM-processed RNA-seq data detected in 6 of the 9 cell lines was the FL/Δ40TP53α_T1. All other TP53 transcript variants were barely detected, including uc010cne.1 (Fig. 2C).

Deep RNA-seq, RT-qPCR, and ddPCR of human cancer cell lines

To determine whether deep RNA-seq data (121–139 million reads per sample) analyzed using the RSEM protocol of the 9 cell lines was an accurate reflection of the transcript abundance, ddPCR and RT-qPCR were also performed for each of the TP53 transcript subclasses (FL/Δ40TP53_T1, FL/Δ40TP53_T2, and Δ133TP53, TP53α, β, and γ). All three methods detected expression from the TP53 locus in IIICF/c, LFS, U2OS, JFCF, and OHS cells, but not in Saos-2, HAL, KPD, or ZK58 cells (Fig. 3; Supplementary Table S5). As we observed for the TCGA data, deep RNA-seq detected the FL/Δ40TP53α_T1 transcript across the 5 cell lines with an expected read count of between 83 and 10,032 reads (Supplementary Table S5). Similarly, expression of the transcript subclasses FL/Δ40TP53_T1 and TP53α using ddPCR (Fig. 3A and B) and RT-qPCR (Supplementary Table S5) was much higher than other TP53 transcripts across these 5 cell lines. However, even deep RNA-seq struggled to detect low abundance TP53 transcript subclasses including FL/Δ40TP53_T2, Δ133TP53, and TP53β (Supplementary Table S5), whereas these were readily detected by ddPCR (Fig. 3C–E; Supplementary Table S5) and RT-qPCR (Supplementary Table S5). For example, in U20S cells where ddPCR detected 1,574 copies/μg of Δ133TP53 mRNA, while RNA-seq failed to detect any transcript except FL/Δ40TP53α_T1 (Fig. 3; Supplementary Table S5). Moreover, across these 5 cell lines, relative expression of the Δ133TP53 transcript was 1% of the FL/Δ40TP53_T1 transcript except in IIICF/c cells (Supplementary Fig. S4). Similarly, the relative levels of the FL/Δ40TP53_T2 and TP53β were 2% and 20% of the FL/Δ40TP53_T1 transcript across the cell lines, respectively (Supplementary Fig. S4). The relative expression of these transcripts to the FL/Δ40TP53_T1 transcript further highlights their rarity. These data suggest that even deep RNA-seq both fails to detect and underestimates the abundance of rare TP53 transcripts. None of the three methods detected the expression of the TP53γ across the cell lines implying that this transcript is expressed at very low levels in all samples examined. Despite RNA-seq data on tumors from TCGA barely detecting Δ133TP53 transcripts, high levels of this transcript have been detected using RT-qPCR in cohorts of brain, prostate, and colon cancers (unpublished data). These results suggest that there are marked differences in sensitivity between RNA-seq and PCR-based methods.

Figure 3.

ddPCR detects all TP53 transcript subclasses except TP53γ. ddPCR of the various TP53 transcript subclasses was performed on the nine cell lines, and the subclass abundance per microgram of RNA was calculated using the QuantaSoft software. The lower limit of detection was <30 copies/μg RNA. FL/Δ40TP53_T1 (A), TP53α (B), FL/Δ40TP53_T2 (C), Δ133TP53 (D), and TP53β (E). TP53 transcripts are expressed at different levels, hence the y-axis scales are different between the plots.

Figure 3.

ddPCR detects all TP53 transcript subclasses except TP53γ. ddPCR of the various TP53 transcript subclasses was performed on the nine cell lines, and the subclass abundance per microgram of RNA was calculated using the QuantaSoft software. The lower limit of detection was <30 copies/μg RNA. FL/Δ40TP53_T1 (A), TP53α (B), FL/Δ40TP53_T2 (C), Δ133TP53 (D), and TP53β (E). TP53 transcripts are expressed at different levels, hence the y-axis scales are different between the plots.

Close modal

Lack of detection of TP53 transcript subclasses with low levels of expression (FL/Δ40TP53_T2, Δ133TP53 and TP53β) by deep RNA-seq was partially explained by the fact that ≤5 individually counted RNA-seq reads were found to contain both PCR primers for FL/Δ40TP53_T2, Δ133TP53, and TP53β transcript subclasses (Supplementary Table S6). Also, the detection of the Δ133TP53 transcript subclass is complicated by the overlap between the AluJb sequence and the 5′-UTR of Δ133TP53 (Fig. 1C). Once again, ≤6 reads from deep RNA-seq spanned the 125-nt region unique to the 5′-UTR of Δ133TP53, suggesting low expression of this splice variant relative to other variants, given that much higher numbers of reads were counted in 125-nt regions of exon 4 and exons 5/6. A higher number of RNA-seq reads in the region overlapping the AluJb sequence than in the adjacent 5′-UTR of Δ133TP53 suggests that the artifactual read counts could have been recorded against this repeat region due to expressed AluJb sequences from elsewhere in the genome (Supplementary Table S7).

Simulation analysis of RNA-seq read assignment reveals mapping uncertainty for low abundance transcripts

In addition to a very low number of reads, assigning a set of short RNA-seq reads (125 nt), few of which cross RNA splice sites, to numerous alternatively spliced transcripts in a complex locus like TP53 is theoretically a difficult task. Thus, to test this, we replicated the RSEM analysis done by TCGA using simulated RNA-seq reads. Initially, input for RSEM was 1,000 simulated separately aligned and counted reads for FL/Δ40TP53_T1, FL/Δ40TP53_T2, FL/Δ40TP53β, FL/Δ40TP53γ, Δ133TP53α, β, and γ respectively. As expected, RSEM could accurately assign the reads to the correct TP53 transcript when the input data consisted of only one of the transcripts. Furthermore, when the simulated input consisted of an equal mixture of these transcripts each with 1,000 reads, RSEM assigned the reads with a maximum error of ±5% to the correct TP53 transcript variants. However, RSEM failed to accurately assign reads to the correct TP53 transcript variant when the simulated input consisted of a mixture of TP53α, β, and γ transcripts each with only 8 reads (Fig. 4). Similar results were observed when the same simulated RNA-seq data were analyzed using TopHat2-Cufflinks as an alternative method. This suggests that, perhaps unsurprisingly, the accuracy of multiple RNA-seq analysis methods for enumerating transcript variants degrades with very low read counts.

Figure 4.

Accurate quantitation of transcripts by RSEM is dependent on RNA-seq read depth and relative abundance. From left to right, the first stacked bar shows the ability of RSEM to assign simulated reads to the correct TP53 transcript variant with an error of about ±5% when 1,000 simulated input reads were mixed together for FL/Δ40TP53α_T1, FL/Δ40TP53α_T2, FL/Δ40TP53β, FL/Δ40TP53γ, Δ133TP53α, Δ133TP53β, Δ133TP53γ (reads = 1,000/transcript), respectively. The second stacked bar illustrates a slightly degraded ability of RSEM to assign the reads to the correct TP53 transcript when the number of simulated input reads was reduced 125-fold from 1,000 to 8 reads for all of FL/Δ40TP53α_T1, FL/Δ40TP53α_T2, Δ133TP53α, Δ133TP53β, Δ133TP53γ, FL/Δ40TP53β, and FL/Δ40TP53γ, respectively. Stacked bars 3–5 demonstrate the progressive loss of RSEM's ability to assign reads to the correct TP53 transcript variant as the number of simulated input reads are diluted 2.5-fold (400 reads), 25-fold (40 reads), and 125-fold (8 reads) for FL/Δ40TP53α_T2, Δ133TP53α, Δ133TP53β, Δ133TP53γ, FL/Δ40TP53β, and FL/Δ40TP53γ, respectively, representing low abundance transcripts in the presence of undiluted FL/Δ40TP53α_T1 transcript (1,000 simulated input reads) representing an abundantly expressed transcript. Each stacked bar plot shows the proportion of the expected counts (RSEM)/number of simulated input reads. For example, when RSEM accurately assigns the reads, the height of each stacked bar should equal 1. Inaccuracies by RSEM reduce the height of the stacked bar to <1 when the number of expected count for each variant is less than the input reads and >1 when the expected count for each variant is greater than the input reads.

Figure 4.

Accurate quantitation of transcripts by RSEM is dependent on RNA-seq read depth and relative abundance. From left to right, the first stacked bar shows the ability of RSEM to assign simulated reads to the correct TP53 transcript variant with an error of about ±5% when 1,000 simulated input reads were mixed together for FL/Δ40TP53α_T1, FL/Δ40TP53α_T2, FL/Δ40TP53β, FL/Δ40TP53γ, Δ133TP53α, Δ133TP53β, Δ133TP53γ (reads = 1,000/transcript), respectively. The second stacked bar illustrates a slightly degraded ability of RSEM to assign the reads to the correct TP53 transcript when the number of simulated input reads was reduced 125-fold from 1,000 to 8 reads for all of FL/Δ40TP53α_T1, FL/Δ40TP53α_T2, Δ133TP53α, Δ133TP53β, Δ133TP53γ, FL/Δ40TP53β, and FL/Δ40TP53γ, respectively. Stacked bars 3–5 demonstrate the progressive loss of RSEM's ability to assign reads to the correct TP53 transcript variant as the number of simulated input reads are diluted 2.5-fold (400 reads), 25-fold (40 reads), and 125-fold (8 reads) for FL/Δ40TP53α_T2, Δ133TP53α, Δ133TP53β, Δ133TP53γ, FL/Δ40TP53β, and FL/Δ40TP53γ, respectively, representing low abundance transcripts in the presence of undiluted FL/Δ40TP53α_T1 transcript (1,000 simulated input reads) representing an abundantly expressed transcript. Each stacked bar plot shows the proportion of the expected counts (RSEM)/number of simulated input reads. For example, when RSEM accurately assigns the reads, the height of each stacked bar should equal 1. Inaccuracies by RSEM reduce the height of the stacked bar to <1 when the number of expected count for each variant is less than the input reads and >1 when the expected count for each variant is greater than the input reads.

Close modal

We then tested the robustness of RSEM for detecting mixtures containing high and low abundance TP53 transcripts. The input for RSEM consisted of 1,000 simulated reads for FL/Δ40TP53α_T1, representing an abundantly expressed transcript, with low abundance transcripts represented by a simulated input of either 400, 40, or 8 reads for FL/Δ40TP53α_T2, FL/Δ40TP53β, FL/Δ40TP53γ, Δ133TP53α, β, and γ, respectively. In all 3 simulations (400, 40, or 8 reads), RSEM assigned the reads to the FL/Δ40TP53α_T1 with a maximum error of ±10%. However, as the number of simulated input reads for the various TP53 transcripts decreased compared with the FL/Δ40TP53α_T1, assignment error by the RSEM algorithm increased, resulting in overrepresentation of some TP53 transcripts and lack of detection of others (Fig. 4). For example, as the number of simulated input reads decreased for the Δ133TP53 subclass, it resulted in artifactual over-representation of the Δ133TP53α (blue bar), artifactual under-representation of Δ133TP53β (purple bar) and Δ133TP53γ (light blue bar), respectively (Fig. 4). Similarly, an over representation of the FL/Δ40TP53β (orange bar) and FL/Δ40TP53γ (pink bar) and under representation of the FL/Δ40TP53α_T2 (green bar), respectively, was observed across the different simulations (Fig. 4). To confirm that TP53 splice variant quantification was not influenced by TP53 gene mutation, we showed the 20 most frequent Cosmic TP53 single-nucleotide variants and indels caused no significant degradation in the mapping of sequence reads to the correct position in TP53 (Supplementary Table S8). These read assignment errors suggest that the accuracy of RNA-seq for quantifying splice variants is reduced when there is a mixture of high and very low abundance splice variants transcribed from complex loci such as TP53.

Analyses of RNA-seq data from TCGA (10,310 tumors over 32 cancer types) and deep RNA-seq, ddPCR, and RT-qPCR data from 9 human cell lines led to several key findings. First, FL/Δ40TP53α_T1 was the most abundantly expressed TP53 transcript across all tumor types (Fig. 2B) and in the cell lines detected by RNA-seq (Fig. 2C). Similarly, RT-qPCR and ddPCR detected abundant expression of the FL/Δ40TP53_T1 and TP53α transcript subclasses in these cell lines (Fig. 3A and B). In contrast, RNA-seq struggled to detect TP53 transcripts expressed with low abundance (e.g., Δ133p53) across the cell lines (Fig. 2), which were, however, detected by ddPCR (Fig. 3C) and RT-qPCR (Supplementary Table S1), consistent with previous reports (4, 6, 10–12, 37, 47). Interestingly, RNA-seq analyses of tumor RNA identified the noncanonical TP53 transcript uc010cne.1 to be abundantly expressed; however, this was not observed from the RNA-seq data in the 9 cell lines and therefore needs to be validated using PCR-based methods (Fig. 2B and C). Finally, lack of detection of TP53 transcripts expressed at low levels by RNA-seq is at least in part due to the very low numbers of reads produced for these transcripts, even from the relatively deep RNAseq performed here (Supplementary Tables S2 and S3). This appears to be compounded by reduced reliability of RNA-seq analysis methods when faced with a wide range of abundance of splice variant reads that need to be mapped to a single complex locus such as TP53 (Fig. 4). It is important, however, to remember that low transcript abundance and limited ability of detection does not imply a lack of biological or clinical significance, as the functional consequences of even low abundance TP53 isoform expression can be profound (see Introduction).

Several factors potentially influence the ability of current RNA-seq methods to measure the abundance of a large dynamic range of RNA transcripts. In addition to the bioinformatic factors, these include genomic factors such as: fragmentation site biases, biases in the efficiency of cDNA synthesis, representational bias of sense versus antisense transcripts, stochastic amplification biases, GC bias, template switching, and transcript length bias (30, 32). However, we show using simulations that over and above these genomic issues, there are significant bioinformatic challenges when assigning a mixture of very low and moderately high abundance short reads to a complex locus with largely overlapping splice variants, such as TP53.

On the basis of our findings, RNA-seq data for the TP53 locus and other transcriptionally complex loci, from publicly available information, can be reliably used to quantify total expression of all RNA splice variants from a gene, or individual RNA splice variants that are expressed at high levels, for example, FL/Δ40TP53α_T1. However, RNA-seq assignment of alternatively spliced transcripts from complex single loci such as TP53, expressed at low and/or wide ranging levels, may be inaccurate. In some instances, however, where good coverage of the targeted loci can be obtained, it may be possible to improve the detection of the low abundance transcripts by using deep targeted sequencing approaches. In the future, the use of PacBio and similar technologies that produce long reads (48), or long RT-qPCR, may circumvent the challenge of quantifying RNA transcripts with potential splice variants at both ends. However, for now, accurate quantification of low abundance transcripts in the TP53 and other complex loci can best be done using empirical PCR-based methods.

No potential conflicts of interest were disclosed.

Conception and design: S. Mehta, H. Campbell, A. Braithwaite, C. Print

Development of methodology: S. Mehta, A. Lasham, C. Print

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S. Mehta, A. Lasham, H. Campbell

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S. Mehta, P. Tsai, A. Lasham, A. Braithwaite, C. Print

Writing, review, and/or revision of the manuscript: S. Mehta, P. Tsai, A. Lasham, H. Campbell, R.R. Reddel, A. Braithwaite, C. Print

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S. Mehta, R.R. Reddel, C. Print

Study supervision: S. Mehta, A. Lasham, A. Braithwaite, C. Print

We are grateful for the support provided by The University of Auckland, New Zealand, University of Otago, New Zealand, and Children's Medical Research Institute, Sydney, Australia. We would also like to thank New Zealand Genomics Limited for preparing the RNA-seq libraries for the nine cell lines and Jane Noble for authenticating the cell lines.

This work was supported by the Health Research Council of New Zealandgrant (awarded to A.W. Braithwaite), the Royal Society of New Zealand Marsden Fund (awarded to A.W. Braithwaite), the Maurice Wilkins Centre, New Zealand(S.Y. Mehta, C.G. Print, and A.W. Braithwaite), and Cancer Council of New South Wales, Australia (R.R. Reddel).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Lane
DP
. 
Cancer. p53, guardian of the genome
.
Nature
1992
;
358
:
15
6
.
2.
Braithwaite
AW
,
Royds
JA
,
Jackson
P
. 
The p53 story: layers of complexity
.
Carcinogenesis
2005
;
26
:
1161
9
.
3.
Olivier
M
,
Hollstein
M
,
Hainaut
P
. 
TP53 mutations in human cancers: origins, consequences, and clinical use
.
Cold Spring Harb Perspect Biol
2010
;
2
:
a001008
.
4.
Mondal
AM
,
Horikawa
I
,
Pine
SR
,
Fujita
K
,
Morgan
KM
,
Vera
E
, et al
p53 isoforms regulate aging-and tumor-associated replicative senescence in T lymphocytes
.
J Clin Invest
2013
;
123
:
5247
57
.
5.
Hafsi
H
,
Santos-Silva
D
,
Courtois-Cox
S
,
Hainaut
P
. 
Effects of Δ40p53, an isoform of p53 lacking the N-terminus, on transactivation capacity of the tumor suppressor protein p53
.
BMC Cancer
2013
;
13
:
134
.
6.
Bernard
H
,
Garmy-Susini
B
,
Ainaoui
N
,
Van Den Berghe
L
,
Peurichard
A
,
Javerzat
S
, et al
The p53 isoform, Δ133p53α, stimulates angiogenesis and tumour progression
.
Oncogene
2013
;
32
:
2150
60
.
7.
Avery-Kiejda
KA
,
Morten
B
,
Wong-Brown
MW
,
Mathe
A
,
Scott
RJ
. 
The relative mRNA expression of p53 isoforms in breast cancer is associated with clinical features and outcome
.
Carcinogenesis
2013
;
35
:
586
96
.
8.
Hofstetter
G
,
Berger
A
,
Berger
R
,
Zoric
A
,
Braicu
EI
,
Reimer
D
, et al
The N-terminally truncated p53 isoform Δ40p53 influences prognosis in mucinous ovarian cancer
.
Int J Gynecol Cancer
2012
;
22
:
372
9
.
9.
Slatter
TL
,
Hung
N
,
Campbell
H
,
Rubio
C
,
Mehta
R
,
Renshaw
P
, et al
Hyperproliferation, cancer, and inflammation in mice expressing a Δ133p53-like isoform
.
Blood
2011
;
117
:
5166
77
.
10.
Marcel
V
,
Perrier
S
,
Aoubala
M
,
Ageorges
S
,
Groves
MJ
,
Diot
A
, et al
Delta160p53 is a novel N-terminal p53 isoform encoded by Δ133p53 transcript
.
FEBS Lett
2010
;
584
:
4463
8
.
11.
Fujita
K
,
Mondal
AM
,
Horikawa
I
,
Nguyen
GH
,
Kumamoto
K
,
Sohn
JJ
, et al
p53 isoforms Δ133p53 and p53β are endogenous regulators of replicative cellular senescence
.
Nat Cell Biol
2009
;
11
:
1135
42
.
12.
Bourdon
J-C
. 
p53 and its isoforms in cancer
.
Br J Cancer
2007
;
97
:
277
82
.
13.
Campbell
H
,
Slatter
T
,
Jeffs
A
,
Mehta
R
,
Rubio
C
,
Baird
M
, et al
Does Δ133p53 isoform trigger inflammation and autoimmunity?
Cell Cycle
2012
;
11
:
446
50
.
14.
Sawhney
S
,
Hood
K
,
Shaw
A
,
Braithwaite
AW
,
Stubbs
R
,
Hung
NA
, et al
Alpha-enolase is upregulated on the cell surface and responds to plasminogen activation in mice expressing a Δ133p53α mimic
.
PLoS One
2015
;
10
:
e0116270
.
15.
Slatter
T
,
Hung
N
,
Bowie
S
,
Campbell
H
,
Rubio
C
,
Speidel
D
, et al
Δ122p53, a mouse model of Δ133p53α, enhances the tumor-suppressor activities of an attenuated p53 mutant
.
Cell Death Dis
2015
;
6
:
e1783
.
16.
Roth
I
,
Campbell
H
,
Rubio
C
,
Vennin
C
,
Wilson
M
,
Wiles
A
, et al
The Δ133p53 isoform and its mouse analogue Δ122p53 promote invasion and metastasis involving pro-inflammatory molecules interleukin-6 and CCL2
.
Oncogene
2016
;
35
:
4981
9
17.
Joruiz
SM
,
Bourdon
J-C
. 
p53 isoforms: key regulators of the cell fate decision
.
Cold Spring Harb Perspect Med
2016
;
6
:
pii: a026039
.
18.
Nutthasirikul
N
,
Limpaiboon
T
,
Leelayuwat
C
,
Patrakitkomjorn
S
,
Jearanaikoon
P
. 
Ratio disruption of the Δ133p53 and TAp53 isoform equilibrium correlates with poor clinical outcome in intrahepatic cholangiocarcinoma
.
Int J Oncol
2013
;
42
:
1181
8
.
19.
Avery-Kiejda
KA
,
Zhang
XD
,
Adams
LJ
,
Scott
RJ
,
Vojtesek
B
,
Lane
DP
, et al
Small molecular weight variants of p53 are expressed in human melanoma cells and are induced by the DNA-damaging agent cisplatin
.
Clin Cancer Res
2008
;
14
:
1659
68
.
20.
Aoubala
M
,
Murray-Zmijewski
F
,
Khoury
MP
,
Fernandes
K
,
Perrier
S
,
Bernard
H
, et al
p53 directly transactivates Δ133p53α, regulating cell fate outcome in response to DNA damage
.
Cell Death Differ
2011
;
18
:
248
58
.
21.
Chen
J
,
Ng
SM
,
Chang
C
,
Zhang
Z
,
Bourdon
J-C
,
Lane
DP
, et al
p53 isoform Δ113p53 is a p53 target gene that antagonizes p53 apoptotic activity via BclxL activation in zebrafish
.
Genes Dev
2009
;
23
:
278
90
.
22.
Ungewitter
E
,
Scrable
H
. 
Delta40p53 controls the switch from pluripotency to differentiation by regulating IGF signaling in ESCs
.
Genes Dev
2010
;
24
:
2408
19
.
23.
Hinault
C
,
Kawamori
D
,
Liew
CW
,
Maier
B
,
Hu
J
,
Keller
SR
, et al
Δ40 Isoform of p53 controls beta-cell proliferation and glucose homeostasis in mice
.
Diabetes
2011
;
60
:
1210
22
.
24.
Arsic
N
,
Gadea
G
,
Lagerqvist
EL
,
Busson
M
,
Cahuzac
N
,
Brock
C
, et al
The p53 isoform Δ133p53β; promotes cancer stem cell potential
.
Stem Cell Reports
2015
;
4
:
531
40
.
25.
Slatter
TL
,
Ganesan
P
,
Holzhauer
C
,
Mehta
R
,
Rubio
C
,
Williams
G
, et al
p53-mediated apoptosis prevents the accumulation of progenitor B cells and B-cell tumors
.
Cell Death Differ
2010
;
17
:
540
50
.
26.
Marcel
V
,
Khoury
MP
,
Fernandes
K
,
Diot
A
,
Lane
DP
,
Bourdon
JC
. 
Detecting p53 isoforms at protein level
.
Methods Mol Biol
2013
;
962
:
15
29
.
27.
Khoury
MP
,
Marcel
V
,
Fernandes
K
,
Diot
A
,
Lane
DP
,
Bourdon
JC
. 
Detecting and quantifying p53 isoforms at mRNA level in cell lines and tissues
.
Methods Mol Biol
2013
;
962
:
1
14
.
28.
Hindson
CM
,
Chevillet
JR
,
Briggs
HA
,
Gallichotte
EN
,
Ruf
IK
,
Hindson
BJ
, et al
Absolute quantification by droplet digital PCR versus analog real-time PCR
.
Nat Methods
2013
;
10
:
1003
5
.
29.
Prensner
JR
,
Cao
X
,
Wu
Y-M
,
Robinson
D
,
Wang
R
,
Chen
G
, et al
The landscape of antisense gene expression in human cancers
.
Genome Res
2015
;
25
:
1068
79
.
30.
Xuan
J
,
Yu
Y
,
Qing
T
,
Guo
L
,
Shi
L
. 
Next-generation sequencing in the clinic: promises and challenges
.
Cancer Lett
2013
;
340
:
284
95
.
31.
Mercer
TR
,
Gerhardt
DJ
,
Dinger
ME
,
Crawford
J
,
Trapnell
C
,
Jeddeloh
JA
, et al
Targeted RNA sequencing reveals the deep complexity of the human transcriptome
.
Nat Biotech
2012
;
30
:
99
104
.
32.
Ozsolak
F
,
Milos
PM
. 
RNA sequencing: advances, challenges and opportunities
.
Nat Rev Genet
2011
;
12
:
87
98
.
33.
The Cancer Genome Atlas Research Network
,
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KRM
,
Ozenberger
BA
, et al
The Cancer Genome Atlas Pan-Cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
34.
Zhang
J
,
Baran
J
,
Cros
A
,
Guberman
JM
,
Haider
S
,
Hsu
J
, et al
International Cancer Genome Consortium Data Portal–a one-stop shop for cancer genomics data
.
Database
2011
;
2011
:
bar026
.
35.
Sharathchandra
A
,
Katoch
A
,
Das
S
. 
IRES mediated translational regulation of p53 isoforms
.
Wiley Interdiscip Rev RNA
2014
;
5
:
131
9
.
36.
Marcel
V
,
Tran
PL
,
Sagne
C
,
Martel-Planche
G
,
Vaslin
L
,
Teulade-Fichou
MP
, et al
G-quadruplex structures in TP53 intron 3: role in alternative splicing and in production of p53 mRNA isoforms
.
Carcinogenesis
2011
;
32
:
271
8
.
37.
Senturk
S
,
Yao
Z
,
Camiolo
M
,
Stiles
B
,
Rathod
T
,
Walsh
AM
, et al
p53Ψ is a transcriptionally inactive p53 isoform able to reprogram cells toward a metastatic-like state
.
Proc Natl Acad Sci
2014
;
111
:
E3287
96
.
38.
Li
B
,
Dewey
CN
. 
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
.
BMC Bioinformatics
2011
;
12
:
1471
2105
.
39.
Martin
M
. 
Cutadapt removes adapter sequences from high-throughput sequencing reads
.
EMBnet J
2011
;
17
:
10
2
.
40.
Langmead
B
,
Salzberg
SL
. 
Fast gapped-read alignment with Bowtie 2
.
Nat Methods
2012
;
9
:
357
9
.
41.
Trapnell
C
,
Pachter
L
,
Salzberg
SL
. 
TopHat: discovering splice junctions with RNA-Seq
.
Bioinformatics
2009
;
25
:
1105
11
.
42.
Trapnell
C
,
Williams
BA
,
Pertea
G
,
Mortazavi
A
,
Kwan
G
,
Van Baren
MJ
, et al
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
.
Nature Biotechnol
2010
;
28
:
511
5
.
43.
Kent
WJ
,
Sugnet
CW
,
Furey
TS
,
Roskin
KM
,
Pringle
TH
,
Zahler
AM
, et al
The human genome browser at UCSC
.
Genome Res
2002
;
12
:
996
1006
.
44.
Dalgleish
R
,
Flicek
P
,
Cunningham
F
,
Astashyn
A
,
Tully
RE
,
Proctor
G
, et al
Locus Reference Genomic sequences: an improved basis for describing human DNA variants
.
Genome Med
2010
;
2
:
1
7
.
45.
Li
H
,
Handsaker
B
,
Wysoker
A
,
Fennell
T
,
Ruan
J
,
Homer
N
, et al
The sequence alignment/map format and SAMtools
.
Bioinformatics
2009
;
25
:
2078
9
.
46.
Anders
S
,
Pyl
PT
,
Huber
W
. 
HTSeq—a Python framework to work with high-throughput sequencing data
.
Bioinformatics
2015
;
31
:
166
9
.
47.
Boldrup
L
,
Bourdon
JC
,
Coates
PJ
,
Sjostrom
B
,
Nylander
K
. 
Expression of p53 isoforms in squamous cell carcinoma of the head and neck
.
Eur J Cancer
2007
;
43
:
617
23
.
48.
Rhoads
A
,
Au
KF
. 
PacBio sequencing and its applications
.
Genomics Proteomics Bioinformatics
2015
;
13
:
278
89
.
49.
Rambaldi
D
,
Ciccarelli
FD
. 
FancyGene: dynamic visualization of gene structures and protein domain architectures on genomic loci
.
Bioinformatics
2009
;
25
:
2281
2
.