Abstract
Whole-exome (WES) and RNA sequencing (RNA-seq) are key components of cancer immunogenomic analyses. To evaluate the consistency of tumor WES and RNA-seq profiling platforms across different centers, the Cancer Immune Monitoring and Analysis Centers (CIMAC) and the Cancer Immunologic Data Commons (CIDC) conducted a systematic harmonization study.
DNA and RNA were centrally extracted from fresh frozen and formalin-fixed paraffin-embedded non–small cell lung carcinoma tumors and distributed to three centers for WES and RNA-seq profiling. In addition, two 10-plex HapMap cell line pools with known mutations were used to evaluate the accuracy of the WES platforms.
The WES platforms achieved high precision (> 0.98) and recall (> 0.87) on the HapMap pools when evaluated on loci using > 50× common coverage. Nonsynonymous mutations clustered by tumor sample, achieving an index of specific agreement above 0.67 among replicates, centers, and sample processing. A DV200 > 24% for RNA, as a putative presequencing RNA quality control (QC) metric, was found to be a reliable threshold for generating consistent expression readouts in RNA-seq and NanoString data. MedTIN > 30 was likewise assessed as a reliable RNA-seq QC metric, above which samples from the same tumor across replicates, centers, and sample processing runs could be robustly clustered and HLA typing, immune infiltration, and immune repertoire inference could be performed.
The CIMAC collaborating laboratory platforms effectively generated consistent WES and RNA-seq data and enable robust cross-trial comparisons and meta-analyses of highly complex immuno-oncology biomarker data across the NCI CIMAC-CIDC Network.
Given the biological complexity of immunotherapy strategies, data generated from cross-site clinical trials are often confounded by technical variations or artifacts. The Cancer Immune Monitoring and Analysis Centers (CIMAC) function to interface collaboratively with the Cancer Immunologic Data Commons (CIDC) to enable data standardization and the development of uniform analysis pipelines across clinical trials within the network.
This study details the CIMACs-CIDCs harmonization strategy for evaluating DNA and RNA sequencing data generated across distinct platforms and tissue preparation methods. This work also provides a roadmap for harmonizing diverse sequencing assays that employ different chemistries and data analysis pipelines. The key metrics for successful harmonization described herein are expected to further advance interlaboratory data comparison and database development to facilitate integrative cross-cohort analysis. This work is anticipated to facilitate biomarker development across trial networks and organizations and ultimately address the critically important mission of improving the therapeutic management of patients with cancer.
Introduction
The Cancer Immune Monitoring and Analysis Centers-Cancer Immunologic Data Commons (CIMAC-CIDC) Network (https://cimac-network.org/) is an NCI Cancer Moonshot initiative that provides cutting-edge technology and expertise in genomic, proteomic, and functional molecular analysis to enhance clinical trials in cancer immune therapies. CIMACs serve as the main units of the network for correlative studies in clinical trials involving cancer immunotherapy, functioning as platforms for deep molecular characterization of tumor and immune profiling using state-of-the-art analytically validated and standardized platforms. The CIDC, hosted by Dana-Farber Cancer Institute, is dedicated to providing a bioinformatics infrastructure for CIMACs as well as to build a biomarker database. The CIMACs work collaboratively with the CIDC to enable data standardization and the development of uniform analysis pipelines across studies within the Network.
Given the biological complexity of most immunotherapy strategies, data generated from cross-site clinical trials are often confounded by technical variations or artifacts. Objective quality control (QC) standards are indispensable for minimizing the variations due to differences in reference genomes, gene models, analytic algorithms, and processing pipelines. Harmonization of center-specific protocols and assay performance is necessary to establish standard operating procedures to overcome the variability of methods and data collection (1–5). In addition, assay harmonization is expected to facilitate objective interpretation and data comparison across different studies and multiple sites, thereby achieving a unified network for cross-trial comparisons and meta-analyses.
Whole-exome sequencing (WES) and RNA sequencing (RNA-seq) data provide a wealth of information for understanding tumor immune responses in clinical studies (6–12). WES can provide a comprehensive characterization of tumor mutations, from which neoantigens, mutational burden, and clonality can be inferred (6–8). Accumulating evidence has suggested the usage of tumor mutation burden and tumor neoantigen load as biomarkers for cancer immunotherapy response (7, 13–16). Likewise, RNA-seq provides a powerful tool to define response-driving factors from the tumor microenvironment (9, 10). Advanced computational methods are now making it possible to utilize RNA-seq data to estimate the composition of the tumor immune infiltrates (17–21) and infer infiltrating immune B- and T-cell receptor repertoires (22, 23). These immunologic characterizations have yielded valuable insights, with the potential to guide immunotherapy (9, 11, 12, 17, 22, 24, 25). Formalin fixation of tissue sample remains the standard protocol for tissue preservation in the clinical arena (26, 27). Successful use of formalin-fixed paraffin-embedded (FFPE) derived material in next-generation sequencing (NGS) applications has been reported previously (28–30). However, data evaluating whether sequencing data generated from FFPE can be used to robustly estimate immunologic characteristics, such as immune gene expression, neoantigens, HLA typing, immune infiltration, and immune repertoires, is lacking. In addition, many existing studies to date have not consistently used matched normal samples as germline comparison and therefore somatic mutation detection could not be rigorously evaluated (26, 27, 31–33).
In this study, the CIMACs and CIDC performed a cross-site harmonization of WES and RNA-seq data generated from three centers (A, B, and C). They include, but not necessarily in this order, the MD Anderson Cancer Center (MDACC, Houston, TX), the Broad Institute of Harvard and MIT, and the Molecular Characterization (MoCha) Lab at the Frederick National Laboratory for Cancer Research. Here, we describe the CIMAC-CIDC harmonization strategy for evaluating DNA sequencing and RNA-seq data generated across distinct platforms and tissue preparation methods. Moreover, we discuss the key metrics needed for successful harmonization within and among the three sites.
Materials and Methods
Sample preparation and sequencing
Two mixed HapMap cell line pools with well-characterized mutational profiles were used as truth data for the evaluation of WES (Supplementary Table S1; ref. 34). Matched FFPE tumor, fresh frozen (FF) tumor, and peripheral blood mononuclear cells (PBMC) from 8 patients with non–small cell lung cancer (NSCLC) of squamous cell carcinoma histology were also studied; the tumors were collected between the years 2012 and 2015. Ethical approval for this study was obtained under a lab protocol (ProtocolLAB90-020) and was reviewed by The University of Texas MD Anderson Cancer Center Institutional Review Board (IRB). All samples used in this study and article were obtained from patients consented under an IRB-approved informed consent. For the tumor specimens, percent tumor content, quantity, and quality of DNA and RNA were assessed at the originating center for sample preparation before distribution to all three centers for WES and RNA-seq (Supplementary Tables S2 and S3). All samples were sequenced to at least 200× mean target coverage (Supplementary Table S4) for WES and at a minimum depth of 50M paired-end fragments for RNA-seq following each center’s QC criteria (Supplementary Table S5). As an alternative to RNA-seq, RNA from FFPE samples were profiled by NanoString using the nCounter PanCancer Immune Profiling Panel (35). RNA quantity (ng) was determined with the qubit fluorometer while the quality was determined with the TapeStation by measuring the DV200 (Supplementary Table S6). Center C distributed the extracted RNA from the macrodissected and non-macrodissected samples to each site. All NanoString data were analyzed with NanoString’s nSolver (v4.0).
Centralized data processing by the CIMAC-CIDC bioinformatics pipeline
After sequencing, raw data were transferred to the CIDC for centralized analyses using CIDC common pipelines (Supplementary Figs. S1 and S2). Reference files for human hg38 (GRCh38.d1.vd1) were obtained from the NCI Genomic Data Commons. For the WES analysis, the CIMAC-CIDC platform incorporated the Sentieon (2018.08.05) workflow for read alignment and variant calling, in which read alignment was performed with Burrows-Wheeler Aligner (ref. 36; 0.7.15-r1140). Aligned and recalibrated BAM files were subjected to somatic mutation calling using Sentieon TNsnv algorithm. Low-quality mutations were filtered by VCFtools (ref. 37; 0.1.16), and remaining somatic mutations were annotated by VEP (ref. 38; v91). For RNA-seq analysis, read alignment was performed with STAR (ref. 39; v.2.4.2a). RNA-seq QC examination was performed on the aligned BAM files using RSeQC (40). Expression levels were quantified by SALMON (ref. 41; v.0.14.0). Batch effect removal was performed with Limma (ref. 42; 3.42.2). The immune cell repertoires were inferred from aligned BAM files using TRUST4 (ref. 22; v0.1.2). Expression profiles were subjected to immune infiltration estimation using Immunedeconv (43), which integrates six state-of-the-art estimation algorithms, including TIMER (17), xCell (18), MCPCounter (19), CIBERSORT (20), EPIC (21), and quanTIseq (44). Patient HLA types were estimated from both RNA-seq and WES using Optitype (ref. 45; 1.3.2). Expression profiles, somatic mutations, and HLA types from WES were integrated for neoantigen prediction using pVAC-Seq (ref. 24; 4.0.10) with NetMHC (ref. 46; v4.0), that leveraged information from both binding affinity and eluted ligand data (46).
Statistical analysis for assay and sample harmonization
Index of specific agreement (ISA), defined as 2 * Jaccard/(1 + Jaccard), was used to measure mutation agreements. ISA between samples was used as it has the potential to address downward bias in platform agreement on mutation detection when the true mutations are not prespecified (47). RNA-seq harmonization was assessed by the correlation level between replicates, sample tissue type (FF, FFPE), and sequencing centers. Spearman correlation was used to measure the agreements because it is a more robust measure in settings when the data deviate from a Gaussian distribution and is less influenced by outliers (47). Hierarchical clustering was performed on the ISA or Spearman correlation coefficient derived distance matrix with the average linkage to measure sample similarities.
WES and RNA-seq harmonization baseline and concordance evaluation
The Cancer Genome Atlas (TCGA) lung cancer (NSCLC) cohort (519 adenocarcinomas) was retrieved, processed, and analyzed to establish reference data from which to assess the agreement of mutations from different callers (48). Mutation calls made by different TCGA-approved mutation callers (MuSE, MuTect2, SomaticSniper, VarScan2) on identical WES raw data were found to have an ISA concordance between 0.22 and 0.90 (mean = 0.71). It has been noted that although there is no uniform criterion of “acceptable” agreement, a correlation of greater than 0.7, 0.8, and 0.9 can be considered as having adequate, good, and excellent correlation, respectively. Of the published studies, depending on the sequencing depth, mutation allele frequency, mutation calling tools, and sample processing, there is a large variation in the reported concordance level. The concordance levels between FF and FFPE have been reported to be around 70% in previous studies (1, 49). Therefore, if the different WES platforms applied to the same DNA sample yielded mutation calls with similar or higher ISA, these sequencing platforms were considered reasonably harmonized. Mutation agreement assessment was performed on the overlapping exon regions to ensure that data generated by different capture kits across centers were comparable. Mutation concordance in cancer driver genes was evaluated, wherein 50 cancer driver genes from Ion AmpliSeq Cancer Hotspot Panel (v2) and the 310 lung cancer oncogenes from COSMIC database (50, 51) were selected.
From the same TCGA NSCLC samples, we evaluated the pairwise correlations among RNA-seq data to create a harmonization baseline. Because tumors of the same cancer type are expected to have similar gene expression levels, we set minimum acceptance criteria of the RNA-seq platform harmonization at 0.94, which is the top 95% Spearman correlation coefficient of the studied TCGA samples (Supplementary Fig. S3). If the analysis of the same RNA-seq data by different transcriptome platforms revealed the samples to have expression levels with similar or higher correlation than TCGA baseline, then these RNA-seq platforms were considered reasonably harmonized. Secondary analyses, including expression-based immune cell infiltration estimated by TIMER (17), xCell (18), MCPCounter (19), CIBERSORT (20), EPIC (21), and quanTIseq (44), immune repertoires estimated by TRUST4 (22), and HLA typing inferred by Optitype (45) were evaluated for their concordance. The Spearman correlation coefficient, the proportion of overlapped unique CDR3s, and the Jaccard index were used as concordance metrics for the immune cell infiltration estimates, immune cell repertoires, and HLA types, respectively.
Data availability
All human WES and RNA-seq data presented in this article have been deposited at The database of Genotypes and Phenotypes (dbGaP) under accession number phs002295.v1.p1.
Results
Central sample preparation and distributed sequencing
We generated data from two sample formats: (i) HapMap cell line pools (n = 2); and (ii) NSCLC tumors with squamous cell carcinoma histology (n = 8). DNA from two HapMap cell line pools (xx and yy), each consisting of a mixture of 10 well-characterized HapMap cell lines, was equally mixed at Center C (Fig. 1A; ref. 34). In addition, DNA and RNA were centrally extracted from matched FF tumor and FFPE tumor of 8 patients with NSCLC at Center B (Fig. 1B). For tumor samples, germline DNA was also extracted from matched PBMC from the corresponding patients. Library preparation and sequencing were performed on two different days as technical replicates in all three centers (Fig. 1A and B). For both WES and RNA-seq, the capture kits used per sequencing center were distinct (Fig. 1C and D; Supplementary Table S7). For WES-seq, the overlap target regions between kits was increased if we focused on the exons (overlap region increased from 59.4% to 88.7%; Fig. 1C).
Illustration of study design and capture kits. A, Two HapMap cell line pools were generated and used to provide “ground truth” data. The HapMap cell lines xx and yy, each consisting of 10 individual HapMap cell lines mixed in equal proportions, were prepared and processed at Center C and distributed to all three centers for WES. Each HapMap pool was paired with a single cell line as germline control for mutation calls. The sequenced data were transferred to CIDC for centralized analyses. B, Tumor samples from 8 patients diagnosed with NSCLC were selected. DNA and RNA extraction was performed by Center B from both FF and FFPE processing. Germline DNA was also extracted from matched PBMCs from the corresponding patients. For all samples, two sets of aliquots were prepared as technical replicates. Extracted DNA and RNA were distributed to the three centers: Center A, B, and C for WES and RNA-seq. C, Overlap of WES target regions between the three centers were evaluated. Left, Venn diagram of overall covered regions from the different centers. Right, Venn diagram of the overlap in exome regions. D, Overlap regions were evaluated on the different RNA-seq capture kits used by the three centers.
Illustration of study design and capture kits. A, Two HapMap cell line pools were generated and used to provide “ground truth” data. The HapMap cell lines xx and yy, each consisting of 10 individual HapMap cell lines mixed in equal proportions, were prepared and processed at Center C and distributed to all three centers for WES. Each HapMap pool was paired with a single cell line as germline control for mutation calls. The sequenced data were transferred to CIDC for centralized analyses. B, Tumor samples from 8 patients diagnosed with NSCLC were selected. DNA and RNA extraction was performed by Center B from both FF and FFPE processing. Germline DNA was also extracted from matched PBMCs from the corresponding patients. For all samples, two sets of aliquots were prepared as technical replicates. Extracted DNA and RNA were distributed to the three centers: Center A, B, and C for WES and RNA-seq. C, Overlap of WES target regions between the three centers were evaluated. Left, Venn diagram of overall covered regions from the different centers. Right, Venn diagram of the overlap in exome regions. D, Overlap regions were evaluated on the different RNA-seq capture kits used by the three centers.
CIMAC genomic platforms achieved high precision and recall in WES calling from the HapMap cell line pools
Utilizing the known mutations and allele fractions in the HapMap cell line pools, we evaluated key determinants of WES platform harmonization. Despite the inherent complexity of the assays and independent protocol development between sites, the WES data generated at different sites and replicates had comparable read coverage and variant allele frequency (VAF; Fig. 2A). The sequencing data for the HapMap samples were highly concordant between technical replicates for all mutations, as well as for nonsynonymous mutations only (ISA >0.874 and ISA >0.875, respectively) (Fig. 2B). These results suggested that potential technical bias and variation introduced during library preparation and sequencing within centers are acceptable.
Evaluation and harmonization of somatic mutations identified in the HapMap cell line pools. A, The read depth and VAF of the somatic mutations detected in the HapMap pools xx and yy across the three centers. B, Reproducibility between technical replicates in each center measured by ISA. The evaluations were performed using all mutations and nonsynonymous mutations (NS). C, Agreement assessment between centers and the truth data, based on mutation call agreement (ISA concordance score) on overlap of target exons. D, Mutation agreement between the three centers was evaluated as a function of coverage and VAF. Red, mutations identified in only one center; Gray, mutations identified by two centers; Blue, mutations identified by all three centers. Numbers indicate the percentage of mutations called by at least two centers. E, Precision and recall for mutation calling at different VAFs when evaluated against truth data.
Evaluation and harmonization of somatic mutations identified in the HapMap cell line pools. A, The read depth and VAF of the somatic mutations detected in the HapMap pools xx and yy across the three centers. B, Reproducibility between technical replicates in each center measured by ISA. The evaluations were performed using all mutations and nonsynonymous mutations (NS). C, Agreement assessment between centers and the truth data, based on mutation call agreement (ISA concordance score) on overlap of target exons. D, Mutation agreement between the three centers was evaluated as a function of coverage and VAF. Red, mutations identified in only one center; Gray, mutations identified by two centers; Blue, mutations identified by all three centers. Numbers indicate the percentage of mutations called by at least two centers. E, Precision and recall for mutation calling at different VAFs when evaluated against truth data.
We next examined the extent to which there was agreement in mutational burden among the three centers. Agreement assessments were performed on the overlap exon regions to ensure that data generated by different capture kits were comparable (Fig. 1C). Upon comparison of mutations called between center-specific data and ground truth data, we obtained an ISA of 0.827 and 0.817 for the xx and yy pools, respectively (Fig. 2C). Mutation agreement among the centers was further investigated as a function of coverage and VAF. At each coverage and VAF cutoff, agreement was evaluated on the basis of the likelihood that a mutation would be detected in common by at least two centers. Overall, a higher level of concordance was observed with increased VAF and with greater in-depth coverage (Fig. 2D). Specifically, a VAF of 10% and 50× coverage cutoff yielded a 95% likelihood that a mutation would be called in common by at least two centers (Fig. 2D). The truth data provided by the HapMap pools gave us an opportunity to evaluate cross-site data variability and reproducibility. In evaluating the VAFs derived from the overlap target regions with common coverage greater than 50×, precision was greater than 0.98 and recall was greater than 0.87 at 10% VAF at all three centers (Fig. 2E). Altogether, the clustering results, the high precision and the high recall in WES called from the HapMap cell line pools lead us to conclude that the CIMAC genomic platforms and the CIDC analysis pipelines have been adequately optimized to ensure reliability and reproducibility in data generation.
Biological differences between tumors are much greater than platform/process-specific differences on WES
To validate the robustness of the center-specific reagents, protocols, data-transferring procedures, and range of acceptance criteria, we performed additional evaluations using the FF and FFPE NSCLC samples. Of note, deamination of nucleotides causes C:G>T:A changes in FFPE tissue samples and can produce false positives during NGS (52, 53). DNA from matched FF and FFPE NSCLC tumors and PBMC in the corresponding patients were subjected to WES. Although generated at different centers using distinct protocols, the coverage, VAF calls, and nonsynonymous mutation loads were comparable across replicates, centers, and sample preparations (FF vs. FFPE; Fig. 3A and B). Mutations called from FF and FFPE were comparable with an overlap rate of approximately 85% in the 50-gene panel (Materials and Methods; Fig. 3C). Furthermore, nucleotide changes shared similar distributions between mutations derived from FF and FFPE tissues (Fig. 3D), suggesting that the deamination effect was not a dominant bias in mutation call from the FFPE specimens. Of note, the similar mutational signature patterns between FF and FFPE were consistent with observations previously detected in large cohorts of whole-genome sequencing data (1, 54). These data together suggested that the mutations obtained from FFPE tissues collected in clinical settings are comparable with FF samples.
Evaluation and harmonization of somatic mutations identified in NSCLC samples. A, The read depth and VAF of the somatic mutations generated from the three centers, separated by FF and FFPE samples. B, The nonsynonymous mutation loads of NSCLC samples across the three centers, separated by FF and FFPE samples. C, Agreement between somatic mutations derived from FF and FFPE samples. The bars are the proportions of FF- and FFPE-unique mutations, and their overlaps. D, Distributions of the nucleotide changes for the FF and FFPE mutations, and their overlaps. E, Agreement assessment between mutations generated across replicates, centers, and sample processing (FF and FFPE). Clustering was performed upon the pairwise mutation call agreements reflected by ISA scores. F, Mutation agreement between the three centers was evaluated as a function of coverage and VAF. Red, mutations identified in one center; Gray, mutations identified in common by two centers; Blue, mutations identified by all three centers. Numbers indicate the percentage of mutations called by at least two centers at the corresponding cutoffs.
Evaluation and harmonization of somatic mutations identified in NSCLC samples. A, The read depth and VAF of the somatic mutations generated from the three centers, separated by FF and FFPE samples. B, The nonsynonymous mutation loads of NSCLC samples across the three centers, separated by FF and FFPE samples. C, Agreement between somatic mutations derived from FF and FFPE samples. The bars are the proportions of FF- and FFPE-unique mutations, and their overlaps. D, Distributions of the nucleotide changes for the FF and FFPE mutations, and their overlaps. E, Agreement assessment between mutations generated across replicates, centers, and sample processing (FF and FFPE). Clustering was performed upon the pairwise mutation call agreements reflected by ISA scores. F, Mutation agreement between the three centers was evaluated as a function of coverage and VAF. Red, mutations identified in one center; Gray, mutations identified in common by two centers; Blue, mutations identified by all three centers. Numbers indicate the percentage of mutations called by at least two centers at the corresponding cutoffs.
Across multiple studies conducted over the years, no single best strategy has been identified for somatic mutation calling from cancer specimens (55, 56). VAF, sequencing depth, and sequencing technique are multiple factors that determine whether a variant can be detected (55). We found that although the NSCLC specimens were sequenced and processed at different centers as technical replicates, the mutations clustered by patient with concordance levels above 0.67 among all samples (Fig. 3E; Supplementary Fig. S4). Of note, the ISAs we reported were based on the lowest ISA among samples generated in different replicates, centers, and sample preparation (FF vs. FFPE) for the same tumor. The vast majority of ISAs (96.1%) we obtained is greater than 0.7, with a median of 0.81, which outperformed previously reported concordances between FF and FFPE samples (1, 49). To evaluate the key determinants of harmonization, we further evaluated agreement in mutation burden as a function of coverage and VAF. Overall, filtering by increasing VAF and coverage yielded fewer mutations and higher accuracy (Fig. 3F). For FF and FFPE, a cutoff at 10% VAF and 50× coverage resulted in a 93% and 87% likelihood for a mutation to be called in common by at least two centers (Fig. 3F). The high concordance level indicated to us that technical differences between replicates, centers, and sample preparation (FF vs. FFPE) were much smaller than the biological differences across tumors.
RNA-seq data generated from NSCLC are comparable between replicates, sample preparations, and centers
For the RNA-seq data generated across centers, concordance was assessed by correlating the gene expression among replicates, by sample preparation (FF, FFPE) and per sequencing center. Supplementary Figure S5 shows the clustering result of the 150 samples based on Spearman correlation coefficient of log-transformed expression data. Multiple medTIN cutoffs were evaluated to determine the minimum cutoff at which the RNA samples could harmonize. Of note, medTIN score is a postsequencing QC metric to measure RNA integrity and RNA degradation (57). At a medTIN cutoff of >50, the resultant 91 samples clustered by patient, with a minimum Spearman correlation above 0.94 among samples from the same tumor, thereby achieving a concordance level consistent with the prespecified TCGA-based acceptance criteria (Materials and Methods; Fig. 4A). In addition, a cutoff of medTIN >30 was tested, and the resultant 134 samples achieved concordance levels above 0.90. Although the concordance level did not satisfy the prespecified criterion (0.94, based on TCGA NSCLC data), samples still clustered by patient regardless of replicates, centers, or sample preparations (FFPE vs. FF; Fig. 4B). Together, these data suggested that medTIN > 50 could be used as a postsequencing QC criterion to ensure all the samples cluster by tumors and to meet the prespecified concordance cutoff, while medTIN > 30 could be used to ensure that all samples cluster by tumors.
RNA-seq harmonization and QC matrix evaluation. A and B, Hierarchical clustering based on Spearman correlation coefficient of log2-transcripts per kilobase million (TPM) values for FF and FFPE samples with medTIN > 50 (91 samples, A), along with QC metrics. B, With medTIN > 30 (DV200 > 24%; 134 samples). C, Scatterplot of medTIN and DV200 scores for the 150 sequenced samples. Outliers (yellow) are the samples that did not cluster by patients. D, Scatterplot of medTIN scores and exome mapping rate for the 150 sequenced samples. Outliers are the samples that did not cluster by patient. Spearman correlation was performed to calculate the association between the two QC metrics in C and D.
RNA-seq harmonization and QC matrix evaluation. A and B, Hierarchical clustering based on Spearman correlation coefficient of log2-transcripts per kilobase million (TPM) values for FF and FFPE samples with medTIN > 50 (91 samples, A), along with QC metrics. B, With medTIN > 30 (DV200 > 24%; 134 samples). C, Scatterplot of medTIN and DV200 scores for the 150 sequenced samples. Outliers (yellow) are the samples that did not cluster by patients. D, Scatterplot of medTIN scores and exome mapping rate for the 150 sequenced samples. Outliers are the samples that did not cluster by patient. Spearman correlation was performed to calculate the association between the two QC metrics in C and D.
To determine a set of criteria for the generation of reliable RNA-seq data, we further evaluated other QC metrics, including DV200 and exon mapping rate. DV200, a presequencing quality metric, was highly associated with medTIN score (Spearman correlation = 0.63; Fig. 4C). While the manufacturer has recommended that samples with DV200 > 30% usually yield better RNA-seq data quality, our data showed concordance among samples even at DV200 > 24% (Fig. 4B). Using DV200 > 24% as cutoff, we could rule out the samples that did not cluster by tumor ID (Fig. 4C). Exon mapping rate (EMR), another commonly used QC metric to quantify the percentage of reads mapping to exon regions, was also associated with medTIN score (P = 0.04). However, we did not find EMR as a useful QC metric for ruling out outliers (Fig. 4D). These analyses together showed that DV200 of 24% is an effective presequencing QC metric for the generation of RNA-seq data.
We next performed simulation studies to investigate whether the read number or gene number is a key determinant for successful harmonization. We downsampled the data from FFPE samples of Center C from 113M paired-end reads to 50M. Using the expression profiles derived from the downsampled reads, we could cluster the samples by tumor with concordance levels above 0.97 (Supplementary Fig. S6). These high correlations suggested to us that 50M paired-end reads was an adequate read number to yield concordance. The effects of gene number on the harmonization were evaluated as well. The top 3,000 most variable genes were selected on the basis of variance distribution in the log-transformed expression profile. When the 3,000 most variable genes were used for clustering, the lower bound correlation level decreased to 0.88 (Supplementary Fig. S7). Despite the decreased concordance result, the samples still clustered by tumor ID. In addition, the clustering result was better than the baseline derived from TCGA NSCLC samples using the 3,000 most variable genes (0.88 vs. 0.85; Supplementary Fig. S3).
QC metrics were evaluated to determine optimal cutoffs to generate acceptable secondary immunogenomic characteristics from RNA-seq data, including HLA typing, immune cell infiltration, and immune repertoire. Attempts were made to evaluate the sample data quality across different medTIN scores. All samples from the same tumor were inferred to have identical HLA type when a medTIN cutoff of 50 was used. In contrast, seven samples (8.3%) were noted to have off-target HLA typing when a medTIN cutoff of 30 was used. Agreement between FF and FFPE, measured by the Jaccard index, was similar whether a medTIN cutoff of 50 or 30 was used (1.0 and 0.99, respectively). Overall, FF and FFPE samples clustered by tumor in both medTIN 50 and 30 cut-off groups (Fig. 5A and B), and matched FF and FFPE samples per tumor shared similar immune infiltration patterns (Fig. 5C and D). The average Spearman correlations between the FF and FFPE samples were 0.88 and 0.87 for the medTIN 50 and 30 cut-off groups, respectively. When we examined the immune repertoires estimated from the RNA-seq data using TRUST4 (22), in which the inferred CDR3 clonotypes included TCRA, TCRB, TCRD, IGH, IGK, and IGL, the immune repertoires inferred from the matched FF and FFPE were highly concordant among samples from the same tumor regardless of medTIN cutoff (50 or 30; Fig. 5E and F). The immune repertoire clonality correlation between FF and FFPE was slightly higher in the medTIN 50 cut-off group, compared with the medTIN 30 group (Rho 0.58 vs. 0.55; Fig. 5G and H). Overall, when the cutoff of medTIN was above 30, immune cell infiltration and repertoires mostly clustered on a per patient basis; HLA typing estimation clearly distinguished between tumors. Together, these results suggested that the quality of secondary immunogenomic characteristics were acceptable when inferred from RNA-seq data with medTIN above 30 (or, equivalently, DV200 > 24%).
HLA typing, immune cell infiltration, and immune repertoire inferred from RNA-seq of FF and FFPE tumors robustly cluster together. A and B, Clustering assessment between FF and FFPE samples using HLA typing inferred from RNA-seq using the tool Optitype (45). A, medTIN >50 as a cutoff for sample selection, or medTIN >30 as cutoff (B). C and D, Clustering analysis between FF and FFPE using immune infiltration as features. The infiltration was estimated by immunedeconv (43), which integrates six state-of-the-art estimation algorithms, including TIMER (17), xCell (18), MCPCounter (19), CIBERSORT (20), EPIC (21), and quanTIseq (44). As cutoffs, medTIN >50 (C) and medTIN >30 (D) were assessed. E and F, Clustering results between FF and FFPE using immune repertoires as features. Immune repertoires were estimated by TRUST4, which is an updated version of the original TRUST (22) to infer CDR3 clonal types for TCRA, TCRB, TCRD, IGH, IGK, and IGL in tumor immune repertoires, using medTIN >50 (E) or medTIN >30 (F) as cutoffs. G and H, Scatterplot of immune repertoire clonality inferred in FF and FFPE tumors (using Spearman correlations) using medTIN >50 (G) or medTIN >30 (H) as cutoffs.
HLA typing, immune cell infiltration, and immune repertoire inferred from RNA-seq of FF and FFPE tumors robustly cluster together. A and B, Clustering assessment between FF and FFPE samples using HLA typing inferred from RNA-seq using the tool Optitype (45). A, medTIN >50 as a cutoff for sample selection, or medTIN >30 as cutoff (B). C and D, Clustering analysis between FF and FFPE using immune infiltration as features. The infiltration was estimated by immunedeconv (43), which integrates six state-of-the-art estimation algorithms, including TIMER (17), xCell (18), MCPCounter (19), CIBERSORT (20), EPIC (21), and quanTIseq (44). As cutoffs, medTIN >50 (C) and medTIN >30 (D) were assessed. E and F, Clustering results between FF and FFPE using immune repertoires as features. Immune repertoires were estimated by TRUST4, which is an updated version of the original TRUST (22) to infer CDR3 clonal types for TCRA, TCRB, TCRD, IGH, IGK, and IGL in tumor immune repertoires, using medTIN >50 (E) or medTIN >30 (F) as cutoffs. G and H, Scatterplot of immune repertoire clonality inferred in FF and FFPE tumors (using Spearman correlations) using medTIN >50 (G) or medTIN >30 (H) as cutoffs.
The NanoString platform was evaluated for its potential to serve as an alternative approach for transcriptome profiling in cases of low-quality RNA samples. RNA extracted and processed from 7 patients with NSCLC of squamous cell carcinoma histology were subjected to the NanoString PanCancer Immune Profiling Panel for transcriptomic quantification (35) at Centers B and C. DV200 cutoffs were evaluated by hierarchical clustering to determine the minimum cutoff at which the NanoString-generated data could harmonize between different sample processing (macrodissected and non-macrodissected) and centers. Overall, the majority of samples with DV200 below 24% failed to cluster by patient (15/20 failed, 75%; Supplementary Fig. S8), a few samples with DV200 above 24% failed to cluster as well (3/44 failed, 6.81%). In summary, while NanoString gene expression data can be generated even from samples with very low DV200 that failed to produce RNA-seq libraries, our hierarchical clustering analysis indicates that the quality of such NanoString data originating from samples with very low DV200 may not be reliable.
Integrated DNA and RNA analyses revealed important immunogenomic features in NSCLC
Transcriptomics is a critical adjunct to genomics when interrogating patient tumors for actionable alterations (58). We therefore explored the potential of utilizing matched WES and RNA-seq to derive reliable cancer immunogenomic characteristics across centers and sample preparations (FF vs. FFPE). Analysis of the somatic mutations among the samples highlighted the consistent detection of multiple known recurrently mutated drivers of NSCLC across replicates, centers, and sample preparation (Fig. 6A; Materials and Methods). The majority (48/51, 94%) of mutated cancer driver genes were confirmed to have high expression levels (Fig. 6A, left). TP53 was the most frequent mutated cancer driver gene (6/7 samples), consistent with TCGA squamous cell lung carcinomas data (48). These results suggested that the CIMAC-CIDC analysis pipeline can reliably identify cancer driver mutations across replicates, centers, and preparations.
Integrated DNA and RNA analyses in FF and FFPE NSCLC tumors. A, Comutation plot using WES and RNA-seq of the NSCLC tumors. The average log TPM expression is shown on the left panel. Mutations were called by the TnSnv algorithm from the Sentieon pipeline. Average log expression was calculated from SALMON counts. B, HLA types were estimated for both WES and RNA-seq data using the tool Optitype (45). Jaccard index was calculated per patient using RNA-seq (y-axis) and WES (x-axis) data. C, Comparison of mutation and neoantigen load per patient specimen between FF and FFPE. Neoantigens were inferred by pVAC-Seq (24). Mutation load is the total number of nonsynonymous mutations.
Integrated DNA and RNA analyses in FF and FFPE NSCLC tumors. A, Comutation plot using WES and RNA-seq of the NSCLC tumors. The average log TPM expression is shown on the left panel. Mutations were called by the TnSnv algorithm from the Sentieon pipeline. Average log expression was calculated from SALMON counts. B, HLA types were estimated for both WES and RNA-seq data using the tool Optitype (45). Jaccard index was calculated per patient using RNA-seq (y-axis) and WES (x-axis) data. C, Comparison of mutation and neoantigen load per patient specimen between FF and FFPE. Neoantigens were inferred by pVAC-Seq (24). Mutation load is the total number of nonsynonymous mutations.
With the matched WES and RNA-seq data available, HLA typing derived from both assays were compared. The four-digit level of accuracy for HLA typing was inferred using Optitype (45) on both assay data. Overall, HLA typing inferred from WES and RNA-seq was highly concordant for both FF and FFPE. The Jaccard index of HLA typing between the two platforms was 1.0 and 0.98 for FF and FFPE. HLA typing inferred from WES and RNA-seq clustered by tumor, suggesting the CIMAC-CIDC genomics platforms could generate reliable HLA typing regardless of sample preparations, sequencing centers, and sequencing platforms (Fig. 6B). The number of neoantigen calls, using the pVAC-Seq analysis pipeline (24), was performed on both FF and FFPE samples, and was highly associated with mutation burden in both FF (Rho = 0.71) and FFPE (Rho = 0.66) across centers.
Discussion
The CIMAC-CIDC network undertook an effort to establish harmonized platforms for the genomic analyses of clinical specimens from immunotherapy trials including WES and RNA-seq. This study provides a roadmap for how to harmonize diverse sequencing assays that may employ different chemistries and data analysis pipelines. The cross-center concordance evaluation assessed the factors that contributed to the discrepancies and those that facilitate sample harmonization. During the harmonization process, each participating center evaluated and confirmed the validity of center-specific reagents, standards, analytic methods, protocols, and data-reporting procedures throughout assay development and implementation. The discrepancies in somatic mutation calls and expression levels were found to be acceptable between replicates, sample preparation (FF vs. FFPE), and centers. Overall, this study demonstrated the feasibility of leveraging the resources available at different facilities to achieve high throughput at an acceptable level of consistency.
In this harmonization effort, CIMAC-CIDC rigorously evaluated sequencing data generated from multiple assays, including WES, RNA-seq, and NanoString. This study affirmed multiple key determinants to achieve sequencing-assay harmonization, including (i) use of rigorously validated assays at all centers, (ii) focus on the overlap of capture regions, (iii) use of a common data analysis pipeline, and (iv) application of appropriate metrics for reporting the data, including a requirement for 50× coverage and 10% VAF for WES data, and medTIN > 30 and DV200 > 24% for RNA-seq. Studies have reported concordance level of somatic mutation calls generated from different sequencing centers and different pipelines (59, 60). In our study, we leveraged the replicated sequencing data [3 centers × 2 replicates × 2 processing (FF and FFPE)] and the HapMap data to systematically evaluate the key determinants for harmonization. In addition to evaluating somatic mutation calls, we have investigated a set of criteria to harmonize the expression profile, immune infiltration, CDR3 immune repertoire, and neoantigen calls. Together, these evaluation efforts will provide an analysis roadmap for our multisite sequencing data harmonization.
One caveat in our study is the limited number of samples evaluated, namely, two HapMap cell lines and eight NSCLC tumor samples. This study aimed to establish protocols for WES and RNA-seq library preparation and QC metrics to allow reliable and robust cross-site data generation. We have replicated the WES and RNA-seq in twelve aliquots [3 centers × 2 replicates × 2 processing (FF and FFPE)] for each of the eight NSCLC samples and each of the two mixed HapMap cell pools. The high replicate number allowed us to robustly identify factors introducing variability and to set up criteria to ensure comparable results across CIMAC sites while using a centralized analysis pipeline.
WES provides the opportunity to evaluate a spectrum of somatic alterations, whereas RNA-seq provides cell immunologic phenotypes, including tumor immune infiltration, HLA typing, and immune repertoire. Here we report a harmonization effort carried out by CIMAC-CIDC to ensure reproducible WES and RNA-seq results both between centers and sample preparation (FF vs. FFPE) that meet the minimum prespecified QC criteria. The high level of concordance found supports interpretability of datasets across CIMACs and studies and will facilitate development of a database for secondary analyses. These efforts are particularly important and relevant in an era when evidence-based precision medicine is becoming more prevalent.
Authors’ Disclosures
B. Das reports other from U.S. Government (under contract from HHS) during the conduct of the study and nonfinancial support from Illumina Inc. outside the submitted work. B. Sanchez-Espiridion reports grants from NIH during the conduct of the study and grants from NIH outside the submitted work. D.Y. Duose reports other from Chrysalis Biomed outside the submitted work. D.S. Neuberg reports grants from NIH U24 CA224331 during the conduct of the study. S. Shukla reports nonfinancial support from Bristol-Myers Squibb during the conduct of the study and other from Agenus, Agios Pharmaceuticals, Breakbio Corp, Bristol-Myers Squibb, and Lumos Pharma outside the submitted work. J. Zhang reports grants from NIH during the conduct of the study, as well as grants from Merck; grants and personal fees from Johnson & Johnson; personal fees and other from Bristol Myers Squibb; and personal fees from Bristol Myers Squibb, AstraZeneca, Geneplus, OrigMed, Innovent, and Roche outside the submitted work. I.I. Wistuba reports grants and personal fees from Genentech, Bayer, Bristol Myers Squibb, AstraZeneca/Medimmune, Pfizer, HTG Molecular, and Merck; personal fees from Roche, Asuragen, Glaxo Smith Klane, Guardant Health, Oncocyte, Flame Inc, and MSD; and grants from Oncoplex, Adaptive, Adaptimmune, EMD Serono, Takeda, Amgen, Karus, Johnson & Johnson, Iovance, 4D, Novartis, and Akoya outside the submitted work. X.S. Liu reports grants from NIH and Foundation for the NIH during the conduct of the study and is a cofounder, board member, and consultant of GV20 Oncotherapy and its subsidiaries; is a SAB of 3DMedCare; is a consultant for Genentech; is a stockholder of BMY, TMO, WBA, ABT, ABBV, and JNJ; and receives research funding from Takeda and Sanofi. C.J. Wu reports other from BioNTech outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
Z. Zeng: Data curation, formal analysis, methodology, writing–original draft. J. Fu: Formal analysis, visualization, writing–original draft. C. Cibulskis: Resources, data curation, supervision, project administration. A. Jhaveri: Data curation, validation. C. Gumbs: Conceptualization, resources, data curation, supervision, project administration, writing–review and editing. B. Das: Conceptualization, resources, methodology, writing–review and editing. B. Sanchez-Espiridion: Data curation, investigation, writing–review and editing. S. Janssens: Project administration. L. Taing: Data curation, software, visualization, methodology. J. Wang: Data curation, software. J. Lindsay: Data curation, supervision. T. Vilimas: Supervision, investigation. J. Zhang: Data curation, supervision, writing–review and editing. C. Tokheim: Data curation, formal analysis, visualization, writing–review and editing. A. Sahu: Data curation, investigation, visualization, writing–review and editing. P. Jiang: Data curation, formal analysis, visualization. C. Yan: Data curation, formal analysis, writing–review and editing. D.Y. Duose: Data curation, formal analysis, writing–review and editing. E. Cerami: Conceptualization, resources, supervision, writing–review and editing. L. Chen: Data curation, investigation. D. Cohen: Data curation, formal analysis, visualization. Q. Chen: Formal analysis, methodology. R. Enos: Supervision, writing–review and editing. X. Huang: Data curation, formal analysis. J.J. Lee: Supervision, methodology, writing–review and editing. Y. Liu: Data curation, formal analysis. D.S. Neuberg: Data curation, methodology. C. Nguyen: Data curation, formal analysis. C. Patterson: Resources, data curation, supervision, writing–review and editing. S. Sarkar: Data curation, formal analysis, writing–review and editing. S. Shukla: Supervision, methodology, writing–review and editing. M. Tang: Data curation, formal analysis. J. Tsuji: Data curation, methodology, writing–review and editing. M. Uduman: Formal analysis, writing–review and editing. X. Wang: Data curation, writing–review and editing. J.L. Weirather: Formal analysis, writing–review and editing. J. Yu: Data curation, formal analysis. J. Yu: Project administration. J. Zhang: Supervision, validation, methodology, writing–review and editing. J. Zhang: Data curation, methodology, writing–review and editing. D. Meerzaman: Conceptualization, resources, data curation, supervision, funding acquisition, writing–review and editing. M. Thurin: Conceptualization, resources, supervision, writing–review and editing. A. Futreal: Conceptualization, resources, funding acquisition, writing–review and editing. C. Karlovich: Conceptualization, resources, supervision, investigation, writing–review and editing. S.B. Gabriel: Resources, data curation, formal analysis, methodology. I.I. Wistuba: Conceptualization, resources, supervision, writing–review and editing. X.S. Liu: Conceptualization, resources, supervision, funding acquisition, writing–original draft, writing–review and editing. C.J. Wu: Conceptualization, resources, supervision, funding acquisition, methodology, writing–original draft, writing–review and editing.
Acknowledgments
Scientific and financial supports for the CIMAC-CIDC Network are provided through the NCI Cooperative Agreements U24CA224319 (to the Icahn School of Medicine at Mount Sinai CIMAC), U24CA224331 (to the Dana-Farber Cancer Institute CIMAC), U24CA224285 (to the University of Texas MD Anderson Cancer Center CIMAC), U24CA224309 (to the Stanford University CIMAC), and U24CA224316 (to the CIDC at Dana-Farber Cancer Institute). This work is also supported by grants from the Lung SPORE P50CA070907, NIH CCSG Award (CA016672), MD Anderson Cancer Center Support Grant (CA016672), and NCI under contract HHSN261200800001E. Additional support is made possible through the NCI CTIMS Contract HHSN261201600002C (to the Emmes Company, LLC).
Scientific and financial supports for the PACT projects are made possible through funding support provided to the FNIH by AbbVie Inc., Amgen Inc., Boehringer-Ingelheim Pharma GmbH & Co. KG., Bristol-Myers Squibb, Celgene Corporation, Genentech Inc, Gilead, GlaxoSmithKline plc, Janssen Pharmaceutical Companies of Johnson & Johnson, Novartis Institutes for Biomedical Research, Pfizer Inc., and Sanofi.
The authors thank Anita Giobbie-Hurder, Mickey Williams, Rajesh Patidar, Chip Stewart, and Gad Getz for their valuable discussions and suggestions. They also acknowledge helpful suggestions from Helen Chen, Holden Maecker, and Sacha Gnjatic.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.