In this article (Cancer Res 2015;75:2445–56), which appeared in the June 15, 2015, issue of Cancer Research (1), the authors present translational studies supporting a causal link between the expression of the endothelial cell–expressed TGFβ family receptor ALK1 and metastatic dissemination of breast cancer. For a subsection of their studies, the authors utilized data from gene expression analysis of patient samples from a clinical cohort designed as a nested case–control study (data presented in Table 1). In subsequent follow-up studies, the authors have uncovered a potential bias in this dataset. Importantly, however, the analyses included in the article are unaffected, and the conclusions of the work are not in question.
As background, a metastatic breast cancer cohort study was first designed (2). Thereafter, a case–control study nested in the corresponding primary breast cancer cohort was designed by selecting distant metastasis–free controls to each case. Tumor RNA was extracted in the same order. All RNAs were profiled on microarrays in randomized order. For quality control, RNA was also reextracted in a randomized order for randomly selected cases–controls sets and profiled with the rest. The potential bias of the data from the nested case–control study is due to apparent RNA extraction batch effects confounded with case–control status. Reassuringly, gene expression data for endothelial ALK1 are consistent for a substudy in which RNA has been reextracted from a new tumor piece in a randomized order.
The correlation between gene expression data for original and reextracted RNA is excellent for key breast cancer genes, for example, ESR1 (r = 0.95) and ERBB2 (r = 0.96). Bridging the primary comparison, case–control set differences (n = 40) for ACVRL1 and the ACVRL1:endothelial metagene index that we reported are consistent between the two extractions (Fig. 1). A case–control set difference is the value for the case minus the (average) value of the matched control(s).
Although the potential bias of the dataset does not affect the outcome of the current study, the authors recommend careful scrutiny of the data and inclusion of proper controls when attempting other analyses based on these data. Microarray data for the reextracted RNA are deposited at the Gene Expression Omnibus database under accession number GSE81954.
All authors have been informed of and agree to this correction.