The Cancer Imaging Archive (TCIA) hosts publicly available deidentified medical images of cancer from over 25 body sites and over 30,000 patients. Over 400 published studies have utilized freely available TCIA images. Images and metadata are available for download through a web interface or a REST API. Here, we present TCIApathfinder, an R client for the TCIA REST API. TCIApathfinder wraps API access in user-friendly R functions that can be called interactively within an R session or easily incorporated into scripts. Functions are provided to explore the contents of the large database and to download image files. TCIApathfinder provides easy access to TCIA resources in the highly popular R programming environment. TCIApathfinder is freely available under the MIT license as a package on CRAN (https://cran.r-project.org/web/packages/TCIApathfinder/index.html) and from https://github.com/pamelarussell/TCIApathfinder.

Significance: These findings present a new tool, TCIApathfinder, the first client for The Cancer Imaging Archive (TCIA) for use in the highly popular R computing environment, that will dramatically lower the barrier of access to the valuable tools in TCIA. Cancer Res; 78(15); 4424–6. ©2018 AACR.

The Cancer Imaging Archive

The Cancer Imaging Archive (TCIA) provides deidentified clinical images of cancer for use by the research community (1). Currently, TCIA includes images from over 25 cancer types and over 30,000 patients. TCIA also hosts supporting data related to the images, as well as analysis results from the research community based on TCIA data. TCIA is contributing to research efforts toward understanding the genomic basis of cancer by providing over 20 collections of clinical images from patients whose matched tumor genomic profiles are freely available from The Cancer Genome Atlas (2).

TCIA datasets, referred to as “collections,” typically represent sets of patients sharing a common disease. Descriptions of each collection are available from the TCIA website. A variety of imaging modalities are represented, such as magnetic resonance imaging, computed tomography, and positron emission tomography. The image files in TCIA conform to the widely adopted DICOM standard (3). For each patient, one or more image studies are included. A study may include one or more image series. In turn, an image series is a stack of two-dimensional images from a single run of an instrument. Patient names and birth dates have been deidentified; patient sex and age are provided. TCIA supports reproducibility through the use of Digital Object Identifiers to refer to subsets of data.

Radiomic analysis

TCIA represents a valuable resource for the field of radiomics. The process known as radiomics involves the conversion of digital medical images into higher dimensional data and the subsequent mining of these data (4). Hundreds of image features or more can be extracted from radiologic image analysis and associated with biological or clinical endpoints to develop diagnostic, prognostic, and predictive models (5). While this can be applied to many biomedical areas, oncologic applications are of particular interest.

Oncologic treatment failure is often attributed to heterogeneity within tumors at the phenotypic, physiologic, and genomic levels (6–9). A major aim of radiomics is to provide quantitative measurements of intra- and intertumoral heterogeneity, thereby individualizing treatment (10). Unlike tissue biopsy, which offers a small sample of the tumor for analysis, radiomics has the potential to evaluate the whole tumor in the native environment. Unfortunately, there are multiple challenges to the process of radiomics. The need for data and ability to share data has been cited as one of the largest hurdles to advancing the field of radiomics (5, 10). A related issue involves the high cost of data preprocessing and analysis, including initial segmentation of tumor volumes. Large centralized data repositories such as TCIA offer a solution. TCIA not only provides publicly available images for thousands of patients, but additional hosts processed data for a subset of studies (11), potentially leading to dramatic time savings or making radiomic analysis possible for research groups with limited resources.

TCIApathfinder provides a powerful gateway to TCIA

TCIA provides programmatic access to its data through a REST API that features many endpoints and return object types (12). Programmers can use the API through any preferred method of sending HTTP requests and parsing the structured responses. Here we describe a novel R package, TCIApathfinder, which provides powerful access to the resources in TCIA without the need to understand or program against the TCIA REST API. TCIApathfinder wraps all API functionality in clearly documented, user-friendly R functions, empowering users to quickly and interactively explore the available resources in TCIA without the need to construct HTTP requests or parse responses. Information is returned in native R data formats such as lists and data frames that are familiar to casual R users, as opposed to JSON and other structured data formats that are returned by the API.

A typical use case of TCIApathfinder is to download all available images for a specific cancer type and imaging modality. TCIApathfinder allows users to interactively explore the available data using these and other filters, save all metadata to R data structures, and download the images to their local machine. Other use cases involve exploring the available data in TCIA. With one or a few commands, users can quickly list and slice the available data along many dimensions. Such analysis would require a substantial amount of programming against the API itself; the interaction with the API is abstracted away from users of TCIApathfinder.

Description of TCIApathfinder

TCIApathfinder is hosted by the Comprehensive R Archive Network (CRAN) from https://cran.r-project.org/web/packages/TCIApathfinder/index.html and the development version can be obtained from https://github.com/pamelarussell/TCIApathfinder. Package documentation is available as a PDF manual from CRAN, from within an R session using R's documentation system, or on GitHub. In order for the package to function correctly, an API key must be obtained from TCIA.

In TCIApathfinder, function calls are used to explore the available data in TCIA and to download image files to the local machine. Two functions download image files from TCIA. The remaining functions in the package are used to explore the available data in TCIA. Each exploratory function returns an object containing simplified summarized data, a parsed JSON response, and the raw API response. Details on all available functions are provided in Table 1.

Table 1.

Available functions in TCIApathfinder

Function nameDescriptionAvailable filters: required (R) or optional (O)
get_collection_names Returns the names of all TCIA collections None 
get_modality_names Returns the names of available imaging modalities TCIA collection (O); body part (O) 
get_body_part_names Returns the names of available body parts TCIA collection (O); modality (O) 
get_manufacturer_names Returns the names of available instrument manufacturers TCIA collection (O); modality (O); body part (O) 
get_patient_info Returns a table of available information for each patient TCIA collection (O) 
get_patients_by_modality Returns patient IDs for a given TCIA collection and imaging modality TCIA collection (R); modality (R) 
get_studies_in_collection Returns a table of patient image study information for a given TCIA collection TCIA collection (R); patient ID (O) 
get_patient_studies Returns a table of patient image study information and available patient demographic information TCIA collection (O); patient ID (O); study ID (O) 
get_series_info Returns a table of available information for each image series TCIA collection (O); patient ID (O); study ID (O); series ID (O); modality (O); body part (O); manufacturer model name (O); manufacturer (O) 
get_series_size Returns the number of images in a series Series ID (R) 
get_sop_instance_uids Returns individual DICOM image IDs for a given image series Series ID (R) 
get_new_patients_in_collection Returns IDs of patients that have been added to a given collection since a given date TCIA collection (R); date (R) 
get_new_studies_in_collection Returns a table of image studies that have been added to a given collection since a given date TCIA collection (R); date (R); patient ID (O) 
save_single_image Save a single DICOM image file to the local machine Series ID (R); image ID (R); target directory (R); target file name (O) 
save_image_series Save a series of DICOM images to the local machine as a zip file Series ID (R); target directory (R); target file name (O) 
Function nameDescriptionAvailable filters: required (R) or optional (O)
get_collection_names Returns the names of all TCIA collections None 
get_modality_names Returns the names of available imaging modalities TCIA collection (O); body part (O) 
get_body_part_names Returns the names of available body parts TCIA collection (O); modality (O) 
get_manufacturer_names Returns the names of available instrument manufacturers TCIA collection (O); modality (O); body part (O) 
get_patient_info Returns a table of available information for each patient TCIA collection (O) 
get_patients_by_modality Returns patient IDs for a given TCIA collection and imaging modality TCIA collection (R); modality (R) 
get_studies_in_collection Returns a table of patient image study information for a given TCIA collection TCIA collection (R); patient ID (O) 
get_patient_studies Returns a table of patient image study information and available patient demographic information TCIA collection (O); patient ID (O); study ID (O) 
get_series_info Returns a table of available information for each image series TCIA collection (O); patient ID (O); study ID (O); series ID (O); modality (O); body part (O); manufacturer model name (O); manufacturer (O) 
get_series_size Returns the number of images in a series Series ID (R) 
get_sop_instance_uids Returns individual DICOM image IDs for a given image series Series ID (R) 
get_new_patients_in_collection Returns IDs of patients that have been added to a given collection since a given date TCIA collection (R); date (R) 
get_new_studies_in_collection Returns a table of image studies that have been added to a given collection since a given date TCIA collection (R); date (R); patient ID (O) 
save_single_image Save a single DICOM image file to the local machine Series ID (R); image ID (R); target directory (R); target file name (O) 
save_image_series Save a series of DICOM images to the local machine as a zip file Series ID (R); target directory (R); target file name (O) 

TCIApathfinder can be loaded into an active R session and used directly from the R console, or incorporated into R scripts. See Supplementary Video S1 for a demonstration of package installation and usage.

TCIApathfinder makes the extensive resources in TCIA easily available and accessible in the highly popular R programming environment. Simple functions allow the full collection to be easily explored before patients are selected for analysis. Images and supporting data can be imported directly into R scripts for further analysis using packages such as oro.dicom (13) and other image analysis packages, or simply saved to the local machine for any type of downstream analysis. For patients also included in The Cancer Genome Atlas, tumor genomic data can be imported into R via the matched patient ID using the TCGAbiolinks package (14) and analyzed using the extensive tools in Bioconductor (15). Vignettes included with the package demonstrate TCIApathfinder usage as well as downstream radiomic analysis with other packages. TCIApathfinder will significantly lower the barrier for researchers to leverage the valuable resources in TCIA.

No potential conflicts of interest were disclosed.

We thank Bernard Jones and Julio Carballido-Gamio for valuable conversations about medical image analysis and the DICOM standard. Financial support for D. Ghosh and P. Russell has been provided by the Grohne-Stapp Endowed Chair for Cancer Research (University of Colorado Cancer Center). This work has been supported by the Grohne-Stapp Endowed Chair for Cancer Research (University of Colorado Cancer Center).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Clark
K
,
Vendt
B
,
Smith
K
,
Freymann
J
,
Kirby
J
,
Koppel
P
, et al
The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository
.
J Digit Imaging
2013
;
26
:
1045
57
.
2.
Cancer Genome Atlas Research Network
,
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KRM
,
Ozenberger
BA
, et al
The Cancer Genome Atlas Pan-Cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
3.
Mustra
M
,
Delac
K
,
Grgic
M
. 
Overview of the DICOM standard. Proceedings of the 50th International Symposium ELMAR [Online]
; 
2008
.
Available from
: http://ieeexplore.ieee.org/abstract/document/4747434/?reload=true.
4.
Kumar
V
,
Gu
Y
,
Basu
S
,
Berglund
A
,
Eschrich
SA
,
Schabath
MB
, et al
Radiomics: the process and the challenges
.
Magn Reson Imaging
2012
;
30
:
1234
48
.
5.
Avanzo
M
,
Stancanello
J
,
El Naqa
I
. 
Beyond imaging: the promise of radiomics
.
Phys Med
2017
;
38
:
122
39
.
6.
Gerlinger
M
,
Rowan
AJ
,
Horswell
S
,
Math
M
,
Larkin
J
,
Endesfelder
D
, et al
Intratumor heterogeneity and branched evolution revealed by multiregion sequencing
.
N Engl J Med
2012
;
366
:
883
92
.
7.
Sequist
LV
,
Waltman
BA
,
Dias-Santagata
D
,
Digumarthy
S
,
Turke
AB
,
Fidias
P
, et al
Genotypic and histological evolution of lung cancers acquiring resistance to EGFR inhibitors
.
Sci Transl Med
2011
;
3
:
75ra26
.
8.
Sottoriva
A
,
Spiteri
I
,
Piccirillo
SGM
,
Touloumis
A
,
Collins
VP
,
Marioni
JC
, et al
Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics
.
Proc Natl Acad Sci U S A
2013
;
110
:
4009
14
.
9.
Yachida
S
,
Jones
S
,
Bozic
I
,
Antal
T
,
Leary
R
,
Fu
B
, et al
Distant metastasis occurs late during the genetic evolution of pancreatic cancer
.
Nature
2010
;
467
:
1114
7
.
10.
Gillies
RJ
,
Kinahan
PE
,
Hricak
H
. 
Radiomics: images are more than pictures, they are data
.
Radiology
2016
;
278
:
563
77
.
11.
Cancer Imaging Archive
. 
TCIA Analysis Results – TCIA DOIs – Cancer Imaging Archive Wiki [Online]
[cited 2018 Apr 26]. Available from
: https://wiki.cancerimagingarchive.net/display/DOI/TCIA+Analysis+Results.
12.
Cancer Imaging Archive
. 
TCIA Programmatic Interface (REST API) Usage Guide – The Cancer Imaging Archive (TCIA) Public Access – Cancer Imaging Archive Wiki [Online]
[cited 2018 Apr 26]. Available from
: https://wiki.cancerimagingarchive.net/display/Public/TCIA+Programmatic+Interface+%28REST+API%29+Usage+Guide.
13.
Whitcher
B
,
Schmid
V
,
Thorton
A
. 
Working with the DICOM and NIfTI Data Standards in R
.
J Stat Softw
2011
;
44
:
1
29
.
14.
Colaprico
A
,
Silva
TC
,
Olsen
C
,
Garofano
L
,
Cava
C
,
Garolini
D
, et al
TCGAbiolinks: an R/bioconductor package for integrative analysis of TCGA data
.
Nucleic Acids Res
2016
;
44
:
e71
.
15.
Gentleman
RC
,
Carey
VJ
,
Bates
DM
,
Bolstad
B
,
Dettling
M
,
Dudoit
S
, et al
Bioconductor: open software development for computational biology and bioinformatics
.
Genome Biol
2004
;
5
:
R80
.