Since June 2019, under the umbrella of the national health insurance system, Japan has started cancer genomic medicine (CGM) with comprehensive genomic profiling (CGP) tests. The Ministry of Health, Labour and Welfare (MHLW) of Japan constructed a network of CGM hospitals (a total of 233 institutes as of July 1, 2022) and established the Center for Cancer Genomics and Advanced Therapeutics (C-CAT), the national datacenter for CGM. Clinical information and genomic data from the CGP tests are securely transferred to C-CAT, which then generates “C-CAT Findings” reports containing information of clinical annotation and matched clinical trials based on the CGP data. As of June 30, 2022, a total of 36,340 datapoints of clinical/genomic information are aggregated in C-CAT, and the number is expected to increase swiftly. The data are now open for sharing with not only the CGM hospitals but also other academic institutions and industries.
INTRODUCTION
Cancer genomic medicine (CGM), which optimizes therapeutic interventions based on tumor genome profiling, has demonstrated clinical benefit for a wide range of cancer subtypes (1–3). Large-scale, multicenter (sometimes, multicountry) initiatives for CGM have already attracted great attention from both academia and pharmaceutical companies (4–6). Tens of thousands of patients with cancer have already undergone tumor profiling through such initiatives, the result of which is utilized for the recruitment of patients into biomarker-matched clinical trials and/or for the stratification of the patients.
In Japan, more than 1 million people were estimated to be newly diagnosed with cancer in 2021, and approximately 380,000 individuals died of cancer in the same year (https://ganjoho.jp/public/qa_links/report/statistics/2022_en.html). The mortality rate for cancer per 100,000 population in Japan has been steadily increasing, surpassing that for other disorders since 1981. Cancer thus remains one of the major burdens for health care (and economics) in Japan, similar to other countries.
The national health insurance system in Japan covers almost the entire Japanese population. Therefore, if Japan starts CGM under the umbrella of this national insurance system, it has to construct an infrastructure to accommodate at least tens of thousands of patients with cancer. To actualize this national platform, the Ministry of Health, Labour and Welfare (MHLW) of Japan assembled an Expert Meeting for the Cancer Genomic Medicine Promotion Consortium (EMCGMPC) in the spring of 2017. After careful discussion about the ideal CGM platform, the meeting proposed two important steps for this national system. First, the MHLW should designate qualified hospitals for CGM that can hold multidisciplinary tumor boards, called expert panels, to discuss optimal therapeutic interventions based on comprehensive genomic profiling (CGP) data, provide suitable genetic counseling to patients, and safely conduct clinical trials with targeted drugs for matched patients. The number of such hospitals should increase gradually. Second, Japan should establish a central datacenter, the Center for Cancer Genomics and Advanced Therapeutics (C-CAT), to collect both genomic and clinical information of the patients who undergo CGP testing. Furthermore, the datacenter should facilitate the wide utilization of its aggregated data with other hospitals, academia, and industries (Fig. 1A).
The Japanese CGM platform. A, C-CAT receives both clinical information and genomic data of the patients who undergo CGP testing from the CGM hospitals and testing laboratories, respectively, upon the consent of the patients. C-CAT also generates C-CAT Findings reports and sends them back to the hospitals. CKDB, cancer knowledge database. B, The total number and the monthly number of patients’ data sent to C-CAT are shown in a line graph and a bar graph, respectively. The y-axis denotes the number of patients (the monthly number on the left and the total one on the right), and the x-axis denotes the period (from June 2019 to March 2022). The timing of reimbursement for FoundationOne CDx (F1CDX), OncoGuide NCC Oncopanel (NOP), and FoundationOne Liquid CDx (F1LCDx) is also indicated.
The Japanese CGM platform. A, C-CAT receives both clinical information and genomic data of the patients who undergo CGP testing from the CGM hospitals and testing laboratories, respectively, upon the consent of the patients. C-CAT also generates C-CAT Findings reports and sends them back to the hospitals. CKDB, cancer knowledge database. B, The total number and the monthly number of patients’ data sent to C-CAT are shown in a line graph and a bar graph, respectively. The y-axis denotes the number of patients (the monthly number on the left and the total one on the right), and the x-axis denotes the period (from June 2019 to March 2022). The timing of reimbursement for FoundationOne CDx (F1CDX), OncoGuide NCC Oncopanel (NOP), and FoundationOne Liquid CDx (F1LCDx) is also indicated.
Based on the EMCGMPC report (in Japanese, https://www.mhlw.go.jp/file/05-Shingikai-10901000-Kenkoukyoku-Soumuka/0000169236.pdf), a total of 111 hospitals were qualified to be CGM institutes in March 2018, and the number of such hospitals increased to 233 as of July 1, 2022. Requirements for CGM hospitals were described previously (7). CGM officially started in June 2019 in Japan with two approved CGP tests: OncoGuide NCC Oncopanel and FoundationOne CDx (7, 8). C-CAT has collected genomic and clinical data of 36,340 patients with cancer as of June 30, 2022 (Fig. 1B), and the sharing systems for such aggregated data have already been started.
RESULTS
The CGM Network between Hospitals and C-CAT
In the Japanese CGM platform, participating hospitals have several layers of connection networks to/from C-CAT. Each CGM hospital is equipped with “C-CAT Cancer Genomic Portal” computers through which the CGM hospitals send clinical information to the C-CAT repository database and receive “C-CAT Findings,” a report for clinical annotation and, if present, biomarker-matched clinical trials for every mutation found in a given patient (Fig. 1A; Supplementary Fig. S1A). Some hospitals have C-CAT–specific pages in their electronic medical record (EMR) system to directly send patients’ information through the EMR system. The Liaison Council of the Cancer Genomic Medicine Hospitals specified the clinical information to be aggregated to C-CAT, including patient background, medical history, best overall response, and severe adverse events.
To generate C-CAT Findings reports, a cancer knowledge database (CKDB) for clinical annotation and related drugs/clinical trials was constructed and is constantly updated within C-CAT. Certified laboratories for CGP testing also electronically send sequence data (such as FASTQ/BAM files and mutation lists) to C-CAT, mainly through a secure direct network. As shown below, several systems to share/utilize C-CAT data are now open to CGM hospitals and other parties.
Importantly, the MHLW indicated that data registration to C-CAT is a part of medical care under national health insurance (but only upon patient consent), and thus, the amount of registered clinical/genomic data is expected to increase continuously.
The CAncer genomic Test Standardized Format: A Unified File Format to Summarize Genomic Aberrations
From the CGP testing companies (and through CGM hospitals for some tests), C-CAT receives both raw sequencing data (FASTQ/BAM) and a mutation list for every patient, the latter of which is then used to generate a C-CAT Findings report. Because each testing company uses its own file format to designate mutations (and because such companies often make minor changes in their formats), C-CAT requests testing companies to send mutation lists in a standardized format that we call the CAncer genomic Test Standardized (CATS) format (Supplementary Fig. S1B; also see the latest version at https://www.ncc.go.jp/en/c_cat/section/070/index.html).
The CATS format is a structured JSON file optimized for listing various genomic alterations detected in CGP tests in a plain way and is made to be compatible with other standard file formats. The CATS format does not use its own ontologies or symbol notations, but rather relies on well-defined public resources such as HGVS (9) and VCF4.3 (10). The format addresses single-nucleotide variants (SNV), insertions/deletions (indel), copy-number alterations (CNA), rearrangements (in DNA and RNA), more complex composite markers, RNA expression, and virus data, as well as other biomarkers such as tumor mutation burden (TMB) and microsatellite instability (MSI), in just one file (an example of the CATS format is provided as a Supplementary File). The CATS format is built to efficiently annotate biomarkers with knowledge databases such as the CKDB of C-CAT (Supplementary Fig. S1C) and OncoKB (11), but does not include clinical data or some information of testing companies. The format also handles “meta information,” such as quality control data, sample information (e.g., tissue/liquid and paired/nonpaired), and comments to appear in testing annotation documents.
The CATS format adopts “term mapping”; for example, a map function is set where the term “gain” used by a testing company is mapped to “amplification” in the CATS format. Therefore, even if different terms that represent copy-number amplification are used by testing companies, the same program is applicable to annotate amplification as long as such different terms are mapped to the same “amplification” in the CATS format (Supplementary Fig. S1D).
In addition, we are developing a program that manipulates CATS-formatted files, catstools (CATS|TOOLS). For example, catstools can convert multiple VCF files for SNVs/indels/CNAs/rearrangements into a single CATS file and vice versa just by a single command. The catstools program can also annotate alterations, calculate statistics, and draw circos graphs and onco-plots for CATS files.
C-CAT Findings: A Report to Implement CGP Data into Clinical Practice
Under the national insurance program, expert panels in the CGM hospitals will use C-CAT Findings as a reference to optimize treatment strategies for patients with cancer (Supplementary Fig. S1A). C-CAT has developed an in-house pipeline to automatically create a C-CAT Findings report from a CTAS file by using CKDB. To make swift clinical decisions possible, C-CAT keeps the turnaround time (TAT) of C-CAT Findings to no more than 3 days (average TAT = 1.2 days).
The CKDB comprises four databases: Marker DB, Drug DB, Evidence DB, and Clinical DB. Marker DB is a list of gene markers relevant to cancer treatments, such as a somatic EGFR(T790M) SNV and germline BRCA1 variants. Drug DB is a list of anticancer reagents that are approved/under development in Japan or approved by the FDA, accompanied by information on corresponding biomarkers/genes. Evidence DB contains the clinical evidence for a given mutation, such as predictive, prognostic, diagnostic, predisposing, and oncogenic evidence obtained from our original C-CAT database and from public ones (CIViC, ref. 12; BRCAExchange, ref. 13; ClinVar, ref. 14; and COSMIC, ref. 15). Evidence levels of C-CAT are defined as A for biomarkers that predict a response to Japanese Pharmaceuticals and Medical Devices Agency (PMDA)– or FDA-approved therapies or are described in professional guidelines, B for biomarkers that predict a response based on well-powered studies with consensus of experts in the field, C for biomarkers that predict a response to therapies approved by the PMDA or FDA in another type of tumor or that predict a response based on clinical studies, D for biomarkers that predict a response based on case reports, E for biomarkers that show plausible therapeutic significance based on preclinical studies, F for oncogenic markers, and R for biomarkers that predict resistance to certain therapies (Supplementary Fig. S1E). The C-CAT evidence-level classification was established in accordance with Japanese guidance (16) and is compatible with the guidelines from the Association for Molecular Pathology, the American Society of Clinical Oncology, and the College of American Pathologists (17).
Clinical DB contains a list of cancer-related clinical trials in Japan with biomarker information. The database is updated bimonthly with information from the Japan Pharmaceutical Information Center (JAPIC; https://www.japic.or.jp/), the University Hospital Medical Information Network (UMIN; https://www.umin.ac.jp/english/), the Japan Medical Association Center for Clinical Trials (JMACCT; http://www.jmacct.med.or.jp/en/), the Japan Registry of Clinical Trials (jRCT; https://jrct.niph.go.jp/), and ClinicalTrials.gov (https://clinicaltrials.gov). C-CAT transforms these data into a structured database with corresponding biomarker information.
More importantly, C-CAT maintains a curator team of medical oncologists selected from all over Japan, which facilitates the inclusion of the latest information of clinical trials in CKDB. As of February 2022, CKDB has collected 150,000 evidence elements from the literature, information on 752 clinical trials in Japan, and information on 1,208 drugs.
By referring each mutation (from a CATS file) to CKDB, we generate a C-CAT Finding report for every patient, which comprises nine sections: basic information of the patient and the CGP test, evidence information of each mutation, information of matched clinical trials, description on mutant genes, evidence reference, CKDB software version, evidence level definition, U.S. evidence level, and disclaimer. After reports are generated by our system, expert reviewers within C-CAT manually inspect the individual reports to assure quality. Then, C-CAT sends C-CAT Findings reports to CGM hospitals through a secure network. In Japan, therefore, the same quality of clinical annotation information for CGP testing is provided to every patient by C-CAT, no matter which CGP test a given patient chooses.
C-CAT Data Portals: Sharing Systems for the C-CAT Repository Database
C-CAT currently receives data of >1,500 patients monthly; hence, the C-CAT repository database is projected to include the data of an excess of 100,000 patients within 4 years. The C-CAT database will be, therefore, the big CGM data linked with real-world clinical outcomes for cancer patients treated at multiple institutions throughout Japan. The genomic data are clinical grade and are produced in quality-certified laboratories. The clinical data contain the OncoTree cancer-type classification (http://oncotree.mskcc.org/#/home), information for prescribed drugs defined by the YJ codes according to the standard master for pharmaceutical products in Japan (18), therapeutic response (tumor shrinkage), therapeutic duration, and adverse events of grade 3 or more.
The C-CAT data released on January 31, 2022 (version 20220120) contain information from the first 25,991 patients. The registered patient number per tumor type is summarized in Fig. 2A. Because CGP testing is covered by insurance only for advanced solid tumors in patients who have completed or will soon complete standard therapies, cancer types with few standard therapies may be overrepresented in the C-CAT data compared with the actual prevalence in Japan (https://ganjoho.jp/public/qa_links/report/statistics/2022_en.html; Supplementary Fig. S2). The number of patients with lung cancer (n = 1,517) in the C-CAT data is, for instance, approximately half that of pancreatic cancer (n = 3,125), whereas on the contrary, the prevalence of the former in Japan (n = 129,900) is four-times larger than the latter (n = 36,800). Similarly, the number of soft-tissue tumors (n = 1,294) is relatively large in the C-CAT data (Supplementary Fig. S2).
Overview of the current C-CAT dataset. A, The numbers of male and female patients per tumor type are shown. CNS, central nervous system.B, Numbers of genetic variants according to the evidence level. Genetic variants observed in each case were tabulated according to levels of evidence A to E defined in Supplementary Fig. S1E. C, Fraction of patients with genomically matched drug treatment. Cases whose information on recommendation of genomically matched drug treatment by expert panels is available were subclassified by the recommendation status as well as actual treatments. D, Fraction of druggable alterations in patients with KRAS mutation–negative pancreatic cancer. Druggable alterations detected in 98 KRAS mutation–negative patients are shown on the right.
Overview of the current C-CAT dataset. A, The numbers of male and female patients per tumor type are shown. CNS, central nervous system.B, Numbers of genetic variants according to the evidence level. Genetic variants observed in each case were tabulated according to levels of evidence A to E defined in Supplementary Fig. S1E. C, Fraction of patients with genomically matched drug treatment. Cases whose information on recommendation of genomically matched drug treatment by expert panels is available were subclassified by the recommendation status as well as actual treatments. D, Fraction of druggable alterations in patients with KRAS mutation–negative pancreatic cancer. Druggable alterations detected in 98 KRAS mutation–negative patients are shown on the right.
The genomic data consist of 114,431 SNVs/indels including germline variants across 344 genes and the TMB/MSI information. The data also contain 330 gene fusions, amplification in 312 genes, and deletion in 231 genes (Supplementary Fig. S3A–S3C). It should be noted that such mutation data are collected directly from the results of CGP tests, and thus, the computational pipeline used for each CGP test may have its own bias for mutation calling.
In addition, the clinical data in C-CAT include drug names (Supplementary Fig. S4A) coupled with clinical outcome data, such as therapeutic response, therapeutic duration, and adverse events. The C-CAT data are enriched with information on Asian-prevalent tumors, such as those of the biliary tract (n = 1,858) and esophagus/stomach (n = 1,608), and on uncommon tumors, such as ovarian (n = 1,557), adolescent and young adult (n = 1,809; 15–39 years old), and pediatric (n = 578; 0–14 years old).
As shown in Fig. 2B, a significant portion (approximately 40%) of the variants in the C-CAT data have an evidence level of A to F defined in Supplementary Fig. S1E. In fact, 7.8% of all patients received genomically matched drug treatments (Fig. 2C). In addition, druggable gene alterations in rare fractions of cancers, such as KRAS mutation–negative pancreatic cancer (Fig. 2D), can also be assessed. C-CAT data thus contribute to accelerating the off-label use of genomically matched drugs and to the design of umbrella trials focusing on rare types of cancers (Fig. 2D). The C-CAT data are also useful to estimate Japanese-specific characteristics of germline variants in tumor suppressor genes, as shown for the deleterious mutations within BRCA1 and BRCA2 genes (Supplementary Fig. S4B; refs. 19, 20). Recently, data from a Korean precision medicine initiative, K-MASTER, were published (21). Even with similar inclusion criteria, the prevalence of cancer subtypes in each database is different between C-CAT and K-MASTER; for instance, the three most frequent cancer types are colorectal, pancreatic, and biliary tract in C-CAT, but colorectal, breast, and stomach in K-MASTER. Further detailed comparison between the two datasets would be interesting to discover similarities/differences in mutation profiles among the East Asian countries.
C-CAT has already opened various data-sharing systems of its repository database (Supplementary Fig. S5A). Since September 2020, for instance, C-CAT data have been shared by physicians and medical staff within the CGM hospitals through the C-CAT Medical-Use Portal. This portal shows the genetic alterations of all registered patients with corresponding evidence levels defined by the guidance in Japan (16). The site enables physicians to identify patients with similar genetic and clinical findings to their own patients and to know the outcome for such patients treated in different hospitals. The portal also provides updated information on investigator- and sponsor (industry)-initiated clinical trials into which the patients could be enrolled on the basis of the latest CKDB.
In addition to the C-CAT Medical-Use Portal, C-CAT has launched another data-sharing system, the C-CAT Research-Use Portal, through which researchers in academic institutions and industries can access the repository data (clinical information and mutation lists) under the approval of the C-CAT Data Utilization Review Board. Every C-CAT datapoint includes the consent information as to whether a patient agrees with the utilization of his or her data by industry. Encouragingly, as of March 2022, 99.7% of patients gave consent to share their data with the third parties; therefore, nearly all C-CAT data can be accessed by nonacademic institutes in addition to academia. This portal enables combinational search by gene alterations, drug names, and therapeutic responses; hence, the characteristics of specified patient populations as well as their individual genomic and clinical data can be rapidly assessed (e.g., 496 responders to immune-checkpoint therapy). This portal also presents real-world drug prescription data. For instance, the present C-CAT data indicate that the detection of deleterious mutations within BRCA1 and BRCA2 genes by CGP has led to the selection of olaparib, a PARP inhibitor (Supplementary Fig. S5B). The C-CAT Research-Use Portal is now permitted for use by >30 academic and industrial institutions in Japan (including Japanese branches of foreign pharmaceutical companies) as of the end of June 2022 and will soon be open to the broad research community outside of Japan.
DISCUSSION
Japan started CGM in 2019 with two approved CGP tests under the national health insurance umbrella and is gradually expanding its infrastructure. FoundationOne Liquid CDx was additionally reimbursed starting in August 2021, and some other CGP tests are already submitted to the MHLW for approval. Because all data of patients with approved CGP testing are stored in the C-CAT database (but only upon patients’ consent), the growth of C-CAT data is supposed to accelerate in the future.
What makes the CGM hospitals–C-CAT platform unique is that this network is an integral part of standard cancer care within the national health insurance system, not an academia-driven research initiative or a screening platform for the enrollment into clinical trials. Successful construction of this Japanese network is a result of enormous efforts made by hospitals, testing companies (all of them send mutation lists as well as raw sequencing data to C-CAT), pharmaceutical companies, the Japanese government, and C-CAT. This platform is also supported by many Japanese academic societies such as the Japanese Society of Medical Oncology, the Japan Society of Clinical Oncology, the Japanese Cancer Association, and the Japanese Society for Genetic Counseling, which helped to educate medical and paramedical staff in CGM. Realization of the CGM hospitals–C-CAT platform is thus a gigantic shift within the national health system toward precision medicine in Japan.
Our national platform not only provides a C-CAT Findings report for every patient with cancer but also has further impacts on CGM in Japan and worldwide. First, the network of CGM hospitals and C-CAT provides novel tracks to deliver drugs to patients with cancer. Japan has not yet introduced a patient-oriented compassionate use system, and medical care, including off-label treatment, is not covered by the national health insurance (22, 23). However, the Patient-Proposed Health Services program (PHS), in which an unapproved/off-label treatment proposed by a patient is undertaken upon the approval of the relevant committees in the MHLW, is an exception. A pan-cancer multidrug off-label treatment trial, BELIEVE (NCCH1901; ref. 24), is a pan-Japan clinical trial of CGM hospitals undertaken as a PHS (https://jrct.niph.go.jp/en-latest-detail/jRCTs031190104), which enables patients to rapidly access off-label treatments recommended from CGP testing. As of the end of May 2022, >370 patients have received off-label treatments based on their genetic alterations, with one out of 17 targeted drugs being kinase inhibitors, immune-checkpoint inhibitors, or antibody drugs, all of which are provided free of charge by pharmaceutical companies.
Second, the C-CAT data would be highly valuable to efficiently design clinical trials in Japan. Hospital IDs can be displayed in the C-CAT Research-Use Portal upon the approval of the C-CAT Data Utilization Review Board. Therefore, if a pharmaceutical company is going to start a clinical trial with a targeted drug against, for example, BRAF mutations, the C-CAT data instantly provide information on how many patients with BRAF mutations are present and in which CGM hospitals throughout Japan.
Third, through the C-CAT Research-Use Portal, researchers can access clinical-grade sequencing data with clinical outcomes (as shown in Supplementary Fig. S5A). Such information (especially on a large scale) would be essential to discover novel therapeutic targets or drug-resistance biomarkers. Researchers can, for instance, readily identify in the C-CAT data compound mutations within EGFR in lung cancer positive for EGFR(L858R) that failed to respond to EGFR inhibitors (25, 26). Importantly, any results/intellectual property obtained from the use of the C-CAT data belongs to the users, not to C-CAT or the Japanese government. Researchers in academia/industries are thus encouraged to take advantage of our C-CAT data.
The broad-based sharing of genomic and clinical data is critical to realize the full potential of precision oncology. The C-CAT database fulfills an unmet need in oncology by providing a data-sharing platform to enable scientific and clinical discovery. The C-CAT platform will enable researchers to better understand the distribution of genetic alterations across cancer types in Asia, deduce the feasibility of genotype-specific clinical trials, validate genomic biomarkers, reposition or repurpose approved drugs, and expand existing drug labels by the addition of new cancer subtypes. To extend our data sharing, we are planning to launch an additional cloud system, CALculation & Investigation ClOud (C-CAT CALICO), which will enable users to analyze raw sequence data to examine novel genetic alterations that are not included in the reports provided by the testing companies. We hope that this national database will play a significant role in expanding/advancing cancer treatments and improving cancer care worldwide.
Authors’ Disclosures
T. Kohno reports grants from Sysmex and Chugai and personal fees from Eli Lilly outside the submitted work. M. Kato reports grants from Mitsui Knowledge Industry outside the submitted work. S. Kohsaka reports grants from Konica Minolta outside the submitted work. Y. Shiraishi reports grants from Hitachi, Ltd. during the conduct of the study. Y. Okuma reports grants and personal fees from AstraZeneca, Ono Pharmaceutical, and Takeda Pharmaceutical, grants from AbbVie G.K. and Chugai Pharmaceutical, and personal fees from Eli Lilly, Esai Co., Ltd. outside the submitted work. H. Mano reports a patent for EML4-ALK issued and licensed to CureGene. No disclosures were reported by the other authors.
Acknowledgments
This work was not supported by any grants or foundations. We give our deepest thanks to the patients with cancer in Japan who contributed to the C-CAT database. We are also grateful to physicians, pathologists, and other staff in the CGM hospitals for their dedication to running this national platform smoothly, to testing companies for providing their sequencing data, and to pharmaceutical companies for providing drugs to the BELIEVE trial for free. Finally, we wish to acknowledge all C-CAT members, Fujitsu Japan, Mitsui Knowledge Industry, and Hitachi for constructing/improving the C-CAT system and operating it seamlessly.
Note: Supplementary data for this article are available at Cancer Discovery Online (http://cancerdiscovery.aacrjournals.org/).