Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data.

Data produced during the processes of clinical care and research in oncology are proliferating at an exponential rate. In the past decade, use of electronic medical records (EMR) has increased significantly in the United States (1), driven at least in part by incentivization from the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 (2). In parallel, large databases such as the NCI's Surveillance, Epidemiology, and End Results program (SEER; ref. 3), the National Cancer Database (NCDB; ref. 4), The Cancer Genome Atlas (TCGA; ref. 5), and the Human Tumor Atlas Network (HTAN; ref. 6) are increasingly important avenues for clinical and translational oncology research. However, significant nuanced phenotype data are locked in clinical free-text, which remains the primary form of documenting and communicating clinical presentations, provider impressions, procedural details, and management decision-making (7). Despite the proliferation of EMR and -omics data, critical and precise phenotype information is often detailed only in these clinical texts. Natural language processing (NLP), broadly defined as the transformation of language into computable representations, is key to large-scale extraction of nuanced data within clinical texts. As a subfield of artificial intelligence, clinical NLP (cNLP), which refers to the analysis of clinical or health care texts (as opposed to clinical application, per se) has been around for decades. However, only in recent years have compute power and algorithms advanced sufficiently to demonstrate its power toward broadening oncologic investigation.

There are excellent prior review articles of cNLP. Spyns (8) covers the period before 1995. Meystre and colleagues (9) survey the 1998 to 2008 developments. Yim and colleagues (10) provide an overview with a special emphasis on oncology for the period of 2008 to 2016. Neveol and colleagues (11) offer the first broad overview of cNLP for languages other than English. These surveys capture three distinct methodology phases in NLP, from exclusively rule-based systems through the shift toward probabilistic methods to the dominance of machine learning. Kreimeyer and colleagues (12) review existing cNLP systems. Some popular cNLP systems are MetaMap (concept mapping; refs. 13,,14), Apache cTAKES (classic NLP components, concept mapping, entities and attributes, relations, temporality; refs. 15, 16), YTex (entity and attributes; ref. 17), OBO annotator (concept mapping; ref. 18), TIES (linking of pathology reports to tissue bank data; ref. 19), MedLEE (entities and attributes, relations; ref. 20), CLAMP (entities and attributes; ref. 21), and NOBLE (entities and attributes; ref. 22).

The mid-2010s mark a transformational milestone for the field where plentiful digitized textual data and hardware advances met powerful mathematical abstractions in a super connected world that led to the explosive interest in general artificial intelligence (e.g., autonomous cars) and NLP in particular (e.g., Google translator, Apple Inc.'s Siri, movie recommenders). Herein, we review major recent developments in cNLP methods for cancer since that watershed point. We discuss their applications for translational investigation and future directions. We cover publications since the 2016 review by Yim and colleagues (10), which are: (i) focused on cNLP of EMR text related to cancer; (ii) peer-reviewed; (iii) published in English and use English EMR text; (iv) sourced from MEDLINE and major computational linguistics and machine learning venues: the annual conferences of the Association of Computational Linguistics, North American Association of Computational Linguistics, European Association of Computational Linguistics, Empirical Methods for Natural Language Processing, International Conference on Machine Learning, Neural Information Processing Systems Conference, Machine Learning for Healthcare, SemEval, International Conference for High Performance Computing, and IEEE International Conference on Biomedical Health Informatics. Our goal is to highlight recent exceptional articles with implications for the broader cancer research community; thus, this survey is not a systematic meta-review. We acknowledge that much work is taking place outside traditional academic environments (i.e., industry), and we attempt to include it to the extent it meets this survey's inclusion criteria. For ease of reading, terms and definitions are presented in Table 1.

Table 1.

Terms and definitions

TermDefinition
Accuracy |\frac{{( {TP + TN} )}}{{( {TP + FP + FN + TN} )}}$| Where TP is true positive; TN is true negative; FP is false positive; and FN is false negative. 
Artificial intelligence A process through which machines mimic "cognitive" functions that humans associate with other human minds, such as language comprehension. 
Area under the curve (AUC) A metric of binary classification; range from 0 to 1, 0 being always wrong, 0.5 representing random chance, and 1, the perfect score. 
Artificial neural network Computing systems that are inspired by, but not necessarily identical to, the biological neural networks that constitute human brain. 
Attribute Facts, details, or characteristics of an entity. 
Autoencoder A class of artificial neural networks. 
Concept mapping A diagram that depicts suggested relationships between concepts. 
Convolutional neural network A class of artificial neural networks. 
Decision tree A tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. 
Deep learning A subclass of a broader family of machine learning methods based on artificial neural networks. The designation "deep" signifies multiple layers of the neural network. 
Entities A person, place, thing, or concept about which data can be collected. Examples in the clinical domain include diseases/disorders, signs/symptoms, procedures, medications, anatomical sites. 
F1 score |\frac{{( {2*Recall*Precision} )}}{{( {Recall + Precision} )}}$| Values range from 0 to 1 (perfect score). 
Graphics processing unit A specialized electronic circuit designed to perform very fast calculations needed for training artificial neural networks. 
K-nearest neighbors A nonparametric method used for classification and regression in pattern recognition. 
Latent representation Word representations that are not directly observed but are rather inferred through a mathematical model. 
Machine learning The scientific study of algorithms and probabilistic models that computer systems use in order to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. 
Precision |\frac{{( {TP} )}}{{( {TP + FP} )}}$| Where TP is true positive, and FP is false positive. 
Probabilistic methods A nonconstructive method, primarily used in combinatorics, for proving the existence of a prescribed kind of mathematical object. 
Recall |\frac{{( {TP} )}}{{( {TP + FN} )}}$| Where TP is true positive, and FN is false negative. 
Recurrent neural network A class of artificial neural networks. 
Rule-based system Systems involving human-crafted or curated rule sets. 
Semantic representation Ways in which the meaning of a word or sentence is interpreted. 
Supervised learning Machine learning method that infers a function from labeled training data consisting of a set of training examples. 
Support vector machine Supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. 
tensor A mathematical object analogous to but more general than a vector, represented by an array of components that are functions of the coordinates of a space. 
Transfer learning A machine learning technique where a model trained on one task is repurposed on a second related task. 
Unsupervised learning Self-organized Hebbian learning that helps find previously unknown patterns in data set without pre-existing labels. 
Word embedding The collective name for a set of language modeling and feature learning techniques in natural language processing (NLP), where words or phrases from the vocabulary are mapped to vectors of real numbers. 
TermDefinition
Accuracy |\frac{{( {TP + TN} )}}{{( {TP + FP + FN + TN} )}}$| Where TP is true positive; TN is true negative; FP is false positive; and FN is false negative. 
Artificial intelligence A process through which machines mimic "cognitive" functions that humans associate with other human minds, such as language comprehension. 
Area under the curve (AUC) A metric of binary classification; range from 0 to 1, 0 being always wrong, 0.5 representing random chance, and 1, the perfect score. 
Artificial neural network Computing systems that are inspired by, but not necessarily identical to, the biological neural networks that constitute human brain. 
Attribute Facts, details, or characteristics of an entity. 
Autoencoder A class of artificial neural networks. 
Concept mapping A diagram that depicts suggested relationships between concepts. 
Convolutional neural network A class of artificial neural networks. 
Decision tree A tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. 
Deep learning A subclass of a broader family of machine learning methods based on artificial neural networks. The designation "deep" signifies multiple layers of the neural network. 
Entities A person, place, thing, or concept about which data can be collected. Examples in the clinical domain include diseases/disorders, signs/symptoms, procedures, medications, anatomical sites. 
F1 score |\frac{{( {2*Recall*Precision} )}}{{( {Recall + Precision} )}}$| Values range from 0 to 1 (perfect score). 
Graphics processing unit A specialized electronic circuit designed to perform very fast calculations needed for training artificial neural networks. 
K-nearest neighbors A nonparametric method used for classification and regression in pattern recognition. 
Latent representation Word representations that are not directly observed but are rather inferred through a mathematical model. 
Machine learning The scientific study of algorithms and probabilistic models that computer systems use in order to perform a specific task effectively without using explicit instructions, relying on patterns and inference instead. 
Precision |\frac{{( {TP} )}}{{( {TP + FP} )}}$| Where TP is true positive, and FP is false positive. 
Probabilistic methods A nonconstructive method, primarily used in combinatorics, for proving the existence of a prescribed kind of mathematical object. 
Recall |\frac{{( {TP} )}}{{( {TP + FN} )}}$| Where TP is true positive, and FN is false negative. 
Recurrent neural network A class of artificial neural networks. 
Rule-based system Systems involving human-crafted or curated rule sets. 
Semantic representation Ways in which the meaning of a word or sentence is interpreted. 
Supervised learning Machine learning method that infers a function from labeled training data consisting of a set of training examples. 
Support vector machine Supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. 
tensor A mathematical object analogous to but more general than a vector, represented by an array of components that are functions of the coordinates of a space. 
Transfer learning A machine learning technique where a model trained on one task is repurposed on a second related task. 
Unsupervised learning Self-organized Hebbian learning that helps find previously unknown patterns in data set without pre-existing labels. 
Word embedding The collective name for a set of language modeling and feature learning techniques in natural language processing (NLP), where words or phrases from the vocabulary are mapped to vectors of real numbers. 

We highlight results measured in either accuracy, harmonic mean of recall/sensitivity and precision/positive predictive value (F1 score), or AUC (trade-off between true positive and false positive rates). These performance metrics reflect a comparison against human-generated data (referred to as gold-standard annotations); thus, they capture agreement between NLP systems and humans. Gold-standard annotations are also used for training algorithms (supervised learning). The interannotator agreement (IAA) measures human performance and serves as a system performance target.

The past 3 years have shown the development of a variety of methodologies for NLP with a general shift toward a particular machine learning category: deep learning (DL; ref. 23). DL techniques were initially conceived in the 1980s but not operationalized until the convergence of three critical elements: massive digital text corpora, novel but compute and data intensive algorithms, and powerful, massively parallel computing architectures currently using graphics processing units (GPU; ref. 24). For many tasks, DL is considered state-of-the-art in artificial intelligence (25–27). The key differentiator between DL and feature-rich machine learners is the concept of representation learning (28). Feature-rich algorithms require expert knowledge, linguistic, semantic, biomedical, or world, to determine the information of interest. Some examples of feature-rich learners are support vector machines (SVM) and random forests (RF; ref. 29). In the clinical domain, the engineered features are often guided by biomedical dictionaries, clinical ontologies, or biomedical knowledge from domain experts. Instead, DL models automatically discover mathematically and computationally convenient abstractions from raw data needed for classification without the need for explicitly defined features (23, 25). These representations can range from simple word representations and word embeddings (30) to complex hierarchies that capture contextual meaning and relationships between words, phrases, and other compositional derivatives. This capability of DL algorithms can potentially unmask unknown relationships buried within large quantities of data, which can be particularly advantageous in cancer research and practice (25). Furthermore, DL algorithms can uniquely take advantage of transfer learning (26), the ability to learn from data not in the target domain, and then apply this knowledge to other domains. For example, one DL model may be trained on large, openly available nonmedical text data (e.g., Wikipedia), and then this model's knowledge is applied effectively in cNLP tasks through fine tuning the model's parameters on smaller but directly relevant clinical text corpora.

Most DL architectures are built on the artificial neural network with interconnected nodes (neurons) arranged in layers (23). The variations in the arrangement and interconnections of these layers result in various elaborate networks, or architectures, suitable for addressing a variety of tasks. The most popular among these include: convolutional neural networks (CNN), optimal for data where spatial relationships encode critical information; recurrent neural networks (RNN), advantageous for sequentially ordered data (e.g., time-series data); and autoencoders, suitable for learning problems from noisy data, or data where prior information about data are partially or entirely unknown (23). There is a substantial amount of research in the general (as opposed to clinical) application of DL, demonstrating its potential in NLP (31).

Linguistic variability, combined with the abundance of medical terminology, abbreviations, synonyms, jargon, and spelling inconsistencies prevalent in clinical text, make cNLP a particularly challenging problem. DL has shown remarkable results in extracting low- and high-level abstractions from raw text data with semantic and syntactic capabilities. This ability is often accompanied by excellent performance across translational science applications (25, 32) and as highlighted below.

Task: extracting temporality and timelines

Longitudinal representations of patients' cancer journeys are a cornerstone of translational research enabling rich studies across variables (e.g., tumor molecular profile) and outcomes (e.g., treatment efficacy). Extracting timelines from the EMR free-text has become a line of cNLP research on its own. Since 2016, under the auspices of SemEval, Clinical TempEval shared tasks have challenged the NLP research community to establish state-of-the-art methods and results for temporal relation extraction with a focus on oncology. The dataset for these shared tasks consists of 400 patients with cancer distributed evenly between colon and brain cancers, each represented by pathology, radiology, and clinical notes (the THYME corpus described in ref. 33 and available from ref. 34). The tasks consisted of identifying event expressions, time expressions, and temporal relations (see Fig. 1 for an example). The relation between the event and the document creation time is called DocTimeRel with values of BEFORE, OVERLAP, BEFORE-OVERLAP, and AFTER, which provide a course-level temporal positioning on a timeline.

Figure 1.

Clinical TempEval example: two events, one time expression, two temporal relations, two relations to the document creation time (DocTimeRel).

Figure 1.

Clinical TempEval example: two events, one time expression, two temporal relations, two relations to the document creation time (DocTimeRel).

Close modal

Clinical TempEval 2016 (35) focused on developing methods from colon cancer EMR data and testing on colon cancer data (within-domain evaluation). The results suggest that current state-of-the-art systems perform extremely well on most event- and time expression- related tasks, gap between system performance and IAA (or human performance) < 0.05 F1. However, the temporal relation tasks remained a challenge. Systems that predict DocTimeRel relation lagged about 0.09 F1 behind IAA. For other types of temporal relations, systems lagged about 0.25 F1 behind IAA.

Clinical TempEval 2017 (36) addressed the question of how well systems trained on one cancer medical domain (colon cancer) perform in predicting timelines in another cancer medical domain (brain cancer). The results showed that is an open research question with a 0.20+ F1 drop across domains. Providing a small amount of target domain training data improved performance.

Methods employed by the Clinical TempEval participants range from classic methods (logistic regression, conditional random fields, SVMs, pattern matching) to various architectures of latest DL techniques (RNNs, CNNs with inputs of word and character embeddings). Clinical TempEval 2017 showed there was no one specific method that provides the best results, although the combination of various approaches appeared a promising path.

Outside of Clinical TempEval, experimentation with advanced DL architectures and various data streams for timeline extraction of cancer patient EMRs has intensified. Tourille and colleagues explored neural networks and domain adaptation strategies (37). Chen and colleagues (38) and Dligach and colleagues (39) dealt with simplifications of time expression representations in a neural approach. Some latest trends include DL models that combine a small portion of labeled data with unlabeled publicly available data [Google News (30) and social media] to achieve results about 0.02 F1 below IAA (40). The current best reported result is 0.684 F1 (41).

Open source systems for timeline extraction include Apache cTAKES temporal module (42), Heidel–Time (for temporal expressions and their normalization; ref. 43), and rule-based extensions of Stanford CoreNLP (44).

The task of extracting temporality from EMR clinical narrative has advanced dramatically since 2016. In the last 3 years, the performance on the Clinical TempEval test set moved from 0.573 to 0.684 F1 for finer grained temporal relations and reached 0.835 F1 for DocTimeRel. This last result enables exploring select temporally sensitive applications such as outcomes extraction, which was pointed out as one of the most challenging yet to be addressed use cases in the 2016 survey article.

Application: extracting tumor and cancer characteristics

Information extraction from pathology reports, which have a more consistent structure than other free text EMR documents, presents a tractable challenge to the field of cNLP (45). Since the 2016 survey, the oncology NLP field has moved beyond cancer stage and TNM extraction into the extraction of more comprehensive cancer and tumor attributes. Qiu and colleagues (46) presented a CNN for information abstraction of primary cancer site topography from breast and lung cancer pathology reports from the Louisiana Cancer Registry, reporting 0.72 F1. Using the same corpus, Gao and colleagues (47) boosted performance using a more elaborate DL architecture (hierarchical attention neural network). The authors reported 0.80 F1 for cancer site topography and 0.90 F1 for histologic grade. However, the authors noted significant computational demands of their DL solution.

Alawad and colleagues (48) showed that for extraction of cancer primary site, histologic grade, and laterality, training CNN to make multiple predictions simultaneously (multi-task learning) outperformed single task models. In a later study, the authors explored the computational demands of CNN cNLP models and the role of high-performance computing for achieving population-level automated coding of pathology documents to achieve near real-time cancer surveillance for cancer registry development (49). Using a corpus of 23,000 pathology reports, they reported 0.84 F1 for primary cancer site extraction across 64 cancer sites using their CNN model, significantly outperforming a random forest classifier with 0.76 F1.

Yala and colleagues (50) used boosting (51) to extract tumor information from breast pathology reports and achieved 90% accuracy for extracting carcinoma and atypia categories. Because gold-standard datasets are a necessary but resource-intensive requirement of ML algorithms, this study also investigated the minimum number of annotations needed to maintain at least 0.9 F1 without the system being pretrained. They reported this to be approximately 400. Using similar methods, Acevedo and colleagues (52) found the rate of abnormal findings in asymptomatic patients to be 7%, and to increase with age. These results are higher than previously reported, suggesting the clinical value of these algorithms over current epidemiologic methods to measure cancer incidence and prevalence. In a study of multiple diseases, Gehrmann and colleagues (25) reported an improvement in F1 score and AUC for advanced cancer using CNNs over rule-based systems.

The open source DeepPhe platform (53, 54) is a hybrid system for extracting a number of tumor and cancer attributes. It implements a variety of artificial intelligence approaches, rules, domain knowledge bases, machine learning (feature-rich and DL), to crawl the entire cancer patient chart (not restricted to pathology notes), extract, and summarize the information related to tumors and cancers and their characteristics. The IAA ranged from 0.46 to 1.00 F1, and system agreement with humans ranged from 0.32 to 0.96 F1. System highest result is on primary site extraction (0.96 F1); lowest: PR method extraction (0.32 F1).

Castro and colleagues (55) developed an NLP system to annotate and classify all BI-RADS mentions present in a single radiology report, which can serve as the foundation for future studies that will leverage automated BI-RADS annotation, providing feedback to radiologists as part of a learning health system loop (56).

Application: clinical trials matching

Clinical trials determine safety and effectiveness of new medical treatments; with the successes of recent years including new classes of therapies (e.g., immunotherapy; CAR-T cells), the clinical trial landscape has exploded. Nevertheless, adult patient participation in clinical trials remains low, especially among underrepresented minorities. This limits trial completion, generalizability, and interpretation of trial findings. Thus, there is a great deal of interest in clinical trial matching. This is not a simple problem, given the need to extract information from trial protocols written in natural language and match the findings with characteristics from individual EMRs.

Since the 2016 survey article (10), researchers have explored DL technology to identify relevant information found in patients' EMRs to establish eligibility for clinical trials. Bustos and colleagues developed a CNN, leveraging its representation learning capability, to extract medical knowledge reflecting eligibility criteria from clinical trials (57). They reported promising results using CNNs compared with state-of-the-art classification algorithms including FastText (58), SVM, and k-Nearest Neighbors (kNN). Shivade and colleagues (59) and Zhang and colleagues (60) developed SVMs to automate the classification of eligibility criteria to facilitate trial matching for specific patient populations.

Yala and colleagues (50) and Osborne and colleagues (61) used Boostexter (62) and MetaMap (13, 14) respectively on rule-based regular expressions to automatically extract relevant patient information from EMRs, predominantly free-text reports, to identify patient cohorts with characteristics of interest for clinical trials or other relevant reporting. There are also a panoply of commercial solutions emerging in this space, but our search did not reveal any publications by these commercial entities.

Application: pharmacovigilance and pharmacoepidemiology

Pharmacovigilance, drug–safety surveillance, and factors associated with nonadherence play an important role in improving patient outcomes by personalizing cancer treatments, monitoring, and understanding adverse drug events (ADE) as well as minimizing risks associated with different therapies. The 2016 survey article identifies outcomes extraction as one of the challenges for cNLP because temporality extraction plays a key role. With the advances in temporality extraction in the last three years (see section Extracting Temporality and Timelines), methods for outcomes extraction have also improved.

A variety of methods have been explored including logistic regression, SVM, random forest, decision tree, and DL to analyze EMR data to predict treatment prescription, quality of care, and health outcomes of patients with cancer. Using data from the SEER (3) cancer registry as gold-standard for cancer stages, and variables extracted from linked Medicare claims data, Bergquist and colleagues (63) classified patients with lung cancer receiving chemotherapy into different stages of severity, with a hybrid method of rules and ensemble ML algorithms. This system achieved 93% accuracy demonstrating its potential applications to study the quality of care for patients with lung cancer and health outcomes.

Survival analysis plays an important role for clinical decision support. In oncology care, the choice of treatment depends greatly on prognosis, sometimes difficult for physicians to determine. Gensheimer and colleagues (64) proposed a hybrid pipeline that combines semantic data mining with neural embeddings of sequential clinical notes and outputs a probability of >3 months life expectancy.

Yang and colleagues (65) applied a tensorized RNN on sequential clinical records to extract a latent representation from the entire patient history, and used it as the input to an Accelerated Failure Time model to predict the survival time of metastatic breast cancer patients. Yin and colleagues (66) applied word embeddings to discover topics in patient-provider communications associated with an increased likelihood of early treatment discontinuation in the adjuvant breast cancer setting. Overall, treatment toxicity extraction remains an open research area.

Recent years have seen cancer cNLP tasks tackled occasionally at mainstream NLP conferences and affiliated workshops (in open-domain NLP, top research is preferentially presented at conferences). Although still relatively rare, this has the potential to greatly benefit cancer cNLP research, with a larger community of NLP researchers working directly on these problems in addition to the more specialized cNLP community. The prerequisite for this trend to continue is access to shareable data resources as also pointed out in the 2016 survey article. The colon and brain cancer THYME corpus was used in several general domain conference and workshop articles (37, 38, 40, 67–69), whereas a radiology report dataset from a 2007 challenge (available from ref. 70) was used in another (71), and SEER-provided (although unshared thus not available for distribution) corpus was used in yet another (72). Other work using ad hoc resources has been used for methods development but this is a less sustainable model due to the rarity of expertise in both cancer and NLP (73–75). A recently developed resource created gold-standard annotations of the semantics of sentences in notes describing patients with cancer (76). More shared resources, community challenges, and publicity for both, will likely lead to more focused development of new methods for cancer information extraction, a challenge that the community needs to address.

The focus of our survey article is on NLP technologies for cancer translational studies. However, we briefly review the applications of these technologies for direct patient care, which has rightfully proceeded with caution given that even small system error rates could lead to harm. Lee and colleagues (77) studied concordance of IBM Watson for Oncology, a commercial NLP-based treatment recommendation system, with the recommendations of local experts and it was 48.9%. Similar results are reported in (78, 79). Furthermore, such applications are treated as Software as Medical Device (SaMD) by the FDA, which, justifiably, is a high bar to clear (80, 81). Some cautious use cases provide assistance to physicians (82, 83) in the form of question-answering and summarization. Voice tools in health care, which represent a distinct subdomain of NLP, are primarily used for (i) documentation; (ii) commands; and (iii) interactive response and navigation to patients (84).

As discussed above, NLP technology for cancer has made strides since the 2016 article paper, which states that at that time “oncology-specific NLP is still in its infancy.” Given the breadth and depth of the research we surveyed in the current article, we believe the field has expanded enabled by state-of-the-art methods and abundant digital EMR data. We observe more collaborations between NLPers and oncologists, which was one of the take-away lessons from Yim and colleagues.

State-of-the-art machine learning methods require significant amounts of human-labeled data to learn from, which is expensive and time-consuming. This presents a methodologic challenge toward learning paradigms from vast unlabeled datasets (lightly supervised or unsupervised methods). Another challenge lies in the portability of the machine learners as they represent the distributions of the data they learned from. If translated to a domain with a different distribution (e.g., colorectal to brain cancer), there is a substantial drop in performance (see section Extracting Temporality and Timelines). Thus, domain adaptation remains an unsolved and hot scientific problem. Large-scale translational science is likely to cross country boundaries and harvest data from EMRs written in a variety of languages. Therefore, the cNLP research community needs to think about multilingual machine learning to enable such bold studies. On the hardware side, DL methods require vast computational resources available only to a very few and not necessarily solvable by a cloud computing environment. Last but not least, ethical considerations of the application of these powerful technologies should be discussed, at the bare minimum whether the underlying data on which machine learners are trained represents the whole of human diversity.

In research, real-world big data have great potential to improve cancer care. Gregg and colleagues present a risk stratification research for prostate cancer (85). The utilization of real-world big data is a key focus area of the NCI (86). SEER and NCDB, the two major cancer registry databases in the United States, have limitations in terms of coverage, accuracy, and granularity that introduce bias (3, 4, 87, 88, 89, 90). Currently, database building requires manual annotation of clinical free-text, which is resource intensive and prone to human error. cNLP can support more rapid, large-scale, and standardized database development. Automated, semiautomated, and accurate identification of cancer cases will be particularly helpful in studying underrepresented patient populations and rare cancers. In addition, cNLP can facilitate analysis of unstructured data that are poorly documented in databases but widely accepted to be critical for prognostication and management decision-making, most notably patient-reported outcomes (91). Our hope is that larger, more accurate, and granular clinical databases can be integrated with -omics databases to enable translational research to better understand oncologic phenotype relationships. This data convergence has the potential to enable new insights about cancer initiation, progression, metastasis, and response to treatment.

Although NLP has yet to make major inroads in the clinical setting, some of the potential applications are clear. Direct extraction of cancer phenotypes from source data (pathology and radiology reports) could reduce redundancy and prevent ambiguity within a patient's chart, minimizing confusion and medical errors. Summarization and information retrieval applications can reduce search burden and enable clinicians to spend more time with their patients. Clinical decision support tools could help reduce the increasingly burdensome cognitive load placed on clinicians, although the results reported thus far by efforts such as IBM Watson for Oncology raise serious concerns about what the bar for accuracy of clinical recommendations should be for routine use. In fact, these results are a cautionary tale of the challenges of domain adaptation; the software was widely reported to have been trained on hypothetical cases at a highly specialized cancer center, leading to incorrect and possibly unsafe recommendations (92). At this time, NLP technology is not yet ripe for direct patient care except in carefully observed scenarios.

cNLP has the potential to affect almost all aspects of the cancer care continuum, and multidisciplinary collaboration is necessary to ensure optimal advancement of the field. As there are few individuals with expertise in both oncology and NLP, clinical oncologists, basic and translational scientists, bioinformaticians, and epidemiologists should work with computer scientists to identify and prioritize the most important clinical questions and tasks that can be addressed with this technology. Furthermore, oncology subject matter experts will be needed to create gold datasets. Once an NLP technology is developed, oncologists and cancer researchers should take a primary role in evaluating it to determine its utility for research and their clinical value. Although standards for clinical evaluation of software, including artificial intelligence systems, are evolving (93), NLP tools that directly affect management decisions should be considered for evaluation in a trial setting by clinical investigators familiar with the technology and FDA guidelines (80). In partnership, computer scientists, oncology researchers, and clinicians can take full advantage of the recent advances in NLP technology to fully leverage the wealth of data stored and rapidly accumulating in our EMRs.

No potential conflicts of interest were disclosed.

The work was supported by funding from U24CA184407 (NCI), U01CA231840 (NCI), R01 LM 10090 (LM), and R01GM114355 (NIGMS). This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

1.
Cohen
MF
. 
Impact of the HITECH financial incentives on EHR adoption in small, physician-owned practices
.
Int J Med Inf
2016
;
94
:
143
54
.
2.
GovTrack.us.
H.R. 1 (111th): American Recovery and Reinvestment Act of 2009 – House Vote #46 – Jan 28, 2009
.
[cited 2019 Feb 11]. Available from:
https://www.govtrack.us/congress/votes/111-2009/h46.
3.
National Cancer Institute.
Surveillance, Epidemiology, and End Results Program
.
SEER. [cited 2019 Feb 11]. Available from
: https://seer.cancer.gov/index.html.
4.
National Cancer Database
. 
American College of Surgeons
.
[cited 2019 Feb 11]. Available from
: https://www.facs.org/quality-programs/cancer/ncdb.
5.
The Cancer Genome Atlas Home Page
.
The Cancer Genome Atlas - National Cancer Institute
. 
2011
[cited 2019 Feb 11]. Available from:
https://cancergenome.nih.gov/.
6.
National Cancer Institute.
Human Tumor Atlas Network (HTAN)
.
[cited 2019 Feb 11]. Available from:
https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/implementation/human-tumor-atlas.
7.
Rosenbloom
ST
,
Denny
JC
,
Xu
H
,
Lorenzi
N
,
Stead
WW
,
Johnson
KB
. 
Data from clinical notes: a perspective on the tension between structure and flexible documentation
.
J Am Med Inform Assoc
2011
;
18
:
181
6
.
8.
Spyns
P
. 
Natural language processing in medicine: an overview
.
Methods Inf Med
1996
;
35
:
285
301
.
9.
Meystre
SM
,
Savova
GK
,
Kipper-Schuler
KC
,
Hurdle
JF
. 
Extracting information from textual documents in the electronic health record: a review of recent research
.
Yearb Med Inform
2008
;
128
44
.
10.
Yim
WW
,
Yetisgen
M
,
Harris
WP
,
Kwan
SW
. 
Natural language processing in oncology: a review
.
JAMA Oncol
2016
;
2
:
797
804
.
11.
Névéol
A
,
Dalianis
H
,
Velupillai
S
,
Savova
G
,
Zweigenbaum
P
. 
Clinical natural language processing in languages other than English: opportunities and challenges
.
J Biomed Semant
2018
;
9
:
12
.
doi: 10.1186/s13326-018-0179-8
.
12.
Kreimeyer
K
,
Foster
M
,
Pandey
A
,
Arya
N
,
Halford
G
,
Jones
SF
, et al
Natural language processing systems for capturing and standardizing unstructured clinical information: a systematic review
.
J Biomed Inform
2017
;
73
:
14
29
.
13.
Aronson
AR
. 
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program
.
Proc AMIA Symp
2001
;
17
21
.
14.
Aronson
AR
,
Lang
FM
. 
An overview of MetaMap: historical perspective and recent advances
.
J Am Med Inform Assoc
2010
;
17
:
229
36
.
15.
Savova
GK
,
Masanz
JJ
,
Ogren
PV
,
Zheng
J
,
Sohn
S
,
Kipper-Schuler
KC
, et al
Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications
.
J Am Med Inform Assoc
2010
;
17
:
507
13
.
16.
ctakes.apache.org. [homepage on the Internet]. The Apache Software Foundation. [cited 2019 Feb 11]. Available from:
ctakes.apache.org.
17.
Garla
V
,
Lo Re
V
,
Dorey-Stein
Z
,
Kidwai
F
,
Scotch
M
,
Womack
J
, et al
The Yale cTAKES extensions for document classification: architecture and application
.
J Am Med Inform Assoc
2011
;
18
:
614
20
.
18.
www.obofoundry.org
[homepage on the Internet]. [cited 2019 Feb 11]. Available from:
www.obofoundry.org.
19.
TIES v5; clinical text search engine
.
[cited 2019 Feb 11]. Available from:
http://ties.dbmi.pitt.edu/.
20.
Friedman
C
. 
A broad-coverage natural language processing system
.
Proc AMIA Symp
2000
;
270
4
.
21.
Soysal
E
,
Wang
J
,
Jiang
M
,
Wu
Y
,
Pakhomov
S
,
Liu
H
, et al
CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines
.
J Am Med Inform Assoc
2017 Nov 24 [Epub ahead of print]. doi: 10.1093/jamia/ocx132
.
22.
Tseytlin
E
,
Mitchell
K
,
Legowski
E
,
Corrigan
J
,
Chavan
G
,
Jacobson
RS
. 
NOBLE – Flexible concept recognition for large-scale biomedical natural language processing
.
BMC Bioinformatics
2016
;
17
:
32
.
23.
Goodfellow
I
,
Bengio
Y
,
Courville
A
.
Deep learning
.
MIT Press
; 
2016
[cited 2019 Feb 12]. Available from:
http://www.deeplearningbook.org.
24.
Rumelhart
DE
,
Hinton
GE
,
Williams
RJ
. 
Learning representations by back-propagating errors
.
Nature
1986
;
323
:
533
.
25.
Gehrmann
S
,
Dernoncourt
F
,
Li
Y
,
Carlson
ET
,
Wu
JT
,
Welt
J
, et al
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives
.
PLoS ONE
2018
;
13
:
e0192360
.
26.
Young
T
,
Hazarika
D
,
Poria
S
,
Cambria
E
. 
Recent trends in deep learning based natural language processing
.
Ieee Comput Intell Mag
2018
;
13
:
55
75
.
27.
Goldberg
Y
. 
A primer on neural network models for natural language processing
.
J Artif Intell Res
2016
;
57
:
345
420
.
28.
Bengio
Y
,
Courville
A
,
Vincent
P
. 
Representation Learning: A Review and New Perspectives
.
ArXiv12065538 Cs
. 
2012
Jun 24
[cited 2019 Feb 13]. Available from:
http://arxiv.org/abs/1206.5538.
29.
Manning
CD
,
Raghavan
P
,
Schütze
H
.
Introduction to information retrieval
.
Cambridge University Press
; 
2008
.
30.
Mikolov
T
,
Sutskever
I
,
Chen
K
,
Corrado
GS
,
Dean
J
. 
Distributed representations of words and phrases and their compositionality
.
In
:
Burges
CJC
,
Bottou
L
,
Welling
M
,
Ghahramani
Z
,
Weinberger
KQ
,
editors
.
Advances in Neural Information Processing Systems 26
.
Curran Associates, Inc.
; 
2013
[cited 2019 Jan 3]. p.
3111
9
.
Available from:
http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
31.
LeCun
Y
,
Bengio
Y
,
Hinton
G
. 
Deep learning
.
Nature
2015
;
521
:
436
44
.
32.
Banerjee
I
,
Ling
Y
,
Chen
MC
,
Hasan
SA
,
Langlotz
CP
,
Moradzadeh
N
, et al
Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification
.
Artif Intell Med
2019
;
97
:
79
88
.
33.
Styler
WF
,
Bethard
S
,
Finan
S
,
Palmer
M
,
Pradhan
S
,
de Groen
PC
, et al
Temporal annotation in the clinical domain
.
Trans Assoc Comput Linguist
2014
;
2
:
143
54
.
34.
THYME corpus (available through hNLP Center membership). Available from:
https://healthnlp.hms.harvard.edu/center/pages/data-sets.html.
35.
Bethard
S
,
Savova
G
,
Chen
W-T
,
Derczynski
L
,
Pustejovsky
J
,
Verhagen
M
. 
SemEval-2016 Task 12: clinical TempEval
.
In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016). San Diego, CA: Association for Computational Linguistics
; 
2016
[cited 2019 Jan 3]. p.
1052
62
.
Available from
: http://www.aclweb.org/anthology/S16-1165.
36.
Bethard
S
,
Savova
G
,
Palmer
M
,
Pustejovsky
J
. 
SemEval-2017 Task 12: Clinical TempEval. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Vancouver
,
Canada
:
Association for Computational Linguistics
; 
2017
[cited 2019 Jan 2]. p.
565
72
.
Available from:
http://www.aclweb.org/anthology/S17-2093.
37.
Tourille
J
,
Ferret
O
,
Neveol
A
,
Tannier
X
. 
Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers
.
In:
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 2: short papers). Vancouver, Canada: Association for Computational Linguistics
; 
2017
[cited 2019 Jan 3]. p.
224
30
.
Available from:
http://aclweb.org/anthology/P17-2035.
38.
Lin
C
,
Miller
T
,
Dligach
D
,
Bethard
S
,
Savova
G
. 
Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks
.
In:
BioNLP 2017
.
Vancouver, Canada
:
Association for Computational Linguistics
; 
2017
[cited 2019 Jan 3]. p.
322
7
.
Available from:
http://www.aclweb.org/anthology/W17-2341.
39.
Dligach
D
,
Miller
T
,
Lin
C
,
Bethard
S
,
Savova
G
.
Neural Temporal Relation Extraction
.
In:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics
(
Volume 2
:
Short Papers
).
Valencia, Spain
:
Association for Computational Linguistics
; 
2017
[cited 2019 Jan 3]. p. 746–51. Available from:
http://www.aclweb.org/anthology/E17-2118.
40.
Lin
C
,
Miller
T
,
Dligach
D
,
Amiri
H
,
Bethard
S
,
Savova
G
. 
Self-training improves recurrent Neural Networks performance for Temporal Relation Extraction
.
In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis. Brussels, Belgium: Association for Computational Linguistics
; 
2018
[cited 2019 Jan 3]. p.
165
76
.
Available from
: http://www.aclweb.org/anthology/W18-5619.
41.
Lin
C
,
Miller
T
,
Dligach
D
,
Bethard
S
,
Savova
G
.
A BERT-based Universal Model for Both Within- and Cross-sentence Clinical Temporal Relation Extraction
.
In: Clinical NLP Workshop
.
Minneapolis, MN
; 
2019
.
42.
Lin
C
,
Dligach
D
,
Miller
TA
,
Bethard
S
,
Savova
GK
. 
Multilayered temporal modeling for the clinical domain
.
J Am Med Inform Assoc
2016
;
23
:
387
95
.
43.
Strötgen
J
,
Gertz
M
. 
Multilingual and cross-domain temporal tagging
.
Lang Resour Eval
2013
;
47
:
269
98
.
44.
Manning
C
,
Surdeanu
M
,
Bauer
J
,
Finkel
J
,
Bethard
S
,
McClosky
D
. 
The Stanford CoreNLP natural language processing toolkit
.
In:
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland: Association for Computational Linguistics
; 
2014
[cited 2019 Jan 3]. p.
55
60
.
Available from
: http://aclweb.org/anthology/P14-5010.
45.
Liu
K
,
Hogan
WR
,
Crowley
RS
. 
Natural language processing methods and systems for biomedical ontology learning
.
J Biomed Inform
2011
;
44
:
163
79
.
46.
Qiu
JX
,
Yoon
HJ
,
Fearn
PA
,
Tourassi
GD
. 
Deep Learning for automated Extraction of Primary Sites From Cancer Pathology Reports
.
IEEE J Biomed Health Inform
2018
;
22
:
244
51
.
47.
Gao
S
,
Young
MT
,
Qiu
JX
,
Yoon
H-J
,
Christian
JB
,
Fearn
PA
, et al
Hierarchical attention networks for information extraction from cancer pathology reports
.
J Am Med Inform Assoc
2017 Nov 16 [Epub ahead of print]. doi: 10.1093/jamia/ocx131
.
48.
Alawad
M
,
Yoon
H
,
Tourassi
GD
. 
Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports
.
In: 2018 IEEE EMBS International Conference on Biomedical Health Informatics (BHI)
; 
2018
.
p.
218
21
.
49.
HPC-Based Hyperparameter Search of MT-CNN for Information Extraction from Cancer Pathology Reports
.
[cited 2019 Feb 12]. Available from
: https://sc18.supercomputing.org/proceedings/workshops/workshop_pages/ws_cafcw107.html.
50.
Yala
A
,
Barzilay
R
,
Salama
L
,
Griffin
M
,
Sollender
G
,
Bardia
A
, et al
Using machine learning to parse breast pathology reports
.
Breast Cancer Res Treat
2017
;
161
:
203
11
.
51.
Schapire
RE
. 
The boosting approach to machine learning: an overview
.
Nonlinear Estimation and Classification. Springer; 2003 [cited 2019 Feb 11]. Available from:
https://www.cs.princeton.edu/courses/archive/spring07/cos424/papers/boosting-survey.pdf.
52.
Acevedo
F
,
Armengol
VD
,
Deng
Z
,
Tang
R
,
Coopey
SB
,
Braun
D
, et al
Pathologic findings in reduction mammoplasty specimens: a surrogate for the population prevalence of breast cancer and high-risk lesions
.
Breast Cancer Res Treat
2019
;
173
:
201
7
.
53.
Savova
GK
,
Tseytlin
E
,
Finan
S
,
Castine
M
,
Miller
T
,
Medvedeva
O
, et al
DeepPhe: a natural language processing system for extracting cancer phenotypes from clinical records
.
Cancer Res
2017
;
77
:
e115
8
.
54.
Public release of the DeepPhe analytic software
. 
DeepPhe
; 
2019
[cited 2019 Feb 14]. Available from:
https://github.com/DeepPhe/DeepPhe-Release.
55.
Castro
SM
,
Tseytlin
E
,
Medvedeva
O
,
Mitchell
K
,
Visweswaran
S
,
Bekhuis
T
, et al
Automated annotation and classification of BI-RADS assessment from radiology reports
.
J Biomed Inform
2017
;
69
:
177
87
.
56.
Chandran
UR
,
Medvedeva
OP
,
Barmada
MM
,
Blood
PD
,
Chakka
A
,
Luthra
S
, et al
TCGA expedition: a data acquisition and management system for TCGA Data
.
PLoS ONE
2016
;
11
.
[cited 2019 May 29]. Available from:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5082933/.
57.
Bustos
A
,
Pertusa
A
. 
Learning eligibility in cancer clinical trials using deep neural networks
.
Appl Sci
2018
;
8
:
1206
.
58.
Joulin
A
,
Grave
E
,
Bojanowski
P
,
Mikolov
T
. 
Bag of tricks for efficient text classification
.
ArXiv160701759 Cs
. 
2016
Jul 6
[cited 2019 Feb 15]. Available from:
http://arxiv.org/abs/1607.01759.
59.
Shivade
C
,
Hebert
C
,
Regan
K
,
Fosler-Lussier
E
,
Lai
AM
. 
Automatic data source identification for clinical trial eligibility criteria resolution
.
AMIA Annu Symp Proc
2017
;
2016
:
1149
58
.
60.
Zhang
K
,
Demner-Fushman
D
. 
Automated classification of eligibility criteria in clinical trials to facilitate patient-trial matching for specific patient populations
.
J Am Med Inform Assoc
2017
;
24
:
781
7
.
61.
Osborne
JD
,
Wyatt
M
,
Westfall
AO
,
Willig
J
,
Bethard
S
,
Gordon
G
. 
Efficient identification of nationally mandated reportable cancer cases using natural language processing and machine learning
.
J Am Med Inform Assoc
2016
;
23
:
1077
84
.
62.
Schapire
RE
,
Singer
Y
. 
BoosTexter: a boosting-based system for text categorization
.
Mach Learn
2000
;
39
:
135
68
.
63.
Bergquist
SL
,
Brooks
GA
,
Keating
NL
,
Landrum
MB
,
Rose
S
. 
Classifying lung cancer severity with ensemble machine learning in health care claims data
.
Proc Mach Learn Res
2017
;
68
:
25
38
.
64.
Gensheimer
MF
,
Henry
AS
,
Wood
DJ
,
Hastie
TJ
,
Aggarwal
S
,
Dudley
SA
, et al
Automated survival prediction in metastatic cancer patients using high-dimensional electronic medical record data
.
J Natl Cancer Inst
2018 Oct 21 [Epub ahead of print]
.
65.
Yang
Y
,
Fasching
PA
,
Tresp
V
.
Modeling Progression Free Survival in Breast Cancer with Tensorized Recurrent Neural Networks and Accelerated Failure Time Models. Proceedings of Machine Learning for Healthcare 2017. [cited 2019 Feb 11]. Available from:
http://mucmd.org/CameraReadySubmissions/37%5CCameraReadySubmission%5CPFS_TTRNN_AFT_CameraReady.pdf.
66.
Yin
Z
,
Harrell
M
,
Warner
JL
,
Chen
Q
,
Fabbri
D
,
Malin
BA
. 
The therapy is making me sick: how online portal communications between breast cancer patients and physicians indicate medication discontinuation
.
J Am Med Inform Assoc
2018
;
25
:
1444
51
.
67.
Lin
C
,
Miller
T
,
Dligach
D
,
Bethard
S
,
Savova
G
. 
Improving temporal relation extraction with training instance augmentation
.
In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Berlin, Germany: Association for Computational Linguistics
; 
2016
.
p.
108
13
.
68.
Galvan
D
,
Okazaki
N
,
Matsuda
K
,
Inui
K
. 
Investigating the challenges of temporal relation extraction from clinical text
.
In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
.
Brussels, Belgium: Association for Computational Linguistics
; 
2018
.
p.
55
64
.
69.
Leeuwenberg
A
,
Moens
MF
. 
Word-Level loss extensions for neural temporal relation classification
.
In: Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, NM: Association for Computational Linguistics
. 
2018
.
p.
3436
47
.
70.
ICD-9 radiology corpus (available through hNLP Center membership
).
[cited 2019 Feb 11]. Available from
: https://healthnlp.hms.harvard.edu/center/pages/data-sets.html.
71.
Karimi
S
,
Dai
X
,
Hassanzadeh
H
,
Nguyen
A
. 
Automatic diagnosis coding of radiology reports: a comparison of deep learning and conventional classification methods
.
BioNLP
2017
2017
;
328
32
.
72.
Zamaraeva
O
,
Howell
K
,
Rhine
A
. 
Improving feature extraction for pathology reports with precise negation scope detection
.
In: Proceedings of the 27th International Conference on Computational Linguistics
. 
2018
.
p.
3564
75
.
Available from:
https://www.aclweb.org/anthology/C18-1302/.
73.
Jagannatha
A
. 
Structured prediction models for RNN based sequence labeling in clinical text
.
In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
. 
2016
.
p.
856
65
.
Available from:
https://www.aclweb.org/anthology/D16-1082/.
74.
Jagannatha
AN
,
Yu
H.
Bidirectional RNN for Medical Event Detection in Electronic Health Records
.
In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics
; 
2016
:
p.
473
82
.
[cited 2019 Jan 18]. Available from
: http://aclweb.org/anthology/N16-1056.
75.
Shivade
C
,
de Marneffe
M-C
,
Fosler-Lussier
E
,
Lai
AM
. 
Identification, characterization, and grounding of gradable terms in clinical text
.
In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Berlin, Germany: Association for Computational Linguistics
; 
2016
.
p.
17
26
.
76.
Roberts
K
,
Si
Y
,
Gandhi
A
,
Bernstam
E
. 
A framenet for cancer information in clinical narratives: schema and annotation
.
In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018). Miyazaki, Japan: European Language Resource Association
; 
2018
[cited 2019 Jan 3]. Available from:
http://aclweb.org/anthology/L18-1041.
77.
Lee
WS
,
Ahn
SM
,
Chung
JW
,
Kim
KO
,
Kwon
KA
,
Kim
Y
, et al
Assessing concordance with watson for oncology, a cognitive computing decision support system for colon cancer treatment in Korea
.
JCO Clin Cancer Inform
2018
;
2
:
1
8
.
78.
Kim
EJ
,
Woo
HS
,
Cho
JH
,
Sym
SJ
,
Baek
JH
,
Lee
WS
, et al
Early experience with Watson for oncology in Korean patients with colorectal cancer
.
PLoS One
2019
;
14
:
e0213640
.
79.
Choi
YI
,
Chung
JW
,
Kim
KO
,
Kwon
KA
,
Kim
YJ
,
Park
DK
, et al
Concordance rate between clinicians and watson for oncology among patients with advanced gastric cancer: early, real-world experience in Korea
.
Can J Gastroenterol Hepatol
2019
;
2019
:
8072928
.
80.
U.S. Food and Drug Administration
.
Artificial intelligence and machine learning in software as a medical device
. 
2019
Apr 2
[cited 2019 Jun 6]. Available from:
https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device.
81.
U.S. Food and Drug Administration. Proposed regulatory framework for modifications to artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD)
.
[cited 2019 Jun 6]. Available from:
https://www.fda.gov/media/122535/download.
82.
Schuler
A
,
Callahan
A
,
Jung
K
,
Shah
NH
. 
Performing an informatics consult: methods and challenges
.
J Am Coll Radiol JACR
2018
;
15
:
563
8
.
83.
Hirsch
JS
,
Tanenbaum
JS
,
Lipsky Gorman
S
,
Liu
C
,
Schmitz
E
,
Hashorva
D
, et al
HARVEST, a longitudinal patient record summarizer
.
J Am Med Inform Assoc
2015
;
22
:
263
74
.
84.
Kumah-Crystal
YA
,
Pirtle
CJ
,
Whyte
HM
,
Goode
ES
,
Anders
SH
,
Lehmann
CU
. 
Electronic health record interactions through voice: a review
.
Appl Clin Inform
2018
;
9
:
541
52
.
85.
Gregg
JR
,
Lang
M
,
Wang
LL
,
Resnick
MJ
,
Jain
SK
,
Warner
JL
, et al
Automating the determination of prostate cancer risk strata from electronic medical records
.
JCO Clin Cancer Inform
2017
;
1
.
doi: 10.1200/CCI.16.00045
.
86.
National Cancer Institute. Hope and challenge: the NCI annual plan and budget proposal for fiscal year 2020
. 
2018
[cited 2019 Feb 11]. Available from
: https://www.cancer.gov/news-events/cancer-currents-blog/2018/sharpless-nci-annual-plan-2020.
87.
Giordano
SH
,
Kuo
YF
,
Duan
Z
,
Hortobagyi
GN
,
Freeman
J
,
Goodwin
JS
. 
Limits of observational data in determining outcomes from cancer therapy
.
Cancer
2008
;
112
:
2456
66
.
88.
Noone
AM
,
Lund
JL
,
Mariotto
A
,
Cronin
K
,
McNeel
T
,
Deapen
D
, et al
Comparison of SEER treatment data with medicare claims
.
Med Care
2016
;
54
:
e55
64
.
89.
Baldwin
LM
,
Adamache
W
,
Klabunde
CN
,
Kenward
K
,
Dahlman
C
,
L Warren
J
. 
Linking physician characteristics and medicare claims data: issues in data availability, quality, and measurement
.
Med Care
2002
;
40
(8 Suppl):IV-82–95
.
90.
Lerro
CC
,
Robbins
AS
,
Phillips
JL
,
Stewart
AK
. 
Comparison of cases captured in the national cancer data base with those in population-based central cancer registries
.
Ann Surg Oncol
2013
;
20
:
1759
65
.
91.
Hernandez-Boussard
T
,
Tamang
S
,
Blayney
D
,
Brooks
J
,
Shah
N
. 
New paradigms for patient-centered outcomes research in electronic medical records: an example of detecting urinary incontinence following prostatectomy
.
EGEMS (Wash DC)
2016
;
4
:
1231
.
92.
STAT
.
IBM's Watson recommended “unsafe and incorrect” cancer treatments
. 
2018
[cited 2019 Jun 13]. Available from
: https://www.statnews.com/2018/07/25/ibm-watson-recommended-unsafe-incorrect-treatments/.
93.
U.S. Food and Drug Administration. Developing a software precertification program: a working model
.
[cited 2019 Feb 11]. Available from
: https://www.fda.gov/downloads/MedicalDevices/DigitalHealth/DigitalHealthPreCertProgram/UCM605685.pdf.