We present NDEx 2.0, the latest release of the Network Data Exchange (NDEx) online data commons (www.ndexbio.org) and the ways in which it can be used to (i) improve the quality and abundance of biological networks relevant to the cancer research community; (ii) provide a medium for collaboration involving networks; and (iii) facilitate the review and dissemination of networks. We describe innovations addressing the challenges of an online data commons: scalability, data integration, data standardization, control of content and format by authors, and decentralized mechanisms for review. The practical use of NDEx is presented in the context of a novel strategy to foster network-oriented communities of interest in cancer research by adapting methods from academic publishing and social media. Cancer Res; 77(21); e58–61. ©2017 AACR.
NDEx 2.0 is the latest release of the Network Data Exchange (NDEx; refs. 1, 2) online resource (www.ndexbio.org), a framework in which users can store, share, access, and disseminate networks. In this article, we discuss workflows and methods used by NDEx in the context of its role as an online data commons and its mission to foster the emergence of network-centered communities of scientists involved in all aspects of disease biology, from basic research to personalized medicine.
Highlights of innovations in NDEx 2.0 include:
A novel, modular network data exchange standard, CX, developed in collaboration with the Cytoscape (3) project.
A total redesign of the NDEx server for scalability and input/output speed, supporting large communities, large networks, and high access rates.
A scoring system for network annotation combined with search engine prioritization of high-scoring networks, rewarding authors for compliance with best annotation practices.
"Network Sets," a facility for managing and publishing collections of networks.
In NDEx 2.0, every page in the website includes a search interface featuring a menu of search examples (see Supplementary Video S1). The interface conforms to standard query practices, enabling networks to be found by specific document attributes, such as labels, title, author, or description and by biological attributes, such as tissue or organism. Networks can also be found on the basis of the identifiers and names associated with network nodes (e.g., "TP53," "P04637," "ENSG00000141510," or "GO:0006915").
Because NDEx is an open scientific commons, finding networks presents a novel challenge: search would be more effective if networks were required to be copiously annotated using standard vocabularies, but NDEx must also promote the contribution of content by minimizing the burden on authors and respecting their decisions when designing networks. In response to the diversity of identifier systems and common aliases used in both networks and queries, we have improved search recall using domain-specific "term expansion," where NDEx preprocesses queries to add aliases for genes, proteins, and other entities. To encourage but not impose the addition of annotations and the use of standard vocabularies, we have implemented an NDEx server algorithm that evaluates each network for compliance with annotation best practices and provides a score as immediate feedback to the author. A higher annotation score rewards the author's network with a better ranking in search results, thus promoting "search engine optimization" for networks.
Every network in NDEx is assigned a stable, universally unique identifier (UUID) that can be used to access the network data or to link to a corresponding "network page" at the NDEx website (Fig. 1A). Authors are encouraged to create a separate network for each version, annotated with its version identifier, supporting reproducible access to specific data. Alternatively, a UUID may identify a dynamically updated public network, where the network essentially serves as an online database resource maintained by its authors, as is the case for the “RAS Machine” (4), described later in this article.
Scripts and other applications can find, download, create, or update NDEx networks using their UUIDs and a programmatic web access interface, a REST application programming interface (API; ref. 5). For example, when the public NDEx web application presents networks in either a graphic visualization or as tables of nodes and edges, it uses this API to retrieve the network data. The API can also be used to retrieve a small sample of a network, enabling the interface to handle NDEx networks too large for practical display.
All networks are initially private, accessible only to the individual scientist or organization that owns them. For prepublication collaboration, network owners can grant access to other NDEx users or groups of users via a permission management interface that can be reached from each network page. In NDEx 2.0, users can also generate a special "shareable URL" to grant access to a private network, whether or not the recipient is a registered NDEx user. A primary use of shareable URLs in NDEx is to streamline the submission and peer-review of networks supporting publications by providing editors and reviewers with instant, interactive, and anonymous access to the networks. Finally, users and organizations can make their networks publicly accessible for broad dissemination, can designate them as stable "read-only" resources, and can select them for display on their NDEx account homepage using the "showcase" facility.
When searching NDEx, users want to find networks that they deem credible, backed by evidence and informed analysis. This need is addressed by features that facilitate the review and recommendation of networks by researchers, following the successful models of academic peer review and of user reviews in Internet media. This strategy promotes the growth of NDEx by providing both motivation for authors and benefit to users. The introduction of "Network Sets" in NDEx 2.0 enables users to create and manage named collections of networks with stable UUIDs, making these collections destinations in their own right. Individuals and organizations can publish documented collections of networks, collections that can even include public networks owned by other users. An individual might choose to create a specialized collection such as "Cell adhesion pathways in hepatocytes," sharing their expert selections with the community. In the same way, organizations such as academic publishers or leading laboratories can publish sets of networks that they select in formal review processes, conferring recognition to the authors and helping researchers locate useful, well-founded networks with confidence.
An NDEx Workflow for Collaboration and Publication
The following example (Fig. 1B) describes a workflow that starts when a team of scientists uploads a network to NDEx directly from an analysis script. The network is initially private, invisible, and inaccessible to other users. The authors share it with a group of collaborators who use NDEx as a data hub as they analyze and improve the network. They import it into Cytoscape via the CyNDEx App (See Supplementary Video S1; ref. 6) that uses the same REST API used by the initial analysis script. They then add a layout, choose a graphic style, and use a heat diffusion algorithm to highlight a subnetwork of interest. Communication between NDEx and Cytoscape is mediated by the CX exchange format in which each "aspect" of the network is expressed in a distinct module. In this case, CX facilitates subsequent reuse of the network because the formatting of the network is separated from the scientific content. The improved network is reuploaded to NDEx and included in a Network Set supporting an upcoming publication. When the manuscript is submitted for publication, editors and reviewers anonymously access the Network Set via a shareable URL. On publication, the authors make their networks publicly accessible, allowing online journal readers to immediately view the networks, copy them to their private accounts, or use them in applications. Databases and websites can also reference the Network Set by its URL, providing alternative dissemination channels. The Cancer Cell Map Initiative (CCMI; refs. 7, 8), a multi-institution NCI center for systems biology, has adopted NDEx for these purposes, facilitating collaboration by researchers, data sharing between applications, and access to CCMI networks by other scientists.
The team of scientists also maintains a large, dynamically updated network of inferred molecular relationships, providing an up-to-date data resource without the need for a specialized web portal. The RAS Machine, a software agent developed under the DARPA Big Mechanism program (9), reads literature daily, adds new knowledge to its model of RAS signaling, and publishes the changes as an update to its public NDEx network.
Another collaborating institution then uses the dynamically updated network as input to their web-based genomic analysis tool, creating a channel from the team's output to the users of the application. The Cancer-Related Analysis of Variants Toolkit (10) is a web application using NDEx, accessing pathway networks for enrichment analysis of lists of mutated genes.
Conclusions and Future Work
NDEx 2.0 is a significant step in the evolution of the NDEx data commons, supporting its mission to foster network-centered communities of scientists. Recent NDEx innovations have addressed challenges of scalability, data integration, and support for annotation and review of networks. In future work, we will explore community building via Internet media techniques, such as user reviews of networks, recommendation algorithms, and in-system messaging. We will facilitate the association of digital object identifiers (11) with published networks, integrate network-authoring tools, and recruit disease and mechanism experts as authors and reviewers. We conclude by inviting readers to be pioneering users of NDEx as authors, reviewers, publishers, creators of communities of interest, and as developers of NDEx-enabled workflows, applications, and analysis pipelines.
Disclosure of Potential Conflicts of Interest
T. Ideker has ownership interest (including patents) in Data4Cure, Inc. and Ideaya BioSciences, Inc. and is a consultant/advisory board member for Ideaya Biosciences, Inc. and NIH NHGRI Council. No potential conflicts of interest were disclosed by the other authors.
Conception and design: D. Pratt, B. Demchak, T. Ideker
Development of methodology: D. Pratt, J. Chen, R. Pillich, A. Gary, B. Demchak
Writing, review, and/or revision of the manuscript: D. Pratt, R. Pillich, T. Ideker
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): V. Rynkov, B. Demchak
We thank Dan Carlin, other members of the Ideker laboratory, and the Sorger Laboratory at Harvard Medical School for their early adoption of the NDEx platform and their valuable feedback.
This work was supported by NIH ITCR U24 CA1884427 and DARPAW911NF-14-1-0397.