The World Cancer Research Fund (WCRF) International and the University of Bristol have developed a novel framework for providing an overview of mechanistic pathways and conducting a systematic literature review of the biologically plausible mechanisms underlying exposure–cancer associations. Two teams independently applied the two-stage framework on mechanisms underpinning the association between body fatness and breast cancer to test the framework feasibility and reproducibility as part of a WCRF-commissioned validation study. In stage I, a “hypothesis-free” approach was used to provide an overview of potential intermediate mechanisms between body fatness and breast cancer. Dissimilar rankings of potential mechanisms were observed between the two teams due to different applications of the framework. In stage II, a systematic review was conducted on the insulin-like growth factor 1 receptor (IGF1R) chosen as an intermediate mechanism. Although the studies included differed, both teams found inconclusive evidence for the body fatness–IGF1R association and modest evidence linking IGF1R to breast cancer, and therefore concluded that there is currently weak evidence for IGF1R as mechanism linking body fatness to breast cancer. The framework is a good starting point for conducting systematic reviews by integrating evidence from mechanistic studies on exposure–cancer associations. On the basis of our experience, we provide recommendations for future users. Cancer Epidemiol Biomarkers Prev; 26(11); 1583–94. ©2017 AACR.

Recently, there has been increased interest in studying the biological mechanisms underpinning well-known epidemiologic associations between exposures and diseases (1). Increased understanding of mechanistic pathways may impact disease prevention, early detection, and treatment. However, it remains a challenge to summarize the heterogeneous body of mechanistic evidence for a specific exposure–disease association. In particular, integrating results from human (i.e., epidemiologic), animal, and cell studies is difficult.

Systematic reviews aim at delivering an exhaustive summary of relevant literature for research questions and can thereby provide an overall assessment of the total body of evidence (2). There are well-established methodologies for reviews of epidemiologic data (3, 4). However, no guidelines for performing a systematic review of mechanistic evidence underlying well-known epidemiologic associations have been proposed previously.

The World Cancer Research Fund (WCRF) International/University of Bristol recently developed a methodology for conducting systematic reviews of biological mechanisms underpinning exposure–cancer associations, published in an article in the same issue of this journal (5). The two-stage methodology described by Lewis and colleagues offers an organized framework for providing an overview of potential mechanisms underlying an exposure–outcome association (stage I), and performing a systematic review of the evidence underlying one or more specific mechanisms including the integration of mechanistic evidence derived from human, animal, and cell studies (stage II). At the heart of their approach is the notion of “intermediate phenotypes” (IP), which are candidate “mechanisms” that may be linked to both exposure and outcome of interest.

Independently from the scientists who developed the framework, and independently from each other, two study teams set out to evaluate the feasibility and reproducibility of this novel framework by applying it to investigate biological mechanisms underlying the association of higher body fatness with an increased risk of postmenopausal breast cancer. In 2012, 10% of postmenopausal breast cancers, adding up to a total of 113,676 cases, were estimated to be attributable to high body mass index (BMI; ref. 6). The biological mechanisms underlying this association are not yet fully explained (7). Two multidisciplinary research teams, one from the German Cancer Research Center (DKFZ; Team A) and one from Maastricht University in the Netherlands (Team B), conducted both stages of the framework independently and without direct contact to investigate feasibility and reproducibility of the methodology. In stage I, the framework was applied to generate an overview of biological mechanisms (referred to according to the framework as IPs) underpinning the body fatness–breast cancer association. During stage II, a systematic literature review was conducted of the mechanistic literature on one specific IP, namely the insulin-like growth factor 1 receptor (IGF1R) as part of the proposed insulin-IGF hypothesis (7), in linking body fatness to breast cancer.

This manuscript describes and compares the implementation of the methodology and the results obtained by the two teams. Furthermore, we provide recommendations for future users of the framework to obtain more reproducible results and gain efficiency based on our experience.

The two teams conducted both stages of the framework independently following the guidelines published by Lewis and colleagues (5), and incorporated alternative approaches to improve the feasibility of the framework. Below the similarities and differences in the approaches of both teams, due to different interpretation of the framework instructions and suggested improvements, are described. During both stages, the focus was on breast cancer overall and not postmenopausal breast cancer specifically, as preliminary searches indicated that the term “postmenopausal” is not commonly used in mechanistic literature, in particular in animal and cell studies. Both teams deemed that restricting the search to only postmenopausal breast cancers would bias the results towards human studies.

Stage I

Similar approaches.

The aim of stage I was to summarize the literature on potential IPs linking body fatness (exposure) to breast cancer (outcome). Included search results were the overlap between exposure, IPs, and outcome (A+B+C+D; see Fig. 1A). Accordingly, search terms were combined for exposure and outcome (A+C), exposure and IPs (A+D), and IPs and outcome (A+B). Query strings for both teams are presented in Supplementary Material S1. The lists of search terms for the exposure, IPs, and outcome were compiled on the basis of published high quality reviews found by team A (7–11) and team B (7, 9, 12–17), and preliminary database searches providing additional relevant terms. According to the framework, Hanahan and Weinberg's article on the “Hallmarks of cancer” (18) was additionally used to identify search terms for IPs, enabling a broad and inclusive approach. Pathways that were considered included sex steroid hormones, growth factors, cytokines, inflammatory and metabolic markers, as well as potential mechanistic mediators impacting gene expression and DNA, and other terms related to cancer development. Searches were conducted in PubMed/MEDLINE and were restricted to MeSH terms, as MeSH terms are required for the Text Mining for Mechanism Prioritisation (TeMMPo; ref. 19) tool developed as part of the framework (5). TeMMPo was used to summarize exposure and IP and IP and outcome relationships within Sankey plots (Fig. 2A), and scores were assigned by TeMMPo to IPs indicating their importance as potential intermediate mechanism.

Figure 1.

Venn diagram illustrating literature searches: A, Venn diagram illustrating literature searches with IPs, exposure (E), and outcome (O) and their intersections (A, B, C and D). B, Team A: PubMed search in stage I. C, Team B: PubMed search in stage I.

Figure 1.

Venn diagram illustrating literature searches: A, Venn diagram illustrating literature searches with IPs, exposure (E), and outcome (O) and their intersections (A, B, C and D). B, Team A: PubMed search in stage I. C, Team B: PubMed search in stage I.

Close modal
Figure 2.

Visualization of the mechanisms found between body fatness and breast cancer in PubMed – Framework stage I: A, Team A - TeMMPo - Sankey plot for sex steroids, prolactin, and related factors and the related scores. B, Team B: Bubble chart showing visualization of top 20 mechanisms from search in PubMed based on adjusted score. For the first 10 mechanisms according to the adjusted scores, the numbers are indicated in the bubbles for clarity.

Figure 2.

Visualization of the mechanisms found between body fatness and breast cancer in PubMed – Framework stage I: A, Team A - TeMMPo - Sankey plot for sex steroids, prolactin, and related factors and the related scores. B, Team B: Bubble chart showing visualization of top 20 mechanisms from search in PubMed based on adjusted score. For the first 10 mechanisms according to the adjusted scores, the numbers are indicated in the bubbles for clarity.

Close modal

Different approaches.

Team B adjusted the scores obtained from TeMMPo, for the total number of records that were assigned the MeSH term of that IP within PubMed. This was done to adjust for certain MeSH headings being studied more often in general without necessarily being a more relevant IP. Therefore, Team B calculated an adjusted score by dividing the original score derived from TeMMPo by the total number of PubMed records assigned with that MeSH term. In addition, Team B proposed a novel way of visualizing results using Bubble charts showing the adjusted score, number of records relating IPs to the exposure, number of records relating IPs to the outcome, and balance in these numbers (Fig. 2B).

Stage 2

The aim of stage II was to conduct a systematic review of evidence linking body fatness to the IGF1R, and evidence linking the IGF1R to breast cancer. IGF1R expression and/or function were considered as the IP. This IP was selected given that preliminary searches indicated that this would result in a reasonable number of abstracts/articles to review within the allotted time (i.e., four months), and a potentially etiologically interesting IP within the insulin–IGF pathway (20).

Steps 1 & 2: specify the research question and searching for studies

Similar approaches.

Two subreviews were defined including systematic reviews of the mechanistic literature linking (i) body fatness to the IGF1R (exposure-IP) and (ii) the IGF1R with breast cancer (IP-outcome). Thus, two independent searches were conducted. The search terms and syntax of both teams are included in Supplementary Material S2.

Different approaches.

Team A searched PubMed (Jan 26, 2016), Web of Science (January 27, 2016), BIOSIS (January 27, 2016), and EMBASE (February 3, 2016). Standardized search terms were used as applied within the databases [PubMed: MeSH headings; Web of Science, BIOSIS, and EMBASE: Topic (i.e., TS=) or Disease (i.e., DS=)].

Team B conducted their search in MEDLINE (February 17, 2016) using both standardized MeSH headings and free text synonyms. Free text terms were included to retrieve the most recent literature, which had not been indexed with MeSH headings yet.

Step 3: inclusion and exclusion of studies

Similar approaches.

Both teams considered eligible human, animal, or cell studies (any languages and any design). Two reviewers from each team reviewed all potentially eligible studies independently. If any differences occurred, the two reviewers discussed differences to reach a consensus. Animal studies were classified into those mimicking human cancers or not, by evaluating whether the tumors were xenografts or actually developed in vivo, as recommended in the framework (for IGF1R–breast cancer sub-review only). Xenograft studies were assessed alongside cell studies (Step 9).

Different approaches.

Team A assigned individual inclusion and exclusion criteria to exposure, IP, and outcome:

  • Exposure (body fatness): articles investigating obesity, overweight, abdominal adiposity, waist–hip ratio, adipocytes, adipose tissue as the primary exposure, in either men or women, were included.

  • IP (IGF1R): articles investigating IGF1R expression and/or signaling measured by IHC staining for IGFIR or by RT-PCR for mRNA levels were included. Hybrid receptor combining insulin receptor and IGF1R as well as articles on IGF1R gene, methylation, or SNPs were excluded.

  • Outcome (breast cancer): articles investigating breast or mammary neoplasms, mammographic density, breast cancer cell growth, proliferation, differentiation, or apoptosis were included. Studies on breast cancer survival, prognosis, or recurrence were excluded, as well as studies investigating women treated for breast cancer before IGF1R measurement and studies on male breast cancer.

Further reasons for exclusion were: reverse causation implying either an effect from IGF1R on adipocytes or from breast cancer cells on IGF1R levels, cross-talk between IGF1R and other receptors, such as EGFR or leptin receptors, and additional proteins in signaling pathways, and IGF1R blockade for drug development.

After excluding duplicate papers across databases, the first selection was based on title and abstract in the records obtained from PubMed, Web of Science, and EMBASE (BIOSIS records were not screened due to time constraints). Subsequently, full-text articles were retrieved and sorted by study type/evidence stream (i.e., human experimental, human observational, animal, and cell studies). From this point, given time considerations, Team A restricted the remaining review process on articles identified via PubMed. Full-text review by two reviewers was done by study type and final decision for inclusion was made after full-text consideration. Available review articles on cell line studies were used to evaluate the evidence for IGF1R and breast cancer for that evidence stream.

Team B defined slightly different inclusion and exclusion criteria, in particular, that the molecular level of the IP needed to be IGF1R expression, activation, regulation, or protein abundance. Furthermore, for the IGF1R and breast cancer sub-review, studies needed to be performed in women (including patient-derived cell lines), female animal models, or commercially available female cell lines.

After excluding reviews and duplicate papers, articles were screened in three steps: title, abstract, and full-text screening. Each step was done independently by two reviewers who assigned three scores: inclusion, exclusion, or unclear. For title and abstract screening, spreadsheets were generated which only included the title, or title and abstract, respectively. All other information was deleted to avoid any influence of other characteristics such as author and journal names on the screening process. Kappa statistics were calculated to assess the level of agreement in scores between reviewers.

Steps 4 and 5: data extraction and risk of bias assessment

Similar approaches.

Both teams extracted data and assessed the quality of studies by study type. Appropriate tools for each study type were used to assess the risk of bias as recommended in the framework. Although the approaches of both teams differed slightly (described below), both applied to human observational studies the Critical Appraisal Skills Programme (CASP) Checklists of Case-Control and Cohort Studies (21), to animal studies the SYstematic Review Centre for Laboratory animal Experimentation (SYRCLE)'s risk of bias tool (22), and to cell studies the criteria provided in the framework.

Different approaches.

Team A developed a database to extract and assess included full-text articles. They combined Steps 4 and 5 with full-text review (Step 3) and developed a quantitative scoring system for the risk of bias assessment based on the 15 to 18 signaling questions provided by the framework (Supplementary Tables S1A and S1B) based on CASP and SYRCLE. Each signaling question was assigned with “yes” (1 point: low risk of bias) or “no” (0 points: high risk of bias, or not applicable). The average score for each study was computed by summing the points and dividing this by the number of signaling questions answered. On the basis of this score, a qualitative risk of bias was attributed: the lowest tertile with a high risk of bias, middle tertile with medium risk, and highest tertile with low risk. In addition, P values and the reported direction of association required for Step 6, and judgment of imprecision and indirectness required in Step 7 were assessed for each study. No randomized controlled trial (RCT) was included and therefore no tool was used for risk of bias assessment.

In addition, Team A classified the evidence of human studies as “indirect” when the IP was IGF1R mRNA or IGF1R protein levels measured in any blood component, or “direct” when the IP was observed in tissue. In animal studies, the specific guidance provided in the framework was followed to judge indirectness. In case of studies using transgenic mice or when the IP was IGF1R mRNA levels the evidence was considered as indirect.

In Team B, data extraction and risk of bias assessment was performed by the reviewers based on expertise (i.e., one reviewer evaluated human studies, while the other reviewer evaluated animal/cell studies). For human experiments, the Cochrane risk of bias tool for RCTs (23) was applied. For observational studies, the criteria were based on the CASP (21), as well as tools developed by the NIH (Bethesda, MD; ref. 24). All tools included 8–10 criteria, which were assigned as having a high, low, or unclear risk of bias. An overall assessment of risk of bias was assigned (low vs. high), in which studies were assigned a high risk of bias if they had ≥2 criteria with a high risk, or ≥4 criteria with an unclear risk. Otherwise, the study was assigned with a low risk of bias. For studies on body fatness and IGF1R, Team B classified as “direct” studies using a direct measure of body fatness such as BMI, while dietary intervention studies aiming to reduce body fatness or studies with closely related parameters, such as insulin levels, were classified as “indirect”. For studies on IGF1R and breast cancer, studies that investigated the occurrence/presence of breast cancer (cells) were classified as “direct”, while studies that used proxy measures such as proliferation markers were classified as “indirect.”

Step 6: synthesis of data from individual studies

Similar approaches.

The direction of the association was determined for each study as follows:

  • Negative association: for example, higher body fatness significantly linked to lower IGF1R expression, or higher IGF1R expression to lower breast cancer risk;

  • No association/inconsistent evidence: no significant association found between body fatness and IGF1R, or IGF1R and breast cancer; or if results were inconsistent if several experiments were performed;

  • Positive association: for example, higher body fatness linked to higher IGF1R expression, or higher IGF1R expression to higher breast cancer risk.

Different approaches.

Team A generated “Albatross plots” based on the beta coefficients and P values extracted in previous steps from included studies to visualise the heterogeneity across studies (25).

Team B adopted a different strategy due to the large heterogeneity across studies included, both in terms of types of studies as well as other characteristics within study types (e.g., intervention type or body fatness, IGF1R, or breast cancer assessment method, and study designs). Therefore, Team B provided a qualitative overview of their results in Harvest plots. Within these plots, the direction of association, overall risk of bias, and whether the evidence was assigned as direct or indirect was shown for each study. This was performed for each study type and sub-review (i.e., body fatness-IGF1R, and IGF1R-breast cancer). Cell studies and animal xenograft studies were both included in the Harvest plot, but their evidence was assessed separately (Step 9).

Steps 7 and 8: assessing the strength of the overall body of evidence within evidence streams and across evidence streams

Similar approaches.

According to the framework, both teams applied the Grading of Recommendations Assessment, Development and Evaluation (GRADE) assessment (26) to evaluate the quality of the body of evidence per evidence stream (i.e., human and animal studies) within each sub-review (body fatness–IGF1R, and IGF1R–breast cancer). An initial starting rating was assigned according to the number of studies and the study designs included (e.g., cross-sectional, prospective, RCT). The scale was: 4 for high, 3 for moderate, 2 for low, and 1 for very low quality evidence. The initial starting rating was then up- or downgraded based on the overall risk of bias, indirectness, inconsistency, imprecision, and publication bias, as determined in previous steps. Thereafter, the levels of evidence from human and animal studies in each sub-review were combined to provide a summary rating for the strength of overall evidence. Four strength-of-evidence categories were possible: strong, modest, weak, or inconclusive.

Step 9: synthesis of cell studies and xenograft animal studies

Similar approaches.

Both teams assessed the results and characteristics of included cell studies and animal xenograft studies to evaluate whether the results supported the biological plausibility of the IGF1R being involved as IP in the association of body fatness with breast cancer.

Different approaches.

Team A evaluated full-text cell line studies for body fatness and IGF1R, whereas available review articles were used to evaluate the strength of the evidence for IGF1R and breast cancer (27–29). This decision was based on the fact that cell line studies contributed minimally to the final conclusion, the high volume of cell studies (half of the full-text articles to be reviewed), and availability of excellent reviews.

Stage 1

Results both teams.

Venn diagrams with the number of articles retrieved from PubMed for both teams are shown in Fig. 1B and C.

Results Team A.

Sankey plots were created on the basis of the following thematic pathways: “Sex steroids, prolactin, and related factors”; “Cytokines; insulin, glucose, and related factors”; “Other proteins and immune factors;” “Lipids and lipid signaling;” “Oxidative stress and antioxidants;” “Targets related to cells, chromosomes, and genes;” “Targets related to DNA;” and “Mammography and obesity-related comorbidities”. As an example, the Sankey plot for ‘Sex steroids, prolactin, and related factors’ is shown in Fig. 2A. The top 5 mechanisms by pathway, and top 10 mechanisms overall are shown in Supplementary Table S2.

Results Team B.

The Sankey plot with results of Team B search is shown in Supplementary Fig. S1. In addition, a Bubble chart was generated showing the top 20 mechanisms based on the adjusted score (Fig. 2B). Team B retrieved information on the types of articles from PubMed (human studies, animal studies, or reviews), and described their distribution (Supplementary Fig. S2). Furthermore, Team B generated a graph for each IP, showing the number of published abstracts per year linking the IP to body fatness and to breast cancer. This allows investigation of the “popularity” of the IP with respect to the exposure and outcome over time (example for IP = IGF1R in Supplementary Fig. S3).

Stage 2

Steps 1 & 2: specify the research question and searching for studies

Results Team A.

A total number of 693 records [262 for body fatness–IGF1R (38%) and 431 for IGF1R–breast cancer (62%)] were retrieved from PubMed. In addition, 1,678, 1,549, and 1,290 records were retrieved from Web of Science, BIOSIS, and EMBASE, respectively. In BIOSIS, the majority of records were identified for body fatness to IGF1R (67%), whereas a higher percentage of references were found for IGF1R to breast cancer in the other databases (PubMed: 62%; Web of Science: 78%; EMBASE: 71%). The observed percentage duplicates between PubMed and EMBASE was 25%, PubMed and Web of Science 22%, and PubMed and BIOSIS 15%.

Results Team B.

A total number of 1,615 records [779 for body fatness–IGF1R (48%) and 836 for IGF1R–breast cancer (52%)] were retrieved from MEDLINE. After excluding reviews and duplicates, 703 articles for body fatness and IGF1R and 707 for IGF1R and breast cancer were left for downstream selection.

Step 3: inclusion and exclusion of studies

Results Team A.

Team A reviewed 2,523 records from PubMed, Web of Science, and EMBASE, and selected 15% (n = 379) for full-text review based on title and abstract (Fig. 3A). The reviewers disagreed on inclusion for approximately 15% of records. In case of disagreement, the reviewers reached consensus through discussion. When an abstract did not include enough information to allow a consensus decision, it was retained for full-text review. The remainder of the review process was restricted to the 181 full-text articles identified via PubMed. A total of 115 full-text articles were reviewed, of which 70 were included in the review among which 4 in both subreviews.

Figure 3.

Flow diagram of record identification and screening phases, eligibility assessment, and number of included articles [according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses: the PRISMA statement (32)]: A, Flow diagram for Team A; B, Flow diagram for Team B (the thickness of the blue bars is proportional to the number of studies present in each step of the screening process). Abbreviations in the overall figure: E → IP, exposure to mechanism (body fatness and IGF1R); IP → O, mechanism to outcome (IGF1R and breast cancer).

Figure 3.

Flow diagram of record identification and screening phases, eligibility assessment, and number of included articles [according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses: the PRISMA statement (32)]: A, Flow diagram for Team A; B, Flow diagram for Team B (the thickness of the blue bars is proportional to the number of studies present in each step of the screening process). Abbreviations in the overall figure: E → IP, exposure to mechanism (body fatness and IGF1R); IP → O, mechanism to outcome (IGF1R and breast cancer).

Close modal

Results Team B.

Team B reviewed 1,410 titles, 404 abstracts, and 60 full-text articles (Fig. 3B). A fair to moderate agreement between reviewers was observed at the title (body fatness–IGF1R, kappa: 0.37; IGF1R–breast cancer, kappa: 0.23), abstract (0.44; 0.34), and full-text screening (0.25; 0.25). The majority of disagreement was due to one reviewer assigning either exclude or include, and the other reviewer assigning unclear. When looking at the “true” disagreement (i.e., one reviewer assigned include and other exclude), the percentage ranged through different screening steps from 0% to 18% of the total number of records screened. A total of 60 full-text articles were reviewed, of which 35 articles were included in the review, of which 3 in both subreviews and 2 articles included two study types (e.g., human experiment and animal study).

Steps 4 and 5: data extraction and assessment of risk of bias

Results both teams.

Supplementary Table S3 shows the type of studies included in the review by both teams, by sub-review (body fatness–IGF1R, IGF1R–breast cancer). Overall, 14 articles were included by both teams (Supplementary Table S4). For the articles only included by one of the two teams, 20 were not identified by the initial search algorithm, 17, 28, and 7 were excluded after title, abstract, and full-text review, respectively, and 1 article was identified in PubMed but not read by Team A because it was a cell line study included in the sub-review of IGF1R and breast cancer.

Both teams had similar reasons for assigning a high risk of bias to studies. For human studies, an assigned high risk of bias was generally due to having no adjustment for potential confounding, a limited number of participants included, and lack of objective measurement of body fatness (i.e., self-report). Most animal studies had a high risk of bias, due to items being not reported including sequence generation, baseline characteristics, allocation concealment, and blinding. In addition, none of the cell studies explicitly stated to have grown cells in 3D, or that cell lines were authenticated, which always led to a higher assigned risk of bias.

Results Team A.

Supplementary Table S5A–S5E show the extracted main characteristics and risk of bias assessment of included studies. Out of the 66 articles included, 19% of the human studies and 38% of the animal studies were rated as having a high risk of bias.

Results Team B.

The main characteristics and risk of bias assessment of studies included are shown in Supplementary Table S5F–S5M. Overall, from the list of 35 articles included, 41% of the human studies and all of the animal studies were rated as having a high risk of bias.

Step 6: synthesis of data from individual studies

Results both teams.

Substantial heterogeneity across studies was observed by both teams. Variables contributing to this heterogeneity included age at data collection, age at diagnosis, duration of follow-up, effect modifiers such as exogenous hormone use and anthropometrics, and specific clinical groups (e.g., by hormone receptor status, or in situ vs. invasive disease). Sex was a potential contributor to heterogeneity in the context of studies on body fatness and IGF1R, as both sexes were included (only females in IGF1R–breast cancer studies).

Results Team A.

The Albatross plots (Fig. 4A) indicated heterogeneity in the observed associations across studies. In the plot of IGF1R and breast cancer, only human studies indicated a positive association (right-hand side of the plot), while other study types showed inconsistent results. Some studies could not be included in the plots as their outcome was not quantitative.

Figure 4.

Illustration of heterogeneity across studies: A, Team A: Albatross plots illustrating heterogeneity of the associations observed across included studies. B, Team B: Harvest plots showing an overview of all studies included in the two subreviews, by study type, risk of bias (high: red bar; low: blue bar) and classification as direct (long bar) or indirect (short bar). Abbreviations in the overall figure: E → IP, exposure to mechanism (body fatness and IGF1R); IP → O, mechanism to outcome (IGF1R and breast cancer). * SNP studies were classified as both positive and negative association as the function of the SNP was not reported. Thus the direction of association could not be determined. Xenograft study.

Figure 4.

Illustration of heterogeneity across studies: A, Team A: Albatross plots illustrating heterogeneity of the associations observed across included studies. B, Team B: Harvest plots showing an overview of all studies included in the two subreviews, by study type, risk of bias (high: red bar; low: blue bar) and classification as direct (long bar) or indirect (short bar). Abbreviations in the overall figure: E → IP, exposure to mechanism (body fatness and IGF1R); IP → O, mechanism to outcome (IGF1R and breast cancer). * SNP studies were classified as both positive and negative association as the function of the SNP was not reported. Thus the direction of association could not be determined. Xenograft study.

Close modal

Results Team B.

Figure 4B shows two Harvest plots providing an overview of included studies with observed associations by study type, overall risk of bias (high vs. low), and classification as direct or indirect. These plots show that overall the evidence is stronger for a positive association of IGF1R with breast cancer; in particular, there were 5 human observational studies with a low risk of bias showing a significant positive association and the evidence from the included 8 animal studies was also consistent for this association. There were fewer studies on the association of body fatness with IGF1R and the evidence was less consistent.

Steps 7 and 8: assessing the strength of the overall body of evidence within evidence streams and across evidence streams

Results Team A.

Team A reviewed a total of nine human observational studies for the body fatness–IGF1R association. The starting rating was low quality, as these were all cross-sectional (GRADE assessment tables in Supplementary Table S6A–S6D). The rating was downgraded due to concerns about indirectness and imprecision bringing the final rating to very low quality of evidence from human studies. Team A also assessed a total of eight animal studies. The starting rating was low quality as only one was an RCT and few studies were included overall. The rating was downgraded due to concerns about bias, indirectness, imprecision, and publication bias. Their final rating was that there was a very low quality of evidence from animal studies. On the basis of this and the relatively small number of studies included, Team A concluded that there was inconclusive evidence for a link between body fatness and IGF1R across evidence streams.

Team A reviewed a total of 33 human observational studies for the IGF1R–breast cancer association. The starting rating was low quality; although a higher number of studies were included, these were all cross-sectional. The starting rating was not downgraded or upgraded and therefore the final rating remained a low quality of evidence from human studies. Team A also assessed a total of eight animal studies. The starting rating was low quality as there were no RCTs and studies were predominantly cross-sectional. The rating was downgraded due to concerns about indirectness and publication bias, and upgraded due to strength of association and minimal concerns of bias/confounding. Their final rating was that there was a low quality of evidence from animal studies. There was a relatively high volume of human studies (n = 33) but fewer animal studies, none of which were RCTs or high quality observational studies. The majority of these studies observed a positive association between IGF1R and breast cancer. Therefore, Team A concluded that there was modest evidence linking IGF1R to breast cancer across evidence streams.

When integrating the evidence across these streams, Team A concluded that the overall evidence of the IGF1R being involved as a mechanism underlying the association of body fatness with breast cancer was weak.

Results Team B.

For the body fatness–IGF1R subreview, Team B reviewed six human studies, of which one was a nonrandomized experiment and five of them were observational studies. The starting rating was low due to not having a randomized study and further downgraded due to the high risk of bias, inconsistency in the evidence and indirectness of body fatness measures (GRADE tables in Supplementary Table S6E–S6H). There were six animal studies, which were randomized; therefore the starting rating was moderate. However, due to the high risk of bias, inconsistency, and indirectness of the quality of evidence the rating was downgraded. After assessing the evidence from both human and animal studies, Team B concluded that quality of evidence linking body fatness to IGF1R was low.

For the IGF1R–breast cancer sub-review, Team B reviewed a total of 13 human studies, of which 12 were observational studies and one was a nonrandomized experiment. The starting grading was low due to the lack of randomized studies and this was not upgraded or downgraded. This resulted in a moderate quality of evidence from human studies. On the basis of six animal studies included (excluding two that were xenograft studies), the initial rating was moderate for animal studies but further downgraded due to the limitations in the study design and high risk of bias. The conclusion from the animal studies was that there was low quality of evidence linking IGF1R to breast cancer. After assessing the evidence from both human and animal studies, Team B concluded that quality of evidence IGF1R to breast cancer was moderate.

When integrating the evidence across these streams, Team B concluded that the overall evidence of the IGF1R being involved as a mechanism underlying the association of body fatness with breast cancer was weak.

Step 9: synthesis of cell studies and xenograft animal studies

Results Team A.

The included cell studies provided a low quality of evidence for the body fatness–IGF1R association. Therefore, the conclusion of “inconclusive evidence” (Step 8) remained unchanged. Comprehensive review articles were used to judge the evidence linking the IGF1R to breast cancer. These reviews presented sufficient evidence of biologic plausibility to support the conclusion that there was modest evidence linking IGF1R to breast cancer (similar to conclusion in Step 8; refs. 28–30).

Results Team B.

In total, five cell studies were included for the body fatness–IGF1R sub-review, and two cell studies for IGF1R and breast cancer (Fig. 4B). In addition, there were two xenograft studies included for IGF1R and breast cancer which were also assessed during this step. Nearly all of the cell studies on body fatness and IGF1R (4 of 5) had an overall low risk of bias, but results of these studies were inconsistent (three studies observed a positive and two observed a negative association). For IGF1R and breast cancer, the included two cell studies and two xenograft studies both observed a positive association. Because the included number of cell studies was low and because inconsistent associations were observed for body fatness and IGF1R, Team B concluded that these studies confirmed the conclusion drawn during Step 8 that there is a low quality of evidence for the IGF1R being involved in the body fatness–breast cancer association.

In this manuscript, we describe the results of a comparative study investigating the feasibility and reproducibility of the systematic review framework developed by WCRF/University of Bristol (5). While the two teams followed the same steps outlined in the framework and reached similar conclusions at the end of stage 2, the methodology used for stage 1 and the articles included in the systematic review (stage 2) differed. These differences can be attributed to differences in expertise and research background of team members (stage 1), search terms (MeSH and free text) used for the database searches and inclusion and exclusion criteria applied (stage 2). These characteristics may influence results when applying the methodology to other research questions in the future, and could lead to different conclusions being reached. Below we describe the strengths, feasibility, reproducibility, and utility of the framework and provide recommendations for future users.

Strengths

The framework provides a highly formalized process for a systematic review of the mechanistic literature underlying exposure–cancer associations (stage 2), with the aim of an objective appraisal. The systematic, step-wise approach integrating mechanistic evidence from human, animal, and cell studies, allows investigators to evaluate the strength of evidence of a mechanism underlying an exposure–cancer association, taking into consideration both epidemiologic and molecular evidence. Previous approaches for systematic literature review (3, 4) primarily focused on exposure–outcome relationships, rather than the underlying mechanistic pathways. Furthermore, stage I offers a unique and useful approach for identification of potential novel mechanistic pathways to be reviewed in stage II; this is an important contribution of the framework. Overall, the framework offers a novel systematic strategy to identify, appraise, and synthesize the literature on biological mechanisms.

Feasibility

Some challenges encountered during the review process may impede the feasibility of the framework for future research questions. The review topic presented here was initially intended to focus on postmenopausal breast cancer only; however, both teams concluded that restricting both stage I and stage II to “postmenopausal” breast cancer was not feasible in the context of the mechanistic literature. This example is specific to this case, but similar hindrances may occur for other research questions. IGF1R as the IP was sufficiently narrow to allow testing of the framework over a four-month period for stage II. However, evaluating the role of IGF1R expression/signaling without fully considering the insulin and IGF pathway was restrictive content-wise. Therefore, the time required should be clearly defined and studies should be carefully planned in advance with sufficient personnel and expertise. Furthermore, this narrow pathway focusing on IGF1R did not allow a review of effect modification or differences across important clinical subgroups (e.g., by hormone receptor status). Restricting this review to PubMed, likely resulted in the exclusion of relevant publications. Although integrating results from multiple databases as recommended by the framework would result in a more comprehensive review, the amount of literature retrieved may be prohibitive. In addition, standardized approaches to remove duplicates between databases should be thought out ahead given the fact that standard article identifiers such as PMID are not referenced in all databases. Furthermore, cell line studies represented 50% of the literature retained for full-text review and they are only considered at the last step of the framework, and only then as a confirmatory step to evaluate agreement with the conclusions drawn based on human and animal studies (biological plausibility). Given the minimal impact on final conclusions and the large volume of cell studies likely to be found for most research questions, we suggest using high-quality reviews including cell studies and descriptions of molecular mechanisms, if available.

Reproducibility

According to our experience, some points should be taken into consideration to ensure the reproducibility of review conducted using the framework (stage II). Considerable subject-area expertise is necessary to characterize potential pathways, determine inclusion and exclusion criteria, and to ensure the comprehensiveness and completeness of these reviews. The substantial differences between the numbers of studies reviewed by the two teams were the result of decisions made depending on investigators' expertise, and might therefore change with accumulated knowledge or if other investigators are included in the study team. It has to be noted that Team A included two reviewers with similar background (i.e., epidemiology with prior experience in either biological sciences or nutrition), whereas Team B included two reviewers with different type of expertise (one epidemiologist with expertise in nutrition and cancer and one bioinformatician with expertise in animal and cell studies). Accordingly, Team A reported a higher level of agreement, whereas Team B had lower concordance between reviewers. We think that differences based on study team members are inevitable to some extent, but that for the reproducibility it is essential that multidisciplinary teams with expertise in the exposure and outcome of interest, and across different study types (i.e., cell line, animal, human) are assembled. In addition, it would be advisable to host a training session on the framework methods before initiation of the process. To further enhance reproducibility, we highly recommend the development of an online tool integrating different steps of the framework (i) searching (different databases), (ii) data extraction (for each type of study), (iii) visualization, and (iv) quality assessment steps. Such a system with built-in version control for all of its components, would allow the review process to be saved and reported in a standardized manner along with the results and the conclusions of the review.

Utility

The largely “hypothesis-free” approach offered in stage I, in which evidence on all possible mechanisms are included as candidate intermediate mechanisms, is an innovative contribution but also not entirely hypothesis-free. While many generic mechanisms (e.g., inflammation) may be included, additional candidate mechanisms are selected by the research teams based on prior knowledge and literature (e.g., “estrogens” as a potential intermediate in this study). Therefore, the selection of potential intermediate pathways is, at least in part, hypothesis-driven, based on previous studies and expertise of the research team. Second, the studies identified in the stage I searches would almost exclusively have been conducted in a hypothesis-driven manner. Given these observations, whether it is possible to identify intermediates in stage I in a completely hypothesis-free manner could be questioned. Nevertheless, the visualization tool (TeMMPo) developed for the framework is an asset for future studies. However, future users should be aware that the reliance on MeSH terms may present a limitation, depending on the research question at hand.

In our view, some evaluation criteria provided in the framework may be too stringent. Reviewers are advised to evaluate different streams of evidence using the GRADE assessments, with a large emphasis on the results of human RCTs. Although this is understandable from a methodologic point of view, RCTs in humans are not often conducted to describe mechanistic pathways. Therefore, a low number of RCTs will generally lead to a direct underrating of the evidence in human studies for many (mechanistic) research questions of interest within the GRADE approach. Nevertheless, although we cannot quantitatively compare the quality of evidence ratings for individual studies between the two teams as different articles were included by each team, the use of GRADE assessment was straightforward and provided a standardized and reliable way of assessing the quality of evidence. At the same time, although the SYRCLE risk of bias assessment criteria for animal studies are very relevant to the review of mechanistic studies, in our evaluations none of the studies reached these standards. Therefore, given the strict evaluation of animal studies, currently they are unlikely to contribute significantly to the final assessment. This also illustrates the need for better reporting of methodological issues in animal studies (31).

Associations between many biologic pathways and lifestyle factors, and lifestyle factors and disease etiology, are already characterized and described in comprehensive reviews [e.g., body fatness and postmenopausal breast cancer, with sex steroid hormones as the IP (13)]. Therefore, which mechanisms merit a systematic review via the framework, and why, should be given a careful consideration. Moreover, systematic guidelines could be established to indicate the type of hypothesis the framework can be applied. In that regard, narrow intermediates to one component of a pathway, such as IGF1R, may lead to low confidence conclusions due to lack of RCTs (as described above), whereas broader intermediates should be carefully considered in terms of the feasibility with respect to available time and resources. The challenge is defining the mechanistic review approach for each research question that best balances objectivity, comprehensiveness, feasibility, and ability to be documented. When employing the framework, for time efficiency, we propose to begin with the IP to outcome review in stage II, and to evaluate the exposure to IP pathway only if there is sufficient evidence linking the IP to outcome. This prevents the potential extra time spent on simultaneously reviewing exposure to IP pathway where there is not sufficient evidence linking IP to outcome. We also propose the results of this process to be published (or made publicly available) to allow other researchers in the field to benefit, even when there is not sufficient evidence linking the IP to outcome.

Recommendations for future users.

On the basis of our joint experience and discussion, we have developed guidelines summarizing our recommendations for the future users of the framework (Fig. 5). Below, we provide specific recommendations for stage I and stage II.

Figure 5.

Comparative overview of steps of stage I and II with differences between teams (boxes) and recommendations for future users of the framework Abbreviations: E, exposure (i.e., body fatness); IP, intermediate phenotype (i.e., IGF1R); O, outcome (i.e., Breast cancer); RCTs, randomized control trials. * Extract also: P value and direction of the association (Step 6), imprecision and indirectness (Step 7) data in Step 4—examples provided in Supplementary Table S7A–S7D.

Figure 5.

Comparative overview of steps of stage I and II with differences between teams (boxes) and recommendations for future users of the framework Abbreviations: E, exposure (i.e., body fatness); IP, intermediate phenotype (i.e., IGF1R); O, outcome (i.e., Breast cancer); RCTs, randomized control trials. * Extract also: P value and direction of the association (Step 6), imprecision and indirectness (Step 7) data in Step 4—examples provided in Supplementary Table S7A–S7D.

Close modal
Stage I.

Researchers may consider using the adjusted score and bubble plots, next to the Sankey plots and score provided by the TeMMPo tool. We also recommend to carefully interpret the results of stage I in deciding whether identified IPs could indeed be intermediate mechanisms linking exposure to outcome (e.g., a factor that is influenced by both the exposure and outcome can also appear within the results of Stage I). Finally, TeMMPo is reliant on MeSH headings. Investigators must evaluate the quality of the MeSH indexing for their research question (e.g., MeSH term “Pediatric Obesity” was introduced in 2014, with 21 studies indexed with this term prior to 2013, and 3274 articles indexed with this term subsequently from 2013 to January 2017).

Stage 2.

We recommend to include experts for each study type in the review team, as in the IARC monograph methodology (4). Time and expertise requirements must be carefully considered, and could be based on preliminary database searches. In the experience of the two teams, implementing the framework is feasible for narrowly defined mechanistic components, as considered here, but may be too time-consuming for larger pathways and/or research questions. For efficiency reasons, we recommend combining data extraction and quality assessment (Steps 4 and 5) as well as simultaneously extracting the data needed in Steps 6 and 7 (e.g., P values, direction of associations, etc.). We also recommend that standardized forms are used for data extraction; we have provided the tools used in our review process in the Supplementary Table S7A–S7D. During data synthesis (Step 6), we recommend to use Harvest plots in addition to the Albatross plots suggested by the framework for integrating all study types in one graph. While a disadvantage of Harvest plots may be that sample sizes and/or effect sizes cannot be taken into account, these plots enable the inclusion of all studies which is particularly useful for a heterogeneous body of evidence, and also provide the opportunity to visualize other information such as risk of bias and/or indirectness of individual studies. After the GRADE assessment, depending on the available time and resources, high-quality reviews (if available) could be used to assess the input from cell studies.

In conclusion, our study investigated the feasibility and reproducibility of the WCRF/University of Bristol framework and evaluated its utility. Two teams independently applied both stages of the framework. We conclude that, given appropriate focus in the research question and reviewers with time and expertise in relevant areas, the framework is a good starting point for opening up the discussion on how to integrate evidence from mechanistic studies on exposure–cancer associations. The recommendations for future users provided here aim to improve the efficiency and standardization of the process.

No potential conflicts of interest were disclosed.

The authors would like to thank the developers of the WCRF International/University of Bristol Framework for systematic reviews for the professional communication during the project.

This work was supported by funding from grants RFA2015/1435 and RFA2015/1436 obtained from World Cancer Research Fund (WCRF UK), as part of the WCRF International grant programme. G. Ertaylan and E.H. van Roekel were supported by grant RFA2015/1435. C. Le Cornet, A. Jung, and A. Damms-Machado were supported by grant RFA2015/1436. M.J.L. Bours is supported by a grant from Kankeronderzoekfonds Limburg (part of Health Foundation Limburg; grant no. 00005739). This project/research has been made possible with support of the Dutch Province of Limburg.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Vineis
P
,
Perera
F
. 
Molecular epidemiology and biomarkers in etiologic cancer research: the new in light of the old
.
Cancer Epidemiol Biomarkers Prev
2007
;
16
:
1954
65
.
2.
Larsen
PO
,
von Ins
M
. 
The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index
.
Scientometrics
2010
;
84
:
575
603
.
3.
World Cancer Research Fund
. 
Continous update project findings & reports
.
Available from
: http://www.wcrf.org/int/research-we-fund/continuous-update-project-findings-reports.
4.
International Agency for Research on Cancer
. 
Preamble to the IARC monographs (amended January 2006)
; 
2006
.
Available from
: http://monographs.iarc.fr/ENG/Preamble/.
5.
Lewis
SJ
,
Gardner
M
,
Higgins
J
,
Holly
JMP
,
Gaunt
TR
,
Perks
CM
, et al
Developing the WCRF International/University of Bristol methodology for identifying and carrying out systematic reviews of mechanisms underpinning exposure-cancer associations
.
Cancer Epidemiol Biomarkers Prev
2017
;
26
:
1678
86
.
6.
Arnold
M
,
Pandeya
N
,
Byrnes
G
,
Renehan
PAG
,
Stevens
GA
,
Ezzati
PM
, et al
Global burden of cancer attributable to high body-mass index in 2012: a population-based study
.
Lancet Oncol
2015
;
16
:
36
46
.
7.
Renehan
AG
,
Zwahlen
M
,
Egger
M
. 
Adiposity and cancer risk: new mechanistic insights from epidemiology
.
Nat Rev Cancer
2015
;
15
:
484
98
.
8.
Byers
T
,
Sedjo
RL
. 
Body fatness as a cause of cancer: epidemiologic clues to biologic mechanisms
.
Endocr Relat Cancer
2015
;
22
:
R125
34
.
9.
Calle
EE
,
Kaaks
R
. 
Overweight, obesity and cancer: epidemiological evidence and proposed mechanisms
.
Nat Rev Cancer
2004
;
4
:
579
91
.
10.
Rose
DP
,
Gracheck
PJ
,
Vona-Davis
L
. 
The interactions of obesity, inflammation and insulin resistance in breast cancer
.
Cancers
2015
;
7
:
2147
68
.
11.
Christopoulos
PF
,
Msaouel
P
,
Koutsilieris
M
. 
The role of the insulin-like growth factor-1 system in breast cancer
.
Mol Cancer
2015
;
14
:
43
.
12.
Boonyaratanakornkit
V
,
Pateetin
P
. 
The role of ovarian sex steroids in metabolic homeostasis, obesity, and postmenopausal breast cancer: molecular mechanisms and therapeutic implications
.
BioMed Res Int
2015
;
2015
:
140196
.
13.
Cleary
MP
,
Grossmann
ME
. 
Obesity and breast cancer: the estrogen connection
.
Endocrinology
2009
;
150
:
2537
42
.
14.
Ford
NA
,
Devlin
KL
,
Lashinger
LM
,
Hursting
SD
. 
Deconvoluting the obesity and breast cancer link: secretome, soil and seed interactions
.
J Mammary Gland Biol Neoplasia
2013
;
18
:
267
75
.
15.
Kruk
J
. 
Overweight, obesity, oxidative stress and the risk of breast cancer
.
Asian Pac J Cancer Prev
2014
;
15
:
9579
86
.
16.
Strong
AL
,
Burow
ME
,
Gimble
JM
,
Bunnell
BA
. 
Concise review: the obesity cancer paradigm: exploration of the interactions and crosstalk with adipose stem cells
.
Stem Cells
2015
;
33
:
318
26
.
17.
Chlebowski
RT
,
Anderson
GL
. 
Menopausal hormone therapy and breast cancer mortality: clinical implications
.
Ther Adv Drug Saf
2015
;
6
:
45
56
.
18.
Hanahan
D
,
Weinberg
RA
. 
Hallmarks of cancer: the next generation
.
Cell
2011
;
144
:
646
74
.
19.
University of Bristol
. 
TeMMPo: Text mining for mechanism prioritisation
; 
2016
.
Available from
: https://www.temmpo.org.uk/.
20.
KEGG
. 
Insulin-IGF pathway
; 
2016
.
Available from
: http://www.genome.jp/kegg-bin/show_pathway?hsa04910+3643.
21.
Critical Appraisal Skills Programme (CASP)
; 
2013
.
Available from
: http://www.casp-uk.net/casp-tools-checklists.
22.
Hooijmans
CR
,
Rovers
MM
,
de Vries
RB
,
Leenaars
M
,
Ritskes-Hoitinga
M
,
Langendam
MW
. 
SYRCLE's risk of bias tool for animal studies
.
BMC Med Res Methodol
2014
;
14
:
43
.
23.
Higgins
JPT
,
Altman
DG
,
Gøtzsche
PC
,
Jüni
P
,
Moher
D
,
Oxman
AD
, et al
The Cochrane Collaboration's tool for assessing risk of bias in randomised trials
.
BMJ
2011
;
343
:
d5928
.
24.
Study Quality Assessment Tools
. 
Quality assessment of observational cohort and cross-sectional studies and quality assessment of case-control studies
.
Available from
: http://www.nhlbi.nih.gov/health-pro/guidelines/in-develop/cardiovascular-risk-reduction/tools.
25.
Harrison
S
,
Jones
HE
,
Martin
RM
,
Lewis
S
,
Higgins
JPT
. 
The Albatross plot: a novel graphical tool for presenting results of diversely reported studies in a systematic review
.
Res Synth Methods.
2017 Apr 28
. [Epub ahead of print].
26.
Guyatt
GH
,
Oxman
AD
,
Vist
GE
,
Kunz
R
,
Falck-Ytter
Y
,
Alonso-Coello
P
, et al
GRADE: an emerging consensus on rating quality of evidence and strength of recommendations
.
BMJ
2008
;
336
:
924
6
.
27.
Pollak
MN
,
Schernhammer
ES
,
Hankinson
SE
. 
Insulin-like growth factors and neoplasia
.
Nat Rev Cancer
2004
;
4
:
505
18
.
28.
Eroles
P
,
Bosch
A
,
Pérez-Fidalgo
JA
,
Lluch
A
. 
Molecular biology in breast cancer: intrinsic subtypes and signaling pathways
.
Cancer Treat Rev
2012
;
38
:
698
707
.
29.
Farabaugh
SM
,
Boone
DN
,
Lee
AV
. 
Role of IGF1R in breast cancer subtypes, stemness, and lineage differentiation
.
Front Endocrinol
2015
;
6
:
59
.
30.
Pollak
MN
. 
Insulin-like growth factors and neoplasia
.
Novartis Found Symp
2004
;
262
:
84
107
.
31.
Morrissey
B
,
Blyth
K
,
Carter
P
,
Chelala
C
,
Jones
L
,
Holen
I
, et al
The sharing experimental animal resources, coordinating holdings (SEARCH) framework: encouraging reduction, replacement, and refinement in animal research
.
PLoS Biol
2017
;
15
:
e2000719
.
32.
Moher
D
,
Liberati
A
,
Tetzlaff
J
,
Altman
DG
,
PRISMA Group
. 
Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement
.
PLoS Med
2009
;
6
:
e1000097
.