Abstract
Background: We report the development of a cutaneous melanoma risk algorithm based upon seven factors; hair color, skin type, family history, freckling, nevus count, number of large nevi, and history of sunburn, intended to form the basis of a self-assessment Web tool for the general public.
Methods: Predicted odds of melanoma were estimated by analyzing a pooled dataset from 16 case–control studies using logistic random coefficients models. Risk categories were defined based on the distribution of the predicted odds in the controls from these studies. Imputation was used to estimate missing data in the pooled datasets. The 30th, 60th, and 90th centiles were used to distribute individuals into four risk groups for their age, sex, and geographic location. Cross-validation was used to test the robustness of the thresholds for each group by leaving out each study one by one. Performance of the model was assessed in an independent UK case–control study dataset.
Results: Cross-validation confirmed the robustness of the threshold estimates. Cases and controls were well discriminated in the independent dataset [area under the curve, 0.75; 95% confidence interval (CI), 0.73–0.78]. Twenty-nine percent of cases were in the highest risk group compared with 7% of controls, and 43% of controls were in the lowest risk group compared with 13% of cases.
Conclusion: We have identified a composite score representing an estimate of relative risk and successfully validated this score in an independent dataset.
Impact: This score may be a useful tool to inform members of the public about their melanoma risk. Cancer Epidemiol Biomarkers Prev; 24(5); 817–24. ©2015 AACR.
Introduction
Cutaneous melanoma continues to increase in incidence in white populations, especially in Europe (1). There is evidence of stabilization in incidence rates in some countries within Europe (in Lithuania, Estonia, Slovakia, Ireland, and Scotland) and notably outside Europe, in Israel, Australia, New Zealand, the United States, and Canada, showing a marked cohort effect (1). The large increase in incidence over the past 50 years and more recent stabilization of rates in several countries suggest that behavioral factors, probably related to sun exposure, underlie the cohort-based trends. These observations reinforce the view that change in sun-related behavior remains a desirable aim for melanoma prevention (2).
Red hair, freckling, and skin reported to burn rather than tan are unequivocally associated with increased risk for melanoma in meta-analyses and pooled data analyses (3, 4). A more potent phenotypic risk factor is the presence of many melanocytic nevi and large (or clinically atypical) nevi, as confirmed by meta-analyses (5) and pooled-data analyses (6). These phenotypes are strongly genetically determined, and genes associated both with nevus phenotype and pigmentation have been shown to be associated with melanoma risk in genome-wide association studies (7). It is therefore not surprising that family history of melanoma, defined as melanoma in a first degree relative, also has been consistently identified as a risk factor (3).
Although these phenotypic risk factors are well described in the medical literature, it is important that the general public is able to extrapolate an understanding of their own risk from the existing published evidence. We constructed a risk tool to allow individuals to assess their lifetime risk of melanoma that will benefit both those at average as well as at increased risk relative to the underlying risk in their population, using variables that can be reliably self-reported by members of the general public. The aim of this analysis was to construct this risk algorithm based on our previous pooled data analyses of melanoma case–control studies performed at different latitudes (4, 6) and then to test the algorithm in an independent UK melanoma case–control study (8, 9).
Materials and Methods
We carried out a pooled data analysis of melanoma case–control datasets from studies conducted in Europe (temperate climate), North America (temperate and warmer climate), Australia, and Hawaii (hotter climate) in the period 1979 to 1999. Previous analyses of these data are presented in two articles on sun-exposure patterns (4) and the nevus phenotype (6) associated with risk. These two articles comprehensively described the approaches taken to pooled data analysis. Because our model is designed to underpin a risk tool for public usage, we considered only variables that were deemed self-reportable by members of the public, even though other variables had been shown to be significantly associated with risk of melanoma (e.g., solar keratoses) and our model may have had better predictive value with their inclusion. Genetic data were excluded for the same reason. A summary is provided here.
Data collection
Eligible studies were identified first as those reported in a systematic meta-analysis conducted by Gandini and colleagues containing analyses conducted before 2002 (3, 5, 10). Second, studies conducted between 2002 and 2007 were identified using a MEDLINE search. Twenty-six studies met the inclusion criteria, and all authors who could be traced were invited to participate. The authors of 16 studies participated. The pooled dataset consisted of eight studies from Europe, five from North America, one from Hawaii, and two from Australia. In each of these studies, data had been collected on some or all of the following variables: nevus phenotype, hair color, sunburn history, Fitzpatrick skin type, freckling, family history of melanoma, and age. Data on eye color were also collected but this variable was found to be highly correlated with hair color and was dropped from the final model. The variables were grouped into categories where it was clear that this was appropriate (Table 1). These variables were established risk factors for melanoma as described above (3, 5, 10); further details can be found in the Supplementary Information.
Generating estimates for the effect of covariates used in the risk algorithm
The pooled data were analyzed using a logistic random coefficients model to account for heterogeneity between studies. Pooled odds ratios (OR) were estimated for the effect of each categorical variable on melanoma risk adjusted for the other six variables (Table 1), age and sex using winBUGS (a more detailed explanation can be found in ref. 6). The Western Canada study (Elwood and colleagues 1985; ref. 11) was omitted from the final model as there were no nevus count data available.
Creating the risk score
The estimated odds of disease from the above model were used to define a “risk score”, formed by multiplicatively combining the estimates in Table 1. This composite estimated OR was considered the estimated relative risk of an individual developing melanoma compared with an individual from the same population who had the lowest possible risk (black hair, fewer nevi than half the population, had not been sunburnt, had no freckles, no large nevi, had a Fitzpatrick skin type of III or IV and had no family history of melanoma).
Categorization of risk and calculation of thresholds
To provide more stable and interpretable risk estimates, we used controls from the pooled analysis as a sample of the combined population and generated an estimate of the background population distribution of the risk score. Table 2 lists each of the included studies. Some variables were not recorded in all of the studies. In addition 1.5% of all data were missing at the individual level for recorded variables. Missing data rates are described in Supplementary Table S1. To calculate a risk score for individuals within these studies, we used imputation as described below to fill in the missing values.
Imputation of missing data using Multiple Imputation by Chained Equations
We assumed that the data could be treated as missing at random (MAR) and implemented Multiple Imputation by Chained Equations (MICE) to impute missing values using the “mice” library in R 3.0.2 (12). We ran the imputation analysis in 30 chains over 15 iterations. Each “chain” is a separate run of the analysis, with different random assignments of the missing data points, upon which the imputation is performed independently of the chains. Further details of the imputation process can be found in Supplementary Methods. The composite melanoma risk score was computed for each individual in each chain and the results from each of the 30 chains were merged together into one composite dataset. We used the 30th, 60th, and 90th centiles to distribute individuals into four risk groups; low, relative to peers; medium-low, relative to peers; medium-high, relative to peers; and high, relative to peers. Peers are defined as individuals of the same age and sex drawn from the same population.
Attributable risk
We calculated attributable risk using the cases in the Leeds case–control data set by imputed missing values as described above, calculating risk scores for each individual and applying the method of Bruzzi and colleagues (13).
Robustness of thresholds
We investigated the robustness of the threshold estimates by dropping each study in turn and recalculating the 20th, 30th, 40th, 50th, 60th, 80th, and 90th centiles, using MICE to impute missing values as above. Large deviation from the threshold values computed using all data with data from a particular study omitted shows that the individual study has a large influence on the algorithm.
Validation in independent data taken from the Leeds case–control study
To test the efficacy of the risk algorithm, we investigated its performance in data collected from 960 population-ascertained incident melanoma cases and 513 controls recruited to a case–control study performed in Leeds, United Kingdom (8, 9). Further details can be found in Supplementary Information. Data on the variables used to build the risk tool were extracted from questionnaires and classified in the same manner as for the pooled data described above.
Composite melanoma risk scores were calculated based on the phenotypic and environmental data recorded for these cases and controls. A receiver operator curve (ROC) was constructed and the area under the curve (AUC) was estimated using the “pROC” library in R (14). Confidence intervals (95% CI) were computed for the ROC curves using DeLong method (15). Cases and controls were also grouped into risk categories using the threshold estimates calculated from the pooled data, and a difference between the risk classification of cases and controls was tested using a χ2 test.
Most users of a risk tool aimed at the public will not have access to professional assessment of mole counts and freckling. We therefore assessed how using self-reported mole counts and freckling scores affected the classification of cases in the Leeds controls. Further details of the methods used to do this can be found in the Supplementary Material.
Results
Calculating the risk score
The composite risk score based on the seven variables has a theoretical approximate range of 1 to 233 (Table 1). This score is the combined odds of developing melanoma compared with the combined odds for a person in the lowest theoretical category of risk for someone of the same sex and age. In the pooled data we observed the full range of theoretical risk score (1–233, Supplementary Fig. S1). In the Leeds melanoma data, we saw a reduced range of scores (1–188, Supplementary Fig. S2).
Risk categories
Thresholds for the predefined risk categories derived from controls in the pooled data are shown in Table 3. Individuals are classified as being low risk (<3.32), medium-low risk (3.32–8.46), medium-high risk (8.46–32.80), or high risk (≥32.80) relative to the background risk in the population. For example, a person with red hair, Fitzpatrick skin type I, freckling, but has no large nevi, a low nevus count, who has not been severely sunburnt, and has no family history of melanoma in a first-degree relative would have a risk score of 4.62 (1.76 × 1.66 × 1.58 from estimates in Table 1) and would be categorized as “medium-low risk, relative to peers” (using the thresholds in Table 3).
Testing the robustness of threshold estimates using “leave-one-study-out” cross-validation
To test the robustness of the thresholds we removed each study in turn, performed MICE using the same settings on the remaining data, and then recalculated threshold values. Table 3 shows the results of omitting each study on the threshold values at seven different points in the distribution. There was no evidence that omitting any of the studies caused gross distortion of the threshold values.
Validation using the Leeds case–control data
The risk score was computed for each individual in the Leeds case–control study as above. Initially, we used only complete cases to reduce the number of assumptions made about the data. ROC curve analysis showed that the raw composite score was capable of distinguishing cases from controls reasonably well (AUC, 0.75; 95% CI, 0.73–0.78; Fig. 1). Cases and controls were also classified into four risk groups using the threshold values generated using the 30th, 60th, and 90th centile values of the controls from the pooled data analysis. The proportion of cases and controls that fall into each of the four risk groups is shown in Table 4. Cases and controls were well separated (χ2 test, P < 2.2 × 10−16); 29% of cases are in the highest risk group compared with 7% of controls, and 43% of controls are in the lowest risk group compared with 13% of cases. However, the Leeds population had a greater proportion of controls at “low risk, relative to peers” compared with pooled controls, as in the latter the controls were by definition approximately distributed in the risk groups as 30%, 30%, 30%, and 10%.
Finally, we imputed missing data in the Leeds cohort using MICE and repeated the above analyses. There was a small improvement in the model (AUC, 0.77) but no difference in the distribution of controls (Supplementary Table S2).
We estimated overall attributable risk from the Leeds dataset to be 87.8%.
Agreement between self-reported and nurses' counts of moles
We assessed the reliability of self-reported versus nurse-assessed mole counts and freckling in the Leeds control group. Supplementary Figure S3 shows a Bland-Altman plot comparing self-reported counts of moles on the back with nurse counts of the back in the Leeds controls. The mean difference between the two counts is three moles, and the 95% limits of agreement are wide (−23.5, 29.4). Larger discrepancies are seen for individuals as the average mole count increases; in the majority of these instances the patient has overestimated the number of moles on their back (difference > 0).
Reasoning that laypersons might identify patterns of moles more accurately than individual moles, Leeds cohort participants were also asked which of four diagrams best represented their mole count (Supplementary Fig. S4). Their responses were compared to the nurse-assessed mole counts grouped using the centile thresholds implemented in the risk model (0%–50%, 51%–75%, 75%–90%, >90%). We saw that 37% of individuals classified themselves in the same nevus score rank as the nurses (Supplementary Table S3) even though the measures are not equivalent. Finally, there was agreement about the presence of any freckling in 63% of controls although the two variables are still highly significantly associated (Supplementary Table S4; χ2 test; 3 × 10−5).
With respect to risk classification in the pooled data, we compared how well individuals are classified when the self-reported counts or nurse counts are used (Supplementary Table S5) in both cases and controls. A good correlation between the two sets of measurements is seen using the self-reported counts; 97% of individuals are classified within one rank of the nurse counts, and 57% are classified in exactly the same group. However, there is a net improvement in the classification of cases into higher risk groups and controls into lower risk groups when the nurse counts are used (NRI, 0.29); the majority of the improvement is due to increased classification of cases into higher risk groups [P(ranked higher | case) = 0.35]. We also compared the performance of the model using self-reported measures to the nurse reported measures using ROC curve analysis. The discriminatory ability of the model is lower when self-reported measures are used (AUC, 0.70; 95% CI, 0.66–0.73), which is similar to an alternative model where mole count and freckling were omitted (AUC, 0.69; 95% CI, 0.66–0.73).
Absolute risk
While we have presented our model to produce categories of risk relative to the underlying population risk for someone of a similar age and sex, with some small modifications it is possible to produce absolute risk estimates as well. Using data taken from the Cancer Research UK and UK Office for National Statistics websites (16–18), it is possible to estimate that the absolute risk for a 30-year old woman from the United Kingdom with the risk factors discussed earlier would be approximately 0.04% over the next 5 years. Further details can be found in Supplementary Data.
Discussion
The leveling off of melanoma incidence rates in some countries, continued rise in others, and the effects of birth cohort on incidence (1) all suggest modifiable environmental exposures may affect incidence. There is strong evidence of effects of intermittent high exposure to the sun on melanoma risk (10). Therefore the need for melanoma prevention advice directed at sun protection is clear. Our study was designed to construct a risk algorithm based upon large melanoma case–control datasets to enable members of the public to estimate their own risk relative to that of others in their population. Use of this algorithm to motivate change in sun-related behavior is based on the theory that primary prevention advice is more effective when the targeted persons believe themselves to be at relatively high risk; a study of modification of the behavior of adolescents in the sun has provided empirical support for this theory (19). On the other hand, individuals who are told that they have relatively low risk may well decide that they can ignore sensible sun protection measures in the sun. Any tailored risk measurements must avoid underplaying the dangers of risky behavior in the sun for all individuals.
Other melanoma risk tools have been implemented previously for public use. The tool provided by the NCI (20, 21) gives an estimate of absolute risk and focuses exclusively on one population (the United States). A recent study has similarly produced a model that predicts the absolute 5-year risk of melanoma for individuals in New Zealand but the authors recommend that external validation is performed before it is used for clinical practice (22). The online risk tool produced by the New South Wales Government (23) produces no final estimate of risk but instead provides short explanations for why each question was asked to inform users of the risk factors. The Harvard School of Public Health Web tool produces an estimate of risk that is relative to peers of the same age and sex for individuals over 40 (24).
Eiser and colleagues have previously suggested that numerical information may be interpreted as more exact than it is (25, 26). The primary approach we have taken is to provide a classification of risk into categories relative to population risk, but we have also shown that our estimates can easily be used to produce an estimate of 10- or 5-year absolute risk by combining with local data. Similar classification systems to ours have been suggested in the past (27, 28). In one instance categorical groupings were used to assign rough estimates of 10-year risk (29). We propose that our algorithm, which is based on data taken from multiple case–control studies worldwide, may be applicable to more than just one population, although as yet we have only tested it on one (Leeds, United Kingdom). The distribution of controls in the Leeds study differs from that in the pooled case–control studies, and is weighted toward more controls being classified in lower risk groups.
Conveying risk effectively is a difficult and complex issue (30) and beyond the scope of this paper to explore fully. In practical applications of our risk tool, ideally both risk relative to the baseline population and estimates of absolute risks would be provided. We have demonstrated that it is easy to adapt our model to output absolute risks, given appropriate local data on melanoma incidence and overall mortality rates, although more sophisticated methods may be required to account for geographical variation in incidence rates in larger countries such as the USA (21).
The beneficial effects of sun exposure include higher vitamin D levels, which are essential for bone health and might be important for many other aspects of health, such as prevention of cancer (31) and diseases associated with the metabolic syndrome (32) although this has not been proven. It may therefore be sub-optimal to recommend very high levels of sun protection for individuals at lower risk of melanoma, especially in temperate climates, where there is less sunshine. This project had the second aim of assisting members of the public to identify themselves as at lower risk than their peers, so that advice on sun avoidance could be better tailored to the individual. The datasets were built almost entirely from data from white-skinned individuals as they are the population most affected by melanoma. The incidence of melanoma in black and Asian populations is much lower, and likely our algorithm would not be applicable to these populations.
A weakness of the study is that the risk algorithm was built and tested using case–control data. Consequently, the ORs that the risk score is built upon are potentially subject to the biases inherent to case–control designs, such as recall bias, selection bias, participation bias, and/or confounding. We have also made an assumption that since the OR estimates for each factor used to build the risk score were derived from a multivariable joint analysis, they can be treated as independent and therefore can be combined multiplicatively. We have not accounted for potential interactions between factors in this model. Interactions are notoriously difficult to show, and a model that included all potential interactions between the factors would contain too many variables to be practical. A strength of the study is that the datasets used were very large and detailed. As with all pooled datasets, the data are from disparate studies. Reassuringly, however, in the previously reported analyses the estimates of relative risk of melanoma in relation to sunburn (4) and nevus phenotype were remarkably consistent across all the studies (6). The point estimates of the odds of melanoma for an individual are highly imprecise, particularly at the extremes of the distribution. Therefore we have taken the approach of categorizing risk into broad groups.
A challenging aspect of this analysis was that several variables were not recorded in all studies; this was addressed by imputation using MICE. Of particular concern is the large mole variable, which is only available for 7 of the 16 studies and is defined differently in different studies (e.g., large moles were defined by a research nurse as >8 mm in Kanetsky 2001 but self-reported ≥5 mm in Le Marchand 2006). However, we did not see much perturbation in the threshold scores when each study was dropped in turn.
The analyses carried out resulted in a composite score representing an estimate of relative risk for individuals compared with those with the lowest level of risk factors. The AUC in the ROC analysis was 0.75, suggesting that the measure explained a substantial proportion of the risk. Recently, Vuong and colleagues identified 28 melanoma prediction models generated from 19 studies published before April 2013, in which discrimination ranged from an AUC of 0.62 to 0.86 (33); so our model is competitive in this regard. It is likely that we could have increased the AUC if we had used additional variables such as genetic factors. However, we hope ultimately to provide a tool that will be used by individuals reporting their own risk factors. Therefore, it was practical to use simple measures that can be self-reported.
We generated four different risk groups based upon a distribution of risk estimates in the controls using the 30th, 60th, and 90th centiles as cutpoints. Cases and controls were well differentiated. Approximately 7% of controls in the Leeds data were found in the highest risk group compared with 29% of cases.
We have shown evidence that risk prediction is more accurate when professionally measured freckling and mole count variables are used. This may be a potential weakness for developing a risk tool using this algorithm, as the results may be misleading in the presence of misclassification. Nonetheless, the majority of individuals were classified in the same group irrespective of whether self-reported or professionally derived variables were used. For a risk tool aimed at the public it may be best to leave out these variables as there was evidence that models that omitted the self-reported variables lost no discriminatory power. However, there was a substantial improvement in classification when professionally derived variables were used, particularly for ranking cases in higher risk groups, so ideally these variables should be incorporated in some form. We made a strong assumption that the qualitative groups in the diagrams match well to the equivalent centile groups in the risk tool. However there is substantial variation in mole count distributions between populations, so this assumption may well be violated. Diagrams that better matched users to the four quantile groups would presumably perform better; this argues for the need to tailor self-estimation of nevi to each individual population if diagrams are to be used.
In summary, we have generated an algorithm for use in white populations to predict risk of melanoma. Practical application of this algorithm to general use in the future will require several more steps including validation in other cohorts from other regions to test its generalizability. We hope to continue to refine the algorithm as additional datasets become available in low-latitude and high-latitude regions. Using simple measures, the algorithm can be used to help identify higher and lower risk individuals, relative to others of the same age and sex within a population, for whom the hazards of sun exposure would be different, and to produce estimates of absolute risk when combined with population-specific data.
Disclosure of Potential Conflicts of Interest
B.K. Armstrong is chairman of NSW Skin Cancer Prevention Advisory Committee at Cancer Institute NSW. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: D.T. Bishop, B.K. Armstrong, N.A. Gruis, E.A. Holly, R.M. Mackie, H. Olsson, A. Østerlind, L. Titus, R.P. Gallagher, J.H. Barrett, J. Newton-Bishop
Development of methodology: J.R. Davies, D.T. Bishop, E.A. Holly, L. Titus, R.P. Gallagher, J.H. Barrett, J. Newton-Bishop
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D.T. Bishop, B.K. Armstrong, V. Bataille, W. Bergman, M. Berwick, J.M. Elwood, M.S. Ernstoff, A. Green, N.A. Gruis, E.A. Holly, C. Ingvar, P.A. Kanetsky, T.K. Lee, L. Le Marchand, H. Olsson, A. Østerlind, T.R. Rebbeck, K. Reich, P. Sasieni, V. Siskind, A.J. Swerdlow, L. Titus, M.S. Zens, A. Ziegler, R.P. Gallagher, J. Newton-Bishop
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.R. Davies, Y. Chang, D.T. Bishop, J.M. Elwood, M.S. Ernstoff, E.A. Holly, C. Ingvar, M.R. Karagas, K. Reich, A. Ziegler, R.P. Gallagher, J. Newton-Bishop
Writing, review, and/or revision of the manuscript: J.R. Davies, Y. Chang, D.T. Bishop, B.K. Armstrong, V. Bataille, W. Bergman, M. Berwick, P.M. Bracci, J.M. Elwood, M.S. Ernstoff, N.A. Gruis, E.A. Holly, C. Ingvar, P.A. Kanetsky, M.R. Karagas, L. Le Marchand, R.M. Mackie, H. Olsson, T.R. Rebbeck, K. Reich, P. Sasieni, A.J. Swerdlow, L. Titus, A. Ziegler, R.P. Gallagher, J.H. Barrett, J. Newton-Bishop
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): N.A. Gruis, M.R. Karagas, T.R. Rebbeck
Study supervision: E.A. Holly, R.P. Gallagher, J.H. Barrett
Acknowledgments
The authors thank the other investigators of the contributing studies, who are authors of the original publications. Dr. J.N. Bouwes Bavinck is thanked for putting the melanoma database together for Leiden University Medical Center, the Netherlands. Lund Melanoma Study Group is thanked for compiling the Swedish data. Mr. John Taylor is thanked for recoding the New Hampshire study for the pooled analysis. The authors also thank Dr. M.R. Karagas who provided original data from the East Denmark, Scotland, East Midlands, San Francisco, Queensland, and Western Australian studies, which she had compiled for pooled analysis of other variables.
The Editor-in-Chief of Cancer Epidemiology, Biomarkers & Prevention is an author of this article. In keeping with the AACR's Editorial Policy, the paper was peer reviewed and an AACR Journals' Editor not affiliated with Cancer Epidemiology, Biomarkers & Prevention rendered the decision concerning acceptability.
Grant Support
The funds from NCI to E.A. Holly (RO1-CA52345, RO1-CA34382, and RO1-CA66032), L. Titus, and M. Berwick are acknowledged. The authors thank the funders of the contributing studies, who are acknowledged in the original study publications listed in the references to this article. This study was supported by the European Commission, 6th Framework Programme (LSHC-CT-2006-018702) to J. Newton-Bishop, J.H. Barrett, and D.T. Bishop; Cancer Research UK (C588/A4994, C569/A5030) to J. Newton-Bishop, J.H. Barrett, and D.T. Bishop; National Cancer Institute (RO1-CA52345 to E.A. Holly, P0-1 CA42101 to M. Berwick, RO1-CA66032 to L. Titus); NIH (R01-CA92428 to P.A. Kanetsky); and University of Sydney Medical Foundation Program Grant (to B.K. Armstrong).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.