Purpose:

Nodule evaluation is challenging and critical to diagnose multiple pulmonary nodules (MPNs). We aimed to develop and validate a machine learning–based model to estimate the malignant probability of MPNs to guide decision-making.

Experimental Design:

A boosted ensemble algorithm (XGBoost) was used to predict malignancy using the clinicoradiologic variables of 1,739 nodules from 520 patients with MPNs at a Chinese center. The model (PKU-M model) was trained using 10-fold cross-validation in which hyperparameters were selected and fine-tuned. The model was validated and compared with solitary pulmonary nodule (SPN) models, clinicians, and a computer-aided diagnosis (CADx) system in an independent transnational cohort and a prospective multicentric cohort.

Results:

The PKU-M model showed excellent discrimination [area under the curve; AUC (95% confidence interval (95% CI)), 0.909 (0.854–0.946)] and calibration (Brier score, 0.122) in the development cohort. External validation (583 nodules) revealed that the AUC of the PKU-M model was 0.890 (0.859–0.916), higher than those of the Brock model [0.806 (0.771–0.838)], PKU model [0.780 (0.743–0.817)], Mayo model [0.739 (0.697–0.776)], and VA model [0.682 (0.640–0.722)]. Prospective comparison (200 nodules) showed that the AUC of the PKU-M model [0.871 (0.815–0.915)] was higher than that of surgeons [0.790 (0.711–0.852), 0.741 (0.662–0.804), and 0.727 (0.650–0.788)], radiologist [0.748 (0.671–0.814)], and the CADx system [0.757 (0.682–0.818)]. Furthermore, the model outperformed the clinicians with an increase of 14.3% in sensitivity and 7.8% in specificity.

Conclusions:

After its development using machine learning algorithms, validation using transnational multicentric cohorts, and prospective comparison with clinicians and the CADx system, this novel prediction model for MPNs presented solid performance as a convenient reference to help decision-making.

Translational Relevance

This study developed the first prediction model for multiple pulmonary nodules (MPNs) using a novel machine learning algorithm. After external validation in independent transnational cohorts, the model showed excellent discrimination and calibration, outperforming existing multivariable risk prediction models (solitary pulmonary nodule or screening models). Furthermore, the model performed better than radiologists, surgeons, and a well-trained artificial intelligence diagnosis system in prospective comparison. This model has been implemented as a web-based model so that clinicians can obtain the estimated probability of malignancy by filling in the clinical and radiographic features of the patients. Therefore, the established model will conveniently guide precision diagnosis of MPNs before surgical treatment, thus decreasing unnecessary invasive procedures.

Multiple pulmonary nodules (MPNs) are radiographic opacities up to 3 cm in the lung, with no associated atelectasis, hilar enlargement, or pleural effusion (1, 2). Because of the widespread use of thoracic CT, MPNs have become an increasingly recognized phenomenon, with a detection rate ranging from 6.8% to 50.9% in lung cancer screening trials (3–5). Previous studies have reported the benign rate as high as 40% for a diagnostic operation after nodule detection (6–8), emphasizing the importance of careful nodule evaluation before invasive procedures to minimize surgical risk and reduce unnecessary pulmonary function loss. Unlike solitary pulmonary nodules (SPNs) in which PET/CT and percutaneous biopsy can help in the differential diagnosis, the accuracy of PET/CT in MPNs, particularly in multiple ground-glass opacities (GGOs), is unsatisfactory (6, 9). In addition, performing a biopsy for every nodule shown in a CT scan is impossible. Consequently, a tool to pretest the malignancy for MPNs to guide subsequent management is needed.

Existing guidelines recommended several clinical prediction models based on logistic regression (10–13) to help estimate the risk of cancer before decision-making (1, 14, 15). Although these models are externally validated, they almost focus on SPNs, with no evidence of accuracy in MPNs. The Brock model is the only one that includes an analysis of MPNs; however, it is designed for screening populations with an extremely low prevalence of cancer (3.7%–5.5%), limiting its use to evaluate patients for surgery. In recent years, some artificial intelligence (AI) products have been applied clinically, including computer-aided detection/diagnosis systems (CADe/x), which help identify or diagnose nodules from CT scans (16–18). However, none of these products have been verified in MPNs, resulting in the lack of evidence regarding surgical evaluation for such nodules.

In this study, we sought to develop a web-based machine learning model, using clinical and radiographic characteristics to predict the probability of malignancy exclusively for MPNs, validate them with a contemporary, transnational and multicentric cohort, and prospectively compare the performance with clinicians and a well-trained CADx system using another multicentric cohort.

Study design

MPNs are defined as more than one nodule in the lung parenchyma detected on thoracic CT scans, with each nodule ranging from 4 to 30 mm. Our study comprised three key components (Fig. 1). First, we constructed a prediction model to pretest the nodule-level probability of malignancy using 11 typical machine learning algorithms and selecting the best performing algorithm for further optimization. Second, the model was externally validated in an independent transnational MPN cohort to test the performance and compare it to the existing mathematical SPN models. Third, we conducted a prospective comparison in predictive accuracy among our model, clinicians, and a well-trained CADx product using another registered multicentric MPN cohort.

Figure 1.

Study design and flowchart of the study.

Figure 1.

Study design and flowchart of the study.

Close modal

To complete this study, multiple patient cohorts were acquired as follows.

The development cohort comprised consecutive patients with newly discovered MPNs by thin-slice thoracic CT scans who had been treated at Peking University People's Hospital (Beijing, China) from January 2007 to December 2018. All the nodules found on CT scans were labeled and recorded to develop the prediction model. Notably, the diagnosis of lung cancer was made by pathologic examination after surgical resection. A nodule was considered benign by pathologic examination after resection or radiographic surveillance for at least 2 years if not undergoing a procedural biopsy. Including nodules without a definite pathology increased the missing data in the development process but minimized the bias in the spectrum of risk encountered by clinicians and the additional bias arising from nodules not undergoing resection.

The independent validation cohort comprised consecutive patients with MPNs who had undergone surgical treatment at six independent transnational hospitals [Beijing Haidian Hospital (China), Beijing Chuiyangliu Hospital (China), Aerospace 731 Hospital (China), Beijing Aerospace General Hospital (China), Shijiazhuang People's Hospital (China), Seoul National University Hospital (Korea)] from January 2016 to December 2018. Considering that including nodules without resection would increase the bias in the accurate validation process, only nodules with a definite pathology were recorded to validate the model.

The prospective comparison cohort was registered before the initiation of the study and comprised consecutive patients with MPNs who had undergone surgical treatment at four independent hospitals [Beijing Haidian Hospital (China), Beijing Chuiyangliu Hospital (China), Aerospace 731 Hospital (China), Shijiazhuang People's Hospital (China)] from January 2019 to March 2019. Nodules with explicit pathology were labeled and recorded.

All CT scans were obtained at 120 kVp, 40 to 60 mA, and rotation times of up to 1 second. Images were reconstructed at a 0.8-mm, 1.0-mm, 1.25-mm, or 1.5-mm slice width using standard mediastinal (width, 350 HU; level, 40 HU) and lung (width, 1500 HU; level, -600 HU) window-width and window-level settings. Patients with presence of pneumonia, pleural effusion, or a benign calcification pattern (fully or popcorn calcification) on thoracic CT scans were excluded. In addition, patients with a history of malignancy within 5 years, initial adjuvant therapy, or unavailable thoracic CT scans were excluded.

The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement (19) was used as the reporting guide for our study. This study was performed in accordance with the Helsinki Declaration and was approved by the Institutional Review Board (IRB) of each center (2019PHB111–01). The prospective comparison cohort was registered at ClinicalTrials.gov (NCT03795181). Informed consent was waived by the IRB because the study was observational and noninvasive.

Variable collection

The sociodemographic variables were extracted through the electronic medical record system and included age, sex, family history of lung cancer, smoking status, smoking quantity (pack-year), time since quitting smoking, and elevation of a subset of tumor markers [carcinoembryonic antigen (normal: <5 ng/mL), cytokeratin 19 fragment (normal: <30 ng/mL), neuron-specific enolase (normal: <12.5 U/mL), cancer antigen 199 (normal: <27 U/mL), cancer antigen 125 (normal: <35 U/mL)].

The radiographic characteristics of the MPNs on CT scans were extracted by two board-certified clinicians (a radiologist with 10 years of experience and a thoracic surgeon with 15 years of experience) who were blinded to the final pathologic diagnoses and our model development process; any conflicts were adjudicated by a third investigator. The radiographic characteristics included the nodule size (maximum transverse size measured on the lung window setting), visual nodule type (pure GGO, part-solid, or solid), nodule location (upper, middle or lower lobe; left or right side), nodule distribution (unilateral or bilateral), and nodule count per scan. The presence of emphysema, spiculation, lobulation, pleural retraction sign, unclear border, and calcification component was also recorded. Perifissural nodules were excluded because they were demonstrated to be benign in previous studies (13–15).

Machine learning

Model selection

A random subset of four of five of the nodules in the development cohort was selected as the training set, and the remaining nodules comprised the test set. We ran a series of typical and recent machine learning algorithms [adaptive boosting (AdaBoost), decision tree, logistic regression, linear support vector machine (Linear SVM), radial basis function kernel support vector machine (RBF SVM), naive Bayes, nearest neighbors, neural net, quadratic discriminant analysis (QDA), random forest, and extreme gradient boosting (XGBoost)] on the training set using all the sociodemographic and radiographic variables as the features to train each model 50 times on each default setting, and evaluated each algorithm using the test set to compare the performance (Fig. 2A). Finally, XGBoost (20)—a novel tree boosting algorithm to develop prediction models that has gained wide popularity in the machine learning community and has excellent scalability, high running speed, and state-of-the-art accuracy—had a better average performance than the other algorithms in the test; thus, it was selected for further optimization.

Figure 2.

Selection of machine learning algorithms and performance metrics of models in the development cohort. A, Selection of the machine learning algorithms based on their performance. All the models were trained and tested 50 times. The AUCs are illustrated by the boxplot. The XGBoost algorithm had the best AUC, so it was selected for further analysis. B, Performance of the PKU-M model by ROC curve analysis. C, Sensitivity and specificity versus cut-off probability plot of the PKU-M model. Decreasing sensitivity and increasing specificity are shown for increasing probability thresholds for malignancy, with a histogram for the distribution of the predicted probabilities. D, Feature importance plot for the PKU-M model. All the features are shown in this figure. The blue and red points in each row represent nodules having low to high values of the specific feature, while the x-axis shows the SHAP value, indicating the impact on the model [i.e., does it tend to drive the predictions towards malignancy (positive value of SHAP) or benignity (negative value of SHAP)?]. Acc, accuracy; AdaBoost, adaptive boosting; FN, false negative; FP, false positive; RBF SVM, radial basis function kernel SVM; QDA, quadratic discriminant analysis; TN, true negative; TP, true positive.

Figure 2.

Selection of machine learning algorithms and performance metrics of models in the development cohort. A, Selection of the machine learning algorithms based on their performance. All the models were trained and tested 50 times. The AUCs are illustrated by the boxplot. The XGBoost algorithm had the best AUC, so it was selected for further analysis. B, Performance of the PKU-M model by ROC curve analysis. C, Sensitivity and specificity versus cut-off probability plot of the PKU-M model. Decreasing sensitivity and increasing specificity are shown for increasing probability thresholds for malignancy, with a histogram for the distribution of the predicted probabilities. D, Feature importance plot for the PKU-M model. All the features are shown in this figure. The blue and red points in each row represent nodules having low to high values of the specific feature, while the x-axis shows the SHAP value, indicating the impact on the model [i.e., does it tend to drive the predictions towards malignancy (positive value of SHAP) or benignity (negative value of SHAP)?]. Acc, accuracy; AdaBoost, adaptive boosting; FN, false negative; FP, false positive; RBF SVM, radial basis function kernel SVM; QDA, quadratic discriminant analysis; TN, true negative; TP, true positive.

Close modal

Model construction

The development cohort was randomly divided into 10 equal-sized folds, of which 8-folds constituted the training set, 1-fold constituted the tuning set, and 1-fold constituted the test set. Optimal model hyperparameters (e.g., number of trees or depth of each tree) were selected and fine-tuned by grid search using a 10-fold cross-validation procedure. The early-stop method was used to control overfitting (21). Finally, the machine learning model (PKU-M model) was developed and evaluated in the test set. Feature ranking was obtained by computing Shapley Additive Explanations (SHAP) values (22), which can help understand how a single feature affects the output of the model. All the machine learning procedures were implemented in Python using open-source libraries.

Statistical analysis

Analysis of the correlations between malignancy and the categorical variables was conducted using Fisher exact test or Pearson χ2 test. Student t test or the Mann–Whitney U test was used for continuous variables, which were expressed as means ± SD and medians with range. Receiver operating characteristic (ROC) curve analysis was performed to evaluate the discriminatory performance, and the areas under the curve (AUCs) were reported with 1,000 bootstrap bias-corrected 95% confidence intervals (95% CI, ref. 23). The Youden index, which maximizes the sum of the sensitivity and specificity, was defined to calculate the sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV; ref. 24). The Brier score (on a scale ranging from 0 to 1), which calculates the difference between the estimated and observed risk for malignancy with values closer to 0 indicating better calibration, was used to evaluate model calibration.

External validation was conducted using the independent validation cohort. The AUC of the PKU-M model was compared with that of four prior mathematical models: The Mayo model (10), Veterans Affairs (VA) model (11), Peking University (PKU) model (12), and Brock model (13). Paired comparisons of AUCs were computed using the same nonparametric bootstrap technique as that for 95% CIs (23). Decision curve analyses that showed the net benefit of using a model at different thresholds were performed to evaluate the clinical value of the models (25, 26).

A comparison among the PKU-M model, four clinicians (a chest radiologist with 5 years of experience and three thoracic surgeons with 3, 5, and 10 years of experience, respectively) and a CADx product (RX) based on three-dimensional convolutional neural network (CNN) developed by a Chinese technique company (RX) was conducted to compare the diagnostic accuracy with that of the prospective comparison cohort. The clinicians were informed to participate in a study to predict malignancy for MPNs. They were asked to do the following: (i) to estimate the probability of malignancy for each target nodule and (ii) judge the malignant risk of nodules in three categories: high risk, medium risk, and low risk. The clinicians were provided access to the sociodemographic data and were allowed to evaluate CT scans by changing the level and width of the window on the monitor without limitation in reading time but were blinded to the pathology and entire model development process. The consistency of risk prediction among the clinicians was assessed by Kendall's coefficient of concordance (Kendall's W; ref. 27). In addition, the Picture Archiving and Communication Systems (PACS) of CT scans were input into the RX to predict the nodules to simultaneously present the predicted probability and malignant risk category for each target nodule. A detailed description of the RX is given in the Supplementary File as a reference. Finally, the estimations of the PKU-M model, clinicians, and RX were compared. To calculate the sensitivity and specificity, only the judgement of “high risk” was considered “test positive” (positive for malignancy) to assess the diagnostic performance. The sensitivity and specificity were compared using McNemar χ2 test.

To explore the generalizability of the PKU-M model in patients with SPN, we collected an exploration cohort comprising consecutive patients with SPN who had received surgical resection at Peking University People's Hospital (China) from October 2018 to December 2018. The PKU-M model was validated using this cohort and compared with the four prior mathematical models mentioned above.

All statistical analyses were performed using STATA/MP (version 16.0) and R software (version 3.6.3).

Five hundred twenty patients were included in the development cohort, with 1,739 nodules found on CT scans, of which 876 nodules (50.4%) were confirmed to be malignant. The independent validation cohort comprised 220 patients with 583 nodules having an explicit pathologic diagnosis, of which 318 nodules (54.5%) were malignant. The prospective comparison cohort included 78 patients with 200 nodules having a definite pathologic diagnosis, of which 126 nodules (63.0%) were malignant. The patient sociodemographic characteristics are shown in Supplementary Table S1, and the nodule variables are mentioned in Table 1.

Table 1.

Distribution of nodule variables according to lung malignancy status (nodule-level data).

Development cohortIndependent validation cohortProspective comparison cohort
BenignMalignancyTotalBenignMalignancyTotalBenignMalignancyTotal
VariableN = 863N = 876N = 1,739PN = 265N = 318N = 583PN = 74N = 126N = 200P
Nodule location 
 Left upper lobe 199 (48.7%) 210 (51.3%) 409 (23.5%) 0.620 51 (44.0%) 65 (56.0%) 116 (19.9%) 0.143 10 (27.8%) 26 (72.2%) 36 (18.0%) 0.203 
 Left lower lobe 144 (49.7%) 146 (50.3%) 290 (16.7%)  45 (49.5%) 46 (50.5%) 91 (15.6%)  18 (52.9%) 16 (47.1%) 34 (17.0%)  
 Right upper lobe 241 (47.5%) 266 (52.5%) 507 (29.2%)  63 (38.0%) 103 (62.0%) 166 (28.5%)  26 (33.8%) 51 (66.2%) 77 (38.5%)  
 Right middle lobe 96 (52.8%) 86 (47.2%) 182 (10.5%)  32 (53.3%) 28 (46.7%) 60 (10.3%)  7 (31.8%) 15 (68.2%) 22 (11.0%)  
 Right lower lobe 183 (52.1%) 168 (47.9%) 351 (20.2%)  74 (49.3%) 76 (50.7%) 150 (25.7%)  13 (41.9%) 18 (58.1%) 31 (15.5%)  
Nodule distribution 
 Unilateral 288 (40.5%) 424 (59.6%) 712 (40.9%) <0.001 126 (47.2%) 141 (52.8%) 267 (45.8%) 0.439 36 (38.7%) 57 (61.3%) 93 (46.5%) 0.641 
 Bilateral 575 (56.0%) 452 (44.0%) 1027 (59.1%)  139 (44.0%) 177 (56.0%) 316 (54.2%)  38 (35.5%) 69 (64.5%) 107 (53.5%)  
Nodule type 
 Pure GGO 388 (54.3%) 326 (45.7%) 714 (41.1%) <0.001 101 (51.0%) 97 (49.0%) 198 (34.0%) <0.001 39 (42.9%) 52 (57.1%) 91 (45.5%) <0.001 
 Part-solid 37 (10.5%) 315 (89.5%) 352 (20.2%)  9 (5.9%) 144 (94.1%) 153 (26.2%)  5 (7.9%) 58 (92.1%) 63 (31.5%)  
 Solid 438 (65.1%) 235 (34.9%) 673 (38.7%)  155 (66.8%) 77 (33.2%) 232 (39.8%)  30 (65.2%) 16 (34.8%) 46 (23.0%)  
Border 
 Unclear 145 (27.2%) 388 (72.8%) 533 (30.7%) <0.001 34 (27.2%) 91 (72.8%) 125 (21.4%) <0.001 26 (23.9%) 83 (76.1%) 109 (54.5%) <0.001 
 Clear 718 (59.5%) 488 (40.5%) 1,206 (69.3%)  231 (50.4%) 227 (49.6%) 458 (78.6%)  48 (52.8%) 43 (47.2%) 91 (45.5%)  
Spiculation 
 Yes 103 (23.6%) 333 (76.4%) 436 (25.1%) <0.001 23 (20.2%) 91 (79.8%) 114 (19.5%) <0.001 10 (34.5%) 19 (65.5%) 29 (14.5%) 0.761 
 No 760 (58.3%) 543 (41.7%) 1,303 (74.9%)  242 (51.6%) 227 (48.4%) 469 (80.5%)  64 (37.4%) 107 (62.6%) 171 (85.5%)  
Lobulation 
 Yes 34 (14.2%) 206 (85.8%) 240 (13.8%) <0.001 13 (18.6%) 57 (81.4%) 70 (12.0%) <0.001 12 (19.1%) 51 (80.9%) 63 (31.5%) <0.001 
 No 829 (55.3%) 670 (44.7%) 1,499 (86.2%)  252 (49.1%) 261 (50.9%) 513 (88.0%)  62 (45.3%) 75 (54.7%) 137 (68.5%)  
Pleural retraction sign 
 Yes 83 (24.2%) 260 (75.8%) 343 (19.7%) <0.001 19 (20.9%) 72 (79.1%) 91 (15.6%) <0.001 16 (23.9%) 51 (76.1%) 67 (33.5%) 0.006 
 No 780 (55.9%) 616 (44.1%) 1,396 (80.3%)  246 (50.0%) 246 (50.0%) 492 (84.4%)  58 (43.6%) 75 (56.4%) 133 (66.5%)  
Calcification 
 Yes 42 (87.5%) 6 (12.5%) 48 (2.8%) <0.001 29 (96.7%) 1 (3.3%) 30 (5.2%) <0.001 1 (50.0%) 1 (50.0%) 2 (1.0%) 1.00 
 No 821 (48.6%) 870 (51.4%) 1,691 (97.2%)  236 (42.7%) 317 (57.3%) 553 (94.8%)  73 (36.9%) 125 (63.1%) 198 (99.0%)  
Nodule size 
 Mean ± SD 6.7 ± 3.7 12.7 ± 6.8 9.7 ± 6.2 <0.001 6.8 ± 4.1 12.5 ± 7.2 9.9 ± 6.7 <0.001 6.5 ± 4.0 11.7 ± 6.2 9.8 ± 6.0 <0.001 
 Median (range) 6 (4–30) 10 (4–30) 7 (4–30)  5 (4–28) 10 (4–30) 7 (4–30)  5 (4–30) 10 (4–29) 7 (4–30)  
Nodule count 
 Mean ± SD 6.2 ± 6.0 3.7 ± 2.1 5.0 ± 4.7 <0.001 7.1 ± 8.1 5.3 ± 6.6 6.2 ± 7.4 <0.001 5.1 ± 3.7 6.0 ± 5.0 5.6 ± 4.6 0.941 
 Median (range) 4 (2–28) 3 (2–14) 4 (2–28)  5 (2–40) 3 (2–40) 4 (2–40)  4 (2–18) 4 (2–18) 4 (2–18)  
Development cohortIndependent validation cohortProspective comparison cohort
BenignMalignancyTotalBenignMalignancyTotalBenignMalignancyTotal
VariableN = 863N = 876N = 1,739PN = 265N = 318N = 583PN = 74N = 126N = 200P
Nodule location 
 Left upper lobe 199 (48.7%) 210 (51.3%) 409 (23.5%) 0.620 51 (44.0%) 65 (56.0%) 116 (19.9%) 0.143 10 (27.8%) 26 (72.2%) 36 (18.0%) 0.203 
 Left lower lobe 144 (49.7%) 146 (50.3%) 290 (16.7%)  45 (49.5%) 46 (50.5%) 91 (15.6%)  18 (52.9%) 16 (47.1%) 34 (17.0%)  
 Right upper lobe 241 (47.5%) 266 (52.5%) 507 (29.2%)  63 (38.0%) 103 (62.0%) 166 (28.5%)  26 (33.8%) 51 (66.2%) 77 (38.5%)  
 Right middle lobe 96 (52.8%) 86 (47.2%) 182 (10.5%)  32 (53.3%) 28 (46.7%) 60 (10.3%)  7 (31.8%) 15 (68.2%) 22 (11.0%)  
 Right lower lobe 183 (52.1%) 168 (47.9%) 351 (20.2%)  74 (49.3%) 76 (50.7%) 150 (25.7%)  13 (41.9%) 18 (58.1%) 31 (15.5%)  
Nodule distribution 
 Unilateral 288 (40.5%) 424 (59.6%) 712 (40.9%) <0.001 126 (47.2%) 141 (52.8%) 267 (45.8%) 0.439 36 (38.7%) 57 (61.3%) 93 (46.5%) 0.641 
 Bilateral 575 (56.0%) 452 (44.0%) 1027 (59.1%)  139 (44.0%) 177 (56.0%) 316 (54.2%)  38 (35.5%) 69 (64.5%) 107 (53.5%)  
Nodule type 
 Pure GGO 388 (54.3%) 326 (45.7%) 714 (41.1%) <0.001 101 (51.0%) 97 (49.0%) 198 (34.0%) <0.001 39 (42.9%) 52 (57.1%) 91 (45.5%) <0.001 
 Part-solid 37 (10.5%) 315 (89.5%) 352 (20.2%)  9 (5.9%) 144 (94.1%) 153 (26.2%)  5 (7.9%) 58 (92.1%) 63 (31.5%)  
 Solid 438 (65.1%) 235 (34.9%) 673 (38.7%)  155 (66.8%) 77 (33.2%) 232 (39.8%)  30 (65.2%) 16 (34.8%) 46 (23.0%)  
Border 
 Unclear 145 (27.2%) 388 (72.8%) 533 (30.7%) <0.001 34 (27.2%) 91 (72.8%) 125 (21.4%) <0.001 26 (23.9%) 83 (76.1%) 109 (54.5%) <0.001 
 Clear 718 (59.5%) 488 (40.5%) 1,206 (69.3%)  231 (50.4%) 227 (49.6%) 458 (78.6%)  48 (52.8%) 43 (47.2%) 91 (45.5%)  
Spiculation 
 Yes 103 (23.6%) 333 (76.4%) 436 (25.1%) <0.001 23 (20.2%) 91 (79.8%) 114 (19.5%) <0.001 10 (34.5%) 19 (65.5%) 29 (14.5%) 0.761 
 No 760 (58.3%) 543 (41.7%) 1,303 (74.9%)  242 (51.6%) 227 (48.4%) 469 (80.5%)  64 (37.4%) 107 (62.6%) 171 (85.5%)  
Lobulation 
 Yes 34 (14.2%) 206 (85.8%) 240 (13.8%) <0.001 13 (18.6%) 57 (81.4%) 70 (12.0%) <0.001 12 (19.1%) 51 (80.9%) 63 (31.5%) <0.001 
 No 829 (55.3%) 670 (44.7%) 1,499 (86.2%)  252 (49.1%) 261 (50.9%) 513 (88.0%)  62 (45.3%) 75 (54.7%) 137 (68.5%)  
Pleural retraction sign 
 Yes 83 (24.2%) 260 (75.8%) 343 (19.7%) <0.001 19 (20.9%) 72 (79.1%) 91 (15.6%) <0.001 16 (23.9%) 51 (76.1%) 67 (33.5%) 0.006 
 No 780 (55.9%) 616 (44.1%) 1,396 (80.3%)  246 (50.0%) 246 (50.0%) 492 (84.4%)  58 (43.6%) 75 (56.4%) 133 (66.5%)  
Calcification 
 Yes 42 (87.5%) 6 (12.5%) 48 (2.8%) <0.001 29 (96.7%) 1 (3.3%) 30 (5.2%) <0.001 1 (50.0%) 1 (50.0%) 2 (1.0%) 1.00 
 No 821 (48.6%) 870 (51.4%) 1,691 (97.2%)  236 (42.7%) 317 (57.3%) 553 (94.8%)  73 (36.9%) 125 (63.1%) 198 (99.0%)  
Nodule size 
 Mean ± SD 6.7 ± 3.7 12.7 ± 6.8 9.7 ± 6.2 <0.001 6.8 ± 4.1 12.5 ± 7.2 9.9 ± 6.7 <0.001 6.5 ± 4.0 11.7 ± 6.2 9.8 ± 6.0 <0.001 
 Median (range) 6 (4–30) 10 (4–30) 7 (4–30)  5 (4–28) 10 (4–30) 7 (4–30)  5 (4–30) 10 (4–29) 7 (4–30)  
Nodule count 
 Mean ± SD 6.2 ± 6.0 3.7 ± 2.1 5.0 ± 4.7 <0.001 7.1 ± 8.1 5.3 ± 6.6 6.2 ± 7.4 <0.001 5.1 ± 3.7 6.0 ± 5.0 5.6 ± 4.6 0.941 
 Median (range) 4 (2–28) 3 (2–14) 4 (2–28)  5 (2–40) 3 (2–40) 4 (2–40)  4 (2–18) 4 (2–18) 4 (2–18)  

Model performance

Machine learning produced an excellent performance in terms of predicting the probability of malignancy for MPNs with an AUC of 0.909 (95% CI, 0.854–0.946; Fig. 2B). The Brier score was 0.122, indicating a minimal difference between the predicted and observed probability of malignancy and good model calibration. As illustrated in Fig. 2C, decreasing sensitivity and increasing specificity are shown for an increasing probability of malignancy, with a histogram for the distribution of the predicted probability. We defined an optimal cut-off probability of 0.447 for the PKU-M model according to the Youden index so that the sensitivity, specificity, PPV, NPV, and accuracy were 0.807, 0.849, 0.845, 0.811, and 0.828, respectively.

SHAP values revealed the distribution of the impacts each feature had on the PKU-M model output (Fig. 2D). The nodule size, nodule type, nodule count, border, age, spiculation, lobulation, emphysema, nodule location, and nodule distribution were the top 10 most predictive features in the model. A larger nodule size, a part-solid/pure GGO nodule, an unclear border, an older patient, spiculation, lobulation, and a nodule scattered unilaterally were associated with malignant nodules, whereas the presence of a solid nodule, a larger nodule count, and emphysema were associated with benign nodules.

External validation

The independent validation cohort was used for external validation. The PKU-M model performed well, with an AUC of 0.890 (95% CI, 0.859–0.916), significantly higher than the AUCs of the Brock model (0.806; 95% CI, 0.771–0.838), PKU model (0.780; 95% CI, 0.743–0.817), Mayo model (0.739; 95% CI, 0.697–0.776), and VA model (0.682; 95% CI, 0.640–0.722; P < 0.001 for all; Fig. 3A). The AUC of the PKU-M model was consistently better than those of other models at each center (Supplementary Fig. S1). The Brier score was 0.1426, lower than the scores of the Brock model (0.400), PKU model (0.216), Mayo model (0.366), and VA model (0.370). Decision curve analysis (Fig. 3B) depicted that the net benefit of the PKU-M model exceeded that of any other models, indicating that it had better clinical impact at a wide range of probability thresholds.

Figure 3.

External validation. A, Comparison of the performance of the PKU-M model, Brock model, PKU model, Mayo model, and VA model by ROC curve analysis in the independent validation cohort. B, Decision curves for the predictive probabilities. The PKU-M model has the highest net benefits at a wide range of cut-off thresholds among several models. C, Distribution of the probability of malignancy according to different models.

Figure 3.

External validation. A, Comparison of the performance of the PKU-M model, Brock model, PKU model, Mayo model, and VA model by ROC curve analysis in the independent validation cohort. B, Decision curves for the predictive probabilities. The PKU-M model has the highest net benefits at a wide range of cut-off thresholds among several models. C, Distribution of the probability of malignancy according to different models.

Close modal

The distribution of the probabilities of malignancy according to the different models is shown in Fig. 3C. The PKU-M model discriminated between malignant and benign nodules at a wide range of probabilities. However, the differences in the probabilities between malignant and benign nodules estimated by other models were small, limiting their use in differentiation.

The performance was subsequently evaluated in subgroups stratified by sex, age, family history of lung cancer, smoking history, tumor marker, nodule distribution, nodule type, and nodule size, and was compared with that of the prior models (Supplementary Fig. S2). The PKU-M model showed excellent discrimination ability with an AUC higher than those of the Brock model, PKU model, Mayo model, and VA model in almost all the subgroups.

Prospective comparison

The prospective comparison cohort comprising 200 nodules from 78 patients with MPN was used for prospective comparison. The AUC of the PKU-M model was 0.871 (95% CI, 0.815–0.915), significantly higher than the AUCs of the three thoracic surgeons [0.790 (95% CI, 0.711–0.852), P = 0.016; 0.741 (95% CI, 0.662–0.804), P < 0.001; and 0.727 (95% CI, 0.650–0.788), P < 0.001], the radiologist [0.748, (95% CI, 0.671–0.814, P < 0.001], and RX [0.757 (95% CI, 0.682–0.818), P < 0.001; Fig. 4A]. The results were consistent at each center and are shown in Supplementary Fig. S3.

Figure 4.

Prospective comparison. A, Comparison of the performance of the PKU-M model, three thoracic surgeons, one radiologist, and RX by ROC curve analysis in the prospective comparison cohort. B, Sensitivity and specificity of the PKU-M model, average clinicians and RX. C, Sensitivity and specificity comparison of the model, clinicians, and RX by McNemar χ2 test.

Figure 4.

Prospective comparison. A, Comparison of the performance of the PKU-M model, three thoracic surgeons, one radiologist, and RX by ROC curve analysis in the prospective comparison cohort. B, Sensitivity and specificity of the PKU-M model, average clinicians and RX. C, Sensitivity and specificity comparison of the model, clinicians, and RX by McNemar χ2 test.

Close modal

We displayed the sensitivity and specificity of the PKU-M model according to the Youden index, as well as the sensitivity and specificity of the clinicians and RX according to the risk category they assessed (Fig. 4B). The risk category evaluated by four clinicians is shown in Supplementary Fig. S4. The radiologist and thoracic surgeon (3 years of experience) classified more nodules as medium risk than the other two thoracic surgeons, indicating a significant number of nodules that seemed challenging to diagnose subjectively. Kendall's W was 0.469, suggesting only moderate risk judgment consistency among the four clinicians. Using McNemar χ2 test, the PKU-M model outperformed the average clinicians with an absolute increase of 0.143 (0.091–0.195; P < 0.001) in sensitivity and 0.078 (0.010–0.145; P = 0.018) in specificity. The PKU-M model had comparable sensitivity and specificity with RX. RX had similar specificity with average clinicians, while it outperformed the average clinicians with an absolute increase of 0.103 (P < 0.001) in sensitivity (Fig. 4C).

Generalizability exploration

To preliminarily explore the generalizability of the new model, it was evaluated in the exploration cohort, which comprised 195 surgical patients with SPN, of whom 123 (63.1%) were patients with lung cancer (Supplementary Table S2). Surprisingly, the PKU-M model maintained the remarkable discrimination competence with an AUC of 0.871 (95% CI, 0.815–0.917; Supplementary Fig. S5), comparable with the AUC of the PKU model (0.812; 95% CI, 0.750–0.871; P = 0.105) but significantly higher than the AUCs of the Brock model (0.780; 95% CI, 0.713–0.841; P = 0.013), Mayo model (0.762; 95% CI, 0.689–0.827; P = 0.006), and VA model (0.681; 95% CI, 0.597–0.755; P < 0.001). This result illustrated the excellent generalizability of our model and broadened the scope of application in patients with SPN.

Web-based model

For user-friendly access, the PKU-M model was also implemented as a web-based model. Figure 5 shows a screenshot of the models that are available on https://mpn.pkuphmodel.com/. Users can estimate the probability of cancer by filling in the patient and nodule characteristics.

Figure 5.

Screenshot of the web-based model. Screenshot of the PKU-M model, which is available at https://mpn.pkuphmodel.com/. This figure presents an example: A 64-year-old male patient with a family history of lung cancer had a smoking history (50 pack-year) and an elevated CEA value. An incidental thoracic CT scan discovered two nodules scattered bilaterally in the lung. The nodule located in the right lower lobe was 21 mm and presented spiculation, lobulation, the pleural retraction sign, and the solid type. The predicted probability of cancer by the PKU-M model was 87.4%. After surgical resection, pathologic examination confirmed squamous cell carcinoma.

Figure 5.

Screenshot of the web-based model. Screenshot of the PKU-M model, which is available at https://mpn.pkuphmodel.com/. This figure presents an example: A 64-year-old male patient with a family history of lung cancer had a smoking history (50 pack-year) and an elevated CEA value. An incidental thoracic CT scan discovered two nodules scattered bilaterally in the lung. The nodule located in the right lower lobe was 21 mm and presented spiculation, lobulation, the pleural retraction sign, and the solid type. The predicted probability of cancer by the PKU-M model was 87.4%. After surgical resection, pathologic examination confirmed squamous cell carcinoma.

Close modal

Clinicians are usually faced with the dilemma when managing pulmonary nodules: surgical resection will pose the risk of morbidity to patients, whereas CT surveillance may lead to delayed diagnosis and treatment. The problem becomes more serious when encountering MPNs. It is often impractical to perform biopsy or resection for each nodule detected on CT scans because the distribution is mostly bilateral and scattered. Besides, it is also impossible to remove all the nodules, particularly in patients with marginal pulmonary function and preoperative comorbidities, because each procedure will increase the likelihood of a poor outcome. Current guidelines recommended a risk-based algorithm to help decision-making that a wait-and-see strategy can be performed when a low risk is present, and resection should be performed when a high risk is present (1, 14, 15). However, no tools have been designed for the risk prediction of MPNs at the point of surgical evaluation, causing confusion in managing such nodules. In this study, we used machine learning as a novel analytic approach to establish the first model that can estimate the probability of malignancy for MPNs. The machine learning model showed consistent excellent discrimination and calibration, even in external transnational validation, helping to automate the process of nodule evaluation, particularly for patients with MPNs.

Several models have been reported to predict the malignancy for incidental SPNs. The most widely used model is the Mayo model (10), which is recommended by the American College of Chest Physicians (ACCP) guidelines (1). The British Thoracic Society (BTS) guidelines (14) for the management of pulmonary nodules also recommended using composite prediction models based on clinicoradiologic factors to estimate the malignant probability of a pulmonary nodule. However, no model has been established for MPNs. Considering the heterogeneous characteristics between SPNs and MPNs, whether models for SPNs can be applied to MPNs is unknown. Therefore, we evaluated the predictive accuracy for MPNs using three representative SPN models that were mostly externally validated (10–12). The results showed that all the SPN models displayed low accuracy, indicating that previous SPN models may not be suitable for evaluating MPNs.

Machine learning algorithms specifically suited to find associations between data beyond the one-dimensional statistical approaches currently used (e.g., logistic regression). The increasing availability of computational power and storage space allows machine learning algorithms to analyze more complex data and output instantaneously (28, 29). XGBoost is a novel and popular algorithm in recent years and has won numerous machine learning competitions (20). In our study, XGBoost exceeded the other 10 algorithms, including logistic regression; thus, it was selected to construct the model. The PKU-M model was developed after considering all the candidate variables derived from previously published models. Additional variables specific to MPNs, such as the nodule count and nodule distribution (bilateral or unilateral), were also analyzed. Furthermore, because pulmonary GGOs have been increasingly diagnosed because of the wide adoption of thoracic CT in the past 10 years, we also included nodule type (GGO/part-solid/solid) in the process. The other strength of our model is that we included nodules that had undergone resection and radiographic surveillance together in the development cohort to minimize the bias in the spectrum of risk encountered by clinicians and additional bias arising from nodules that had not undergone resection. In contrast, all the nodules in the validation cohorts had an explicit pathology to ensure the accuracy of the validation process. The results of both the development cohort and external validation cohort indicated that the PKU-M model significantly outperformed the SPN models with high accuracy, even in different subgroups stratified in several clinical and radiographic features.

Notably, the suggested population for clinical application is different between our model and the Brock model (13), which is the only multivariable model that involved some MPNs. The Brock model is established in a lung cancer screening population in which the majority (>90%) were benign. Most of these nodules presented typical imaging features that were relatively easy to assess by experienced doctors who did not need assistance by the prediction model. In addition, the probabilities estimated by models derived from populations with a lower malignancy prevalence tended to be lower than those estimated by models derived from populations with a high malignancy prevalence (14). In contrast, nodules in our development cohort had a higher prevalence of cancer (50%) because patients were evaluated by surgeons. Therefore, the PKU-M model has more clinical relevance than the Brock model at the point of surgical evaluation.

CADe/x systems have been reported to assist in detecting and diagnosing pulmonary nodules on CT scans. Although the false-positive rate remains a concern, these systems have indeed decreased the workload of radiologists, particularly for patients with multiple small nodules. However, few devices for nodule malignancy evaluation have been validated with a satisfactory result except the report from the Google team (30) and a recent retrospective study by Baldwin and colleagues (31). Although the result is promising, it is still far from clinical application. First, such reported deep learning systems were only based on CNN, which lack clinical characteristics such as smoking history, tumor history, age, and other data that are also crucial for diagnosis. Second, the final clinical decisions are made by thoracic surgeons or respiratory physicians, not radiologists; thus, comparisons with these clinicians should be more persuasive. Third, Asian patients are significantly different from Western patients regarding pulmonary nodules in that they have many more GGOs (32). Finally, these two AI products were both developed using the National Lung Cancer Screening Trial (NLST) dataset of screening populations. However, whether they can perform well in patients with incidental pulmonary nodules that were the most common in clinical practice remains unknown. Therefore, we conducted the first prospective comparison among clinicians, our machine learning model, and a CADx system (RX) to diagnose pulmonary nodules, particularly multiple nodules. Because RX was developed by a Chinese company specializing in lung cancer diagnosis through image analysis, using a machine learning algorithm called three-dimensional CNN (33, 34), similar to Google AI and Baldwin's AI, we considered that RX could represent current AI products to compare with our model. The AUC of the PKU-M model was significantly higher than that of each clinician and CADx system, indicating that the XGBoost model discriminates MPNs better than clinicians and the CADx system. McNemar test showed that our model outperformed the clinicians with an absolute increase of 0.143 in sensitivity and 0.078 in specificity. In addition, the CADx system outperformed the clinicians with an absolute increase of 0.103 in sensitivity but had similar specificity. Considering that the CADx system in this study had only learned approximately 11,000+ cases, following its continuing progress, its accuracy is expected to be further improved and its performance is highly possible to exceed that of the models in the future (30, 31). However, currently, it is not the most validated way for nodule evaluation.

After analysis of SPNs, the discrimination of prior mathematical models improved, suggesting that they are more reliable in patients with SPNs but not MPNs. It is inspiring that we found that the PKU-M model can perform well even in solitary nodules. The AUC was 0.871, comparable with that of our previously established PKU model and higher than those of the Brock model, Mayo model, and VA model. Compared with prior models, the new model included more clinical and radiographic features related to lung cancer, such as tumor markers, lobulation, and pleural retraction sign, and it was developed by a state-of-the-art machine learning algorithm instead of logistic regression; thus, it can assess the solitary nodules more comprehensively. Although larger multicentric datasets are needed for validation, this result preliminarily demonstrates the PKU-M model's generalizability in predicting patients with SPN.

Models are developed to assist in clinical diagnosis, indicating that they should be practical. Because of the consideration of the nonlinear relationship between lung cancer and some clinicoradiologic variables, the use of a nomogram is not suitable for the model. Moreover, nomograms require manual calculation according to the scale, which is complicated to some extent. Therefore, we designed a web-calculated version of the model. When clinicians diagnose MPNs, they can only input several clinicoradiologic characteristics, and the software will automatically display the risk of malignancy.

The limitation of this study is that it did not include PET/CT results, which may decrease the accuracy of this model. However, because a large proportion of nodules in this study comprised small (<1 cm) solid or GGO/part-solid nodules, the accuracy of PET/CT is likely unsatisfactory (6, 9, 35, 36); it is therefore not a regular examination. The established model can be more widely used regardless of whether the patient has undergone PET/CT. The other limitation is that the generalizability of this model in Western heterogeneous populations is unclear. Several thoracic CT datasets for Western ancestry individuals are publicly available, with the possibility to use these datasets for validation. However, to complete this study, CT images and detailed clinical information are required to input into our model. To our knowledge, none of the available public datasets meet the need. Therefore, a future study is expected to collect Western cohorts that meet the criteria to validate our model. The third limitation is that patients in this study were patients with high-risk cancer but not a screening cohort. Whether the model could be applied to a widely extended cohort must be verified. However, for the screening cohort, most of the nodules are benign with significant features on CT scans that are easy to diagnose. The true dilemma that puzzles clinicians is presented by suspicious nodules like those included in this study. Finally, although we included as many clinicoradiologic features as possible in our model, it is still a generally comprehensive analysis method. Because of the nodule heterogeneity among patients, this model might not be accurate for an atypical individual. The role of the model is to provide only a relatively reliable reference for clinical judgment; it cannot completely replace clinicians in making the final decision.

In summary, after its development using a novel machine learning algorithm, validated with transnational multicentric cohorts and prospectively compared with clinicians and a deep learning system, this first cancer prediction model specifically for MPNs demonstrated solid performance as a convenient reference to help decision-making.

Y.T. Kim reported personal fees from Johnson and Johnson outside the submitted work. No disclosures were reported by the other authors.

K. Chen: Conceptualization, resources, formal analysis, funding acquisition, validation, investigation, methodology, writing–original draft, project administration, writing–review and editing. Y. Nie: Conceptualization, data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review and editing. S. Park: Conceptualization, resources, data curation, validation, investigation, writing–original draft, project administration. K. Zhang: Data curation, software, formal analysis, visualization, methodology. Y. Zhang: Software, formal analysis, investigation, visualization, methodology. Y. Liu: Software, formal analysis, visualization, methodology. B. Hui: Resources, Data curation, investigation. L. Zhou: Data curation, investigation. X. Wang: Investigation. Q. Qi: Investigation. H. Li: Investigation. G. Kang: Investigation. Y. Huang: Resources, data curation. Y. Chen: Resources, data curation. J. Liu: Resources, data curation. J. Cui: Resources, data curation. M. Li: Resources, data curation. I.K. Park: Resources, data curation. C.H. Kang: Resources, data curation. H. Shen: Data curation. Y. Yang: Resources, data curation. T. Guan: Data curation. Y. Zhang: Resources, data curation. F. Yang: Conceptualization, resources, supervision, validation, project administration, writing–review and editing. Y.T. Kim: Conceptualization, resources, data curation, supervision, validation, methodology, project administration, writing–review and editing. J. Wang: Conceptualization, resources, supervision, project administration, writing–review and editing.

We thank American Journal Experts (http://www.aje.cn) for providing language help during the preparation of this manuscript. This work was supported by the National Natural Science Foundation of China (no. 82072566, to K. Chen) and Peking University People's Hospital Research and Development Funds (RS2019-01, to K. Chen).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Gould
MK
,
Donington
J
,
Lynch
WR
,
Mazzone
PJ
,
Midthun
DE
,
Naidich
DP
, et al
Evaluation of individuals with pulmonary nodules: when is it lung cancer? Diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines
.
Chest
2013
;
143
:
e93S
e120S
.
2.
Ost
D
,
Fein
AM
,
Feinsilver
SH
. 
Clinical practice. The solitary pulmonary nodule
.
N Engl J Med
2003
;
348
:
2535
42
.
3.
Field
JK
,
Duffy
SW
,
Baldwin
DR
,
Whynes
DK
,
Devaraj
A
,
Brain
KE
, et al
UK Lung Cancer RCT Pilot Screening Trial: baseline findings from the screening arm provide evidence for the potential implementation of lung cancer screening
.
Thorax
2016
;
71
:
161
70
.
4.
Horeweg
N
,
Scholten
ET
,
de Jong
PA
,
van der Aalst
CM
,
Weenink
C
,
Lammers
JW
, et al
Detection of lung cancer through low-dose CT screening (NELSON): a prespecified analysis of screening test performance and interval cancers
.
Lancet Oncol
2014
;
15
:
1342
50
.
5.
Pedersen
JH
,
Ashraf
H
,
Dirksen
A
,
Bach
K
,
Hansen
H
,
Toennesen
P
, et al
The Danish randomized lung cancer CT screening trial–overall design and results of the prevalence round
.
J Thorac Oncol
2009
;
4
:
608
14
.
6.
Deppen
S
,
Putnam
JB
 Jr
,
Andrade
G
,
Speroff
T
,
Nesbitt
JC
,
Lambright
ES
, et al
Accuracy of FDG-PET to diagnose lung cancer in a region of endemic granulomatous disease
.
Ann Thorac Surg
2011
;
92
:
428
32
.
7.
Kuo
E
,
Bharat
A
,
Bontumasi
N
,
Sanchez
C
,
Zoole
JB
,
Patterson
GA
, et al
Impact of video-assisted thoracoscopic surgery on benign resections for solitary pulmonary nodules
.
Ann Thorac Surg
2012
;
93
:
266
72
.
8.
Deppen
SA
,
Blume
JD
,
Aldrich
MC
,
Fletcher
SA
,
Massion
PP
,
Walker
RC
, et al
Predicting lung cancer prior to surgical resection in patients with lung nodules
.
J Thorac Oncol
2014
;
9
:
1477
84
.
9.
Cho
H
,
Lee
HY
,
Kim
J
,
Kim
HK
,
Choi
JY
,
Um
SW
, et al
Pure ground glass nodular adenocarcinomas: are preoperative positron emission tomography/computed tomography and brain magnetic resonance imaging useful or necessary?
J Thorac Cardiovasc Surg
2015
;
150
:
514
20
.
10.
Swensen
SJ
,
Silverstein
MD
,
Ilstrup
DM
,
Schleck
CD
,
Edell
ES
. 
The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules
.
Arch Intern Med
1997
;
157
:
849
55
.
11.
Gould
MK
,
Ananth
L
,
Barnett
PG
. 
A clinical model to estimate the pretest probability of lung cancer in patients with solitary pulmonary nodules
.
Chest
2007
;
131
:
383
8
.
12.
Li
Y
,
Chen
KZ
,
Wang
J
. 
Development and validation of a clinical prediction model to estimate the probability of malignancy in solitary pulmonary nodules in Chinese people
.
Clin Lung Cancer
2011
;
12
:
313
9
.
13.
McWilliams
A
,
Tammemagi
MC
,
Mayo
JR
,
Roberts
H
,
Liu
G
,
Soghrati
K
, et al
Probability of cancer in pulmonary nodules detected on first screening CT
.
N Engl J Med
2013
;
369
:
910
9
.
14.
Callister
ME
,
Baldwin
DR
,
Akram
AR
,
Barnard
S
,
Cane
P
,
Draffan
J
, et al
British Thoracic Society guidelines for the investigation and management of pulmonary nodules
.
Thorax
2015
;
70
:
ii1
ii54
.
15.
Mac Mahon
H
,
Naidich
DP
,
Goo
JM
,
Lee
KS
,
Leung
ANC
,
Mayo
JR
, et al
Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017
.
Radiology
2017
;
284
:
228
43
.
16.
Ciompi
F
,
Chung
K
,
van Riel
SJ
,
Setio
AAA
,
Gerke
PK
,
Jacobs
C
, et al
Towards automatic pulmonary nodule management in lung cancer screening with deep learning
.
Sci Rep
2017
;
7
:
46479
.
17.
Li
W
,
Cao
P
,
Zhao
D
,
Wang
J
. 
Pulmonary nodule classification with deep convolutional neural networks on computed tomography images
.
Comput Math Methods Med
2016
;
2016
:
6215085
.
18.
Setio
AA
,
Ciompi
F
,
Litjens
G
,
Gerke
P
,
Jacobs
C
,
van Riel
SJ
, et al
Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks
.
IEEE Trans Med Imaging
2016
;
35
:
1160
9
.
19.
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
KG
. 
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement
.
BMJ
2015
;
350
:
g7594
.
20.
Chen
T
,
Guestrin
C
. 
XGBoost: a scalable tree boosting system
.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
.
San Francisco, California
:
Association for Computing Machinery
; 
2016
.
p.
785
94
.
21.
Tong
Z
,
Bin
YU
. 
Boosting with early stopping: convergence and consistency
.
Ann Stat
2005
;
33
:
1538
79
.
22.
Lundberg
SM
,
Lee
S-I
. 
A unified approach to interpreting model predictions
.
Proceedings of the 31st International Conference on Neural Information Processing Systems
.
Long Beach, California, USA
:
Curran Associates Inc.
; 
2017
.
p.
4768
77
.
23.
Pepe
M
,
Longton
G
,
Janes
H
. 
Estimation and comparison of receiver operating characteristic curves
.
Stata J
2009
;
9
:
1
.
24.
Fluss
R
,
Faraggi
D
,
Reiser
B
. 
Estimation of the Youden index and its associated cutoff point
.
Biom J
2005
;
47
:
458
72
.
25.
Steyerberg
E
,
Vickers
A
,
Cook
N
,
Gerds
T
,
Gonen
M
,
Obuchowski
N
, et al
Assessing the performance of prediction models. A framework for traditional and novel measures
.
Epidemiology
2010
;
21
:
128
38
.
26.
Vickers
AJ
,
Elkin
EB
. 
Decision curve analysis: a novel method for evaluating prediction models
.
Med Decis Making
2006
;
26
:
565
74
.
27.
Richardson
A
. 
Nonparametric statistics for non-statisticians: a step-by-step approach by Gregory W. Corder, Dale I. Foreman
.
Int Stat Rev
2010
;
78
:
451
2
.
28.
Rajkomar
A
,
Dean
J
,
Kohane
I
. 
Machine learning in medicine
.
N Engl J Med
2019
;
380
:
1347
58
.
29.
Kourou
K
,
Exarchos
TP
,
Exarchos
KP
,
Karamouzis
MV
,
Fotiadis
DI
. 
Machine learning applications in cancer prognosis and prediction
.
Comput Struct Biotechnol J
2015
;
13
:
8
17
.
30.
Ardila
D
,
Kiraly
AP
,
Bharadwaj
S
,
Choi
B
,
Reicher
JJ
,
Peng
L
, et al
End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography
.
Nat Med
2019
;
25
:
954
61
.
31.
Baldwin
DR
,
Gustafson
J
,
Pickup
L
,
Arteta
C
,
Novotny
P
,
Declerck
J
, et al
External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules
.
Thorax
2020
;
75
:
306
12
.
32.
Travis
WD
,
Brambilla
E
,
Noguchi
M
,
Nicholson
AG
,
Geisinger
KR
,
Yatabe
Y
, et al
International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society International Multidisciplinary Classification of Lung Adenocarcinoma
.
J Thorac Oncol
2011
;
6
:
244
85
.
33.
Yang
K
,
Liu
J
,
Tang
W
,
Zhang
H
,
Zhang
R
,
Gu
J
, et al
Identification of benign and malignant pulmonary nodules on chest CT using improved 3D U-Net deep learning framework
.
Eur J Radiol
2020
;
129
:
109013
.
34.
Liu
K
,
Li
Q
,
Ma
J
,
Zhou
Z
,
Sun
M
,
Deng
Y
, et al
Evaluating a fully automated pulmonary nodule detection approach and its impact on radiologist performance
.
Radiol Artif Intell
2019
;
1
:
e180084
.
35.
Fu
L
,
Alam
MS
,
Ren
Y
,
Guan
W
,
Wu
H
,
Wang
Q
, et al
Utility of maximum standard uptake value as a predictor for differentiating the invasiveness of T1 stage pulmonary adenocarcinoma
.
Clin Lung Cancer
2018
;
19
:
221
9
.
36.
Shao
X
,
Niu
R
,
Jiang
Z
,
Shao
X
,
Wang
Y
. 
Role of PET/CT in management of early lung adenocarcinoma
.
Am J Roentgenol
2020
;
214
:
437
45
.