Single-cell RNA sequencing (scRNA-seq) technologies have been broadly utilized to reveal molecular mechanisms of respiratory pathology and physiology at single-cell resolution. Here, we established single-cell meta-analysis (scMeta-analysis) by integrating data from eight public datasets, including 104 lung scRNA-seq samples with clinicopathologic information and designated a cigarette-smoking lung atlas. The atlas revealed early carcinogenesis events and defined the alterations of single-cell transcriptomics, cell population, and fundamental properties of biological pathways induced by smoking. In addition, we developed two novel scMeta-analysis methods: VARIED (Visualized Algorithms of Relationships In Expressional Diversity) and AGED (Aging-related Gene Expressional Differences). VARIED analysis revealed expressional diversity associated with smoking carcinogenesis. AGED analysis revealed differences in gene expression related to both aging and smoking status. The scMeta-analysis paves the way to utilize publicly-available scRNA-seq data and provide new insights into the effects of smoking and into cellular diversity in human lungs, at single-cell resolution.

Significance:

The atlas revealed early carcinogenesis events and defined the alterations of single-cell transcriptomics, cell population, and fundamental properties of biological pathways induced by smoking.

Smoking is the leading risk factor for early death, and its negative effects present individual and public health hazards (1, 2). Cigarette smoke is a mixture of thousands of chemical compounds generated from tobacco burning (3) that causes chronic airway inflammation, reactive oxygen species (ROS) production, and DNA damage. Specifically, it has been discovered that smoking injures the respiratory organs and cardiovascular system and causes carcinogenesis, chronic obstructive pulmonary disease, and atherosclerosis (4). In particular, the incidence of lung squamous carcinoma (LUSC) is significantly increased by cigarette smoking (5, 6).

Single-cell RNA sequencing (scRNA-seq) technologies have been broadly utilized to reveal the molecular mechanisms of respiratory diseases and physiology at single-cell resolution. scRNA-seq in human lungs identified novel cell populations and cellular diversity (7–13). However, there are several concerns regarding scRNA-seq analysis. One of these concerns is sample size, that is, that clinical scRNA-seq analyses could be biased because of insufficient sample sizes. A possible solution is meta-analysis of scRNA-seq data. The recently developed single-cell meta-analysis (scMeta-analysis) method has been considered a powerful tool for large-scale analysis of integrated single-cell cohorts. The scMeta-analysis shows robust statistical significance and the capacity to compare the results among different studies at the single-cell level. In fact, integrated scMeta-analysis of a number of cohorts has revealed a previously unappreciated diversity of cell types and gene expression; for example, scMeta-analysis of lung endothelial cells, including human and mouse datasets, revealed novel endothelial cell populations (14–17). In addition, comparative analysis of scRNA-seq cohorts revealed pan-cancer tumor-specific myeloid lineages (18).

In this study, we integrated eight publicly available datasets comprising 104 lung scRNA-seq samples and analyzed a total of 230,890 single cells to construct a cigarette-smoking lung atlas. The scMeta-analysis of the cigarette-smoking lung atlas defined single-cell gene expression according to smoking, age, and gender. In addition, we developed novel scMeta-analysis methods: VARIED (Visualized Algorithms of Relationships In Expressional Diversity) analysis and AGED (Aging-related Gene Expressional Differences) analysis with clinical metadata. VARIED analysis revealed the diversity of gene expression associated with cancer-related events in each cell population, and AGED analysis revealed the expressional differences in relation to both aging and smoking status.

scRNA-seq Data Collection from Public Databases

The scRNA-seq cohorts were downloaded from the public Gene Expression Omnibus (GEO) and European Genome-Phenome Archive (EGA) databases (Supplementary Table S1). We collected scRNA-seq samples of human lungs for which smoking status information was available. From physiologic studies of the lung airway, all 10 never-smoker samples were extracted from the EGA00001004082 dataset (19), and one never-smoker and three smoker samples were extracted from the GSE130148 dataset (13). From idiopathic pulmonary fibrosis studies, five never-smoker and three smoker samples were extracted from a total of 17 samples in the GSE122960 dataset (20), one never-smoker and seven smoker samples were extracted from a total of 34 samples in the GSE135893 dataset (12), and 22 never-smoker and 23 smoker samples were extracted from a total of 78 samples in the GSE136831 dataset (11). From studies of lung disease in smokers, three never-smoker and three smoker samples were extracted from the GSE123405 dataset (21), and three never-smoker and nine smoker samples were extracted from the GSE173896 dataset (22). From lung cancer studies, four never-smoker and seven smoker samples were extracted from a total of 58 samples in the GSE131907 dataset (23). A total of 104 samples (never-smoker: 49, smoker: 56) were collected, and the details of the extracted samples are shown in Supplementary Table S2. These datasets were imported into R software version 4.2.0. (RRID:SCR_001905) and transformed into Seurat objects with the package Seurat version 4.3.0 (ref. 24; RRID:SCR_016341). The Seurat objects from the different datasets were then integrated in R.

Integration of Datasets, Data Quality Control, and Removal of Batch Effects

The integrated atlas of eight pubicly datasets was subjected to normalization, scaling, and principal component analysis with Seurat functions. Removal of low-quality cells was performed against the merged dataset before batch effect removal according to the following criteria (nFeature_RNA > 1,000 and percent.mt < 20). The expression counts of each sample were normalized by “SCTransform” method version 0.3.5. (RRID:SCR_022146; ref. 25). Doublet cells in the integrated atlas were removed by “DoubletFinder” method version 2.0.3. (refs. 26, 27; RRID:SCR_018771). To remove the batch effect between cohort studies, Harmony version 0.1.1. algorithms (RRID:SCR_022206) were applied to the integrated atlas (28, 29) following the instructions in the Quick start vignettes (https://portals.broadinstitute.org/harmony/articles/quickstart.html). cLISI score and iLISI score were calculated by “lisi” package (28) using cell annotation metadata or dataset metadata.

Cell Type Annotation and Cell-cycle Scoring

Clustering of neighboring cells was performed by the functions “FindNeighbors” and “FindClusters” from Seurat using Harmony reduction. First, the clusters were grouped on the basis of the expression of tissue compartment markers (e.g., EPCAM for epithelia, CLDN5 for endothelia, COL1A2 for fibroblasts, and PTPRC for immune cells) and then annotated in detail according to “A molecular cell atlas of the human lung” (7). Cell-cycle analysis was performed with the “CellCycleScoring” function of Seurat.

VARIED Analysis

To evaluate the expressional heterogeneity in the cell populations, we calculated the correlation coefficients for each cell population between smokers and never-smokers. In each cluster, normalized closeness centrality was calculated in R using ggraph version 2.1.0. (RRID:SCR_021239) and igraph version 1.3.5. packages (RRID:SCR_019225), as described previously (22, 30, 31). We calculated the differences of median of normalized closeness centrality between smokers and never-smokers in each cluster. Extraction of differentially expressional genes (DEG) between smoker basal-differentiated (basal-d) cells and never-smoker basal-d cells was performed by “FindAllMarkers” functions in Seurat where r is the absolute value of Pearson correlational coefficient and n is the number of cells in the cluster.

formula

Gene Set Variation Analysis Scoring and Survival Analysis

Gene set variation analysis (GSVA) was performed using RNA-seq dataset of The Cancer Genome Atlas (TCGA) LUSC cohorts by “GSVA” package version 1.44.5. (RRID:SCR_021058) in R (32). RNA-seq dataset and clinical information of the patients with LUSC were downloaded from TCGA Data Portal (ref. 33; RRID:SCR_014514). The signature genes of basal-d smoker clusters showed in Supplementary Tables S3 and S4. Scoring method by “gsva” algorithms was utilized for calculation of enrichment score in the patients with lung squamous cancer. We subjected clinical status and gene expression data to survival analysis using “survminer” version 0.4.9. (RRID:SCR_021094) and “survival” packages version 3.4.0 (RRID:SCR_021137) in R. TCGA dataset is available at https://portal.gdc.cancer.gov/.

Module Analysis

Module analysis was performed by the “AddModuleScore” function in Seurat using the gene lists from MSigDB (https://www.gsea-msigdb.org/gsea/msigdb/, RRID:SCR_016863). The epithelial-to-mesenchymal transition (EMT) module (HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION), heme metabolism module (HALLMARK_HEME_METABOLISM), ROS module (HOUSTIS_ROS), autophagy module (REACTOME_AUTOPHAGY), IFN signaling module (REACTOME_INTERFERON_SIGNALING), senescence module (REACTOME_CELLULAR_SENESCENCE), circadian module (REACTOME_CIRCADIAN_CLOCK), mitophagy module (REACTOME_MITOPHAGY), pyroptosis module (REACTOME_PYROPTOSIS), and ferroptosis module (WP_FERROPTOSIS) were subjected to module analysis in each cell population.

Pathway Enrichment Analyses and Ingenuity Pathway Analysis

We performed enrichment analysis against the marker gene list in each cluster between male and female smokers by the “ClusterProfiler” version 4.4.4. (RRID:SCR_016884; ref. 34) and “ReactomePA” version 1.40.10. (RRID:SCR_019316; ref. 35) packages in R. Gene symbols were converted to ENTREZ IDs using the “org.Hs.eg.db” package version 3.10.0. Pathway datasets were downloaded from the Reactome database (RRID:SCR_003485). Pathway enrichment analysis using the “enrichPathway” function was performed by the Benjamini-Hochberg (BH) method. Marker genes of the basal-d cluster in smokers and never-smokers were calculated by “FindMarkers” with the MAST method (RRID:SCR_016340; ref. 36). Enrichment analysis of basal-d was performed using QIAGEN Ingenuity Pathway Analysis (IPA) software (RRID:SCR_008653).

AGED Analysis

We calculated the average expression of all cells in each cluster in both smokers and never-smokers and performed regression analysis in correlation with gene expression and patient age by R. Next, we calculated the differences in slopes (delta) in smokers and never-smokers via regression analysis. We extracted the genes with the differences in slopes (Δ) value over 0.5 in three or more types of cell cluster. Extracted genes were shown in a heat map.

Data Visualization

The dimensionality-reduced cell clustering is shown as a Uniform Manifold Approximation and Projection (UMAP) plot by the function “runUMAP”. Heat maps were drawn by Morpheus from the Broad Institute (RRID:SCR_014975). A ridge plot was drawn using the “ggridges” version 0.5.4. package (RRID:SCR_024511) in R. Violin plots were drawn using the “ggplot2” version 3.4.0. package (RRID:SCR_014601) in R.

Statistical Analysis

Correlation coefficients were calculated by Spearman correlation in R. Welch t test or Tukey or Dunnett multiple comparison test was used for comparison of the datasets. log-rank test was used for survival analysis in R. Significance was defined as P < 0.05.

Data Availability Statement

scMeta-analysis data were available to the NCBI GEO database and EGA database. Detailed information is shown in Supplementary Table S1. The datasets GSE122960, GSE123405, GSE130148, GSE131907, GSE135893, GSE136831, and GSE173896 are available in the NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/). The EGAS00001004082 dataset is available in the EGA database (https://ega-archive.org/). The source code of scMeta-analysis is available on GitHub (https://github.com/JunNakayama/scMeta-analysis-of-cigarette-smoking). The integrated atlas is available in FigShare (https://figshare.com/projects/Single-cell_meta-analysis_of_cigarette_smoking_lung_atlas/175497). The rest of the data in the study are available from the corresponding authors.

Establishment of Integrated Single-cell Lung Atlas with Cigarette Smoking

According to scRNA-seq collection criteria, we chose eight publicly available datasets of lung scRNA-seq data to construct a cigarette-smoking lung atlas (Fig. 1A). To this end, we collected data from 374,658 single cells from 104 scRNA-seq samples (smoker: 55 samples, never-smoker: 49 samples; Fig. 1A). Integrated single-cell transcriptome data were linked with clinical metadata such as smoking status, age, gender, and race (Supplementary Tables S1 and S2; Supplementary Fig. S1A). In the process of quality control with Seurat in R, 143,768 low-quality single cells (nFeatures < 103 and mt.percent > 20%) were removed. Doublet cells were computationally removed by “DoubletFinder” algorithm (26). The single cells were normalized by SCTransform method in Seurat (25). Integration of the eight datasets was performed by the Harmony algorithm with the smoking status of scRNA-seq samples (ref. 28; Fig. 1B). To confirm that the integration of the eight datasets reduced bias, we showed the atlas marked with the datasets (Supplementary Fig. S1B). Median of cLISI scores is 1.00, median of iLISI score is 2.21, suggesting that the datasets are well mixed by Harmony (Supplementary Fig. S1C). The normalized dataset was integrated for removal of batch effect by Harmony algorisms (Supplementary Fig. S1B). The cigarette-smoking lung atlas is composed of a total of 230,890 single cells. The density plot showed that the majority of single cells in the atlas were immune cells and epithelial cells (Supplementary Fig. S1D). UMAP plots with cell type–specific markers (PTPRC as an immune marker, EPCAM as an epithelial marker, CLDN5 as an endothelial marker, and COL1A2 as a fibroblast marker) showed an obvious segregation of immune, epithelial, endothelial, and fibroblastic lineages (Fig. 1C and D). There were 118,364 single cells in the smoker group and 112,526 single cells in the never-smoker group (Supplementary Fig. S2A). Comparison of the atlases by smoking status revealed that most of the cell populations in the UMAP plot overlapped; however, parts of epithelial clusters were specific to the never-smoker group (Supplementary Fig. S2A). All major clusters seemed to overlap among the eight datasets (Supplementary Fig. S2B), although the populations of cells were different in each dataset (Fig. 1E). This difference in cell populations could be caused by differences in tissue collection and cell isolation processes.

FIGURE 1

Establishment of the integrated lung atlas with cigarette smoking status from eight scRNA-seq cohorts. A, Overview of the establishment of the integrated lung atlas with cigarette smoking status. The control lung scRNA-seq data from eight publicly available datasets were obtained and integrated with smoking status information. B, Flow diagram of the establishment of the integrated scRNA-seq lung atlas with cigarette smoking. Eight publicly available scRNA-seq datasets were downloaded and combined in Seurat. Doublet cells were removed by DoubletFinder. The datasets were normalized by SCTramsform, integrated by Harmony to adjust for batch effects. C, Representative marker expression patterns for the cell type clusters shown in the UMAP plot. D, A UMAP plot displaying 230,890 single human lung cells of 55 smokers and 49 never-smokers. Each dot represents a single cell, and cell clusters are classified as immune cells, epithelial cells, endothelial cells, and fibroblasts. E, Cell populations of immune cell, epithelial cell, endothelial cell, and fibroblast clusters across the 104 samples in the integrated atlas. Smokers, 55 cases; never-smokers, 49 cases.

FIGURE 1

Establishment of the integrated lung atlas with cigarette smoking status from eight scRNA-seq cohorts. A, Overview of the establishment of the integrated lung atlas with cigarette smoking status. The control lung scRNA-seq data from eight publicly available datasets were obtained and integrated with smoking status information. B, Flow diagram of the establishment of the integrated scRNA-seq lung atlas with cigarette smoking. Eight publicly available scRNA-seq datasets were downloaded and combined in Seurat. Doublet cells were removed by DoubletFinder. The datasets were normalized by SCTramsform, integrated by Harmony to adjust for batch effects. C, Representative marker expression patterns for the cell type clusters shown in the UMAP plot. D, A UMAP plot displaying 230,890 single human lung cells of 55 smokers and 49 never-smokers. Each dot represents a single cell, and cell clusters are classified as immune cells, epithelial cells, endothelial cells, and fibroblasts. E, Cell populations of immune cell, epithelial cell, endothelial cell, and fibroblast clusters across the 104 samples in the integrated atlas. Smokers, 55 cases; never-smokers, 49 cases.

Close modal

For detailed reannotations with all cell types in the integrated atlas, we first identified the cell types present within the atlas according to the lung cell markers in the human lung scRNA-seq atlas (ref. 7; Fig. 2; Supplementary Fig. S3). To investigate the cell types in further detail, we extracted subsets of “epithelia” (Fig. 2A), “fibroblasts” (Fig. 2B), “endothelia” (Fig. 2C), “lymphoids” (Fig. 2D), and “myeloids” (Fig. 2E) repeated the UMAP procedure with each subset, which comprised 39 subpopulations in total. There were 13 epithelial cell types (smoker: 24,084 cells, never-smoker: 53,754 cells; Supplementary Fig. S4), seven fibroblastic cell types (smoker: 3,081 cells, never-smoker: 1,592 cells; Supplementary Fig. S5), five endothelial cell types (smoker: 7,600 cells, never-smoker: 3,783 cells; Supplementary Fig. S6), six lymphoid cell types (smoker: 25,331 cells, never-smoker: 10,929 cells; Supplementary Fig. S7), and myeloid cell types (smoker: 55,398 cells, never-smoker: 40,824 cells; Supplementary Fig. S8).

FIGURE 2

Cell-type classification of the integrated lung atlas. UMAP plots for each cell type cluster. The UMAP plot of the cigarette-smoking lung atlas was divided into five UMAP plots based on the cell-type clusters. Heat maps of selected marker genes in each cell type cluster. Each cluster was defined according to marker expression profiles. A, Epithelia: 13 clusters. B, Fibroblasts: 7 clusters. C, Endothelial cells: 5 clusters. D, Lymphoid cells: 6 clusters. E, Myeloid cells: nine clusters. F, Relative cell number plots between smokers and never-smokers in 39 cell types in the integrated atlas. Welch t test; *, P < 0.05. Blue line: epithelial cell types, pink line: fibroblastic cell types, light blue line: endothelial cell types, light green line: lymphoid cell types, and green line: myeloid cell types.

FIGURE 2

Cell-type classification of the integrated lung atlas. UMAP plots for each cell type cluster. The UMAP plot of the cigarette-smoking lung atlas was divided into five UMAP plots based on the cell-type clusters. Heat maps of selected marker genes in each cell type cluster. Each cluster was defined according to marker expression profiles. A, Epithelia: 13 clusters. B, Fibroblasts: 7 clusters. C, Endothelial cells: 5 clusters. D, Lymphoid cells: 6 clusters. E, Myeloid cells: nine clusters. F, Relative cell number plots between smokers and never-smokers in 39 cell types in the integrated atlas. Welch t test; *, P < 0.05. Blue line: epithelial cell types, pink line: fibroblastic cell types, light blue line: endothelial cell types, light green line: lymphoid cell types, and green line: myeloid cell types.

Close modal

Cigarette smoking is known to induce alterations in cell populations in the lungs. For example, the number of basal-linage cells decreased (37), and the number of basophils increased (38) in smoking lungs. The atlas showed differences in the numbers of 39 cell subpopulations by smoking status (Fig. 2F). Evidently, the cell numbers of basal, basal-d, ionocyte, and proliferating epithelia clusters significantly decreased. Previous bulk studies have reported that the number of bronchial epithelial cells is altered by smoking (9, 37, 39). Consistent with these reports, our data confirmed that smoking had a devastating effect on epithelial cells in the bronchus and bronchiole. The integrated atlas confirmed the increase in basophil cell number with smoking. We also examined the cell cycle in each cell cluster. The cell cycle indices in each subpopulation were not obviously changed between the smoking and never-smoking groups (Supplementary Fig. S9).

VARIED Analysis Visualized Variations in Epithelial Populations and Basophils by Smoking Status

Cigarette smoking is the highest risk factor for carcinogenesis of squamous carcinoma in the bronchia and trachea of the lung (2, 5). To comprehensively understand the effects of smoking in the lung, we developed VARIED analysis to quantify the alteration in gene expressional diversity. VARIED analysis is based on the network centrality of a correlational network with graph theory in each single cell. Previously, we reported that the topology of cell-cell correlational network shows the alteration of cell population (22, 31). In this study, to identify the gene expressional alteration by smoking, we calculated the differences of closeness centrality between smoker and never-smoker. When the random sampling number of the cells is over 100 cells, the medians of VARIED converged to a certain value (Supplementary Fig. S10). Therefore, VARIED analysis needed over 100 cells to calculate the robust centralities. In this study, because all clusters have over 100 cells, we subjected VARIED analysis into the clusters of smoker and never-smoker. The differences in the centrality between smokers and never-smokers represent the alteration of gene expressional diversity in each cell cluster (Fig. 3A). VARIED analysis revealed greater diversity in epithelial clusters, suggesting that cigarette smoking primarily perturbed epithelial populations, particularly in the bronchia and trachea (Fig. 3B and C). These data are consistent with the fact that epithelial cells, located at the bronchia, are considered to be the origin of LUSC (40). Interestingly, the diversity in basophils was also remarkably altered by cigarette smoking. Basophils are known to be activated as a protective immunity against helminths and ticks by expression of cytokines and Immunoglobins (41). Basophils in the smokers expressed JUND, FOSB, IGHA1, IGHG1, IGHG3, FCGR3A, and S100A8 (Fig. 3D), suggesting their activation and IgG production in smoker lungs.

FIGURE 3

VARIED analysis for cellular variations by smoking status. A, Schematic of VARIED analysis for quantifying the alterations in gene expressional diversity between smokers and never-smokers. In each single cell from scRNA-seq, the closeness centrality was calculated in the cell types between smokers and never-smokers. A part of the figure was created with BioRender.com. B, Plot of absolute values of difference in centrality in each cell type cluster. Blue: epithelia, purple: immune cells, red: endothelia, and green: fibroblasts. C, Representative ridge plots for the closeness centrality between smokers and never-smokers. Welch t test. D, Expression profiles of activation marker gene in the basophil clusters. Box plots of JUND, FOSB, IGHA1, IGHG1, IGHG3, FCGR3A, S100A8, and TPSB2 between smokers and never-smokers in the basophil cluster. Welch t test.

FIGURE 3

VARIED analysis for cellular variations by smoking status. A, Schematic of VARIED analysis for quantifying the alterations in gene expressional diversity between smokers and never-smokers. In each single cell from scRNA-seq, the closeness centrality was calculated in the cell types between smokers and never-smokers. A part of the figure was created with BioRender.com. B, Plot of absolute values of difference in centrality in each cell type cluster. Blue: epithelia, purple: immune cells, red: endothelia, and green: fibroblasts. C, Representative ridge plots for the closeness centrality between smokers and never-smokers. Welch t test. D, Expression profiles of activation marker gene in the basophil clusters. Box plots of JUND, FOSB, IGHA1, IGHG1, IGHG3, FCGR3A, S100A8, and TPSB2 between smokers and never-smokers in the basophil cluster. Welch t test.

Close modal

To examine the molecular basis for diversity in gene expression, we extracted differentially expressed genes (DEG) in the basal-d cluster between smokers and never-smokers, focusing on basal-d because this cluster was the most influenced by cigarette smoking (Fig. 4; Supplementary Table S3). Enrichment analysis of the DEGs revealed that protein production–related signal, mitochondrial dysfunction pathways were significantly enriched in the smoker basal-d cluster (Fig. 4A and B; Supplementary Table S5). Our data indicate that smoking adversely affects bronchial epithelial cells and alters gene expressional diversity in carcinogenesis. The basal-d cluster in smoker significantly highly expressed ATF3, FOS, and JUN (Fig. 4C). The VARIED analysis suggested the early oncogenic events in bronchial and tracheal epithelial cells. In addition, the GSVA score using the signature genes of smoker basal-d cluster correlated to poor prognosis in LUSC cohorts in TCGA (Fig. 4D; Supplementary Table S5).

FIGURE 4

Cancer-related alteration in the basal-d cluster. Expression profiles of DEGs in the basal-d clusters between smokers and never-smokers. A, Gene and pathway networks of marker genes for the basal-d cluster. The network plot was generated by IPA. B, Enrichment analysis of marker genes for the basal-d cluster. Significantly enriched pathways are shown based on IPA data. C, Marker expression patterns in the smoker basal-d clusters. Violin plots of KRT5, ATF3, FOS, and JUN between smokers and never-smokers. Welch t test D. Survival analysis of LUSC cohorts in TCGA with basal-d smoker signature genes. GSVA scores using smoking basal-d signature genes were calculated by GAVA algorithm. log-rank test.

FIGURE 4

Cancer-related alteration in the basal-d cluster. Expression profiles of DEGs in the basal-d clusters between smokers and never-smokers. A, Gene and pathway networks of marker genes for the basal-d cluster. The network plot was generated by IPA. B, Enrichment analysis of marker genes for the basal-d cluster. Significantly enriched pathways are shown based on IPA data. C, Marker expression patterns in the smoker basal-d clusters. Violin plots of KRT5, ATF3, FOS, and JUN between smokers and never-smokers. Welch t test D. Survival analysis of LUSC cohorts in TCGA with basal-d smoker signature genes. GSVA scores using smoking basal-d signature genes were calculated by GAVA algorithm. log-rank test.

Close modal

Cigarette smoking affected genome-wide association study–related genes in LUSC

As the cigarette-smoking lung atlas provided high-resolution expression data in 39 cell types, we explored gene expression profiles from a genome-wide association study (GWAS) of LUSC with smoking (42). To identify the expressional patterns and the broad contributions of different lung cell types to squamous carcinoma susceptibility, the expression levels of an average of 92 GWAS genes were examined in all lung cell types (Supplementary Fig. S11A). High expression of squamous carcinoma GWAS genes was observed in the specific clusters, and cigarette smoking affected the expression of GWAS-related genes in some clusters. In particular, the expression of MUC1 was increased in the smoker epithelial clusters, and the expression of HLA-A was increased in the smoker myeloid clusters (Supplementary Fig. S11B). Mutated MUC1 has oncogenic roles in carcinogenesis in the human lung (43, 44). Truncating mutations in HLA-A carry a risk of dysregulation of cancer-related pathways (45).

Cancer-associated alterations induced by smoking

Next, we performed module analysis with cancer-related gene sets, such as senescence, ROS production, IFN signaling, heme metabolism, and EMT genes. The module analysis depicted the alteration of cancer-related events by smoking in each cluster (Fig. 5A). Several modules were drastically altered between the smoker and never-smoker groups, such as IFN signaling in endothelial and myeloid clusters; EMT in epithelial, fibroblastic, and endothelial clusters; and mitophagy in lymphoid and myeloid clusters. Because increased expression of EMT module genes in endothelial clusters was observed, we examined the expression of endothelial-to-mesenchymal transition (EndMT) marker genes (FN1, POSTN, VIM; refs. 17, 46). These EndMT markers were significantly upregulated, suggesting that smoking induced EndMT in some endothelial clusters (Fig. 5B).

FIGURE 5

Module analysis of cancer-related pathways in all cell types between smokers and never-smokers. A, A heat map of module analysis between smokers and never-smokers across the cell types. The median module score was calculated for each cell type. The senescence, ROS, pyroptosis, mitophagy, IFN signaling, heme metabolism, ferroptosis, EMT circadian clock, and autophagy modules are shown in the heat map. B, Violin plots for analysis of selected modules: EMT, IFN signaling, and heme metabolism. Module analysis for selected cell types is shown. “n” represents the cell number in each cluster. Welch t test, P < 0.001.

FIGURE 5

Module analysis of cancer-related pathways in all cell types between smokers and never-smokers. A, A heat map of module analysis between smokers and never-smokers across the cell types. The median module score was calculated for each cell type. The senescence, ROS, pyroptosis, mitophagy, IFN signaling, heme metabolism, ferroptosis, EMT circadian clock, and autophagy modules are shown in the heat map. B, Violin plots for analysis of selected modules: EMT, IFN signaling, and heme metabolism. Module analysis for selected cell types is shown. “n” represents the cell number in each cluster. Welch t test, P < 0.001.

Close modal

The module scores of ROS signaling significantly increased in the endothelial clusters. ROS signaling in endothelia induced inflammatory response and endothelial dysfunction (47, 48). In addition, myeloid cells enhanced IFN signaling in smoking lungs (Fig. 5B, bottom left). Autophagy in immune cells is important for cellular immunity, differentiation, and survival (49). Autophagy modules increased in immune cells and fibroblastic cells. Finally, increased senescence module scores were broadly observed across most cell types (Supplementary Fig. S12), suggesting that smoking induced aging in the lung. The module analysis of the cigarette-smoking lung atlas evidently indicated what cell types were influenced by smoking and how smoking affected these cells in the lung.

Aging-related Gene Expression in the Integrated Atlas with Cigarette Smoking

As the majority of the samples in the atlas had patient age information, we aimed to identify aging-related genes associated with cigarette smoking (Fig. 6A). We developed AGED analysis based on regression analysis with single-cell transcriptome data. Briefly, by using regression analysis with age and gene expression in the smoker and never-smoker groups, we calculated the differences in slopes (Δ) for all genes in 39 cell clusters (Fig. 6B; Supplementary Table S6). For selected genes that were obviously changed with advancing age between the smoker and never-smoker groups, the Δ values were plotted as AGED results in a heat map (Fig. 6C). These data showed that the lung surfactant proteins SFTPC and SFTPB decreased in several epithelial clusters with advancing age in the smoker (Fig. 6C and D, left). These lung surfactant proteins maintain the activation of alveolar macrophages and promote recovery from injuries induced by smoking (50). In addition, secretoglobins (SCGB3A1, SCGB3A2, and SCGB1A1) were also decreased in secretory goblet cells and serous cells with advancing age in the smokers. MALAT1 is a well-known long noncoding RNA in lung cancer, and its expression contributes to malignancy (51, 52). AGED analysis showed that MALAT1 expression increased in most cell types with advancing age in smokers (Fig. 6C and E), suggesting that the oncogenic risk associated with MALAT1 increased with age. From the module analysis, heme metabolism was dysregulated in the myeloid cells of smoker lung (Fig. 5A). High expression of TMSB4X is contributed to poor prognosis and predicts the metastasis in lung carcinoma (52). TMSB4X increased in most cell types of immune clusters in the smoker lung (Fig. 6D, right). Finally, the expression level of FTL was significantly altered with advancing age in the smokers (Fig. 6C). In the cMonocyte clusters, FTL significantly decreased with smoking and aging. Collectively, the AGED analysis revealed changes in aging-related gene expression with smoking in each cell cluster.

FIGURE 6

AGED analysis to identify genes related to advancing age and smoking. A, Age distributions of the smoker and never-smoker groups. B, Schematic of AGED analysis for determining gene alterations with advancing age in smokers. On the basis of regression analysis of single-cell transcriptome data with age, the differences in slopes (Δ) between the smoker and never-smoker groups in 39 cell clusters were calculated for all genes. The Δ values for selected genes were plotted in a heat map. C, Heat map of AGED analysis results. P value < 0.05: * significant in smokers and never-smokers, † significant in smokers, “•” significant in never-smokers. D, Representative correlation plots of SFTPB in basal-d cluster, SFTPC in AT2 cluster, FTL in cMonocyte cluster, and TMSB4X in CD68+Macrophage cluster. E, Representative plots of MALAT1 expression in AT2 and basal-d clusters.

FIGURE 6

AGED analysis to identify genes related to advancing age and smoking. A, Age distributions of the smoker and never-smoker groups. B, Schematic of AGED analysis for determining gene alterations with advancing age in smokers. On the basis of regression analysis of single-cell transcriptome data with age, the differences in slopes (Δ) between the smoker and never-smoker groups in 39 cell clusters were calculated for all genes. The Δ values for selected genes were plotted in a heat map. C, Heat map of AGED analysis results. P value < 0.05: * significant in smokers and never-smokers, † significant in smokers, “•” significant in never-smokers. D, Representative correlation plots of SFTPB in basal-d cluster, SFTPC in AT2 cluster, FTL in cMonocyte cluster, and TMSB4X in CD68+Macrophage cluster. E, Representative plots of MALAT1 expression in AT2 and basal-d clusters.

Close modal

In this study, we presented a human cigarette-smoking lung atlas, generated via the meta-analysis of 104 samples from eight public scRNA-seq datasets. Our integrated smoking atlas confirmed the alteration of gene expression in the lung at single-cell resolution and identified the early oncogenic events induced by cigarette smoking. In addition, the novel VARIED and AGED analyses revealed cell type and gene expressional diversity with smoking and age.

One of the significant contributions of this study is that the scMeta-analysis of integrated atlas identified expressional diversity in the early phase of lung squamous carcinoma at the single-cell level. In fact, expression analysis following VARIED revealed early oncogenic signaling in epithelial cells, expression changes in GWAS-related genes, and gender-dependent alterations in the smoking lung. DEGs analysis of basal-d clusters indicated that survival AKT-mTOR signaling, mitochondrial dysregulation, and sirtuin signaling pathways were altered in bronchial basal-d cells by smoking (Fig. 4B; Supplementary Table S5). In previous studies of the effects of smoking, genetic mutations in oncogenes and tumor suppressor genes were discovered (53–55). Bronchial epithelial cells from smokers have mutations in TP53, NOTCH1, FAT1, CHEK2, PTEN, ARID1A, and other genes (53). Mutations in PTEN contribute to the activation of AKT-mTOR signaling (56). FAT1 controls mitochondrial functions (57), and its mutations induce the dysregulation of mitochondria. In addition, cigarette smoking promotes lung carcinogenesis by IKKβ- and JNK-dependent inflammation (58). DEGs analysis of basal-d clusters indicated that ATF3, JUN, and FOS expression levels were increased in the smoker basal-d cluster (Supplementary Table S3). High expression of ATF3 expression contributes to tumor malignancy in lung cancer (59). Enrichment network analysis of basal-d DEGs showed that angiogenesis signaling, such as VEGFA and FGF2, is upregulated by smoking. Angiogenesis is an important event for tumor growth and remodeling of tumor microenvironment (60, 61). Our results suggested that smoking induces the remodeling of tumor microenvironment in the early phase of oncogenesis. Our module analysis showed enhancement of inflammatory signaling in the myeloid, fibroblastic, endothelia, and epithelial clusters. The integrated atlas confirmed the signaling related to genetic mutations induced by smoking.

AGED analysis showed that MALAT1 expression increased in most cell types with advancing age in smokers. In the epithelial cells, the oncogenic function of MALAT1 is well known (51, 62); however, the function in stromal cells and immune cells remains unclear. Our results suggested that the oncogenic risk associated with MALAT1 is not only in the epithelial cells but also in the immune cells and stromal cells.

The first scMeta-analysis was performed to investigate severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-related genes by The Human Cell Atlas Lung Biological Network (14). Furthermore, scMeta-analyses were reported for endothelial cells in the human and mouse lung (15) and liver-specific immune cells (16), which revealed the alteration of cell populations and expressional heterogeneity with single-cell resolution. In addition, the study of pan-cancer scRNA-seq cohorts revealed heterogeneity in tumor-infiltrating myeloid cell composition and the functions of cancer-specific myeloid cells (18). scMeta-analysis is a powerful tool and strategy to overcome the problem of sample bias in small clinical cohorts. In addition, our integrated atlas enabled us to perform single-cell analysis linked with clinical information in meta-cohorts such as AGED analysis, which identified aging-related gene expression with single-cell resolution. Furthermore, it revealed correlations in the alterations of gene expression associated with smoking and aging. Furthermore, scMeta-analyses incorporating additional clinical information will be helpful for understanding homeostasis and diseases.

Our study has limitations. First, differences in the tissue sampling and single-cell isolation methods generated bias in the cell populations used in this study. Generally, epithelial cells and immune cells tend to dominate in the scRNA-seq study. This bias could not be completely removed by computational normalization. In fact, our integrated atlas showed the differences in cell subpopulations in each dataset (Supplementary Fig. S2B). Therefore, this bias has the potential to influence the scRNA-seq methods, VARIED analysis and AGED analysis. As our results showed in Supplementary Fig. S10, VARIED analysis needs the large number of single cells in each cluster. In addition, there is a poor study for the molecular biology of basal-d cells in the lung. The omics data for bulk study and purification method of basal-d cells are not established. We consider that the further evaluation is needed using other scRNA-seq meta-dataset and bulk study. Next, clinical information such as smoking status, gender, and age depended on the collection in the primary studies. The atlas has only a simple classification: smoker or never-smoker; we could not consider detailed smoking information such as the amount of smoking, years of smoking, and Brinkman index (Supplementary Tables S1 and S2). In addition, patient age was significantly different between the smoker and never-smoker populations (Fig. 6A). Moreover, clinical information such as age and gender were not available for all datasets. In the future, it will be necessary to expand the integrated atlas following the publication of new appropriate datasets for a more robust analysis.

The integrated atlas presented herein contributed to the characterization of the alterations caused by cigarette smoking that are related to carcinogenesis of LUSC. However, lung cancer also develops in never-smokers, in whom lung adenocarcinoma is predominant (5, 6). scMeta-analysis focused on lung adenocarcinoma in different clinical status has the potential to reveal the nature of genetic carcinogenesis. As a future study, the integration of scRNA-seq data from normal lungs (never-smokers) and lung adenocarcinoma could be a feasible approach to discover the mechanism of carcinogenesis and elucidate the cellular diversity in lung adenocarcinoma. In addition, clinical scRNA-seq and scMeta-analysis will be powerful tools in combination with data from pan-cancer multiomics analyses, such as those in TCGA (33, 63, 64). Therefore, the integration of scMeta-analysis data with clinical and omics data paves the way for an in-depth understanding of the nature of cancer.

No disclosures were reported.

J. Nakayama: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, visualization, methodology, writing-original draft, project administration, writing-review and editing. Y. Yamamoto: Conceptualization, data curation, supervision, visualization.

We are grateful to all members of the lab for stimulating discussions during the preparation of this article. Funding: This work was supported by Project for MEXT KAKENHI (Y. Yamamoto: Grant-in-Aid for Scientific Research, grant number: 21H02721), MEXT KAKENHI (J. Nakayama: Grant-in-Aid for JSPS Fellows: 20J01794, Grant-in-Aid for Early-Carrier Scientists: 21K15562), GSK Japan Research Grant (Y. Yamamoto and J. Nakayama), and Tokyo Biochemical Research Foundation Research Grant (J. Nakayama).

Note: Supplementary data for this article are available at Cancer Research Communications Online (https://aacrjournals.org/cancerrescommun/).

1.
GBD2015Tobacco-Collaborators
.
Smoking prevalence and attributable disease burden in 195 countries and territories, 1990–2015: a systematic analysis from the Global Burden of Disease Study 2015
.
Lancet
2017
;
389
:
1885
906
.
2.
Stämpfli
MR
,
Anderson
GP
.
How cigarette smoke skews immune responses to promote infection, lung disease and cancer
.
Nat Rev Immunol
2009
;
9
:
377
84
.
3.
Lee
J
,
Taneja
V
,
Vassallo
R
.
Cigarette smoking and inflammation: cellular and molecular mechanisms
.
J Dent Res
2012
;
91
:
142
9
.
4.
Yanbaeva
DG
,
Dentener
MA
,
Creutzberg
EC
,
Wesseling
G
,
Wouters
EF
.
Systemic effects of smoking
.
Chest
2007
;
131
:
1557
66
.
5.
Sun
S
,
Schiller
JH
,
Gazdar
AF
.
Lung cancer in never smokers–a different disease
.
Nat Rev Cancer
2007
;
7
:
778
90
.
6.
Pikor
LA
,
Ramnarine
VR
,
Lam
S
,
Lam
WL
.
Genetic alterations defining NSCLC subtypes and their therapeutic implications
.
Lung Cancer
2013
;
82
:
179
89
.
7.
Travaglini
KJ
,
Nabhan
AN
,
Penland
L
,
Sinha
R
,
Gillich
A
,
Sit
RV
, et al
.
A molecular cell atlas of the human lung from single-cell RNA sequencing
.
Nature
2020
;
587
:
619
25
.
8.
Goldfarbmuren
KC
,
Jackson
ND
,
Sajuthi
SP
,
Dyjack
N
,
Li
KS
,
Rios
CL
, et al
.
Dissecting the cellular specificity of smoking effects and reconstructing lineages in the human airway epithelium
.
Nat Commun
2020
;
11
:
2485
.
9.
Duclos
GE
,
Teixeira
VH
,
Autissier
P
,
Gesthalter
YB
,
Reinders-Luinge
MA
,
Terrano
R
, et al
.
Characterizing smoking-induced transcriptional heterogeneity in the human bronchial epithelium at single-cell resolution
.
Sci Adv
2019
;
5
:
eaaw3413
.
10.
Plasschaert
LW
,
Žilionis
R
,
Choo-Wing
R
,
Savova
V
,
Knehr
J
,
Roma
G
, et al
.
A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte
.
Nature
2018
;
560
:
377
81
.
11.
Adams
TS
,
Schupp
JC
,
Poli
S
,
Ayaub
EA
,
Neumark
N
,
Ahangari
F
, et al
.
Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis
.
Sci Adv
2020
;
6
:
eaba1983
.
12.
Habermann
AC
,
Gutierrez
AJ
,
Bui
LT
,
Yahn
SL
,
Winters
NI
,
Calvi
CL
, et al
.
Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis
.
Sci Adv
2020
;
6
:
eaba1972
.
13.
Vieira Braga
FA
,
Kar
G
,
Berg
M
,
Carpaij
OA
,
Polanski
K
,
Simon
LM
, et al
.
A cellular census of human lungs identifies novel cell states in health and in asthma
.
Nat Med
2019
;
25
:
1153
63
.
14.
Muus
C
,
Luecken
MD
,
Eraslan
G
,
Sikkema
L
,
Waghray
A
,
Heimberg
G
, et al
.
Single-cell meta-analysis of SARS-CoV-2 entry genes across tissues and demographics
.
Nat Med
2021
;
27
:
546
59
.
15.
Schupp
JC
,
Adams
TS
,
Cosme
C
Jr
,
Raredon
MSB
,
Yuan
Y
,
Omote
N
, et al
.
Integrated single-cell atlas of endothelial cells of the human lung
.
Circulation
2021
;
144
:
286
302
.
16.
Rocque
B
,
Barbetta
A
,
Singh
P
,
Goldbeck
C
,
Helou
DG
,
Loh
YE
, et al
.
Creation of a single cell RNASeq meta-atlas to define human liver immune homeostasis
.
Front Immunol
2021
;
12
:
679521
.
17.
Goveia
J
,
Rohlenova
K
,
Taverna
F
,
Treps
L
,
Conradi
LC
,
Pircher
A
, et al
.
An integrated gene expression landscape profiling approach to identify lung tumor endothelial cell heterogeneity and angiogenic candidates
.
Cancer Cell
2020
;
37
:
21
36
.
18.
Cheng
S
,
Li
Z
,
Gao
R
,
Xing
B
,
Gao
Y
,
Yang
Y
, et al
.
A pan-cancer single-cell transcriptional atlas of tumor infiltrating myeloid cells
.
Cell
2021
;
184
:
792
809
.
19.
Deprez
M
,
Zaragosi
LE
,
Truchi
M
,
Becavin
C
,
Ruiz García
S
,
Arguel
MJ
, et al
.
A single-cell atlas of the human healthy airways
.
Am J Respir Crit Care Med
2020
;
202
:
1636
45
.
20.
Reyfman
PA
,
Walter
JM
,
Joshi
N
,
Anekalla
KR
,
McQuattie-Pimentel
AC
,
Chiu
S
, et al
.
Single-cell transcriptomic analysis of human lung provides insights into the pathobiology of pulmonary fibrosis
.
Am J Respir Crit Care Med
2019
;
199
:
1517
36
.
21.
Zuo
WL
,
Rostami
MR
,
Shenoy
SA
,
LeBlanc
MG
,
Salit
J
,
Strulovici-Barel
Y
, et al
.
Cell-specific expression of lung disease risk-related genes in the human small airway epithelium
.
Respir Res
2020
;
21
:
200
.
22.
Watanabe
N
,
Fujita
Y
,
Nakayama
J
,
Mori
Y
,
Kadota
T
,
Hayashi
Y
, et al
.
Anomalous epithelial variations and ectopic inflammatory response in chronic obstructive pulmonary disease
.
Am J Respir Cell Mol Biol
2022
;
67
:
708
19
.
23.
Kim
N
,
Kim
HK
,
Lee
K
,
Hong
Y
,
Cho
JH
,
Choi
JW
, et al
.
Single-cell RNA sequencing demonstrates the molecular and cellular reprogramming of metastatic lung adenocarcinoma
.
Nat Commun
2020
;
11
:
2285
.
24.
Stuart
T
,
Butler
A
,
Hoffman
P
,
Hafemeister
C
,
Papalexi
E
,
Mauck
WM
3rd
, et al
.
Comprehensive integration of single-cell data
.
Cell
2019
;
177
:
1888
902
.
25.
Hafemeister
C
,
Satija
R
.
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
.
Genome Biol
2019
;
20
:
296
.
26.
McGinnis
CS
,
Murrow
LM
,
Gartner
ZJ
.
DoubletFinder: doublet detection in single-cell RNA sequencing data using artificial nearest neighbors
.
Cell Syst
2019
;
8
:
329
37
.
27.
Xi
NM
,
Li
JJ
.
Benchmarking computational doublet-detection methods for single-cell RNA sequencing data
.
Cell Syst
2021
;
12
:
176
94
.
28.
Korsunsky
I
,
Millard
N
,
Fan
J
,
Slowikowski
K
,
Zhang
F
,
Wei
K
, et al
.
Fast, sensitive and accurate integration of single-cell data with Harmony
.
Nat Methods
2019
;
16
:
1289
96
.
29.
Tran
HTN
,
Ang
KS
,
Chevrier
M
,
Zhang
X
,
Lee
NYS
,
Goh
M
, et al
.
A benchmark of batch-effect correction methods for single-cell RNA sequencing data
.
Genome Biol
2020
;
21
:
12
.
30.
Nakayama
J
,
Ito
E
,
Fujimoto
J
,
Watanabe
S
,
Semba
K
.
Comparative analysis of gene regulatory networks of highly metastatic breast cancer cells established by orthotopic transplantation and intra-circulation injection
.
Int J Oncol
2017
;
50
:
497
504
.
31.
Tokura
M
,
Nakayama
J
,
Prieto-Vila
M
,
Shiino
S
,
Yoshida
M
,
Yamamoto
T
, et al
.
Single-cell transcriptome profiling reveals intratumoral heterogeneity and molecular features of ductal carcinoma in situ
.
Cancer Res
2022
;
82
:
3236
48
.
32.
Hänzelmann
S
,
Castelo
R
,
Guinney
J
.
GSVA: gene set variation analysis for microarray and RNA-seq data
.
BMC Bioinformatics
2013
;
14
:
7
.
33.
Hoadley
KA
,
Yau
C
,
Hinoue
T
,
Wolf
DM
,
Lazar
AJ
,
Drill
E
, et al
.
Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer
.
Cell
2018
;
173
:
291
304
.
34.
Yu
G
,
Wang
LG
,
Han
Y
,
He
QY
.
clusterProfiler: an R package for comparing biological themes among gene clusters
.
OMICS
2012
;
16
:
284
7
.
35.
Yu
G
,
He
QY
.
ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization
.
Mol Biosyst
2016
;
12
:
477
9
.
36.
Finak
G
,
McDavid
A
,
Yajima
M
,
Deng
J
,
Gersuk
V
,
Shalek
AK
, et al
.
MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data
.
Genome Biol
2015
;
16
:
278
.
37.
Lumsden
AB
,
McLean
A
,
Lamb
D
.
Goblet and Clara cells of human distal airways: evidence for smoking induced changes in their numbers
.
Thorax
1984
;
39
:
844
9
.
38.
Shibata
S
,
Miyake
K
,
Tateishi
T
,
Yoshikawa
S
,
Yamanishi
Y
,
Miyazaki
Y
, et al
.
Basophils trigger emphysema development in a murine model of COPD through IL-4-mediated generation of MMP-12-producing macrophages
.
Proc Natl Acad Sci U S A
2018
;
115
:
13057
62
.
39.
Saetta
M
,
Turato
G
,
Baraldo
S
,
Zanin
A
,
Braccioni
F
,
Mapp
CE
, et al
.
Goblet cell hyperplasia and epithelial inflammation in peripheral airways of smokers with both symptoms of chronic bronchitis and chronic airflow limitation
.
Am J Respir Crit Care Med
2000
;
161
:
1016
21
.
40.
Hanna
JM
,
Onaitis
MW
.
Cell of origin of lung cancer
.
J Carcinog
2013
;
12
:
6
.
41.
Voehringer
D
.
Protective and pathological roles of mast cells and basophils
.
Nat Rev Immunol
2013
;
13
:
362
75
.
42.
Bossé
Y
,
Amos
CI
.
A decade of GWAS results in lung cancer
.
Cancer Epidemiol Biomarkers Prev
2018
;
27
:
363
79
.
43.
Nath
S
,
Mukherjee
P
.
MUC1: a multifaceted oncoprotein with a key role in cancer progression
.
Trends Mol Med
2014
;
20
:
332
42
.
44.
Kharbanda
A
,
Rajabi
H
,
Jin
C
,
Tchaicha
J
,
Kikuchi
E
,
Wong
KK
, et al
.
Targeting the oncogenic MUC1-C protein inhibits mutant EGFR-mediated signaling and survival in non-small cell lung cancer cells
.
Clin Cancer Res
2014
;
20
:
5423
34
.
45.
Wang
Y
,
Gorlova
OY
,
Gorlov
IP
,
Zhu
M
,
Dai
J
,
Albanes
D
, et al
.
Association analysis of driver gene-related genetic variants identified novel lung cancer susceptibility loci with 20,871 lung cancer cases and 15,971 controls
.
Cancer Epidemiol Biomarkers Prev
2020
;
29
:
1423
9
.
46.
Zhao
G
,
Lu
H
,
Liu
Y
,
Zhao
Y
,
Zhu
T
,
Garcia-Barrio
MT
, et al
.
Single-cell transcriptomics reveals endothelial plasticity during diabetic atherogenesis
.
Front Cell Dev Biol
2021
;
9
:
689469
.
47.
Csiszar
A
,
Podlutsky
A
,
Wolin
MS
,
Losonczy
G
,
Pacher
P
,
Ungvari
Z
.
Oxidative stress and accelerated vascular aging: implications for cigarette smoking
.
Front Biosci
2009
;
14
:
3128
44
.
48.
Emma
R
,
Caruso
M
,
Campagna
D
,
Pulvirenti
R
,
Li Volti
G
.
The impact of tobacco cigarettes, vaping products and tobacco heating products on oxidative stress
.
Antioxidants
2022
;
11
:
1829
.
49.
Ma
Y
,
Galluzzi
L
,
Zitvogel
L
,
Kroemer
G
.
Autophagy and cellular immune responses
.
Immunity
2013
;
39
:
211
27
.
50.
Zhao
CZ
,
Fang
XC
,
Wang
D
,
Tang
FD
,
Wang
XD
.
Involvement of type II pneumocytes in the pathogenesis of chronic obstructive pulmonary disease
.
Respir Med
2010
;
104
:
1391
5
.
51.
Gutschner
T
,
Hämmerle
M
,
Eissmann
M
,
Hsu
J
,
Kim
Y
,
Hung
G
, et al
.
The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells
.
Cancer Res
2013
;
73
:
1180
9
.
52.
Ji
P
,
Diederichs
S
,
Wang
W
,
Böing
S
,
Metzger
R
,
Schneider
PM
, et al
.
MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer
.
Oncogene
2003
;
22
:
8031
41
.
53.
Yoshida
K
,
Gowers
KHC
,
Lee-Six
H
,
Chandrasekharan
DP
,
Coorens
T
,
Maughan
EF
, et al
.
Tobacco smoking and somatic mutations in human bronchial epithelium
.
Nature
2020
;
578
:
266
72
.
54.
Vaz
M
,
Hwang
SY
,
Kagiampakis
I
,
Phallen
J
,
Patil
A
,
O'Hagan
HM
, et al
.
Chronic cigarette smoke-induced epigenomic changes precede sensitization of bronchial epithelial cells to single-step transformation by KRAS mutations
.
Cancer Cell
2017
;
32
:
360
76
.
55.
Wang
X
,
Ricciuti
B
,
Nguyen
T
,
Li
X
,
Rabin
MS
,
Awad
MM
, et al
.
Association between smoking history and tumor mutation burden in advanced non-small cell lung cancer
.
Cancer Res
2021
;
81
:
2566
73
.
56.
Hollander
MC
,
Blumenthal
GM
,
Dennis
PA
.
PTEN loss in the continuum of common cancers, rare syndromes and mouse models
.
Nat Rev Cancer
2011
;
11
:
289
301
.
57.
Cao
LL
,
Riascos-Bernal
DF
,
Chinnasamy
P
,
Dunaway
CM
,
Hou
R
,
Pujato
MA
, et al
.
Control of mitochondrial function and cell growth by the atypical cadherin Fat1
.
Nature
2016
;
539
:
575
8
.
58.
Takahashi
H
,
Ogata
H
,
Nishigaki
R
,
Broide
DH
,
Karin
M
.
Tobacco smoke promotes lung tumorigenesis by triggering IKKbeta- and JNK1-dependent inflammation
.
Cancer Cell
2010
;
17
:
89
97
.
59.
Li
X
,
Zhou
X
,
Li
Y
,
Zu
L
,
Pan
H
,
Liu
B
, et al
.
Activating transcription factor 3 promotes malignance of lung cancer cells in vitro
.
Thorac Cancer
2017
;
8
:
181
91
.
60.
Hanahan
D
,
Weinberg
RA
.
Hallmarks of cancer: the next generation
.
Cell
2011
;
144
:
646
74
.
61.
Lugano
R
,
Ramachandran
M
,
Dimberg
A
.
Tumor angiogenesis: causes, consequences, challenges and opportunities
.
Cell Mol Life Sci
2020
;
77
:
1745
70
.
62.
Amodio
N
,
Raimondi
L
,
Juli
G
,
Stamato
MA
,
Caracciolo
D
,
Tagliaferri
P
, et al
.
MALAT1: a druggable long non-coding RNA for targeted anti-cancer approaches
.
J Hematol Oncol
2018
;
11
:
63
.
63.
Hutter
C
,
Zenklusen
JC
.
The Cancer Genome Atlas: creating lasting value beyond its data
.
Cell
2018
;
173
:
283
5
.
64.
Cancer Genome Atlas Research Network
;
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KR
,
Ozenberger
BA
, et al
.
The Cancer Genome Atlas Pan-Cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
This open access article is distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Supplementary data