Abstract
Mitochondrial DNA's (mtDNA) haplogroups and SNPs were associated with the risk of different cancer. However, there is no evidence that the same haplogroup or mitochondrial SNP (mtSNP) exhibits the pleiotropic effect on multiple cancers.
We recruited 2,489 participants, including patients with colorectal, hepatocellular, lung, ovarian, bladder, breast, pancreatic, and renal cell carcinoma. In addition, 715 healthy individuals from Northern China served as controls. Next, cross-tumor analysis was performed to determine whether mtDNA variation is associated with multiple cancers.
Our results revealed a significant decrease in the occurrence risk of multiple cancers among individuals belonging to haplogroup A [OR = 0.553, 95% confidence interval (CI) = 0.375–0.815, P = 0.003]. Furthermore, we identified 11 mtSNPs associated with multiple cancers and divided the population into high-risk and low-risk groups. Low-risk groups showed a significantly reduced risk of occurrence compared with high-risk groups (OR = 0.614, 95% CI = 0.507–0.744, P < 0.001). Furthermore, using interaction analysis, we identified a special group of individuals belonging to haplogroup A/M7 and the low-risk population, who exhibit a lower risk of multiple cancers compared with other populations (OR = 0.195, 95% CI = 0.106–0.359, P < 0.001). Finally, gene set enrichment analysis confirmed that haplogroup A/M7 patients had lower expression levels of cancer-related pathway genes compared with haplogroup D patients.
We found that specific mtDNA haplogroups and mtSNPs may play a role in predicting multiple cancer predisposition in Chinese populations.
This may provide a potential tool for early screening in clinical settings for individuals in the Chinese population.
Introduction
Mitochondria are double-membrane organelles involved in many important biological processes such as cellular metabolism, energy generation, and regulation of apoptosis in eukaryotic cells (1, 2). Mitochondria have an independent genome, mitochondrial DNA (mtDNA), which encodes multiple enzymes and proteins related to the mitochondrial respiratory chain. Because of continuous damage during the lifecycle, the accumulation of mtDNA mutations can cause mitochondrial dysfunction, which is involved in the development of various diseases (3–5).
The inheritance pattern of mtDNA is matrilineal, giving human mtDNA strong geographical specificity. In the past, different mtDNA haplotypes and haplogroups were defined using partial SNP sites in mtDNA. Because each haplotype or haplogroup represents a common ancestor, mtDNA haplotypes or haplogroups can be used to trace genetic relationships, evolutionary history, and early human migration among different regions and ethnic groups (6).
In recent years, specific mtDNA haplogroups or mitochondrial SNPs (mtSNP) have been reported to be closely related to occurrences of several tumor types in particular populations. For example, in the Southern Chinese population, mtDNA haplogroup D5 is closely related to an increased risk of breast cancer (7), while haplogroup K has a protective effect against pancreatic cancer in southeastern European populations (8). Our previous works reported that haplogroup M7 in the Northern Chinese population is closely related to the occurrence of liver cancer and colon cancer (9, 10). These findings suggest that in a specific population, certain haplogroups play a pleiotropic role in the occurrence of different tumor types. However, there is currently no cross-tumor study evaluating the pleiotropy of haplogroups in multiple tumor occurrences.
In the current study, we collected mtDNA sequencing data from colorectal cancer, liver cancer, lung cancer, breast cancer, ovarian cancer, bladder cancer patient cohorts and healthy individuals from our laboratory, and combined publicly available mtSNP data from the Northern Han Chinese population. We systematically investigated the roles of mtDNA haplogroups and mtSNPs in the incidence risk of multiple cancers and established a risk score based on 11 SNPs. Our findings show that haplogroup A and low-risk populations has the lowest risk of multiple cancers occurrence.
Material and Methods
Study design and participants
The study collected tissue and plasma samples from a total of 2,489 patients with cancer between January 1, 2007, and December 31, 2019, at the first and second affiliated hospitals of the Fourth Military Medical University (FMMU) in Xi'an, P.R. China. Of those individuals, 649 had colorectal cancer, 816 had hepatocellular carcinoma, 456 had lung cancer, 304 had ovarian cancer, 139 had bladder cancer, 63 had breast cancer, 43 had pancreatic cancer, and 19 had renal cell carcinoma. In addition, 715 healthy controls (552 individuals sourced from public data and 157 individuals sourced from in-house data) were included in the study. For the validation cohort, external validation was performed using data from Southern China, which included 1,015 patients with colorectal cancer (11) and 1,562 healthy controls (12). In addition, we further added 56 bladder cancer cases, 34 ovarian cancer cases (combined as other types of tumors), and 80 healthy individuals to form an internal validation set.
Written informed consent was obtained from each patient during the surgical procedure, and the Ethics Committee of the FMMU (KY20183331-1, Shaanxi, P.R. China) approved the study.
DNA extraction and mtDNA sequencing
As mentioned previously, genomic DNA was extracted from fresh tissue using the Omega Whole Blood DNA Extraction Kit and measured using the Nanodrop 2000. Subsequently, we performed capture-based mtDNA sequencing using our own biotinylated probes (13). The capture probes were hybridized with the prepared whole-genome sequnecing libraries, and the binding buffer with streptavidin-coated magnetic beads was added to the reaction system. Finally, the libraries were sequenced on the Hiseq X Ten (Illumina) platform using paired-end runs with 2 × 150 cycles (PE 150; as shown in Supplementary Fig. S1).
Data processing and mapping
We systematically evaluated the analysis pipeline for mtDNA deep sequencing data. Briefly, raw mtDNA sequencing data first encountered two options: trimming or no trimming for quality control. The mtDNA reads were then mapped to the rCRS or combined with rCRS-hg19 using about Burrow-Wheeler Aligner (BWA) software. After sorting and removing duplicated reads with Picard, the Genome Analysis Toolkit 4 (GATK4) was used for local realignment. Finally, we applied a series of filtering conditions (removing false positive mutations) to detect mtDNA mutations and analyze heteroplasmy levels. Then the FASTQ preprocessor fastp (version 0.20.0; ref. 14) was used for trimming mtDNA sequencing data with three parameters. First, all sequencing adaptors were removed. Second, a sliding window (4 bp in length) approach was used to scan reads from front (5′) to tail (3′). When the average base quality in the window was below Q30, these bases and downstream parts were dropped. Third, reads with a length below 50 bp were discarded to avoid ambiguous mapping of short reads.
Public transcriptome data analysis
Public transcriptome data from colorectal cancer and lung cancer tissues were obtained from the Gene Expression Omnibus (GEO) database (GSE107422 for colorectal cancer and GSE16561 for lung cancer; ref. 15). Batch correction, standardization, and differential gene expression analysis were performed using DESeq2.
Gene set enrichment analysis (GSEA) was performed in the haplogroup A, haplogroup B, and haplogroup D to explore the biological signaling pathway, which originated from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and HALLMARK. The pathways with significant enrichment results were demonstrated on the basis of net enrichment score (NES), gene ratio, and P value. Gene sets with |NES| >1, nominal p value (NOM) P < 0.05, and FDR q < 0.25 were considered to be enrichment significant (16, 17).
Identification of mtDNA haplogroups and mtSNPs
The identification of mtDNA haplogroups and mtSNPs for each patient was carried out according to previously described methods (18). Briefly, to identify mtDNA variations, FASTA sequences of mtDNA were analyzed using MitoTool (www.mitotool.org; ref. 19). The haplogroup was determined using Phylotree (www.phylotree.org; mtDNA tree Build 16; ref. 20). mtSNPs were identified as mtDNA variations observed in both tumor and plasma. Further analysis excluded all mtSNPs in patients with a minor allele frequency (MAF) less than 5%.
Statistical analysis
Logistic regression analysis was used to evaluate the independent significance of haplogroups and mtSNPs while adjusting for age and gender. The χ2 test was used to determine whether there were any differences in SNPs among samples from different sources. Principal component analysis (PCA) was employed to compare the haplogroup distributions among the two cohorts of healthy individuals (Supplementary Fig. S1). The risk score for each patient was then calculated using the coefficients obtained from the logistics hazards model, with the sum of values weighted. The test for interaction between patients’ haplogroup and mtSNP-based risk score was performed by including a cross-product term into the logistic regression model. All statistical analyses were performed using SPSS software (Version 26.0), and P values less than 0.05 were considered statistically significant.
Data availability
The raw sequencing data underlying this article are available in the BIG Data Center, Beijing Institute of Genomics, with access number PRJCA021120.
Results
Flow chart showing the methods used for analysis
We conducted a prospective study on eight independent cohorts of patients with cancer, including colorectal cancer, hepatocellular carcinoma, lung cancer, ovarian cancer, bladder cancer, breast cancer, pancreatic cancer, renal cell carcinoma, and two healthy control cohorts. No significant difference between each healthy cohort was demonstrated by the χ2 test (Supplementary Table S1) and PCA (Supplementary Fig. S2). Then, logistic regression analysis was employed to screen for haplogroups and SNPs associated with the occurrence risk of multiple cancers, with adjustments made for age and gender to account for potential confounding effects. During SNP analysis, the common SNPs were screened from the three cohorts. Next, the χ2 test was used to identify the SNPs that showed no differences. Finally, based on the 11 SNPs identified, we established a risk score and divided the patients into high-risk and low-risk groups. Interaction analysis was also performed on the screened haplogroups. We found that compared with non-A haplogroup and high-risk populations, the incidence of cancer was significantly reduced in haplogroup A and low-risk populations (as shown in Fig. 1).
Workflow diagram. Flow chart showing the methods used for analysis.
Cross-tumor analysis of the association between mtDNA haplogroups and occurrence risk
We analyzed the association between mtDNA haplogroups and the occurrence risk of multiple cancers in a total of 2,489 patients with cancer and 715 healthy controls from Northern China. The overall distribution of mtDNA haplogroups was presented in Table 1, with haplogroup D being the most prevalent clade [734 cases (22.91%), 157 healthy controls (21.96%)]. Using logistic regression analysis, we found that individuals with haplogroup A had a significantly lower occurrence risk [OR = 0.553, 95% confidence interval (CI) = 0.375–0.815, P = 0.003] when compared with all non-A haplogroups, followed by those with haplogroup M7 (OR = 0.632, 95% CI = 0.424–0.942, P = 0.024) when compared with all non-M7 haplogroups.
Analysis of association between mtDNA haplogroups and occurrence risk in multiple types of cancer.
Haplogroupa,b . | Controls (n = 715) . | Cases(n = 2,489) . | OR (95% CI) . | P value . |
---|---|---|---|---|
A | 69 (9.65%) | 188 (7.55%) | 0.553 (0.375–0.815) | 0.003 |
B | 114 (15.94%) | 379 (15.23%) | 0.721 (0.512–1.015) | 0.061 |
D | 157 (21.96%) | 570 (22.91%) | 0.806 (0.583–1.113) | 0.189 |
G | 44 (6.15%) | 134 (5.37%) | 0.643 (0.415–0.994) | 0.047 |
M7 | 60 (8.39%) | 181 (7.27%) | 0.632 (0.424–0.942) | 0.024 |
M8 | 78 (10.91%) | 272 (10.91%) | 0.821 (0.568–1.186) | 0.292 |
N9 | 41 (5.73%) | 131 (5.27%) | 0.690 (0.443–1.074) | 0.100 |
R9 | 88 (12.31%) | 338 (13.58%) | 0.867 (0.606–1.239) | 0.433 |
Haplogroupa,b . | Controls (n = 715) . | Cases(n = 2,489) . | OR (95% CI) . | P value . |
---|---|---|---|---|
A | 69 (9.65%) | 188 (7.55%) | 0.553 (0.375–0.815) | 0.003 |
B | 114 (15.94%) | 379 (15.23%) | 0.721 (0.512–1.015) | 0.061 |
D | 157 (21.96%) | 570 (22.91%) | 0.806 (0.583–1.113) | 0.189 |
G | 44 (6.15%) | 134 (5.37%) | 0.643 (0.415–0.994) | 0.047 |
M7 | 60 (8.39%) | 181 (7.27%) | 0.632 (0.424–0.942) | 0.024 |
M8 | 78 (10.91%) | 272 (10.91%) | 0.821 (0.568–1.186) | 0.292 |
N9 | 41 (5.73%) | 131 (5.27%) | 0.690 (0.443–1.074) | 0.100 |
R9 | 88 (12.31%) | 338 (13.58%) | 0.867 (0.606–1.239) | 0.433 |
Note: Bold entries indicate statistical significance.
Abbreviations: CI, confidence interval; OR, odds ratio.
aOther haplogroups were used as reference.
bThe haplogroup, such as M9, JT, N*, N1, R*, R0, U, X, with a frequency of less than 5%, were not shown.
Furthermore, we selected haplogroup D, with the highest proportion in our cohort, as the reference group to evaluate the association between haplogroup and the occurrence risk of multiple cancers. As shown in Supplementary Table S2, individuals with haplogroup A still had a significantly lower occurrence risk (OR = 0.686, 95% CI = 0.493–0.954, P = 0.025).
In addition to external validation, we conducted external validation using public data from Southern China, comprising 1,015 patients with cancer and 1,562 healthy controls. As demonstrated in Supplementary Table S3, our findings indicated that individuals with haplogroup M7 exhibited a significantly reduced occurrence risk (OR = 0.627, 95% CI = 0.443–0.887, P = 0.008) compared with all non-M7 haplogroups. Although the q-value is 0.072, suggestive significance was still observed. The inability of the validation set to replicate haplogroup A may be attributed to regional differences, as our in-house data originated from Northern China while the validation set was sourced from Southern China. Furthermore, we utilized our newly acquired in-house data for internal validation, revealing that individuals with haplogroup M7 showed a significantly lower occurrence risk (OR = 0.229, 95% CI = 0.054–0.966, P = 0.045), as presented in Supplementary Table S4.
Analysis of association between mtDNA haplogroups and occurrence risk in single cancer
To further validate our viewpoint on the association between haplogroup A and the risk of multiple cancers, we explored whether this haplogroup was also associated with the occurrence risk of single cancer. In Table 2, we found that individuals with haplogroup A had a significantly lower occurrence risk when compared with other haplogroups in colorectal cancer (OR = 0.474, 95% CI = 0.291–0.773, P = 0.003), lung cancer (OR = 0.456, 95% CI = 0.262–0.794, P = 0.005), and other cancers (OR = 0.548, 95% CI = 0.325–0.924, P = 0.024). Similar results were observed for patients with haplogroup M7, with OR = 0.452 in colorectal cancer (95% CI = 0.270–0.756, P = 0.002), OR = 0.542 in lung cancer (95% CI = 0.310–0.947, P = 0.031) and OR = 0.566 in others (95% CI = 0.330–0.971, P = 0.039). However, no significant association was found for hepatocellular carcinoma.
Analysis of the association between mtDNA haplogroups and occurrence risk in single type of cancer.
. | Colorectal cancer . | Hepatocellular carcinoma . | Lung cancer . | Othersb . | ||||
---|---|---|---|---|---|---|---|---|
Haplogroupa . | n = 649 . | OR (95% CI) . | n = 816 . | OR (95% CI) . | n = 456 . | OR (95% CI) . | n = 568 . | OR (95% CI) . |
A | 47 (7.24%) | 0.474 (0.291–0.773)***c | 58 (7.11%) | 0.737 (0.454–1.196) | 30 (6.58%) | 0.456 (0.262–0.794)*** | 39 (6.99%) | 0.548 (0.325–0.924)* |
B | 106 (16.33%) | 0.647 (0.427–0.979) | 126 (15.44%) | 0.969 (0.636–1.476) | 66 (14.47%) | 0.607 (0.382–0.966)* | 77 (13.80%) | 0.655 (0.418–1.026) |
D | 118 (18.18%) | 0.523 (0.351–0.779)*** | 197 (24.14%) | 1.100 (0.741–1.634) | 113 (24.78%) | 0.755 (0.493–1.156) | 149 (26.70%) | 0.920 (0.610–1.387) |
G | 29 (4.47%) | 0.458 (0.260–0.808)*** | 33 (4.04%) | 0.658 (0.375–1.154) | 24 (5.26%) | 0.572 (0.311–1.052) | 43 (7.71%) | 0.948 (0.551–1.631) |
M7 | 39 (6.01%) | 0.452 (0.270–0.756)*** | 68 (8.33%) | 0.994 (0.613–1.610) | 31 (6.80%) | 0.542 (0.310–0.947)* | 37 (6.63%) | 0.566 (0.330–0.971)* |
M8 | 84 (12.94%) | 0.749 (0.481–1.167) | 97 (11.89%) | 1.090 (0.696–1.708) | 49 (10.75%) | 0.659 (0.400–1.087) | 62 (11.11%) | 0.771 (0.477–1.244) |
N9 | 42 (6.47%) | 0.713 (0.417–1.218) | 40 (4.90%) | 0.855 (0.494–1.482) | 24 (5.26%) | 0.614 (0.332–1.135) | 22 (3.94%) | 0.544 (0.294–1.007) |
R9 | 92 (14.18%) | 0.727 (0.472–1.121) | 125 (15.32%) | 1.245 (0.808–1.919) | 58 (12.72%) | 0.692 (0.427–1.120) | 73 (13.08%) | 0.804 (0.506–1.278) |
. | Colorectal cancer . | Hepatocellular carcinoma . | Lung cancer . | Othersb . | ||||
---|---|---|---|---|---|---|---|---|
Haplogroupa . | n = 649 . | OR (95% CI) . | n = 816 . | OR (95% CI) . | n = 456 . | OR (95% CI) . | n = 568 . | OR (95% CI) . |
A | 47 (7.24%) | 0.474 (0.291–0.773)***c | 58 (7.11%) | 0.737 (0.454–1.196) | 30 (6.58%) | 0.456 (0.262–0.794)*** | 39 (6.99%) | 0.548 (0.325–0.924)* |
B | 106 (16.33%) | 0.647 (0.427–0.979) | 126 (15.44%) | 0.969 (0.636–1.476) | 66 (14.47%) | 0.607 (0.382–0.966)* | 77 (13.80%) | 0.655 (0.418–1.026) |
D | 118 (18.18%) | 0.523 (0.351–0.779)*** | 197 (24.14%) | 1.100 (0.741–1.634) | 113 (24.78%) | 0.755 (0.493–1.156) | 149 (26.70%) | 0.920 (0.610–1.387) |
G | 29 (4.47%) | 0.458 (0.260–0.808)*** | 33 (4.04%) | 0.658 (0.375–1.154) | 24 (5.26%) | 0.572 (0.311–1.052) | 43 (7.71%) | 0.948 (0.551–1.631) |
M7 | 39 (6.01%) | 0.452 (0.270–0.756)*** | 68 (8.33%) | 0.994 (0.613–1.610) | 31 (6.80%) | 0.542 (0.310–0.947)* | 37 (6.63%) | 0.566 (0.330–0.971)* |
M8 | 84 (12.94%) | 0.749 (0.481–1.167) | 97 (11.89%) | 1.090 (0.696–1.708) | 49 (10.75%) | 0.659 (0.400–1.087) | 62 (11.11%) | 0.771 (0.477–1.244) |
N9 | 42 (6.47%) | 0.713 (0.417–1.218) | 40 (4.90%) | 0.855 (0.494–1.482) | 24 (5.26%) | 0.614 (0.332–1.135) | 22 (3.94%) | 0.544 (0.294–1.007) |
R9 | 92 (14.18%) | 0.727 (0.472–1.121) | 125 (15.32%) | 1.245 (0.808–1.919) | 58 (12.72%) | 0.692 (0.427–1.120) | 73 (13.08%) | 0.804 (0.506–1.278) |
Note: Bold entries indicate statistical significance.
Abbreviations: CI, confidence interval; OR, odds ratio.
aOther haplogroups were used as reference.
bIn the case of relatively low frequency of certain cancer types, such as ovarian cancer, bladder cancer, breast cancer, pancreatic cancer, and renal cell carcinoma, these were combined and analyzed collectively.
c*, P <0.05; ***, P <0.01.
Furthermore, we selected haplogroup D, with the highest proportion, as the reference group to evaluate the conclusion. As shown in Supplementary Table S5, individuals with haplogroup A still had a significantly lower occurrence risk (haplogroup D as reference, hepatocellular carcinoma, OR = 0.598, 95% CI = 0.363–0.983, P = 0.043; lung cancer, OR = 0.604, 95% CI = 0.369–0.988, P = 0.045; others, OR = 0.596, 95% CI = 0.379–0.936, P = 0.025). While the association between haplogroup A and cancer risk did not reach statistical significance in colorectal cancer, the obtained P value (0.063) still suggested a potential trend between haplogroup A and cancer risk.
Analysis of association between mtSNPs and occurrence risk of multiple cancers
We also assessed the distribution of mtDNA SNPs in both patients and controls. As shown in Table 3, among the 61 mtSNPs with MAF>5%, 11 mtSNPs have a significantly different distribution between multiple cancers and controls. Among them, mtSNP sites 14783, 14318, and 6455 were found to have good reproducibility in single cancers. On the basis of these 11 mtSNPs closely related to the occurrence of multiple cancers, we established a risk score to classify the population into high-risk and low-risk groups. The individual risk score was calculated using the following formula: risk score = − (1.954×m.14783T>C) − (0.837×m.14318T>C) − (0.285×m.6455C>T) + (2.061×m.15043G>A) − (1.814×m.7028C>T) + (0.739×m.13759G>A) + (0.985×m.750A>G) + (0.169×m.152T>C) − (0.928×m.12372G>A) + (1.377×m.11719G>A) + (0.895×m.12358A>G). The resulting risk scores ranged from −1.704 to 5.110. On the basis of the median of the risk score, patients were divided into high-risk (n = 1,536) and low-risk (n = 1,668) groups with a median risk score of 1.342 as the cutoff. The low-risk group had a significantly reduced risk of developing multiple cancers (OR = 0.614, 95% CI = 0.507–0.744, P < 0.001). Furthermore, according to the interaction analysis, it was found that compared with non-haplogroup A or M7 and high-risk populations, the incidence risk of cancer in the haplogroup A or M7 and low-risk populations was significantly reduced (OR = 0.42, 95% CI = 0.303–0.582, P < 0.001). In addition, when analyzing the interaction between different subgroups, we discovered that populations who possess both haplogroup A/M7 and low-risk characteristics have a lower occurrence risk of multiple tumors (Pinteraction = 0.003; Table 4).
Analysis of association between mtSNPs and occurrence risk in multiple types of cancer.
mtSNPa . | Cases (n = 2,489) . | Controls(n = 715) . | OR (95% CI) . | Haplogroup A/M7-related SNP . | P-value . | q- value . |
---|---|---|---|---|---|---|
14783 | 1,321 (53.07%) | 411 (57.48%) | 0.119 (0.065–0.215) | YES | <0.001 | <0.001 |
14318 | 104 (4.18%) | 62 (8.67%) | 0.209 (0.116–0.374) | NO | <0.001 | <0.001 |
6455 | 164 (6.59%) | 56 (7.83%) | 0.363 (0.208–0.632) | YES | <0.001 | 0.007 |
15043 | 1,319 (52.99%) | 367 (51.33%) | 5.238 (2.089–13.136) | NO | <0.001 | 0.006 |
7028 | 2,415 (97.03%) | 707 (98.88%) | 0.132(0.042–0.410) | NO | <0.001 | 0.005 |
13759 | 162 (6.51%) | 26 (3.64%) | 4.298 (1.843–10.025) | YES | <0.001 | 0.007 |
750 | 2,462 (98.92%) | 703 (98.32%) | 3.887 (1.665–9.074) | NO | 0.002 | 0.015 |
152 | 656 (26.36%) | 165 (22.66%) | 1.478 (1.158–1.886) | YES | 0.002 | 0.013 |
12372 | 128 (5.14%) | 48 (6.71%) | 0.361 (0.186–0.698) | YES | 0.002 | 0.017 |
11719 | 2,453 (98.55%) | 703 (98.32%) | 4.998 (1.604–15.573) | NO | 0.006 | 0.034 |
12358 | 146 (5.87%) | 38 (5.31%) | 3.070 (1.385–6.809) | YES | 0.006 | 0.032 |
mtSNPa . | Cases (n = 2,489) . | Controls(n = 715) . | OR (95% CI) . | Haplogroup A/M7-related SNP . | P-value . | q- value . |
---|---|---|---|---|---|---|
14783 | 1,321 (53.07%) | 411 (57.48%) | 0.119 (0.065–0.215) | YES | <0.001 | <0.001 |
14318 | 104 (4.18%) | 62 (8.67%) | 0.209 (0.116–0.374) | NO | <0.001 | <0.001 |
6455 | 164 (6.59%) | 56 (7.83%) | 0.363 (0.208–0.632) | YES | <0.001 | 0.007 |
15043 | 1,319 (52.99%) | 367 (51.33%) | 5.238 (2.089–13.136) | NO | <0.001 | 0.006 |
7028 | 2,415 (97.03%) | 707 (98.88%) | 0.132(0.042–0.410) | NO | <0.001 | 0.005 |
13759 | 162 (6.51%) | 26 (3.64%) | 4.298 (1.843–10.025) | YES | <0.001 | 0.007 |
750 | 2,462 (98.92%) | 703 (98.32%) | 3.887 (1.665–9.074) | NO | 0.002 | 0.015 |
152 | 656 (26.36%) | 165 (22.66%) | 1.478 (1.158–1.886) | YES | 0.002 | 0.013 |
12372 | 128 (5.14%) | 48 (6.71%) | 0.361 (0.186–0.698) | YES | 0.002 | 0.017 |
11719 | 2,453 (98.55%) | 703 (98.32%) | 4.998 (1.604–15.573) | NO | 0.006 | 0.034 |
12358 | 146 (5.87%) | 38 (5.31%) | 3.070 (1.385–6.809) | YES | 0.006 | 0.032 |
Abbreviations: CI, confidence interval; OR, odds ratio.
aIndividuals with the wildtype of mtSNP were used as reference.
Stratified and interaction analysis of mtSNPs-based risk score in multiple types of cancer.
Group . | Cases (n = 2,489) . | Controls (n = 715) . | OR (95% CI) . | P-value . |
---|---|---|---|---|
High-risk | 1,268 (33.23%) | 268 (37.48%) | Ref | |
Low-risk | 1,221 (66.77%) | 447 (62.52%) | 0.614 (0.507–0.744) | <0.001 |
High-risk and haplogroup non-A/M7 | 1,236 (49.66%) | 263 (36.78%) | Ref | |
High-risk and haplogroup A/M7 | 32 (1.29%) | 5 (0.70%) | 0.574 (0.394–0.835) | 0.004 |
Low-risk and haplogroup non-A/M7 | 906 (36.40%) | 323 (45.17%) | 0.558 (0.447–0.696) | <0.001 |
Low-risk and haplogroup A/M7 | 315 (12.66%) | 124 (17.34%) | 0.420 (0.303–0.582) | <0.001 |
Pinteraction | 0.003 |
Group . | Cases (n = 2,489) . | Controls (n = 715) . | OR (95% CI) . | P-value . |
---|---|---|---|---|
High-risk | 1,268 (33.23%) | 268 (37.48%) | Ref | |
Low-risk | 1,221 (66.77%) | 447 (62.52%) | 0.614 (0.507–0.744) | <0.001 |
High-risk and haplogroup non-A/M7 | 1,236 (49.66%) | 263 (36.78%) | Ref | |
High-risk and haplogroup A/M7 | 32 (1.29%) | 5 (0.70%) | 0.574 (0.394–0.835) | 0.004 |
Low-risk and haplogroup non-A/M7 | 906 (36.40%) | 323 (45.17%) | 0.558 (0.447–0.696) | <0.001 |
Low-risk and haplogroup A/M7 | 315 (12.66%) | 124 (17.34%) | 0.420 (0.303–0.582) | <0.001 |
Pinteraction | 0.003 |
Note: Bold entries indicate statistical significance.
Abbreviations: CI, confidence interval; OR, odds ratio.
Furthermore, as depicted in Supplementary Table S6, we utilized the previously mentioned public dataset for validation purposes. Our analysis revealed a significantly decreased incidence risk (OR = 0.291, 95% CI = 0.212–0.398, P < 0.001) among individuals with low-risk and haplogroup A/M7. Importantly, the results of the interaction analysis were consistent with previous findings.
GSEA in patients with cancer with haplogroup A/M7
To further investigate the mechanisms underlying haplogroup for risk of the multiple cancers occurrence, the transcriptome data of 30 patients with colorectal cancer, 30 patients with lung cancer with haplogroup A/M7, haplogroup B, and haplogroup D were downloaded from GEO and analyzed. We standardized the GSE107422 and GSE165611 matrix data and used recognized oncogenic signaling pathways in the KEGG pathway and HALLMARK gene sets as validation standards. Subsequently, GSEA was employed to identify functional enrichment gene sets in the A/M7 haplogroup population, which exhibits lower susceptibility to cancer. The results revealed that in colorectal cancer, the A/M7 haplogroup population was primarily associated with downregulation of the Hedgehog, Wnt, and NOTCH signaling pathways (Fig. 2A, C, and E). These findings suggest that individuals in the A/M7 haplogroup may have decreased neoplastic proliferation capability, thus providing cellular protection against tumor development. Similar results were observed in lung cancer as well (Fig. 2B, D, and F).
GSEA for patients with haplogroup A/M7 in colorectal cancer. GSEA: Hedgehog signaling pathway in colorectal cancer (A); Hedgehog signaling pathway in lung cancer (B); Wnt signaling pathway in colorectal cancer (C); Wnt signaling pathway in lung cancer (D); NOTCH signaling pathway in colorectal cancer (E); NOTCH signaling pathway in lung cancer (F).
GSEA for patients with haplogroup A/M7 in colorectal cancer. GSEA: Hedgehog signaling pathway in colorectal cancer (A); Hedgehog signaling pathway in lung cancer (B); Wnt signaling pathway in colorectal cancer (C); Wnt signaling pathway in lung cancer (D); NOTCH signaling pathway in colorectal cancer (E); NOTCH signaling pathway in lung cancer (F).
Discussion
Previous studies have established a correlation between mtDNA genetic mutations and the risk of developing cancer. However, most current research has focused on the relationship between haplogroups or mtSNPs and the incidence risk of a single cancer (21–23). To our knowledge, no study has investigated the association between haplogroups or mtSNPs and the risk of multiple cancers. In light of this, our study explored the pleiotropy of haplogroups or mtSNPs in multiple cancers and aimed to identify important predictive biomarkers for cancer in the population. These findings could serve as a basis for developing more effective screening and prevention strategies for individuals at high risk of developing multiple cancers.
Our previous study revealed that haplogroup M7 is associated with a reduced risk of colorectal cancer (9), which is consistent with its role in reducing the risk of hepatocellular carcinoma (10). In this study, we have shown for the first time that haplogroup A and M7 may also decrease the risk of developing multiple cancers. Our findings add further support to the role of haplogroups in reducing cancer risk and validate our previous results. However, it has also been reported that haplogroup M7 is associated with an increased risk of lung cancer (24), which conflicts with our conclusion. This discrepancy may be due to different environmental conditions, as our population mainly comes from Northern China while their population mainly comes from Southwestern China, with different genetic backgrounds leading to different effects of the same haplogroup on cancer risk. Besides, we speculate that genetic pleiotropy could also contribute to the discordant findings. The tumor microenvironment encountered by the same haplogroup population in different types of cancer may activate different carcinogenic signals or other processes, leading to mutations that interact and restrict one another, ultimately affecting physiologic and biochemical reactions and resulting in different cancer risks. However, the specific mechanisms underlying these observations remain unclear. In addition, another report suggested that haplogroup N9a is negatively correlated with the incidence of hepatocellular carcinoma in Northern China, but we were unable to replicate this conclusion in our study. Although both our study cohort and theirs are from Northern China, there may be certain regional differences as our population mainly comes from Shaanxi province while theirs is from Henan province. By using the public data (12), we found similar proportions of N9a population between Henan province and Shaanxi province (4.63% vs. 4.45%), but different proportions of N9a population between our study cohort and theirs (4.48% vs. 5.1%), which may explain the discrepancies in our data analysis. Therefore, we believe that our results highlight the importance of considering regional differences in genetic studies.
In recent years, researchers have begun to explore the relationship between classifiers composed of a class of related SNPs and cancers (18, 25). On the basis of our results, we found that the selected mtSNPs were not closely linked, which is consistent with the current situation in multiple cancer studies. It is challenging to use a single or haplogroup A/M7-related mtSNPs to indicate the risk of multiple cancers due to the strong heterogeneity between tumors. In addition, we found that most of the selected mtSNPs were distributed across multiple haplogroups, indicating the feasibility of using this widely distributed mtSNP set to indicate the risk of multiple cancers. For example, m.152T>C is present in almost all haplogroups, while m.14783T>C is also located on the main trunk of the M branch, encompassing the majority of haplogroups. Furthermore, out of the 11 mtSNPs associated with the occurrence risk of multiple tumors, seven are linked to haplogroup A or M7, suggesting that these haplogroups primarily influence tumor risk through the effects of these mtSNPs. The remaining five mtSNPs are not associated with haplogroup A or M7, but interaction analysis indicates that stratifying individuals based on these 11 mtSNPs and haplogroup A can lead to further delineation of different tumor risk subpopulations.
To further explore potential functional mechanisms, we collected public RNA sequencing data to analyze differentially expressed genes (DEG) in colorectal cancer and lung cancer tissues, representing the multiple types of tumors studied with better result repeatability. According to GSEA, DEGs between haplogroup A/M7 and other haplogroups exhibited a strong association with lower expression of carcinogenic signaling pathways in different cancers, resulting in decreased neoplastic proliferation and reduced susceptibility to tumor occurrences. Uncontrolled proliferation is a hallmark of tumor cells, and the pathways related to cancer susceptibility in haplogroup A/M7 populations are all involved in cell proliferation regulation. Previous studies have consistently indicated that Hedgehog signaling pathway (26, 27), Wnt signaling pathway (28, 29), and NOTCH signaling pathway (30, 31) can promote tumor growth in various cancer types and are key signaling pathways contributing to the growth of human tumors. Consequently, haplogroup A/M7 populations may reduce the expression of certain carcinogenic signaling pathways, thereby inhibiting uncontrolled malignant proliferation of their own cells and playing a less susceptible role in developing cancer.
The limitations of our study should be noted. As a retrospective study, the generalizability of our findings is limited as it only included patients from Northern China. Therefore, further validation through prospective studies with larger cohorts and in other populations is necessary. In addition, the SNP-based risk score needs to be validated further through prospective multicenter cohort studies. Moreover, laboratory-based basic research is needed to reveal the mechanisms underlying the association between haplogroups and mtSNPs with multiple cancers occurrence.
In this retrospective analysis, we have identified a significant association between haplogroup and mtSNPs and the occurrence of multiple cancers in Northern China. In addition, the haplogroup and mtSNP-based risk score helps to develop for risk susceptibility prediction of patients with cancer, which helps to identity a special group of individuals with lower tumor incidence risk. Therefore, our findings suggest that the haplogroup and mtSNP-based risk score may be a practical and reliable predictor of risk for multiple cancers, providing a potential tool for early screening in clinical settings for individuals in the Northern Chinese population.
Authors' Disclosures
No disclosures were reported.
Authors' Contributions
D. Chen: Resources, data curation, formal analysis, writing–original draft. Z. Yan: Resources, data curation, formal analysis, methodology, writing–original draft. Q. Yuan: Resources, data curation, software, formal analysis, methodology. F. Xie: Resources, data curation, software. Y. Liu: Resources. Z. Feng: Resources. Z. Wang: Resources, methodology. F. Zhou: Supervision, funding acquisition, writing–review and editing. J. Xing: Conceptualization, supervision, funding acquisition, project administration, writing–review and editing. Z. Zhang: Supervision, funding acquisition, project administration. F. Wang: Resources, supervision, project administration. X. Guo: Conceptualization, supervision, funding acquisition, methodology, project administration, writing–review and editing.
Acknowledgments
We thank the patients for their participation in the study. X. Guo was funded by the Autonomous Project of State Key Laboratory of Holistic Integrative Management of Gastrointestinal Cancers (CBSKL2022ZZ53) and the Key Research and Development Program of Shaanxi Province (2023-ZDLSF-46); J. Xing was funded by the Key Research and Development Program of Shaanxi Province (2022SF-231); Z. Zhang was funded by the Key Research and Development Program of Xuzhou (KC19172, KC21212).
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).