Background:

Mitochondrial DNA's (mtDNA) haplogroups and SNPs were associated with the risk of different cancer. However, there is no evidence that the same haplogroup or mitochondrial SNP (mtSNP) exhibits the pleiotropic effect on multiple cancers.

Methods:

We recruited 2,489 participants, including patients with colorectal, hepatocellular, lung, ovarian, bladder, breast, pancreatic, and renal cell carcinoma. In addition, 715 healthy individuals from Northern China served as controls. Next, cross-tumor analysis was performed to determine whether mtDNA variation is associated with multiple cancers.

Results:

Our results revealed a significant decrease in the occurrence risk of multiple cancers among individuals belonging to haplogroup A [OR = 0.553, 95% confidence interval (CI) = 0.375–0.815, P = 0.003]. Furthermore, we identified 11 mtSNPs associated with multiple cancers and divided the population into high-risk and low-risk groups. Low-risk groups showed a significantly reduced risk of occurrence compared with high-risk groups (OR = 0.614, 95% CI = 0.507–0.744, P < 0.001). Furthermore, using interaction analysis, we identified a special group of individuals belonging to haplogroup A/M7 and the low-risk population, who exhibit a lower risk of multiple cancers compared with other populations (OR = 0.195, 95% CI = 0.106–0.359, P < 0.001). Finally, gene set enrichment analysis confirmed that haplogroup A/M7 patients had lower expression levels of cancer-related pathway genes compared with haplogroup D patients.

Conclusions:

We found that specific mtDNA haplogroups and mtSNPs may play a role in predicting multiple cancer predisposition in Chinese populations.

Impact:

This may provide a potential tool for early screening in clinical settings for individuals in the Chinese population.

Mitochondria are double-membrane organelles involved in many important biological processes such as cellular metabolism, energy generation, and regulation of apoptosis in eukaryotic cells (1, 2). Mitochondria have an independent genome, mitochondrial DNA (mtDNA), which encodes multiple enzymes and proteins related to the mitochondrial respiratory chain. Because of continuous damage during the lifecycle, the accumulation of mtDNA mutations can cause mitochondrial dysfunction, which is involved in the development of various diseases (3–5).

The inheritance pattern of mtDNA is matrilineal, giving human mtDNA strong geographical specificity. In the past, different mtDNA haplotypes and haplogroups were defined using partial SNP sites in mtDNA. Because each haplotype or haplogroup represents a common ancestor, mtDNA haplotypes or haplogroups can be used to trace genetic relationships, evolutionary history, and early human migration among different regions and ethnic groups (6).

In recent years, specific mtDNA haplogroups or mitochondrial SNPs (mtSNP) have been reported to be closely related to occurrences of several tumor types in particular populations. For example, in the Southern Chinese population, mtDNA haplogroup D5 is closely related to an increased risk of breast cancer (7), while haplogroup K has a protective effect against pancreatic cancer in southeastern European populations (8). Our previous works reported that haplogroup M7 in the Northern Chinese population is closely related to the occurrence of liver cancer and colon cancer (9, 10). These findings suggest that in a specific population, certain haplogroups play a pleiotropic role in the occurrence of different tumor types. However, there is currently no cross-tumor study evaluating the pleiotropy of haplogroups in multiple tumor occurrences.

In the current study, we collected mtDNA sequencing data from colorectal cancer, liver cancer, lung cancer, breast cancer, ovarian cancer, bladder cancer patient cohorts and healthy individuals from our laboratory, and combined publicly available mtSNP data from the Northern Han Chinese population. We systematically investigated the roles of mtDNA haplogroups and mtSNPs in the incidence risk of multiple cancers and established a risk score based on 11 SNPs. Our findings show that haplogroup A and low-risk populations has the lowest risk of multiple cancers occurrence.

Study design and participants

The study collected tissue and plasma samples from a total of 2,489 patients with cancer between January 1, 2007, and December 31, 2019, at the first and second affiliated hospitals of the Fourth Military Medical University (FMMU) in Xi'an, P.R. China. Of those individuals, 649 had colorectal cancer, 816 had hepatocellular carcinoma, 456 had lung cancer, 304 had ovarian cancer, 139 had bladder cancer, 63 had breast cancer, 43 had pancreatic cancer, and 19 had renal cell carcinoma. In addition, 715 healthy controls (552 individuals sourced from public data and 157 individuals sourced from in-house data) were included in the study. For the validation cohort, external validation was performed using data from Southern China, which included 1,015 patients with colorectal cancer (11) and 1,562 healthy controls (12). In addition, we further added 56 bladder cancer cases, 34 ovarian cancer cases (combined as other types of tumors), and 80 healthy individuals to form an internal validation set.

Written informed consent was obtained from each patient during the surgical procedure, and the Ethics Committee of the FMMU (KY20183331-1, Shaanxi, P.R. China) approved the study.

DNA extraction and mtDNA sequencing

As mentioned previously, genomic DNA was extracted from fresh tissue using the Omega Whole Blood DNA Extraction Kit and measured using the Nanodrop 2000. Subsequently, we performed capture-based mtDNA sequencing using our own biotinylated probes (13). The capture probes were hybridized with the prepared whole-genome sequnecing libraries, and the binding buffer with streptavidin-coated magnetic beads was added to the reaction system. Finally, the libraries were sequenced on the Hiseq X Ten (Illumina) platform using paired-end runs with 2 × 150 cycles (PE 150; as shown in Supplementary Fig. S1).

Data processing and mapping

We systematically evaluated the analysis pipeline for mtDNA deep sequencing data. Briefly, raw mtDNA sequencing data first encountered two options: trimming or no trimming for quality control. The mtDNA reads were then mapped to the rCRS or combined with rCRS-hg19 using about Burrow-Wheeler Aligner (BWA) software. After sorting and removing duplicated reads with Picard, the Genome Analysis Toolkit 4 (GATK4) was used for local realignment. Finally, we applied a series of filtering conditions (removing false positive mutations) to detect mtDNA mutations and analyze heteroplasmy levels. Then the FASTQ preprocessor fastp (version 0.20.0; ref. 14) was used for trimming mtDNA sequencing data with three parameters. First, all sequencing adaptors were removed. Second, a sliding window (4 bp in length) approach was used to scan reads from front (5′) to tail (3′). When the average base quality in the window was below Q30, these bases and downstream parts were dropped. Third, reads with a length below 50 bp were discarded to avoid ambiguous mapping of short reads.

Public transcriptome data analysis

Public transcriptome data from colorectal cancer and lung cancer tissues were obtained from the Gene Expression Omnibus (GEO) database (GSE107422 for colorectal cancer and GSE16561 for lung cancer; ref. 15). Batch correction, standardization, and differential gene expression analysis were performed using DESeq2.

Gene set enrichment analysis (GSEA) was performed in the haplogroup A, haplogroup B, and haplogroup D to explore the biological signaling pathway, which originated from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and HALLMARK. The pathways with significant enrichment results were demonstrated on the basis of net enrichment score (NES), gene ratio, and P value. Gene sets with |NES| >1, nominal p value (NOM) P < 0.05, and FDR q < 0.25 were considered to be enrichment significant (16, 17).

Identification of mtDNA haplogroups and mtSNPs

The identification of mtDNA haplogroups and mtSNPs for each patient was carried out according to previously described methods (18). Briefly, to identify mtDNA variations, FASTA sequences of mtDNA were analyzed using MitoTool (www.mitotool.org; ref. 19). The haplogroup was determined using Phylotree (www.phylotree.org; mtDNA tree Build 16; ref. 20). mtSNPs were identified as mtDNA variations observed in both tumor and plasma. Further analysis excluded all mtSNPs in patients with a minor allele frequency (MAF) less than 5%.

Statistical analysis

Logistic regression analysis was used to evaluate the independent significance of haplogroups and mtSNPs while adjusting for age and gender. The χ2 test was used to determine whether there were any differences in SNPs among samples from different sources. Principal component analysis (PCA) was employed to compare the haplogroup distributions among the two cohorts of healthy individuals (Supplementary Fig. S1). The risk score for each patient was then calculated using the coefficients obtained from the logistics hazards model, with the sum of values weighted. The test for interaction between patients’ haplogroup and mtSNP-based risk score was performed by including a cross-product term into the logistic regression model. All statistical analyses were performed using SPSS software (Version 26.0), and P values less than 0.05 were considered statistically significant.

Data availability

The raw sequencing data underlying this article are available in the BIG Data Center, Beijing Institute of Genomics, with access number PRJCA021120.

Flow chart showing the methods used for analysis

We conducted a prospective study on eight independent cohorts of patients with cancer, including colorectal cancer, hepatocellular carcinoma, lung cancer, ovarian cancer, bladder cancer, breast cancer, pancreatic cancer, renal cell carcinoma, and two healthy control cohorts. No significant difference between each healthy cohort was demonstrated by the χ2 test (Supplementary Table S1) and PCA (Supplementary Fig. S2). Then, logistic regression analysis was employed to screen for haplogroups and SNPs associated with the occurrence risk of multiple cancers, with adjustments made for age and gender to account for potential confounding effects. During SNP analysis, the common SNPs were screened from the three cohorts. Next, the χ2 test was used to identify the SNPs that showed no differences. Finally, based on the 11 SNPs identified, we established a risk score and divided the patients into high-risk and low-risk groups. Interaction analysis was also performed on the screened haplogroups. We found that compared with non-A haplogroup and high-risk populations, the incidence of cancer was significantly reduced in haplogroup A and low-risk populations (as shown in Fig. 1).

Figure 1.

Workflow diagram. Flow chart showing the methods used for analysis.

Figure 1.

Workflow diagram. Flow chart showing the methods used for analysis.

Close modal

Cross-tumor analysis of the association between mtDNA haplogroups and occurrence risk

We analyzed the association between mtDNA haplogroups and the occurrence risk of multiple cancers in a total of 2,489 patients with cancer and 715 healthy controls from Northern China. The overall distribution of mtDNA haplogroups was presented in Table 1, with haplogroup D being the most prevalent clade [734 cases (22.91%), 157 healthy controls (21.96%)]. Using logistic regression analysis, we found that individuals with haplogroup A had a significantly lower occurrence risk [OR = 0.553, 95% confidence interval (CI) = 0.375–0.815, P = 0.003] when compared with all non-A haplogroups, followed by those with haplogroup M7 (OR = 0.632, 95% CI = 0.424–0.942, P = 0.024) when compared with all non-M7 haplogroups.

Table 1.

Analysis of association between mtDNA haplogroups and occurrence risk in multiple types of cancer.

Haplogroupa,bControls (n = 715)Cases(n = 2,489)OR (95% CI)P value
69 (9.65%) 188 (7.55%) 0.553 (0.375–0.815) 0.003 
114 (15.94%) 379 (15.23%) 0.721 (0.512–1.015) 0.061 
157 (21.96%) 570 (22.91%) 0.806 (0.583–1.113) 0.189 
44 (6.15%) 134 (5.37%) 0.643 (0.415–0.994) 0.047 
M7 60 (8.39%) 181 (7.27%) 0.632 (0.424–0.942) 0.024 
M8 78 (10.91%) 272 (10.91%) 0.821 (0.568–1.186) 0.292 
N9 41 (5.73%) 131 (5.27%) 0.690 (0.443–1.074) 0.100 
R9 88 (12.31%) 338 (13.58%) 0.867 (0.606–1.239) 0.433 
Haplogroupa,bControls (n = 715)Cases(n = 2,489)OR (95% CI)P value
69 (9.65%) 188 (7.55%) 0.553 (0.375–0.815) 0.003 
114 (15.94%) 379 (15.23%) 0.721 (0.512–1.015) 0.061 
157 (21.96%) 570 (22.91%) 0.806 (0.583–1.113) 0.189 
44 (6.15%) 134 (5.37%) 0.643 (0.415–0.994) 0.047 
M7 60 (8.39%) 181 (7.27%) 0.632 (0.424–0.942) 0.024 
M8 78 (10.91%) 272 (10.91%) 0.821 (0.568–1.186) 0.292 
N9 41 (5.73%) 131 (5.27%) 0.690 (0.443–1.074) 0.100 
R9 88 (12.31%) 338 (13.58%) 0.867 (0.606–1.239) 0.433 

Note: Bold entries indicate statistical significance.

Abbreviations: CI, confidence interval; OR, odds ratio.

aOther haplogroups were used as reference.

bThe haplogroup, such as M9, JT, N*, N1, R*, R0, U, X, with a frequency of less than 5%, were not shown.

Furthermore, we selected haplogroup D, with the highest proportion in our cohort, as the reference group to evaluate the association between haplogroup and the occurrence risk of multiple cancers. As shown in Supplementary Table S2, individuals with haplogroup A still had a significantly lower occurrence risk (OR = 0.686, 95% CI = 0.493–0.954, P = 0.025).

In addition to external validation, we conducted external validation using public data from Southern China, comprising 1,015 patients with cancer and 1,562 healthy controls. As demonstrated in Supplementary Table S3, our findings indicated that individuals with haplogroup M7 exhibited a significantly reduced occurrence risk (OR = 0.627, 95% CI = 0.443–0.887, P = 0.008) compared with all non-M7 haplogroups. Although the q-value is 0.072, suggestive significance was still observed. The inability of the validation set to replicate haplogroup A may be attributed to regional differences, as our in-house data originated from Northern China while the validation set was sourced from Southern China. Furthermore, we utilized our newly acquired in-house data for internal validation, revealing that individuals with haplogroup M7 showed a significantly lower occurrence risk (OR = 0.229, 95% CI = 0.054–0.966, P = 0.045), as presented in Supplementary Table S4.

Analysis of association between mtDNA haplogroups and occurrence risk in single cancer

To further validate our viewpoint on the association between haplogroup A and the risk of multiple cancers, we explored whether this haplogroup was also associated with the occurrence risk of single cancer. In Table 2, we found that individuals with haplogroup A had a significantly lower occurrence risk when compared with other haplogroups in colorectal cancer (OR = 0.474, 95% CI = 0.291–0.773, P = 0.003), lung cancer (OR = 0.456, 95% CI = 0.262–0.794, P = 0.005), and other cancers (OR = 0.548, 95% CI = 0.325–0.924, P = 0.024). Similar results were observed for patients with haplogroup M7, with OR = 0.452 in colorectal cancer (95% CI = 0.270–0.756, P = 0.002), OR = 0.542 in lung cancer (95% CI = 0.310–0.947, P = 0.031) and OR = 0.566 in others (95% CI = 0.330–0.971, P = 0.039). However, no significant association was found for hepatocellular carcinoma.

Table 2.

Analysis of the association between mtDNA haplogroups and occurrence risk in single type of cancer.

Colorectal cancerHepatocellular carcinomaLung cancerOthersb
Haplogroupan = 649OR (95% CI)n = 816OR (95% CI)n = 456OR (95% CI)n = 568OR (95% CI)
47 (7.24%) 0.474 (0.291–0.773)***c 58 (7.11%) 0.737 (0.454–1.196) 30 (6.58%) 0.456 (0.262–0.794)*** 39 (6.99%) 0.548 (0.325–0.924)* 
106 (16.33%) 0.647 (0.427–0.979) 126 (15.44%) 0.969 (0.636–1.476) 66 (14.47%) 0.607 (0.382–0.966)* 77 (13.80%) 0.655 (0.418–1.026) 
118 (18.18%) 0.523 (0.351–0.779)*** 197 (24.14%) 1.100 (0.741–1.634) 113 (24.78%) 0.755 (0.493–1.156) 149 (26.70%) 0.920 (0.610–1.387) 
29 (4.47%) 0.458 (0.260–0.808)*** 33 (4.04%) 0.658 (0.375–1.154) 24 (5.26%) 0.572 (0.311–1.052) 43 (7.71%) 0.948 (0.551–1.631) 
M7 39 (6.01%) 0.452 (0.270–0.756)*** 68 (8.33%) 0.994 (0.613–1.610) 31 (6.80%) 0.542 (0.310–0.947)* 37 (6.63%) 0.566 (0.330–0.971)* 
M8 84 (12.94%) 0.749 (0.481–1.167) 97 (11.89%) 1.090 (0.696–1.708) 49 (10.75%) 0.659 (0.400–1.087) 62 (11.11%) 0.771 (0.477–1.244) 
N9 42 (6.47%) 0.713 (0.417–1.218) 40 (4.90%) 0.855 (0.494–1.482) 24 (5.26%) 0.614 (0.332–1.135) 22 (3.94%) 0.544 (0.294–1.007) 
R9 92 (14.18%) 0.727 (0.472–1.121) 125 (15.32%) 1.245 (0.808–1.919) 58 (12.72%) 0.692 (0.427–1.120) 73 (13.08%) 0.804 (0.506–1.278) 
Colorectal cancerHepatocellular carcinomaLung cancerOthersb
Haplogroupan = 649OR (95% CI)n = 816OR (95% CI)n = 456OR (95% CI)n = 568OR (95% CI)
47 (7.24%) 0.474 (0.291–0.773)***c 58 (7.11%) 0.737 (0.454–1.196) 30 (6.58%) 0.456 (0.262–0.794)*** 39 (6.99%) 0.548 (0.325–0.924)* 
106 (16.33%) 0.647 (0.427–0.979) 126 (15.44%) 0.969 (0.636–1.476) 66 (14.47%) 0.607 (0.382–0.966)* 77 (13.80%) 0.655 (0.418–1.026) 
118 (18.18%) 0.523 (0.351–0.779)*** 197 (24.14%) 1.100 (0.741–1.634) 113 (24.78%) 0.755 (0.493–1.156) 149 (26.70%) 0.920 (0.610–1.387) 
29 (4.47%) 0.458 (0.260–0.808)*** 33 (4.04%) 0.658 (0.375–1.154) 24 (5.26%) 0.572 (0.311–1.052) 43 (7.71%) 0.948 (0.551–1.631) 
M7 39 (6.01%) 0.452 (0.270–0.756)*** 68 (8.33%) 0.994 (0.613–1.610) 31 (6.80%) 0.542 (0.310–0.947)* 37 (6.63%) 0.566 (0.330–0.971)* 
M8 84 (12.94%) 0.749 (0.481–1.167) 97 (11.89%) 1.090 (0.696–1.708) 49 (10.75%) 0.659 (0.400–1.087) 62 (11.11%) 0.771 (0.477–1.244) 
N9 42 (6.47%) 0.713 (0.417–1.218) 40 (4.90%) 0.855 (0.494–1.482) 24 (5.26%) 0.614 (0.332–1.135) 22 (3.94%) 0.544 (0.294–1.007) 
R9 92 (14.18%) 0.727 (0.472–1.121) 125 (15.32%) 1.245 (0.808–1.919) 58 (12.72%) 0.692 (0.427–1.120) 73 (13.08%) 0.804 (0.506–1.278) 

Note: Bold entries indicate statistical significance.

Abbreviations: CI, confidence interval; OR, odds ratio.

aOther haplogroups were used as reference.

bIn the case of relatively low frequency of certain cancer types, such as ovarian cancer, bladder cancer, breast cancer, pancreatic cancer, and renal cell carcinoma, these were combined and analyzed collectively.

c*, P <0.05; ***, P <0.01.

Furthermore, we selected haplogroup D, with the highest proportion, as the reference group to evaluate the conclusion. As shown in Supplementary Table S5, individuals with haplogroup A still had a significantly lower occurrence risk (haplogroup D as reference, hepatocellular carcinoma, OR = 0.598, 95% CI = 0.363–0.983, P = 0.043; lung cancer, OR = 0.604, 95% CI = 0.369–0.988, P = 0.045; others, OR = 0.596, 95% CI = 0.379–0.936, P = 0.025). While the association between haplogroup A and cancer risk did not reach statistical significance in colorectal cancer, the obtained P value (0.063) still suggested a potential trend between haplogroup A and cancer risk.

Analysis of association between mtSNPs and occurrence risk of multiple cancers

We also assessed the distribution of mtDNA SNPs in both patients and controls. As shown in Table 3, among the 61 mtSNPs with MAF>5%, 11 mtSNPs have a significantly different distribution between multiple cancers and controls. Among them, mtSNP sites 14783, 14318, and 6455 were found to have good reproducibility in single cancers. On the basis of these 11 mtSNPs closely related to the occurrence of multiple cancers, we established a risk score to classify the population into high-risk and low-risk groups. The individual risk score was calculated using the following formula: risk score = − (1.954×m.14783T>C) − (0.837×m.14318T>C) − (0.285×m.6455C>T) + (2.061×m.15043G>A) − (1.814×m.7028C>T) + (0.739×m.13759G>A) + (0.985×m.750A>G) + (0.169×m.152T>C) − (0.928×m.12372G>A) + (1.377×m.11719G>A) + (0.895×m.12358A>G). The resulting risk scores ranged from −1.704 to 5.110. On the basis of the median of the risk score, patients were divided into high-risk (n = 1,536) and low-risk (n = 1,668) groups with a median risk score of 1.342 as the cutoff. The low-risk group had a significantly reduced risk of developing multiple cancers (OR = 0.614, 95% CI = 0.507–0.744, P < 0.001). Furthermore, according to the interaction analysis, it was found that compared with non-haplogroup A or M7 and high-risk populations, the incidence risk of cancer in the haplogroup A or M7 and low-risk populations was significantly reduced (OR = 0.42, 95% CI = 0.303–0.582, P < 0.001). In addition, when analyzing the interaction between different subgroups, we discovered that populations who possess both haplogroup A/M7 and low-risk characteristics have a lower occurrence risk of multiple tumors (Pinteraction  =  0.003; Table 4).

Table 3.

Analysis of association between mtSNPs and occurrence risk in multiple types of cancer.

mtSNPaCases (n = 2,489)Controls(n = 715)OR (95% CI)Haplogroup A/M7-related SNPP-valueq- value
14783 1,321 (53.07%) 411 (57.48%) 0.119 (0.065–0.215) YES <0.001 <0.001 
14318 104 (4.18%) 62 (8.67%) 0.209 (0.116–0.374) NO <0.001 <0.001 
6455 164 (6.59%) 56 (7.83%) 0.363 (0.208–0.632) YES <0.001 0.007 
15043 1,319 (52.99%) 367 (51.33%) 5.238 (2.089–13.136) NO <0.001 0.006 
7028 2,415 (97.03%) 707 (98.88%) 0.132(0.042–0.410) NO <0.001 0.005 
13759 162 (6.51%) 26 (3.64%) 4.298 (1.843–10.025) YES <0.001 0.007 
750 2,462 (98.92%) 703 (98.32%) 3.887 (1.665–9.074) NO 0.002 0.015 
152 656 (26.36%) 165 (22.66%) 1.478 (1.158–1.886) YES 0.002 0.013 
12372 128 (5.14%) 48 (6.71%) 0.361 (0.186–0.698) YES 0.002 0.017 
11719 2,453 (98.55%) 703 (98.32%) 4.998 (1.604–15.573) NO 0.006 0.034 
12358 146 (5.87%) 38 (5.31%) 3.070 (1.385–6.809) YES 0.006 0.032 
mtSNPaCases (n = 2,489)Controls(n = 715)OR (95% CI)Haplogroup A/M7-related SNPP-valueq- value
14783 1,321 (53.07%) 411 (57.48%) 0.119 (0.065–0.215) YES <0.001 <0.001 
14318 104 (4.18%) 62 (8.67%) 0.209 (0.116–0.374) NO <0.001 <0.001 
6455 164 (6.59%) 56 (7.83%) 0.363 (0.208–0.632) YES <0.001 0.007 
15043 1,319 (52.99%) 367 (51.33%) 5.238 (2.089–13.136) NO <0.001 0.006 
7028 2,415 (97.03%) 707 (98.88%) 0.132(0.042–0.410) NO <0.001 0.005 
13759 162 (6.51%) 26 (3.64%) 4.298 (1.843–10.025) YES <0.001 0.007 
750 2,462 (98.92%) 703 (98.32%) 3.887 (1.665–9.074) NO 0.002 0.015 
152 656 (26.36%) 165 (22.66%) 1.478 (1.158–1.886) YES 0.002 0.013 
12372 128 (5.14%) 48 (6.71%) 0.361 (0.186–0.698) YES 0.002 0.017 
11719 2,453 (98.55%) 703 (98.32%) 4.998 (1.604–15.573) NO 0.006 0.034 
12358 146 (5.87%) 38 (5.31%) 3.070 (1.385–6.809) YES 0.006 0.032 

Abbreviations: CI, confidence interval; OR, odds ratio.

aIndividuals with the wildtype of mtSNP were used as reference.

Table 4.

Stratified and interaction analysis of mtSNPs-based risk score in multiple types of cancer.

GroupCases (n = 2,489)Controls (n = 715)OR (95% CI)P-value
High-risk 1,268 (33.23%) 268 (37.48%) Ref 
Low-risk 1,221 (66.77%) 447 (62.52%) 0.614 (0.507–0.744) <0.001 
High-risk and haplogroup non-A/M7 1,236 (49.66%) 263 (36.78%) Ref 
High-risk and haplogroup A/M7 32 (1.29%) 5 (0.70%) 0.574 (0.394–0.835) 0.004 
Low-risk and haplogroup non-A/M7 906 (36.40%) 323 (45.17%) 0.558 (0.447–0.696) <0.001 
Low-risk and haplogroup A/M7 315 (12.66%) 124 (17.34%) 0.420 (0.303–0.582) <0.001 
Pinteraction 0.003 
GroupCases (n = 2,489)Controls (n = 715)OR (95% CI)P-value
High-risk 1,268 (33.23%) 268 (37.48%) Ref 
Low-risk 1,221 (66.77%) 447 (62.52%) 0.614 (0.507–0.744) <0.001 
High-risk and haplogroup non-A/M7 1,236 (49.66%) 263 (36.78%) Ref 
High-risk and haplogroup A/M7 32 (1.29%) 5 (0.70%) 0.574 (0.394–0.835) 0.004 
Low-risk and haplogroup non-A/M7 906 (36.40%) 323 (45.17%) 0.558 (0.447–0.696) <0.001 
Low-risk and haplogroup A/M7 315 (12.66%) 124 (17.34%) 0.420 (0.303–0.582) <0.001 
Pinteraction 0.003 

Note: Bold entries indicate statistical significance.

Abbreviations: CI, confidence interval; OR, odds ratio.

Furthermore, as depicted in Supplementary Table S6, we utilized the previously mentioned public dataset for validation purposes. Our analysis revealed a significantly decreased incidence risk (OR = 0.291, 95% CI = 0.212–0.398, P < 0.001) among individuals with low-risk and haplogroup A/M7. Importantly, the results of the interaction analysis were consistent with previous findings.

GSEA in patients with cancer with haplogroup A/M7

To further investigate the mechanisms underlying haplogroup for risk of the multiple cancers occurrence, the transcriptome data of 30 patients with colorectal cancer, 30 patients with lung cancer with haplogroup A/M7, haplogroup B, and haplogroup D were downloaded from GEO and analyzed. We standardized the GSE107422 and GSE165611 matrix data and used recognized oncogenic signaling pathways in the KEGG pathway and HALLMARK gene sets as validation standards. Subsequently, GSEA was employed to identify functional enrichment gene sets in the A/M7 haplogroup population, which exhibits lower susceptibility to cancer. The results revealed that in colorectal cancer, the A/M7 haplogroup population was primarily associated with downregulation of the Hedgehog, Wnt, and NOTCH signaling pathways (Fig. 2A, C, and E). These findings suggest that individuals in the A/M7 haplogroup may have decreased neoplastic proliferation capability, thus providing cellular protection against tumor development. Similar results were observed in lung cancer as well (Fig. 2B, D, and F).

Figure 2.

GSEA for patients with haplogroup A/M7 in colorectal cancer. GSEA: Hedgehog signaling pathway in colorectal cancer (A); Hedgehog signaling pathway in lung cancer (B); Wnt signaling pathway in colorectal cancer (C); Wnt signaling pathway in lung cancer (D); NOTCH signaling pathway in colorectal cancer (E); NOTCH signaling pathway in lung cancer (F).

Figure 2.

GSEA for patients with haplogroup A/M7 in colorectal cancer. GSEA: Hedgehog signaling pathway in colorectal cancer (A); Hedgehog signaling pathway in lung cancer (B); Wnt signaling pathway in colorectal cancer (C); Wnt signaling pathway in lung cancer (D); NOTCH signaling pathway in colorectal cancer (E); NOTCH signaling pathway in lung cancer (F).

Close modal

Previous studies have established a correlation between mtDNA genetic mutations and the risk of developing cancer. However, most current research has focused on the relationship between haplogroups or mtSNPs and the incidence risk of a single cancer (21–23). To our knowledge, no study has investigated the association between haplogroups or mtSNPs and the risk of multiple cancers. In light of this, our study explored the pleiotropy of haplogroups or mtSNPs in multiple cancers and aimed to identify important predictive biomarkers for cancer in the population. These findings could serve as a basis for developing more effective screening and prevention strategies for individuals at high risk of developing multiple cancers.

Our previous study revealed that haplogroup M7 is associated with a reduced risk of colorectal cancer (9), which is consistent with its role in reducing the risk of hepatocellular carcinoma (10). In this study, we have shown for the first time that haplogroup A and M7 may also decrease the risk of developing multiple cancers. Our findings add further support to the role of haplogroups in reducing cancer risk and validate our previous results. However, it has also been reported that haplogroup M7 is associated with an increased risk of lung cancer (24), which conflicts with our conclusion. This discrepancy may be due to different environmental conditions, as our population mainly comes from Northern China while their population mainly comes from Southwestern China, with different genetic backgrounds leading to different effects of the same haplogroup on cancer risk. Besides, we speculate that genetic pleiotropy could also contribute to the discordant findings. The tumor microenvironment encountered by the same haplogroup population in different types of cancer may activate different carcinogenic signals or other processes, leading to mutations that interact and restrict one another, ultimately affecting physiologic and biochemical reactions and resulting in different cancer risks. However, the specific mechanisms underlying these observations remain unclear. In addition, another report suggested that haplogroup N9a is negatively correlated with the incidence of hepatocellular carcinoma in Northern China, but we were unable to replicate this conclusion in our study. Although both our study cohort and theirs are from Northern China, there may be certain regional differences as our population mainly comes from Shaanxi province while theirs is from Henan province. By using the public data (12), we found similar proportions of N9a population between Henan province and Shaanxi province (4.63% vs. 4.45%), but different proportions of N9a population between our study cohort and theirs (4.48% vs. 5.1%), which may explain the discrepancies in our data analysis. Therefore, we believe that our results highlight the importance of considering regional differences in genetic studies.

In recent years, researchers have begun to explore the relationship between classifiers composed of a class of related SNPs and cancers (18, 25). On the basis of our results, we found that the selected mtSNPs were not closely linked, which is consistent with the current situation in multiple cancer studies. It is challenging to use a single or haplogroup A/M7-related mtSNPs to indicate the risk of multiple cancers due to the strong heterogeneity between tumors. In addition, we found that most of the selected mtSNPs were distributed across multiple haplogroups, indicating the feasibility of using this widely distributed mtSNP set to indicate the risk of multiple cancers. For example, m.152T>C is present in almost all haplogroups, while m.14783T>C is also located on the main trunk of the M branch, encompassing the majority of haplogroups. Furthermore, out of the 11 mtSNPs associated with the occurrence risk of multiple tumors, seven are linked to haplogroup A or M7, suggesting that these haplogroups primarily influence tumor risk through the effects of these mtSNPs. The remaining five mtSNPs are not associated with haplogroup A or M7, but interaction analysis indicates that stratifying individuals based on these 11 mtSNPs and haplogroup A can lead to further delineation of different tumor risk subpopulations.

To further explore potential functional mechanisms, we collected public RNA sequencing data to analyze differentially expressed genes (DEG) in colorectal cancer and lung cancer tissues, representing the multiple types of tumors studied with better result repeatability. According to GSEA, DEGs between haplogroup A/M7 and other haplogroups exhibited a strong association with lower expression of carcinogenic signaling pathways in different cancers, resulting in decreased neoplastic proliferation and reduced susceptibility to tumor occurrences. Uncontrolled proliferation is a hallmark of tumor cells, and the pathways related to cancer susceptibility in haplogroup A/M7 populations are all involved in cell proliferation regulation. Previous studies have consistently indicated that Hedgehog signaling pathway (26, 27), Wnt signaling pathway (28, 29), and NOTCH signaling pathway (30, 31) can promote tumor growth in various cancer types and are key signaling pathways contributing to the growth of human tumors. Consequently, haplogroup A/M7 populations may reduce the expression of certain carcinogenic signaling pathways, thereby inhibiting uncontrolled malignant proliferation of their own cells and playing a less susceptible role in developing cancer.

The limitations of our study should be noted. As a retrospective study, the generalizability of our findings is limited as it only included patients from Northern China. Therefore, further validation through prospective studies with larger cohorts and in other populations is necessary. In addition, the SNP-based risk score needs to be validated further through prospective multicenter cohort studies. Moreover, laboratory-based basic research is needed to reveal the mechanisms underlying the association between haplogroups and mtSNPs with multiple cancers occurrence.

In this retrospective analysis, we have identified a significant association between haplogroup and mtSNPs and the occurrence of multiple cancers in Northern China. In addition, the haplogroup and mtSNP-based risk score helps to develop for risk susceptibility prediction of patients with cancer, which helps to identity a special group of individuals with lower tumor incidence risk. Therefore, our findings suggest that the haplogroup and mtSNP-based risk score may be a practical and reliable predictor of risk for multiple cancers, providing a potential tool for early screening in clinical settings for individuals in the Northern Chinese population.

No disclosures were reported.

D. Chen: Resources, data curation, formal analysis, writing–original draft. Z. Yan: Resources, data curation, formal analysis, methodology, writing–original draft. Q. Yuan: Resources, data curation, software, formal analysis, methodology. F. Xie: Resources, data curation, software. Y. Liu: Resources. Z. Feng: Resources. Z. Wang: Resources, methodology. F. Zhou: Supervision, funding acquisition, writing–review and editing. J. Xing: Conceptualization, supervision, funding acquisition, project administration, writing–review and editing. Z. Zhang: Supervision, funding acquisition, project administration. F. Wang: Resources, supervision, project administration. X. Guo: Conceptualization, supervision, funding acquisition, methodology, project administration, writing–review and editing.

We thank the patients for their participation in the study. X. Guo was funded by the Autonomous Project of State Key Laboratory of Holistic Integrative Management of Gastrointestinal Cancers (CBSKL2022ZZ53) and the Key Research and Development Program of Shaanxi Province (2023-ZDLSF-46); J. Xing was funded by the Key Research and Development Program of Shaanxi Province (2022SF-231); Z. Zhang was funded by the Key Research and Development Program of Xuzhou (KC19172, KC21212).

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

1.
Wallace
DC
.
A mitochondrial paradigm of metabolic and degenerative diseases, aging, and cancer: a dawn for evolutionary medicine
.
Annu Rev Genet
2005
;
39
:
359
407
.
2.
Bock
FJ
,
Tait
SWG
.
Mitochondria as multifaceted regulators of cell death
.
Nat Rev Mol Cell Biol
2020
;
21
:
85
100
.
3.
Grady
JP
,
Pickett
SJ
,
Ng
YS
,
Alston
CL
,
Blakely
EL
,
Hardy
SA
, et al
.
mtDNA heteroplasmy level and copy number indicate disease burden in m.3243A>G mitochondrial disease
.
EMBO Mol Med
2018
;
10
:
e8262
.
4.
Taylor
RW
,
Turnbull
DM
.
Mitochondrial DNA mutations in human disease
.
Nat Rev Genet
2005
;
6
:
389
402
.
5.
Wallace
DC
.
Mitochondrial genetic medicine
.
Nat Genet
2018
;
50
:
1642
9
.
6.
Goncalves
VF
.
Mitochondrial genetics
.
Adv Exp Med Biol
2019
;
1158
:
247
55
.
7.
Ma
L
,
Fu
Q
,
Xu
B
,
Zhou
H
,
Gao
J
,
Shao
X
, et al
.
Breast cancer-associated mitochondrial DNA haplogroup promotes neoplastic growth via ROS-mediated AKT activation
.
Int J Cancer
2018
;
142
:
1786
96
.
8.
Cocos
R
,
Schipor
S
,
Badiu
C
,
Raicu
F
.
Mitochondrial DNA haplogroup K as a contributor to protection against thyroid cancer in a population from southeast Europe
.
Mitochondrion
2018
;
39
:
43
50
.
9.
Yuan
Q
,
Su
L
,
Wang
T
,
Liu
Y
,
Lu
Z
,
Zhou
K
, et al
.
Mitochondrial DNA haplogroup M7 confers a reduced risk of colorectal cancer in a Han population from northern China
.
J Cell Mol Med
2021
;
25
:
7538
44
.
10.
Chen
C
,
Ba
Y
,
Li
D
,
Du
X
,
Lia
X
,
Yang
H
, et al
.
Genetic variations of mitochondrial genome modify risk and prognosis of hepatocellular carcinoma patients
.
Clin Res Hepatol Gastroenterol
2017
;
41
:
378
85
.
11.
Zhao
Q
,
Wang
F
,
Chen
YX
,
Chen
S
,
Yao
YC
,
Zeng
ZL
, et al
.
Comprehensive profiling of 1015 patients' exomes reveals genomic-clinical associations in colorectal cancer
.
Nat Commun
2022
;
13
:
2342
.
12.
Li
YC
,
Ye
WJ
,
Jiang
CG
,
Zeng
Z
,
Tian
JY
,
Yang
LQ
, et al
.
River valleys shaped the maternal genetic landscape of Han Chinese
.
Mol Biol Evol
2019
;
36
:
1643
52
.
13.
Zhou
K
,
Mo
Q
,
Guo
S
,
Liu
Y
,
Yin
C
,
Ji
X
, et al
.
A Novel next-generation sequencing-based approach for concurrent detection of mitochondrial DNA copy number and mutation
.
J Mol Diagn
2020
;
22
:
1408
18
.
14.
Chen
S
,
Zhou
Y
,
Chen
Y
,
Gu
J
.
fastp: an ultra-fast all-in-one FASTQ preprocessor
.
Bioinformatics
2018
;
34
:
i884
i90
.
15.
Barrett
T
,
Wilhite
SE
,
Ledoux
P
,
Evangelista
C
,
Kim
IF
,
Tomashevsky
M
, et al
.
NCBI GEO: archive for functional genomics data sets–update
.
Nucleic Acids Res
2013
;
41
:
D991
5
.
16.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
17.
Mootha
VK
,
Lindgren
CM
,
Eriksson
KF
,
Subramanian
A
,
Sihag
S
,
Lehar
J
, et al
.
PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes
.
Nat Genet
2003
;
34
:
267
73
.
18.
Yan
Z
,
Yuan
Q
,
He
Y
,
Peng
F
,
Liu
Y
,
Zhang
H
, et al
.
Mitochondrial DNA haplogroup M7: a predictor of poor prognosis for colorectal cancer patients in Chinese population
.
Cancer Sci
2023
;
114
:
1056
66
.
19.
Fan
L
,
Yao
YG
.
MitoTool: a web server for the analysis and retrieval of human mitochondrial DNA sequence variations
.
Mitochondrion
2011
;
11
:
351
6
.
20.
van Oven
M
,
Kayser
M
.
Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation
.
Hum Mutat
2009
;
30
:
E386
94
.
21.
Blein
S
,
Bardel
C
,
Danjean
V
,
McGuffog
L
,
Healey
S
,
Barrowdale
D
, et al
.
An original phylogenetic approach identified mitochondrial haplogroup T1a1 as inversely associated with breast cancer risk in BRCA2 mutation carriers
.
Breast Cancer Res
2015
;
17
:
61
.
22.
Wang
H
,
Gu
D
,
Yu
M
,
Hu
Y
,
Chen
Z
,
Huo
X
, et al
.
Variation rs9929218 and risk of the colorectal cancer and adenomas: a meta-analysis
.
BMC Cancer
2021
;
21
:
190
.
23.
Samanic
CM
,
Teer
JK
,
Thompson
ZJ
,
Creed
JH
,
Fridley
BL
,
Burt Nabors
L
, et al
.
Mitochondrial DNA sequence variation and risk of glioma
.
Mitochondrion
2022
;
63
:
32
6
.
24.
Zheng
S
,
Qian
P
,
Li
F
,
Qian
G
,
Wang
C
,
Wu
G
, et al
.
Association of mitochondrial DNA variations with lung cancer risk in a Han Chinese population from southwestern China
.
PLoS One
2012
;
7
:
e31322
.
25.
Tian
XP
,
Ma
SY
,
Young
KH
,
Ong
CK
,
Liu
YH
,
Li
ZH
, et al
.
A composite single-nucleotide polymorphism prediction signature for extranodal natural killer/T-cell lymphoma
.
Blood
2021
;
138
:
452
63
.
26.
Zhang
B
,
Zhuang
T
,
Lin
Q
,
Yang
B
,
Xu
X
,
Xin
G
, et al
.
Patched1-ArhGAP36-PKA-Inversin axis determines the ciliary translocation of smoothened for Sonic Hedgehog pathway activation
.
Proc Natl Acad Sci U S A
2019
;
116
:
874
9
.
27.
Guo
P
,
Chen
Q
,
Peng
K
,
Xie
J
,
Liu
J
,
Ren
W
, et al
.
Nuclear receptor coactivator SRC-1 promotes colorectal cancer progression through enhancing GLI2-mediated Hedgehog signaling
.
Oncogene
2022
;
41
:
2846
59
.
28.
Miete
C
,
Solis
GP
,
Koval
A
,
Bruckner
M
,
Katanaev
VL
,
Behrens
J
, et al
.
Galphai2-induced conductin/axin2 condensates inhibit Wnt/beta-catenin signaling and suppress cancer growth
.
Nat Commun
2022
;
13
:
674
.
29.
Liu
J
,
Xiao
Q
,
Xiao
J
,
Niu
C
,
Li
Y
,
Zhang
X
, et al
.
Wnt/beta-catenin signalling: function, biological mechanisms, and therapeutic opportunities
.
Signal Transduct Target Ther
2022
;
7
:
3
.
30.
Meurette
O
,
Mehlen
P
.
Notch signaling in the tumor microenvironment
.
Cancer Cell
2018
;
34
:
536
48
.
31.
Zhou
B
,
Lin
W
,
Long
Y
,
Yang
Y
,
Zhang
H
,
Wu
K
, et al
.
Notch signaling pathway: architecture, disease, and therapeutics
.
Signal Transduct Target Ther
2022
;
7
:
95
.