Lung cancer is the leading cause of cancer-related deaths worldwide. Cytologic examination is the current “gold standard” for lung cancer diagnosis, however, this has low sensitivity. Here, we identified a typical methylation signature of histone genes in lung cancer by whole-genome DNA methylation analysis, which was validated by The Cancer Genome Atlas (TCGA) lung cancer cohort (n = 907) and was further confirmed in 265 bronchoalveolar lavage fluid samples with specificity and sensitivity of 96.7% and 87.0%, respectively. More importantly, HIST1H4F was universally hypermethylated in all 17 tumor types from TCGA datasets (n = 7,344), which was further validated in nine different types of cancer (n = 243). These results demonstrate that HIST1H4F can function as a universal-cancer-only methylation (UCOM) marker, which may aid in understanding general tumorigenesis and improve screening for early cancer diagnosis.

Significance:

These findings identify a new biomarker for cancer detection and show that hypermethylation of histone-related genes seems to persist across cancers.

Lung cancer is one of the most common malignant tumors and the leading cause of cancer-related deaths worldwide (1, 2). Early detection and surgery offer the best chance for survival, with the 5-year survival rate as high as 80% (3). However, most patients with lung cancer have been diagnosed with inoperable advanced stage with metastasis, and patients must undergo chemotherapy, radiotherapy, immunotherapy, or targeted therapy. The 5-year survival rate of patients in the advanced stage is below 10% (4, 5). Over the past decade, low-dose CT (LDCT) is the most commonly used screening method for lung cancer, which has been shown to improve early detection and reduce mortality (6). However, due to its low specificity, LDCT is far from satisfactory as a screening tool for clinical application, similar to other currently used cancer biomarkers, such as carcinoembryonic antigen (CEA), neuron-specific enolase, CYFRA 21-1, etc. Therefore, effective biomarkers for early detection, diagnosis, prognosis, and monitoring of lung cancer are urgently needed (7).

Epigenetic and genetic abnormalities are hallmarks of lung cancer (8–10). Abnormal DNA methylation is the most common epigenetic variation in the process of lung cancer. Compared with DNA mutations, DNA methylation occurs much earlier and is more stable in the early diagnosis of tumors, and aberrant DNA methylation pattern can be used for predicting the liver cancer metastasis to lung (11). Although many DNA methylation biomarkers have been reported, they are still under the exploration process and rarely used in clinical applications. Sensitivity and specificity of current methylation markers are insufficient with high false positives and false negatives risk (12, 13). Therefore, applying methylation markers to clinical applications is challenging, and searching for new biomarkers for the early detection of cancer is urgently needed (14).

Histones are major essential components of chromatin and conserved in eukaryotic cells (15). There are five major types of histones: H1, H2A, H2B, H3, and H4. Histones H2A, H2B, H3, and H4 are known as the core histones, whereas histone H1 is known as the linker histone (16). Histones are divided into canonical replication-dependent histones that are expressed during the S-phase of the cell cycle and replication-independent histone variants, which are expressed during each phase of the cell cycle. Genes encoding canonical histones are intron-less and lack a polyA tail at the 3′ end, having instead a stem-loop structure, and canonical histone genes also tend to be clustered in the genome. Genes encoding histone variants are usually not clustered and have introns and polyA tails (17, 18). In the human genome, histone genes mainly form histone cluster 1 (Chr6p21) and histone cluster 2 (Chr1q21; ref. 19). Other histone genes are distributed randomly in the human genome. Although histone modifications have been extensively studied in chromatin regulation, epigenetic variation in the family of histone genes themselves is rarely considered. It has been shown that histone gene cluster 1 is occupied by abnormally higher order chromatin organization in breast cancer (20). However, DNA methylation alteration in histone genes' loci has not yet been systematically investigated, especially in cancer development.

Here, through genome-wide DNA methylation analysis with an unusual strategy, we found that many histone gene loci are abnormally hypermethylated in lung cancer, which piqued our interest for further investigation. We demonstrate that methylation of histone genes can be used as a biomarker for early detection in bronchoalveolar lavage fluid (BALF) samples. Furthermore, histone gene loci are not only abnormally hypermethylated in lung cancer but also specifically methylated in various tumors. In particular, the HIST1H4F gene is abnormally hypermethylated in 17 types of cancer, which could act as a potential universal-cancer-only methylation (UCOM) marker. We speculate that the methylation of HIST1H4F will be of great significance for early diagnosis, especially during the screening process of cancer in clinical applications.

Whole genome bisulfite sequencing data analysis

Whole genome bisulfite sequencing (WGBS) datasets were downloaded from the Encode database (https://www.encodeproject.org/) and the SRA database (https://www.ncbi.nlm.nih.gov/sra); the serial numbers are summarized in Supplementary Table S1. DNA methylation levels were calculated using BSMAP software (21) as described previously (11), where hg19 human genome assembly and University of California, Santa Cruz (UCSC) reference gene annotations were used. Specifically, for each CpG site, reads supporting either methylation or unmethylation were achieved, and the methylation value was calculated as the ratio of the number of reads supporting methylation to the sum of the number of reads supporting both methylation and unmethylation. Only CpG sites covered by more than five reads and detected in all the seven WGBS datasets were used for subsequent analysis.

Differentially methylated sites, differentially methylated region, and differentially methylated genes definition

The methylation levels of four normal lung cell samples (NC) and three lung cancer cell samples (CC) were calculated. For each CpG site, we calculated methylation value difference for all 12 CC – NC pairs (CCi – NCj, where i = 1, 2, or 3 and j = 1, 2, 3, or 4). CpG sites with all 12 (CCi – NCj) ≥ 50% were defined as cancer cell-differentially methylated sites (CC-DMS). Similarly, CpG sites with all 12 (NCj – CCi) ≥ 50% were defined as normal cell-differentially methylated sites (NC-DMS). In addition, CpG sites with all 12 (|CCi – NCj|) ≤ 20% were defined as NO-differentially methylated sites (NO-DMS). A differentially methylated region (DMR) was defined as at least three adjacent DMS within 100 bp genomic window. Genes overlapping with any DMR were defined as differentially methylated genes (DMG).

The Cancer Genome Atlas DNA methylation data analysis

The Illumina 450K methylation array level three data from The Cancer Genome Atlas (TCGA) database were downloaded from the UCSC Xena browser (https://xenabrowser.net/). For each histone gene, only probes within the gene-body region (listed in Supplementary Table S2) were selected to calculate an average methylation value. Probes with “NA” values were excluded. The absolute methylation values were calculated from the β values of 450K methylation array [methylation value = (β value + 0.5) × 100%]. For each gene, the final methylation value was calculated by the average of all CpG sites selected. The samples used from TCGA database and the methylation levels of HIST1H4F are listed in Supplementary Table S3.

Clinical samples

We collected 243 primary tissue samples and 265 BALF samples from Shanghai Chest Hospital and Zhongshan Hospital of Fudan University. Primary tissue samples included 25 lung cancer and 25 paired para-cancer control samples, 12 colorectal cancer and 12 paired para-cancer control samples, 10 esophagus cancer and 12 paired para-cancer control samples, 20 liver cancer and 23 para-cancer control samples, nine pancreatic cancer and nine paired para-cancer control samples, 10 cervical cancer and 10 control samples, 10 gastric cancer and 10 para-cancer control samples, 14 breast cancer and 14 paired para-cancer control samples, and 10 head and neck cancer and 10 paired para-cancer control samples. Clinical characters of these samples are summarized in Supplementary Table S4. BALF samples contained a benign lung disease (BLD) control group and lung cancer group. BLD control group contained 59 samples, including pneumonia, emphysema, tuberculosis, etc. The lung cancer experimental group included 92 lung squamous cell carcinoma (LUSC) samples, 70 lung adenocarcinoma samples, and 44 small-cell lung carcinoma (SCLC) samples. BALF samples were randomly assigned to a training set and a validation set. All patients provided written informed consent before their samples were collected. Institutional review board's approval for research on human subjects was obtained from the hospital.

DNA extraction and bisulfite-PCR pyrosequencing

Genomic DNA from cultured cell lines and primary tissue samples was extracted with phenol-chloroform. Genomic DNA from BALF samples was extracted with the Qiagen DNA Extraction Kit (Qiagen, catalog no. 51404). Next, 20–200 ng genomic DNA was taken for bisulfite treatment (ZYMO Research, catalog no. D5006), and the recovered bisulfite-treated DNA was used as the subsequent PCR template. We detected 11 CpG sites for HIST1H4F gene (chr6:26,240,743–26,240,800) and eight CpG sites for HIST1H4I gene (chr6:27,107,185–27,107,239). The genomic sequences and primers designed for target genes are listed in Supplementary Table S5. Two rounds of seminested PCR were performed to produce the single-band biotin-modified PCR products. The out forward primer and the reverse primer were used for the first round of PCR amplification. The inner forward primer and the reverse primer were used for the second round of PCR amplification. The two-round PCR was performed with the same program: 98°C 30 seconds for predenaturation and 98°C 10 seconds, 58°C 30 seconds, and 72°C 30 seconds for a 30-cycle amplification, 72°C 3 minutes for a final elongation. The pyrosequencing assay was performed on a PyroMark Q96 ID Instrument (Qiagen). For each target gene, the average of each CpG site detected by pyrosequencing matched the final methylation value.

Cell culture

The human lung cancer cell line A549, human lung fibroblast cell line MRC5, and human hepatocarcinoma cell line HepG2 were kindly provided by Stem Cell Bank, Chinese Academy of Sciences. All the cell lines were authenticated by the PowerPlex 16 System (Promega) and were negative for Mycoplasma tested by qPCR. A549, MRC5, and HepG2 cells were cultured in DMEM supplemented with 10% v/v FBS and 1% v/v antibiotics at 37°C in a humidified atmosphere of 5% CO2. For passaging, cells were washed once by PBS and dissociated using 1 mL 0.25% trypsin, then neutralized with 1 mL DMEM, and equally plated into two 10-cm dishes.

The pipeline of genome-wide WGBS data analysis and identified DMRs validated by the TCGA cohort

To detect genome-wide screening DNA methylation biomarkers for the early diagnosis of lung cancer, we collected three WGBS datasets of lung cancer cells and another four WGBS datasets of cell samples derived from normal lung tissues as controls (Supplementary Table S1). To effectively screen for lung cancer biomarkers from these WGBS datasets, we developed a new data analysis strategy (Fig. 1A).

  • (i) We performed a genome-wide methylation analysis for each WGBS sample and obtained all CpG sites covered by more than five reads. By this process, we obtained at least 30 × 106 CpG sites per sample, covering at least 55.7% of whole genomes (Supplementary Table S1). To robustly analyze the difference between normal and cancer samples at single-nucleotide resolution, only CpG sites detected in all seven samples were selected for further analysis. In total, 19,461,312 CpG sites were selected, covering 34.5% of all possible sites in human genome. This rate is much higher than both the reduced representation bisulfite sequencing, whose coverage was estimated to be 1%–3%, and the Illumina 450k methylation array, covering 485,455 CpG sites and accounting for approximately 2% of all possible sites (22, 23). The average methylation levels showed that the cancer samples were hypomethylated compared with normal ones (Fig. 1B), which is consistent with the previous report that cancer is globally hypomethylated. Meanwhile, the 19,461,312 CpG sites were expected to distribute throughout the whole genome, including intergenic, intron, exon, and promoter regions (Fig. 1C). These results indicate that our approach is applicable throughout the genome with minor sequence bias (Supplementary Fig. S1A).

  • (ii) Based on the 19,461,312 CpG sites, by calculating the methylation differences between CCi and NCj, we found 24,257 CC-DMS, 442,233 NC-DMS, and 4,456,347 NO-DMS, which accounted for 0.12%, 2.27%, and 22.9% of all 19,461,312 CpG sites, respectively. Compared with the equilibrium distribution of all the 19,461,312 CpG sites (Fig. 1C), CC-DMS were obviously enriched in the promoter and exonic regions (hypergeometric test, P < 1e-5; Fig. 1D); NC-DMS was enriched in the intergenic region (hypergeometric test, P < 1e-5; Fig. 1E); meanwhile, NO-DMS were mostly enriched in the intronic region (hypergeometric test, P < 1e-5; Fig. 1F). In addition, 13,932 CpG sites of 24,257 CC-DMS (57.4%) were located in CpG island regions. In contrast, only 3,518 CpG sites of 442,233 NC-DMS (0.8%) were located in the CpG island regions, indicating that DNA methylation in tumor usually occurred in cis-regulating elements. However, for NO-DMS, hypomethylated CpG sites (methylation level ≤ 20%) were mainly distributed in the promoter region (Supplementary Fig. S1B), whereas hypermethylated CpG sites (methylation level ≥ 80%) were mainly distributed in the intronic region (Supplementary Fig. S1C). These results reveal that the cancer cells are globally hypomethylated and locally hypermethylated, and these locally hypermethylated regions are mainly distributed in promoter and exonic regions.

  • (iii) Similar to the genetic linkage effect, DNA methylation within a small genome region also tends to be consistent (24). On the basis of this principle, adjacent CpG sites together among regional DNA methylation behavior are much more reliable than single CpG sites. For example, DMR or methylation haplotypes have been widely used for DNA methylation analysis. Therefore, we further defined DMR by more than three DMS within the 100 bps genome region. Among the 24,257 CC-DMS sites, we identified 2,408 CC-DMR. Calculating on the 442,233 NC-DMS, we found 36,393 NC-DMR. Meanwhile, based on 4,456,347 NO-DMS, we found 435,249 NO-DMR. We further analyzed these DMR-embedded genes. There were 958 CC-DMR–related genes and 1,925 NC-DMR–related genes, which we called CC-DMG (cancer cell-differentially methylated genes; Supplementary Table S6) and NC-DMG (normal cell-differentially methylated genes; Supplementary Table S7). We calculated the methylation levels of CC-DMG and NC-DMG in WGBS and TCGA data (Fig. 1G and H; Supplementary Tables S6 and S7). Kyoto Encyclopedia of Genes and Genomes pathway analysis showed that CC-DMG were mainly enriched in tumor-associated signaling pathways, such as the Hippo signaling pathway and transcriptional misregulation in cancer. NC-DMG was enriched in olfactory transduction with less link to tumor-related signaling pathways. NO-DMG was mainly enriched in basic cellular function–related pathways (Supplementary Fig. S1D). Interestingly, both CC-DMG and NC-DMG were enriched in the neuroactive ligand–receptor interaction signaling pathway. Particularly, some adrenaline signaling–related genes, such as ADRA1A, ADRA2A, ADRA2C, and ADRBK1, appeared in the CC-DMG list, but some cholinergic signaling–related genes, such as CHRM2, CHRM3, and CHRM5, were found in the NC-DMG list. The variation in DNA methylation in nerve-related genes indicates that neuroregulation plays an important role in the genesis and development of lung cancer, which is supported by evidence from several groups showing that cancer development in a variety of tissues is controlled by an assortment of nerve-mediated signals, including neurotransmitters and other molecules (25–27), indicating that epigenetic regulation of neuron-related genes will be of great interest in cancer development. As expected, many renowned lung cancer methylation biomarkers that were reported in the literature are among our CC-DMG list, for example, SHOX2, POU4F2, BCAT1, HOXA9, and PTGDR (28–32). These results further support our strategy of analysis.

  • (iv) To further confirm the veracity of the WGBS analysis, we downloaded the Illumina 450K methylation array data of the TCGA lung cancer cohort. The Illumina 450K methylation array contains 485,455 CpG probes. The TCGA lung cancer cohort contained a total of 907 samples, including 75 para-cancer normal control samples and 832 lung cancer samples (Supplementary Table S8). We selected overlapping detected CpG sites among 450K probes and CpGs in DMS/DMR/DMG (Fig. 1A; Supplementary Fig. S1E) to verify our WGBS analysis. In the 485,455 450K probes, 845 and 1,662 CpG sites were commonly detected in 450K probes with CC-DMS and NC-DMS, respectively. Methylation levels of CC-DMS and NC-DMS from WGBS were clearly either hypermethylated or hypomethylated between cancer and normal samples in TCGA datasets accordingly. There were 624 and 840 CpG sites commonly detected in 450K probes with CC-DMR and NC-DMR, respectively. Similarly, CC-DMR and NC-DMR obtained from WGBS are also verified by TCGA datasets (Supplementary Fig. S1F–S1I). As for DMG, 401 and 377 CpG sites were both detected in 450K probes with CC-DMG and NC-DMG, respectively, and their DNA methylation status was all supported by TCGA datasets (Fig. 1H). Taken together, our results can be fully verified by lung cancer 450K methylation array data from the TCGA, which further prove the validity of our previously analyzed approach.

Figure 1.

Systemic analysis of WGBS data and validation by TCGA datasets. A, Outline of WGBS data analysis and TCGA data validation. Four NC-WGBS data and three CC-WGBS data were collected, then CpG sites detected by all seven samples were selected to do subsequent analysis. DMS was defined by the methylation difference between CCi and NCj, DMR was defined by continuous three DMS in 100 bps region, and DMG was defined by DMR-embedded genes. CCi represents any of cancer cell samples, and NCj represents any of normal cell samples. B, Average methylation level of each normal and cancer sample in WGBS data showed cancer genomes are global hypomethylated (Wilcox test, P = 0.057). C–F, Genomic distribution of all detected CpG sites, CC-DMS, NC-DMS, and NO-DMS, and the promoter region was defined by TSS ± 1k. G, Heatmap of CC-DMG and NC-DMG methylation from WGBS data. Each row represents one gene. H, Validation of CC-DMG and NC-DMG methylation by TCGA datasets. Blue dot, CC-DMG; red dot, NC-DMG. The x-axis represents the average methylation of normal samples in TCGA data, and the y-axis represents the average methylation of cancer samples in TCGA data. TSS, transcriptional start sites.

Figure 1.

Systemic analysis of WGBS data and validation by TCGA datasets. A, Outline of WGBS data analysis and TCGA data validation. Four NC-WGBS data and three CC-WGBS data were collected, then CpG sites detected by all seven samples were selected to do subsequent analysis. DMS was defined by the methylation difference between CCi and NCj, DMR was defined by continuous three DMS in 100 bps region, and DMG was defined by DMR-embedded genes. CCi represents any of cancer cell samples, and NCj represents any of normal cell samples. B, Average methylation level of each normal and cancer sample in WGBS data showed cancer genomes are global hypomethylated (Wilcox test, P = 0.057). C–F, Genomic distribution of all detected CpG sites, CC-DMS, NC-DMS, and NO-DMS, and the promoter region was defined by TSS ± 1k. G, Heatmap of CC-DMG and NC-DMG methylation from WGBS data. Each row represents one gene. H, Validation of CC-DMG and NC-DMG methylation by TCGA datasets. Blue dot, CC-DMG; red dot, NC-DMG. The x-axis represents the average methylation of normal samples in TCGA data, and the y-axis represents the average methylation of cancer samples in TCGA data. TSS, transcriptional start sites.

Close modal

Abnormally hypermethylated signature of histone gene in lung cancer

In addition to some already acknowledged biomarkers, such as SHOX2 and POU4F2, we effectively found many unreported new genes on our CC-DMG list. More interestingly, some histone genes appeared on the CC-DMG list, such as HIST1H3C, HIST1H4F, and HIST1H4I, which called for further investigation.

As essential and conserved housekeeping genes, histones are stably expressed in almost all eukaryotic cells. Because of the important function of histones, each histone protein is encoded by multiple histone genes (19). In total, 85 histone genes have been found in the human genome, including 68 canonical histone genes and 17 histone variant genes. Canonical histone genes include six H1 genes, 17 H2a genes, 18 H2b genes, 13 H3 genes, and 14 H4 genes. Variant histone genes include four H1 variants, seven H2a variants, two H2b variants, and four H3 variants. Histone modifications have been widely investigated in the epigenetic field (33, 34). Unfortunately, DNA methylation of the histone gene family has not been well described in the literature. We summarize the 85 histone genes in Supplementary Table S9.

We further focus on the analysis of DNA methylation of the whole histone gene family in WGBS data (Fig. 2A). Four histone genes were not included in our analysis (HIST2H2AA4, HIST2H3C, HIST2H4A, and H2BFS), because they were not all detected in the WGBS dataset; therefore, we excluded these four genes from the subsequent analysis. According to the DNA methylation signature of histone gene in normal and cancer samples, they can be divided into seven groups. As shown in group 1, normal and cancer cells are all poorly methylated, and meanwhile, group 2 histone genes are all highly methylated in normal and cancer samples. Group 3 histone genes are randomly methylated in normal and cancer samples, and group 4, including 14 histone genes (HIST1H4I, HIST1H2BM, HIST1H3C, HIST1H4F, HIST1H2BB, HIST1H2BE, HIST1H1A, HIST1H2BI, HIST1H3G, HIST1H2AD, HIST1H2BE, HIST1H3J, HIST1H2BH, and HIST1H4D), were hypermethylated in all lung cancer samples (Fig. 2B; Supplementary Fig. S2A). To confirm this finding, we reanalyzed the methylation of these hypermethylated histone genes on the Illumina 450K methylation arrays of the TCGA lung cancer cohort (n = 907), and the results showed that nine of the 14 genes (HIST1H4I, HIST1H4F, HIST1H3C, HIST1H2BE, HIST1H2BM, HIST1H3J, HIST1H2BB, HIST1H1A, and HIST1H2BI) were significantly hypermethylated in both lung adenocarcinoma and LUSC (Fig. 2C). In addition, we found that DNA methylation of histone genes can be used for the classification of the three main types of lung cancer. We found that four histone genes (group 5: HIST1H2AG, HIST3H2A, HIST3H2BB, and HIST1H3F) were specifically hypermethylated in lung adenocarcinoma (Fig. 2D), and four histone genes (group 6: HIST1H4A, HIST1H3A, HIST1H2AL, and HIST1H3I) were only methylated in LUSC samples (Fig. 2E), and another six histone genes (group 7: HIST1H2BL, HIST2H3D, HIST1H2AJ, H2AFJ, HIST1H2AI, and HIST1H1D) were high methylated in SCLC (Fig. 2F). More importantly, these cancer type–specific hypermethylated genes can be verified in the TCGA datasets (Supplementary Fig. S2B). These results suggest that methylation of histone gene loci may be used for distinguishing lung cancer subtypes.

Figure 2.

Histone genes are hypermethylated in lung cancer. A, Histone gene family are divided into seven groups according to different DNA methylation pattern in normal cells and cancer cells of WGBS data. B, Fourteen histone genes in group 4 were hypermethylated in lung cancer cells in WGBS data. C, Fourteen histone genes hypermethylated from group 4 were validated in TCGA lung cancer cohort, and nine of 14 were hypermethylated in both lung adenocarcinoma (LUAD) and LUSC. Box and whiskers plots, box represents the upper quartile, lower quartile, and median; whiskers represent minimum to maximum. NS, not significant. ***, P < 0.001; ****, P < 0.0001. P values were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. D, Four histone genes in group 5 were specifically hypermethylated in lung adenocarcinoma sample in WGBS data. E, Four histone genes in group 6 were specifically hypermethylated in LUSC sample in WGBS data. F, Six histone genes in group 7 were specifically hypermethylated in SCLC sample in WGBS data. G, Maximum methylation (Max-IF) of HIST1H4F and HIST1H4I was significantly hypermethylated in primary lung cancer tissues. Error bar represents upper quartile, lower quartile, and median. P value was calculated using the two-tailed, paired, nonparametric, Wilcoxon matched-pair signed rank test by GraphPad Prism 7.0 software. H, ROC analysis of Max-IF in primary lung cancer tissue, and the AUC was 0.98 (95% confidence interval, 0.95–1.00; P < 0.0001), with a specificity of 88.0% and a sensitivity of 96.0%.

Figure 2.

Histone genes are hypermethylated in lung cancer. A, Histone gene family are divided into seven groups according to different DNA methylation pattern in normal cells and cancer cells of WGBS data. B, Fourteen histone genes in group 4 were hypermethylated in lung cancer cells in WGBS data. C, Fourteen histone genes hypermethylated from group 4 were validated in TCGA lung cancer cohort, and nine of 14 were hypermethylated in both lung adenocarcinoma (LUAD) and LUSC. Box and whiskers plots, box represents the upper quartile, lower quartile, and median; whiskers represent minimum to maximum. NS, not significant. ***, P < 0.001; ****, P < 0.0001. P values were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. D, Four histone genes in group 5 were specifically hypermethylated in lung adenocarcinoma sample in WGBS data. E, Four histone genes in group 6 were specifically hypermethylated in LUSC sample in WGBS data. F, Six histone genes in group 7 were specifically hypermethylated in SCLC sample in WGBS data. G, Maximum methylation (Max-IF) of HIST1H4F and HIST1H4I was significantly hypermethylated in primary lung cancer tissues. Error bar represents upper quartile, lower quartile, and median. P value was calculated using the two-tailed, paired, nonparametric, Wilcoxon matched-pair signed rank test by GraphPad Prism 7.0 software. H, ROC analysis of Max-IF in primary lung cancer tissue, and the AUC was 0.98 (95% confidence interval, 0.95–1.00; P < 0.0001), with a specificity of 88.0% and a sensitivity of 96.0%.

Close modal

We further performed ROC analysis on 14 histone genes that were hypermethylated by using TCGA datasets. The results show that HIST1H4F and HIST1H4I have much higher specificity and sensitivity; the specificity and sensitivity of HIST1H4F were 97.3% and 82.7%, respectively, and the specificity and sensitivity of HIST1H4I were 96.0% and 87.5%, respectively (Supplementary Table S10). Moreover, they exhibit an excellent performance within stage I of lung cancer, and ROC analysis reveals that they have a similar AUCs between different stages, which indicates that methylation of HIST1H4F and HIST1H4I can act as early lung cancer diagnosis biomarker (Supplementary Fig. S2C and S2D; Supplementary Table S10). Furthermore, ROC analysis showed that maximum methylation of HIST1H4F and HIST1H4I (Max-IF) performed better than individual genes, with an AUC of 0.95, a specificity of 96.0%, and a sensitivity of 92.9% (Supplementary Fig. S2E; Supplementary Table S10).

To further confirm our results, the lung cancer primary tissue samples were used for verification. We collected 25 lung cancer tissue samples and paired para-cancer tissue samples as control (Supplementary Table S11). Methylation of HIST1H4F and HIST1H4I was detected by bisulfite-PCR pyrosequencing. The results showed that HIST1H4F and HIST1H4I were significantly hypermethylated in lung cancer, and ROC analysis showed very high sensitivity and specificity for each gene (Supplementary Fig. S2F). Max-IF was significantly hypermethylated in lung cancer samples, with an AUC = 0.98, a sensitivity of 96%, and a specificity of 88% (Fig. 2G and H).

Methylation pattern of histone gene for lung cancer diagnosis by BALF samples

BALF is of great significance in the early diagnosis of lung cancer (35, 36). Therefore, we tried to diagnose lung cancer by detecting the methylation of histone genes using BALF samples. We collected 265 BALF samples consisting of 59 BLD control samples and 206 lung cancer samples. The BLD control group contain pneumonia, emphysema, tuberculosis samples, etc. The lung cancer experimental group included 92 LUSC, 70 lung adenocarcinoma, and 44 SCLC samples. After obtaining the BALF samples, we randomly divided the samples into the training set (n = 133) and validation set (n = 132; Table 1).

Table 1.

Clinical information of the training set and validation set

BAFL
Training setValidation set
CharacteristicsBLD (n = 30)LUAD (n = 38)LUSC (n = 44)NSLC (n = 21)Total (n = 103)BLD (n = 29)LUAD (n = 32)LUSC (n = 48)NSLC (n = 23)Total (N = 103)
Age (years) 
 Mean ± SEM 55.8 ± 2.1 62.0 ± 1.5 64.1 ± 1.4 56.6 ± 1.9 61.8 ± 0.9 53.5 ± 2.3 60.2 ± 1.6 61.2 ± 1.4 59.7 ± 1.7 60.7 ± 0.9 
 Range 34–72 44–76 31–79 44–76 31–79 35–80 43–80 39–80 46–76 39–80 
Gender 
 Female (%) 12 (40.0) 12 (31.6) 3 (6.8) 3 (14.3) 18 (17.5) 12 (41.4) 10 (31.2) 4 (8.3) 5 (21.7) 19 (18.4) 
 Male (%) 18 (60.0) 26 (68.4) 41 (93.2) 18 (85.7) 85 (82.5) 17 (58.6) 22 (68.8) 44 (91.7) 18 (78.3) 84 (81.6) 
Stage 
 Stage I (%) — 10 (26.3) 13 (30.0) 8 (38.1) 31 (30.1) — 13 (40.6) 14 (29.2) 9 (39.1) 36 (35.0) 
 Stage II (%) — 11 (28.9) 12 (27.3) 3 (14.3) 26 (25.2) — 7 (21.9) 16 (33.3) 4 (17.4) 27 (26.2) 
 Stage III (%) — 10 (26.3) 13 (30.0) 7 (33.3) 30 (29.1) — 5 (15.6) 13 (27.1) 6 (26.1) 24 (23.3) 
 Stage IV (%) — 7 (18.4) 6 (13.6) 3 (14.3) 16 (15.5) — 7 (21.9) 5 (10.4) 4 (17.4) 16 (15.5) 
BAFL
Training setValidation set
CharacteristicsBLD (n = 30)LUAD (n = 38)LUSC (n = 44)NSLC (n = 21)Total (n = 103)BLD (n = 29)LUAD (n = 32)LUSC (n = 48)NSLC (n = 23)Total (N = 103)
Age (years) 
 Mean ± SEM 55.8 ± 2.1 62.0 ± 1.5 64.1 ± 1.4 56.6 ± 1.9 61.8 ± 0.9 53.5 ± 2.3 60.2 ± 1.6 61.2 ± 1.4 59.7 ± 1.7 60.7 ± 0.9 
 Range 34–72 44–76 31–79 44–76 31–79 35–80 43–80 39–80 46–76 39–80 
Gender 
 Female (%) 12 (40.0) 12 (31.6) 3 (6.8) 3 (14.3) 18 (17.5) 12 (41.4) 10 (31.2) 4 (8.3) 5 (21.7) 19 (18.4) 
 Male (%) 18 (60.0) 26 (68.4) 41 (93.2) 18 (85.7) 85 (82.5) 17 (58.6) 22 (68.8) 44 (91.7) 18 (78.3) 84 (81.6) 
Stage 
 Stage I (%) — 10 (26.3) 13 (30.0) 8 (38.1) 31 (30.1) — 13 (40.6) 14 (29.2) 9 (39.1) 36 (35.0) 
 Stage II (%) — 11 (28.9) 12 (27.3) 3 (14.3) 26 (25.2) — 7 (21.9) 16 (33.3) 4 (17.4) 27 (26.2) 
 Stage III (%) — 10 (26.3) 13 (30.0) 7 (33.3) 30 (29.1) — 5 (15.6) 13 (27.1) 6 (26.1) 24 (23.3) 
 Stage IV (%) — 7 (18.4) 6 (13.6) 3 (14.3) 16 (15.5) — 7 (21.9) 5 (10.4) 4 (17.4) 16 (15.5) 

Abbreviation: LUAD, lung adenocarcinoma.

A bisulfite-PCR pyrosequencing assay was used to detect HIST1H4F and HIST1H4I methylation. To ensure the reproducibility of pyrosequencing, three technical replications of bisulfite-PCR pyrosequencing were completed of a total of 30 BALF samples including 10 low methylated (0% ≤ methylation ≤ 5%), 10 middle methylated (5% < methylation < 20%), and 10 high methylated (20% ≤ methylation ≤ 100%) samples, the results showed an excellent performance in all low-, middle-, and high-methylated samples, with a methylation variation within 5% (Supplementary Fig. S3A–S3C). Our analysis of clinical samples displayed that, in both the training set and the validation set, HIST1H4F and HIST1H4I were significantly hypermethylated in different types of lung cancer (Supplementary Fig. S4A and S4B). Max-IF was also significantly higher in lung adenocarcinoma, LUSC, SCLC, and all lung cancer samples (Fig. 3A). To assess the potential for lung cancer diagnosis using HIST1H4I, HIST1H4F, or Max-IF, we first performed ROC analysis in the training dataset, where the AUC ROC curve was calculated and a cut-off value was determined accordingly; sensitivity and specificity were further calculated on the basis of this cutoff. Moreover, to robustly estimate the diagnostic accuracy, an independent evaluation using the validation set was performed, where another sensitivity and specificity were calculated on the basis of the given cutoff (Fig. 3B and C; Supplementary Table S10). For LUSC and SCLC, Max-IF achieved AUCs of 0.94 and 0.97, respectively (Fig. 3B). For LUSC, with a methylation cutoff of 6.05%, the specificity and sensitivity of Max-IF were 96.7% and 86.4% in the training set and were 96.5% and 85.4% in the validation set. For SCLC, with the methylation cutoff of 7.75%, the specificity and sensitivity of Max-IF were 96.7% and 95.5% in the training set and were 96.5% and 95.7% in the validation set (Fig. 3C; Supplementary Table S10). Comparing with LUSC and SCLC, which tend to be more centrally located, lung adenocarcinoma is usually observed peripherally in the lungs (37). Therefore, LUSC and SCLC BALF samples are more likely to contain cancer cells than lung adenocarcinoma BALF samples (38), thus the sensitivity of lung adenocarcinoma should be lower than that in BALF samples of LUSC and SCLC. As expected, in lung adenocarcinoma, the specificity and sensitivity of Max-IF were 96.7% and 60.5% in the training set (cutoff = 6.3% and AUC = 0.84) and were 96.5% and 65.6% in the validation set. To improve the detection sensitivity in lung adenocarcinoma, we combined Max-IF with serum CEA. The sensitivity of CEA alone as a lung cancer biomarker is very low for lung cancer diagnosis (39). In our study, the sensitivities of CEA (cutoff = 5 ng/mL) in the training set and validation set were 27.3% and 30.7%, respectively. However, the sensitivity of CEA in lung adenocarcinoma is much higher than in LUSC or SCLC. In the training set, the sensitivities of lung adenocarcinoma, LUSC, and SCLC were 47.1%, 16.2%, and 14.3%, respectively. In the validation set, the sensitivities of lung adenocarcinoma, LUSC, and SCLC were 50%, 22.2%, and 26.1%, respectively. Therefore, we combined Max-IF with serum CEA for lung adenocarcinoma diagnosis, the final result of the sample can be positive by either of them, and the sensitivity increased from 60.5% to 77.8% in the training set and from 65.6% to 81.5% in the validation set (Fig. 3D). For all cancer samples, the specificity and sensitivity were 96.7% and 86.0% in the training set and 96.5% and 87.0% in the validation set, indicating that histone gene methylation as lung cancer biomarker has excellent accuracy for lung cancer diagnosis (Fig. 3E).

Figure 3.

HIST1H4F and HIST1H4I were used as lung cancer biomarkers in BALF samples. A, Maximum methylation (Max-IF) of HIST1H4F and HIST1H4I was significantly hypermethylated in lung adenocarcinoma (LUAD), LUSC, SCLC, and total lung cancer in the BALF training set (left) and the validation set (right). Box and whiskers plots, box represents the upper quartile, lower quartile, and median; whiskers represent minimum to maximum. ****, P < 0.0001. P values for all the analyses were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. B, ROC analysis of Max-IF in training set: lung adenocarcinoma [AUC, 0.84; 95% confidence interval (CI), 0.74–0.93; P < 0.0001], LUSC (AUC, 0.94; 95% CI, 0.89–1.00; P < 0.0001), SCLC (AUC, 0.97; 95% CI, 0.92–1.00; P < 0.0001), and total lung cancer (AUC, 0.91; 95% CI, 0.86–0.96; P < 0.0001). C, Sensitivity and specificity of lung adenocarcinoma, LUSC, SCLC, and total lung cancer in the training set (left) and validation set (right). D, The sensitivity of lung adenocarcinoma detected by Max-IF combined CEA was much higher than Max-IF or CEA individually. E, The comprehensive sensitivity and specificity for HIST1H4I and HIST1H4F as a lung cancer diagnosis marker in the training set and validation set. BLD containing pneumonia, emphysema, tuberculosis, etc. Total, total lung cancer includes lung adenocarcinoma, LUSC, and SCLC.

Figure 3.

HIST1H4F and HIST1H4I were used as lung cancer biomarkers in BALF samples. A, Maximum methylation (Max-IF) of HIST1H4F and HIST1H4I was significantly hypermethylated in lung adenocarcinoma (LUAD), LUSC, SCLC, and total lung cancer in the BALF training set (left) and the validation set (right). Box and whiskers plots, box represents the upper quartile, lower quartile, and median; whiskers represent minimum to maximum. ****, P < 0.0001. P values for all the analyses were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. B, ROC analysis of Max-IF in training set: lung adenocarcinoma [AUC, 0.84; 95% confidence interval (CI), 0.74–0.93; P < 0.0001], LUSC (AUC, 0.94; 95% CI, 0.89–1.00; P < 0.0001), SCLC (AUC, 0.97; 95% CI, 0.92–1.00; P < 0.0001), and total lung cancer (AUC, 0.91; 95% CI, 0.86–0.96; P < 0.0001). C, Sensitivity and specificity of lung adenocarcinoma, LUSC, SCLC, and total lung cancer in the training set (left) and validation set (right). D, The sensitivity of lung adenocarcinoma detected by Max-IF combined CEA was much higher than Max-IF or CEA individually. E, The comprehensive sensitivity and specificity for HIST1H4I and HIST1H4F as a lung cancer diagnosis marker in the training set and validation set. BLD containing pneumonia, emphysema, tuberculosis, etc. Total, total lung cancer includes lung adenocarcinoma, LUSC, and SCLC.

Close modal

Methylation of HIST1H4F gene is a potential UCOM marker

We have demonstrated that many histone genes are abnormally hypermethylated in lung cancer, and we wonder whether histone genes are also abnormally methylated in other types of cancer. In total, 17 cancer cohorts from the TCGA were analyzed. They include bladder urothelial carcinoma(n = 433), breast-invasive carcinoma (n = 867), cervical squamous cell carcinoma and endocervical adenocarcinoma (n = 310), cholangiocarcinoma (n = 45), colon adenocarcinoma (n = 335), esophageal carcinoma (n = 201), head and neck squamous cell carcinoma (n = 578), kidney renal clear cell carcinoma (n = 479), liver hepatocellular carcinoma (n = 427), lung cancer (n = 907), pancreatic adenocarcinoma (n = 194), prostate adenocarcinoma (n = 548), rectum adenocarcinoma (n = 106), skin cutaneous melanoma (n = 476), stomach adenocarcinoma (n = 398), thyroid carcinoma (n = 563), and uterine corpus endometrioid carcinoma (n = 477; Supplementary Table S12).

For each cancer type, we calculated the average methylation difference in normal and cancer samples (Fig. 4A). We found that there are no methylation differences in most histone genes. However, some histone genes tended to be hypermethylated in different types of cancer, including HIST1H4F, HIST1H3E, HIST1H2BB, HIST1H1A, HIST1H3C, and HIST1H4I. However, H2BFM and H2BFWT tended to be hypomethylated in various types of cancer. Importantly, we found that HIST1H4F was hypermethylated in all tumor types, except thyroid carcinoma (Fig. 4B). In thyroid carcinoma, even minor methylation difference was observed between normal (median = 6.1%) and cancer (median = 5.4%) samples, and we showed HIST1H4F did hypermethylated in different stages of cancer than normal samples (Supplementary Fig. S5). Therefore, we considered HIST1H4F hypermethylation as a conserved feature across almost all types of cancers and named it “UCOM”. Furthermore, we analyzed the relationship between HIST1H4F methylation and tumor stages or patients' outcome in eight tumor types with a larger sample size in the TCGA database (Supplementary Figs. S6A–S6C and S7A–S7G). The results showed that HIST1H4F was even hypermethylated in stage I of all eight types of cancers without significant differences among stages of cancer. Moreover, ROC analysis showed that the AUCs were also similar in different stages (Supplementary Table S13). These results indicate that HIST1H4F locus is methylated in the initiation process of cancer development. Furthermore, the survival analysis in these eight cancer types showed there were no significant differences for patients' outcome among the low, middle, high methylation group (Supplementary Table S14). Taken together, our results suggest that hypermethylation of HIST1H4F can act as a useful early diagnostic marker for multi-types of cancers.

Figure 4.

HIST1H4F as a UCOM marker. A, Histone gene family methylation in 17 different types of cancer. For each histone gene in each cancer type, calculate the average methylation difference between normal and cancer samples in the corresponding cancer type. The color shows the degree of average methylation difference, the negative value means that histone gene is hypomethylated, and the positive value means that histone gene is hypermethylated. B,HIST1H4F is hypermethylated in different types of cancer in the TCGA data. Ten cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), 10 stomach adenocarcinoma (STAD), and six skin cutaneous melanoma (SKCM) para-cancer samples were collected from primary tissues by us, due to too few (n ≤ 3) control samples in TCGA database. Box and whiskers plots, box, the upper quartile, lower quartile, and median; whiskers, minimum to maximum; light-colored box, para-cancer control samples; and dark-colored box, cancer samples. NS, not significant. *, P < 0.1; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001. P values for all the analyses were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. C, Validation of HIST1H4F methylation in eight other types of cancer besides lung cancer. Error bar, upper quartile, lower quartile, and median. P values for esophagus cancer, colorectal cancer, pancreatic cancer, and head and neck cancer were calculated using the two-tailed, paired, nonparametric, Wilcoxon matched-pair signed rank test by GraphPad Prism 7.0 software. P values for cervical cancer, gastric cancer, breast cancer, and liver cancer were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. HNSC, head and neck squamous cell carcinoma; ESCA, esophageal carcinoma; COAD, colon adenocarcinoma; READ, rectum adenocarcinoma; PAAD, pancreatic adenocarcinoma; KIRC, kidney renal clear cell carcinoma; THCA, thyroid carcinoma; LIHC, liver hepatocellular carcinoma; PRAD, prostate adenocarcinoma; BLCA, bladder urothelial carcinoma; LUNG, lung cancer; BRCA, breast-invasive carcinoma; UCEC, uterine corpus endometrioid carcinoma; UCEC, cholangiocarcinoma; CHOL, cholangiocarcinoma.

Figure 4.

HIST1H4F as a UCOM marker. A, Histone gene family methylation in 17 different types of cancer. For each histone gene in each cancer type, calculate the average methylation difference between normal and cancer samples in the corresponding cancer type. The color shows the degree of average methylation difference, the negative value means that histone gene is hypomethylated, and the positive value means that histone gene is hypermethylated. B,HIST1H4F is hypermethylated in different types of cancer in the TCGA data. Ten cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC), 10 stomach adenocarcinoma (STAD), and six skin cutaneous melanoma (SKCM) para-cancer samples were collected from primary tissues by us, due to too few (n ≤ 3) control samples in TCGA database. Box and whiskers plots, box, the upper quartile, lower quartile, and median; whiskers, minimum to maximum; light-colored box, para-cancer control samples; and dark-colored box, cancer samples. NS, not significant. *, P < 0.1; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001. P values for all the analyses were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. C, Validation of HIST1H4F methylation in eight other types of cancer besides lung cancer. Error bar, upper quartile, lower quartile, and median. P values for esophagus cancer, colorectal cancer, pancreatic cancer, and head and neck cancer were calculated using the two-tailed, paired, nonparametric, Wilcoxon matched-pair signed rank test by GraphPad Prism 7.0 software. P values for cervical cancer, gastric cancer, breast cancer, and liver cancer were calculated using the two-tailed nonparametric Mann–Whitney test by GraphPad Prism 7.0 software. HNSC, head and neck squamous cell carcinoma; ESCA, esophageal carcinoma; COAD, colon adenocarcinoma; READ, rectum adenocarcinoma; PAAD, pancreatic adenocarcinoma; KIRC, kidney renal clear cell carcinoma; THCA, thyroid carcinoma; LIHC, liver hepatocellular carcinoma; PRAD, prostate adenocarcinoma; BLCA, bladder urothelial carcinoma; LUNG, lung cancer; BRCA, breast-invasive carcinoma; UCEC, uterine corpus endometrioid carcinoma; UCEC, cholangiocarcinoma; CHOL, cholangiocarcinoma.

Close modal

To further confirm HIST1H4F as a UCOM marker, we selected 243 cases of a total of nine types of clinical cancer samples, including 50 lung cancer samples as shown previously and another 193 samples from eight different types of cancers (Supplementary Table S4). Methylation of HIST1H4F in these samples was detected by bisulfite-PCR pyrosequencing assay. The results showed that HIST1H4F was significantly hypermethylated in all nine types of cancer (Fig. 4C). ROC analysis of HIST1H4F methylation in nine types of cancer was performed, and the results showed that the AUCs in all nine cancers were above 0.87, suggesting HIST1H4F as a dream UCOM marker (Supplementary Table S10). To further confirm HIST1H4F as a UCOM marker, we should expect that the DNA methylation level of HIST1H4F should represent the ratio of cancer cell mixed with noncancer cell in clinical samples. To verify this point, we mixed normal cells (lung fibroblast cell line MRC5 or normal liver cells) within cancer cells (lung cancer cell line A549 or liver cancer cell line HepG2) by the proportion of 0%, 25%, 50%, 75%, and 100%. We then detected the methylation level of each sample by bisulfite-PCR pyrosequencing assay. As expected, the final methylation level properly represented percentage of cancer cell DNA mixed with normal ones. These results indicate that HIST1H4F is not only a UCOM marker, but also able to estimate the cancer cell ratio in clinical samples (Supplementary Fig. S8A and S8B).

DNA methylation is usually correlated with gene expression, so we asked whether abnormal hypermethylation of HIST1H4F influenced gene expression. We analyzed HIST1H4F expression in 15 tumor types in the TCGA database (tumor types without normal controls were excluded), and the results showed that in most types of tumors, HIST1H4F has no (or very low) gene expression in both normal controls as well as tumors (Supplementary Fig. S9A). We verified in cultured normal lung fibroblast cell line MRC5 and lung cancer cell line A549, in which we detected DNA methylation and gene expression of HIST1H4F, the results showed that HIST1H4F was hypermethylated in A549 cells and hypomethylated in MRC5 cells (Supplementary Fig. S9B), but has no gene expression in both of them (Supplementary Fig. S9C). These unexpected results indicate that the expression of HIST1H4F itself may not be involved in tumor genesis, but instead that the epigenetic status of HIST1H4F loci may affect the chromatin information or structure, which further alters the cancer-related gene expression during tumor imitation, which is further supported by the discovery that the histone gene H4 genome sequence is completely different but generates almost the same amino acid peptides (Supplementary Fig. S9D and S9E).

In summary, we collected nine types of cancer, and although many other rare types of cancers have not yet been verified, we speculate that HIST1H4F is hypermethylated in many other cancer types as well. Therefore, we conclude that HIST1H4F may be a promising UCOM marker for the screening of patients with early cancer, and its role in tumorigenesis awaits further study.

WGBS is the most comprehensive method for detecting genome-wide DNA methylation (23). However, few reports have directly investigated methylation biomarkers in WGBS dataset. Here, we developed a new strategy to analyze WGBS data and to efficiently screen for new methylation markers of lung cancer genome wide. These markers were also further verified by TCGA data and clinical cancer samples. Through these analyses, we unexpectedly found that many histone genes were abnormally hypermethylated in lung cancer. The methylation status of HIST1H4F and HIST1H4I in BALF samples can be used as an effective approach for the early diagnosis of lung cancer, with a specificity of 96.7% and a sensitivity of 87.0%.

The TCGA database provides us with much information on the study of tumors, especially for the investigation of pan-cancerous characteristics, and a series of high-level literature have been published, including pan-cancer–related signaling pathway analysis (40, 41), genetic alteration analysis (42–44), molecular-based tumor reclassification analysis (45), pan-cancer DNA methylation analysis (46), etc. These studies have given us informative views of cancer from different perspectives. However, these reported pan-cancer–related markers combined lots of genes together for cancer diagnosis, and there are few reports describing that one gene or one locus can be used for all cancer type screening. These may be due to the fact that methylation data in the TCGA database were measured using the 450K methylation array, covering only about 2% of all CpG sites in the genome, and most information of the genome was missing. Therefore, combining WGBS data with TCGA data for analysis is an efficient strategy for screening DNA methylation biomarkers across the genome.

Histones are an important family of housekeeping genes expressed in almost all organisms. To ensure the expression stability of histone, each histone protein is encoded by many histone genes. The regulation of spatial and temporal expression of the histone genes is very different from other genes (17, 19). In addition, the modification of histones has been extensively studied. However, there is no systematic study on the methylation abnormality of the histone loci themselves. Alterations in the chromatin structure of the histone gene cluster 1 region have been found in breast cancer (20). By coincidence, it has been reported that the histone gene cluster 1 genomic region is abnormally enriched of H3K27me3 in acute myeloid leukemia (47). Interestingly on our part, we found aberrant DNA methylation in many histone loci located in the histone gene cluster 1. We further analyzed the expression of HIST1H4F in 15 tumor types in the TCGA database, and the results showed that HIST1H4F has no (or very low) expression in normal tissues and tumors of different cancer types. We interpreted that these aberrant DNA methylations may affect CCCTC binding factor, which will further alter the chromatin structure of histone gene cluster 1 during cancer development (48, 49), and we could imagine that the epigenetic status or chromatin high-order structure of histone loci other than their expression themselves may involve in tumor-initiative process. More interestingly, the histone gene in cluster 1 is also methylated in different types of cancer, which suggests that aberrant DNA methylation in the region of histone gene cluster 1 may also be involved in multiple types of cancer development, and it will be interesting for us to explore this in the near future.

To extend our unexpected findings, we analyzed 17 cohorts of cancer in the TCGA database and found that many histone genes are not only hypermethylated in lung cancer but also abnormally hypermethylated in many other tumors. Moreover, we were surprised to find that HIST1H4F is hypermethylated in all cancer types and is both highly sensitive and specific as a potential UCOM marker, which was further verified by a total of 243 clinical samples, covering nine tumor types. Unlike most reported multigene panels for pan-cancer diagnosis (50–52), HIST1H4F is a potential UCOM marker, which was a completely unexpected finding and will be of great convenience and significance in subsequent clinical applications. Meanwhile, further exploring the underlying mechanism of HIST1H4F in cancer development may help us better understand the common feature of tumorigenesis. As a UCOM marker, the epigenetics status and chromatin structure of HIST1H4F loci will be of great significance for understanding the general mechanism of cancer development, and reversing DNA methylation in specific histone locus may be a potential common strategy for future cancer treatment.

W. Yu and Shihua Dong report having a pending patent application. No potential conflicts of interest were disclosed by the other authors.

Conception and design: S. Dong, W. Yu

Development of methodology: S. Dong, W. Li, W. Yu

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S. Dong, W. Li, L. Wang, J. Hu, Y. Song, B. Zhang, X. Ren, S. Ji, J. Li, P. Xu, Y. Liang, G. Chen, J.-T. Lou

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S. Dong, W. Li, W. Yu

Writing, review, and/or revision of the manuscript: S. Dong, W. Li, B. Zhang, X. Ren, J. Li, P. Xu, Y. Liang, W. Yu

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S. Dong, J. Hu, B. Zhang, X. Ren, S. Ji, J. Li, P. Xu, Y. Liang, J.-T. Lou, W. Yu

Study supervision: W. Yu

We thank Yan Li, Lina Peng, Huaibing Luo, ZhiCong Chu, Yao Xiao, Min Xiao, Ying Guo, Lu Chen, and Lan Zhang for experimental help. We thank Ruitu Lv and Feizhen Wu for their help in bioinformatic analysis. We thank Yue Yu, Zhicong Yang, Ying Tong, and Zhiqiang Hu for editorial help and useful comments on the article. This work was supported by the National Key R&D Program of China (grant no. 2018YFC1005004), the Science and Technology Innovation Action Plan of Shanghai (grant no. 17411950900), the National Natural Science Foundation of China (grant nos. 31671308, 31872814, and 81272295), Major Special Projects of Basic Research of Shanghai Science and Technology Commission (grant no. 18JC1411101), the Shanghai Science and Technology Committee (grant no. 12ZR1402200), the Ministry of Education of the People's Republic of China (grant no. 2009CB825600), and the Innovation Group Project of Shanghai Municipal Health Commission (grant no. 2019CXJQ03).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Hirsch
FR
,
Scagliotti
GV
,
Mulshine
JL
,
Kwon
R
,
Curran
WJ
 Jr
,
Wu
YL
, et al
Lung cancer: current therapies and new targeted treatments
.
Lancet
2017
;
389
:
299
311
.
2.
Melosky
B
,
Chu
Q
,
Juergens
R
,
Leighl
N
,
McLeod
D
,
Hirsh
V
. 
Pointed progress in second-line advanced non-small-cell lung cancer: the rapidly evolving field of checkpoint inhibition
.
J Clin Oncol
2016
;
34
:
1676
88
.
3.
Sozzi
G
,
Boeri
M
. 
Potential biomarkers for lung cancer screening
.
Transl Lung Cancer Res
2014
;
3
:
139
48
.
4.
National Lung Screening Trial Research Team
,
Aberle
DR
,
Adams
AM
,
Berg
CD
,
Black
WC
,
Clapp
JD
, et al
Reduced lung-cancer mortality with low-dose computed tomographic screening
.
N Engl J Med
2011
;
365
:
395
409
.
5.
Kanodra
NM
,
Silvestri
GA
,
Tanner
NT
. 
Screening and early detection efforts in lung cancer
.
Cancer
2015
;
121
:
1347
56
.
6.
Singhal
S
,
Vachani
A
,
Antin-Ozerkis
D
,
Kaiser
LR
,
Albelda
SM
. 
Prognostic implications of cell cycle, apoptosis, and angiogenesis biomarkers in non-small cell lung cancer: a review
.
Clin Cancer Res
2005
;
11
:
3974
86
.
7.
Kathuria
H
,
Gesthalter
Y
,
Spira
A
,
Brody
JS
,
Steiling
K
. 
Updates and controversies in the rapidly evolving field of lung cancer screening, early detection, and chemoprevention
.
Cancers
2014
;
6
:
1157
79
.
8.
Risch
A
,
Plass
C
. 
Lung cancer epigenetics and genetics
.
Int J Cancer
2008
;
123
:
1
7
.
9.
Mundbjerg
K
,
Chopra
S
,
Alemozaffar
M
,
Duymich
C
,
Lakshminarasimhan
R
,
Nichols
PW
, et al
Identifying aggressive prostate cancer foci using a DNA methylation classifier
.
Genome Biol
2017
;
18
:
3
.
10.
Nguyen
LV
,
Pellacani
D
,
Lefort
S
,
Kannan
N
,
Osako
T
,
Makarem
M
, et al
Barcoding reveals complex clonal dynamics of de novo transformed human mammary cells
.
Nature
2015
;
528
:
267
71
.
11.
Li
J
,
Li
Y
,
Li
W
,
Luo
H
,
Xi
Y
,
Dong
S
, et al
Guide positioning sequencing identifies aberrant DNA methylation patterns that alter cell identity and tumor-immune surveillance networks
.
Genome Res
2019
;
29
:
270
80
.
12.
Dor
Y
,
Cedar
H.
Principles of DNA methylation and their implications for biology and medicine
.
Lancet
2018
;
392
:
777
86
.
13.
Koch
A
,
Joosten
SC
,
Feng
Z
,
de Ruijter
TC
,
Draht
MX
,
Melotte
V
, et al
Analysis of DNA methylation in cancer: location revisited
.
Nat Rev Clin Oncol
2018
;
15
:
459
66
.
14.
Vargas
AJ
,
Harris
CC
. 
Biomarker development in the precision medicine era: lung cancer as a case study
.
Nat Rev Cancer
2016
;
16
:
525
37
.
15.
Hu
Y
,
Lai
Y
. 
Identification and expression analysis of rice histone genes
.
Plant Physiol Biochem
2015
;
86
:
55
65
.
16.
Bhasin
M
,
Reinherz
EL
,
Reche
PA
. 
Recognition and classification of histones using support vector machine
.
J Comput Biol
2006
;
13
:
102
12
.
17.
Isogai
Y
,
Keles
S
,
Prestel
M
,
Hochheimer
A
,
Tjian
R
. 
Transcription of histone gene cluster by differential core-promoter factors
.
Genes Dev
2007
;
21
:
2936
49
.
18.
Buschbeck
M
,
Hake
SB
. 
Variants of core histones and their roles in cell fate decisions, development and cancer
.
Nat Rev Mol Cell Biol
2017
;
18
:
299
314
.
19.
Braastad
CD
,
Hovhannisyan
H
,
van Wijnen
AJ
,
Stein
JL
,
Stein
GS
. 
Functional characterization of a human histone gene cluster duplication
.
Gene
2004
;
342
:
35
40
.
20.
Fritz
AJ
,
Ghule
PN
,
Boyd
JR
,
Tye
CE
,
Page
NA
,
Hong
D
, et al
Intranuclear and higher-order chromatin organization of the major histone gene cluster in breast cancer
.
J Cell Physiol
2018
;
233
:
1278
90
.
21.
Xi
Y
,
Li
W
. 
BSMAP: whole genome bisulfite sequence MAPping program
.
BMC Bioinformatics
2009
;
10
:
232
.
22.
Yong
WS
,
Hsu
FM
,
Chen
PY
. 
Profiling genome-wide DNA methylation
.
Epigenetics Chromatin
2016
;
9
:
26
.
23.
Chatterjee
A
,
Rodger
EJ
,
Morison
IM
,
Eccles
MR
,
Stockwell
PA
. 
Tools and strategies for analysis of genome-wide and gene-specific DNA methylation patterns
.
Methods Mol Biol
2017
;
1537
:
249
77
.
24.
Guo
S
,
Diep
D
,
Plongthongkum
N
,
Fung
HL
,
Zhang
K
,
Zhang
K
. 
Identification of methylation haplotype blocks aids in deconvolution of heterogeneous tissue samples and tumor tissue-of-origin mapping from plasma DNA
.
Nat Genet
2017
;
49
:
635
42
.
25.
Zhao
CM
,
Hayakawa
Y
,
Kodama
Y
,
Muthupalani
S
,
Westphalen
CB
,
Andersen
GT
, et al
Denervation suppresses gastric tumorigenesis
.
Sci Transl Med
2014
;
6
:
250ra115
.
26.
Zahalka
AH
,
Arnal-Estape
A
,
Maryanovich
M
,
Nakahara
F
,
Cruz
CD
,
Finley
LWS
, et al
Adrenergic nerves activate an angio-metabolic switch in prostate cancer
.
Science
2017
;
358
:
321
6
.
27.
Magnon
C
,
Hall
SJ
,
Lin
J
,
Xue
X
,
Gerber
L
,
Freedland
SJ
, et al
Autonomic nerve development contributes to prostate cancer progression
.
Science
2013
;
341
:
1236361
.
28.
Ilse
P
,
Biesterfeld
S
,
Pomjanski
N
,
Wrobel
C
,
Schramm
M
. 
Analysis of SHOX2 methylation as an aid to cytology in lung cancer diagnosis
.
Cancer Genomics Proteomics
2014
;
11
:
251
8
.
29.
Pradhan
MP
,
Desai
A
,
Palakal
MJ
. 
Systems biology approach to stage-wise characterization of epigenetic genes in lung adenocarcinoma
.
BMC Syst Biol
2013
;
7
:
141
.
30.
Ooki
A
,
Maleki
Z
,
Tsay
JJ
,
Goparaju
C
,
Brait
M
,
Turaga
N
, et al
A panel of novel detection and prognostic methylated DNA markers in primary non-small cell lung cancer and serum DNA
.
Clin Cancer Res
2017
;
23
:
7141
52
.
31.
Diaz-Lagares
A
,
Mendez-Gonzalez
J
,
Hervas
D
,
Saigi
M
,
Pajares
MJ
,
Garcia
D
, et al
A novel epigenetic signature for early diagnosis in lung cancer
.
Clin Cancer Res
2016
;
22
:
3361
71
.
32.
Su
J
,
Huang
YH
,
Cui
X
,
Wang
X
,
Zhang
X
,
Lei
Y
, et al
Homeobox oncogene activation by pan-cancer DNA hypermethylation
.
Genome Biol
2018
;
19
:
108
.
33.
Cedar
H
,
Bergman
Y
. 
Linking DNA methylation and histone modification: patterns and paradigms
.
Nat Rev Genet
2009
;
10
:
295
304
.
34.
Hammond
CM
,
Stromme
CB
,
Huang
H
,
Patel
DJ
,
Groth
A
. 
Histone chaperone networks shaping chromatin function
.
Nat Rev Mol Cell Biol
2017
;
18
:
141
58
.
35.
Wang
H
,
Zhang
X
,
Liu
X
,
Liu
K
,
Li
Y
,
Xu
H
. 
Diagnostic value of bronchoalveolar lavage fluid and serum tumor markers for lung cancer
.
J Cancer Res Ther
2016
;
12
:
355
8
.
36.
Poletti
V
,
Poletti
G
,
Murer
B
,
Saragoni
L
,
Chilosi
M
. 
Bronchoalveolar lavage in malignancy
.
Semin Respir Crit Care Med
2007
;
28
:
534
45
.
37.
Collins
LG
,
Haines
C
,
Perkel
R
,
Enck
RE
. 
Lung cancer: diagnosis and management
.
Am Fam Physician
2007
;
75
:
56
63
.
38.
Sareen
R
,
Pandey
CL
. 
Lung malignancy: diagnostic accuracies of bronchoalveolar lavage, bronchial brushing, and fine needle aspiration cytology
.
Lung India
2016
;
33
:
635
41
.
39.
Holdenrieder
S
,
Wehnl
B
,
Hettwer
K
,
Simon
K
,
Uhlig
S
,
Dayyani
F
. 
Carcinoembryonic antigen and cytokeratin-19 fragments for assessment of therapy response in non-small cell lung cancer: a systematic review and meta-analysis
.
Br J Cancer
2017
;
116
:
1037
45
.
40.
Sanchez-Vega
F
,
Mina
M
,
Armenia
J
,
Chatila
WK
,
Luna
A
,
La
KC
, et al
Oncogenic signaling pathways in the cancer genome atlas
.
Cell
2018
;
173
:
321
37
.
41.
Chen
H
,
Li
C
,
Peng
X
,
Zhou
Z
,
Weinstein
JN
,
Cancer Genome Atlas Research Network
et al 
A pan-cancer analysis of enhancer expression in nearly 9000 patient samples
.
Cell
2018
;
173
:
386
99
.
42.
Korkut
A
,
Zaidi
S
,
Kanchi
RS
,
Rao
S
,
Gough
NR
,
Schultz
A
, et al
A pan-cancer analysis reveals high-frequency genetic alterations in mediators of signaling by the TGF-beta superfamily
.
Cell Syst
2018
;
7
:
422
37
.
43.
Huang
KL
,
Mashl
RJ
,
Wu
Y
,
Ritter
DI
,
Wang
J
,
Oh
C
, et al
Pathogenic germline variants in 10,389 adult cancers
.
Cell
2018
;
173
:
355
70
.
44.
Bailey
MH
,
Tokheim
C
,
Porta-Pardo
E
,
Sengupta
S
,
Bertrand
D
,
Weerasinghe
A
, et al
Comprehensive characterization of cancer driver genes and mutations
.
Cell
2018
;
174
:
1034
5
.
45.
Hoadley
KA
,
Yau
C
,
Hinoue
T
,
Wolf
DM
,
Lazar
AJ
,
Drill
E
, et al
Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer
.
Cell
2018
;
173
:
291
304
.
46.
Saghafinia
S
,
Mina
M
,
Riggi
N
,
Hanahan
D
,
Ciriello
G
. 
Pan-cancer landscape of aberrant DNA methylation across human tumors
.
Cell Rep
2018
;
25
:
1066
80
.
47.
Tiberi
G
,
Pekowska
A
,
Oudin
C
,
Ivey
A
,
Autret
A
,
Prebet
T
, et al
PcG methylation of the HIST1 cluster defines an epigenetic marker of acute myeloid leukemia
.
Leukemia
2015
;
29
:
1202
6
.
48.
Bonev
B
,
Cavalli
G
. 
Organization and function of the 3D genome
.
Nat Rev Genet
2016
;
17
:
661
78
.
49.
Dixon
JR
,
Xu
J
,
Dileep
V
,
Zhan
Y
,
Song
F
,
Le
VT
, et al
Integrative detection and analysis of structural variation in cancer genomes
.
Nat Genet
2018
;
50
:
1388
98
.
50.
Yang
X
,
Gao
L
,
Zhang
S
. 
Comparative pan-cancer DNA methylation analysis reveals cancer common and specific patterns
.
Brief Bioinform
2017
;
18
:
761
73
.
51.
Hao
X
,
Luo
H
,
Krawczyk
M
,
Wei
W
,
Wang
W
,
Wang
J
, et al
DNA methylation markers for diagnosis and prognosis of common cancers
.
Proc Natl Acad Sci U S A
2017
;
114
:
7414
9
.
52.
Brena
RM
,
Plass
C
,
Costello
JF
. 
Mining methylation for early detection of common cancers
.
PLoS Med
2006
;
3
:
e479
.