A study of more than 454,000 people used exome sequencing to identify rare DNA variants that influence disease risk. Analysis of subjects of European ancestry uncovered variants in 564 genes that affect health-related traits, including variants in 15 genes linked to cancer.

Exome sequencing can uncover rare genetic variants that affect health and can help nail down the genes they alter, according to a massive study of data from the world's largest biobank, which contains information on about 500,000 participants (Nature 2021 Oct 18 [Epub ahead of print]).

Genome-wide association studies (GWAS) have linked thousands of DNA variants to altered disease risk, including more than 400 variants that boost the odds of developing cancer. However, GWAS is better at detecting common variants and may not catch rare ones that can have dramatic effects. Moreover, the variants flagged in GWAS often lie in noncoding sequences—and the genes and gene functions they influence may not be clear. But exome sequencing may reveal rare variants and provide insight into how they modify disease risk.

To assess that approach, Manuel Ferreira, PhD, of the Regeneron Genetics Center in Tarrytown, NY, and colleagues performed exome sequencing on samples from 454,787 people in the UK Biobank, which contains blood and urine samples, brain scans, activity measurements, health records, and other information. The researchers found 12.3 million variants in the sequenced exomes, including more than 7 million missense variants and more than 900,000 loss-of-function alterations. They then asked whether the variants were associated with 3,994 health-related traits, including weight, neutrophil count, and blood glucose level.

When the scientists analyzed the results for the 430,998 participants who were of European descent, they determined that variants in 564 genes correlated with disease risk. Although the team did not focus on cancer, they detected 141 variants in 15 genes that affected the likelihood of developing cancer. For instance, variants in CDH2 and NFKBIE were associated with B cell–type chronic lymphocytic leukemia. The researchers found most of the same associations between variants and traits when they analyzed exome sequences from the participants who were not of European descent.

Researchers are interested in identifying beneficial variants because they could provide clues for designing new drugs. The team pinpointed novel protective variants in four genes. For example, variants in MAP3K15 protected against diabetes. No beneficial variants associated with cancer were identified.

Ferreira and colleagues also performed a GWAS on the UK Biobank samples that demonstrated that exome sequencing can help researchers home in on genes responsible for GWAS signals. One of these, HAL, codes for an enzyme that helps manufacture a UV-absorbing acid that Ferreira calls “a natural sunblock.”

Variants in HAL correlated with higher levels of vitamin D and increased risk of skin cancer. Vitamin D doesn't cause skin cancer, Ferreira says, but the variants likely lower levels of the UV-absorbing acid, leading to greater vitamin D synthesis and less sun protection. “An association that's relevant for cancer can come out because you are looking at traits that are not obviously connected to cancer,” he explains.

Although the study uncovered variants in 15 genes linked to cancer risk, no new cancer-associated genes were identified, says Ferreira. The UK Biobank contains only about 36,000 participants who have had cancer, he notes, making it difficult to tease out uncommon variants, particularly for individual cancer types. He and his colleagues have started a follow-up study with researchers at other institutions to accumulate more participants and provide greater discrimination.

“This clearly is a very impressive study,” says Li Ding, PhD, of the Washington University School of Medicine in St. Louis, MO, who wasn't connected to the research. The results show that exome sequencing “is a powerful approach for association studies.” More and more researchers are using whole-genome sequencing to search for disease-linked variants because it can uncover variants outside of coding regions. However, she notes, exome sequencing is less expensive than whole-genome sequencing and allows sequencing to a greater depth. –Mitch Leslie