The haplotype structure of genes is relatively unexplored. We used publicly available data on 8773 polymorphisms from 90 people of varied ethnicity to characterize haplotype structure in 106 DNA repair and cell cycle control genes. Using SNPs with a minor allele frequency greater than 0.1, we constructed a parsimonious set of haplotypes for each gene. On average there were 19 common SNPs per gene and 4.5 haplotypes per gene. These haplotypes explain, on average, 94% of the genotype variation in the sample. Interestingly, we found that most haplotypes within a gene are orthogonal to one another, i.e. they are defined by mutually disjoint sets of co-occurring minor alleles at multiple SNPs. These results have a number of important implications. First they suggest that like recent studies of genome blocks, genes have low haplotype diversity and that haplotypes are shared across ethnic groups. Second, they suggest that intragenic recombination is surprisingly low. Finally, they offer a 4 to 20 fold reduction in genotyping requirements. Haplotypes thus provide both an efficient means of surveying most population variation within genes and an appropriate model for data analysis. Given that haplotypes are shared across ethnic groups, they have broad applicability in diverse populations. Thus the search for the genetic determinants of disease may be less complex, and the results more widely applicable across human populations, than we might have expected.

[Proc Amer Assoc Cancer Res, Volume 45, 2004]