HAPMIXMAP is a program for modelling extended haplotypes in genetic association studies, similar to the FASTPHASE program developed by Scheet and Stephens (2006). The program models unphased genotype data on unrelated individuals, and fits a model in which linkage disequilibrium is generated by K independent Poisson arrival processes corresponding to K modal haplotype states. This corresponds to the observation that typically 2-4 common haplotypes account for most of the allelic diversity in any haplotype block, and that rarer haplotypes are typically slight variants of these modal haplotypes. The block-like structure of haplotypes in the genome, corresponding to ancestral recombination hotspots, is modelled by allowing the arrival rate to vary across the genome.
The eHap software is designed to analyze multilocus data as haplotypes, and to determine whether there is an association between haplotypes and phenotypes. This version of eHap embodies a broad (and ever broadening) set of tools for haplotype-based inference for association (and linkage) studies using population- and family-based samples.
CLUMPHAP implements a novel method for association testing based on clustering similar haplotypes (Knight et al. Submitted). This represents an extension of the basic methodology used in CLUMP, a program designed for the analysis of multi-allelic markers (Sham and Curtis 1995). CLUMPHAP calculates chi-squared statistics for binary partitions of haplotypes, where the number of partitions is reduced by allowing only those that are supported by a hierarchical cluster analysis of the haplotypes. CLUMPHAP obtains the empirical significance level of the largest chi-square statistic by a permutation procedure in which multiple permuted datasets (where the case-control labels have been randomly re-assigned) are subjected to exactly the same procedure of haplotype partitioning and calculation of largest chi-square statistic. Incidentally, this permutation procedure accounts for not only the inflation of the test statistic due to the maximization over the multiple ways of partitioning the haplotypes, but also for the uncertainty in haplotype phase of the individual subjects (Curtis and Sham 2006). The results are easy to interpret, a significant result suggests that a disease causing variant is present on haplotypes in the group which has an increased overall frequency in cases. CLUMPHAP reports the cluster pattern that resulted in the highest chi-squared along with the corresponding statistic and the empirical p-value.
harp : Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. harp implements an EM algorithm to calculate the frequencies of known haplotypes from pooled sequence data.