The program ProtEvol performs two kinds of computation.
It computes the mean-field site-specific amino acid distributions that have minimal differences with respect to the background distribution and that constraint the average stability of the native state of the protein against both unfolding and misfolding. The program also computes an exchangeability matrix derived from an empirical substitution model or from a mutation model that can be used together with the site-specific distributions for applications in phylogenetic inference.
It simulates protein evolution subject to the constraint of selection on the folding stability of the native state of the protein against both unfolding and misfolding.
CONSEL is a program package consists of small programs written in C language. It calculates the probability value (i.e., p-value) to assess the confidence in the selection problem. Although CONSEL is applicable to any selection problem, it is mainly designed for the phylogenetic tree selection. CONSEL does not estimate the phylogenetic tree by itself, but CONSEL does read the output of the other phylogenetic packages, such as Molphy, PAML, PAUP*, TREE-PUZZLE, and PhyML. CONSEL calculates the p-value using several testing procedures; the bootstrap probability, the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, and the weighted Shimodaira-Hasegawa test. In addition to these conventional tests, CONSEL calculates the p-value based on the approximately unbiased test using the multi-scale bootstrap technique.
SNPPicker is a post-processor to optimize the selection of tag SNPs from common bin-tagging programs. SNPPicker uses a multi-step search strategy in combination with a statistical model to produce optimal genotyping panels. SNPPicker’s algorithm is also designed to optimize tag SNP selection for multi-population panels. It accounts for several assay-specific constraints such as predicted assay failure of SNPs and avoidance of SNPs that are too close. The latter issue affects one third of all SNPs in the 1000 genomes project pilot 1 data.SNPPicker automates the design of tag SNP genotyping panels by maximizing the likelihood of successfully genotyping the selected SNPs while minimizing the number of tag SNPs to assay. Geno-typing success is a function of two properties: the genotyping probability of a bin (or cluster of bins) statistically derived from the individual genotyping probability of each SNP; and (for some platforms) the proximity distance between SNPs. The genotyping probabilities currently used by SNPPicker are derived a from pro-spective analysis of the performance of genotyping assay and the probability model can be updated or changed for other platforms. SNP proximity is a strictly enforced constraint
mRMRe contains a set of function to compute mutual information matrices from continuous, categorical and survival variables. It also contains function to perform feature selection with minimum Redundancy, Maximum Relevance (mRMR) and a new ensemble mRMR technique.
ModelGenerator is a model selection program that selects optimal amino acid and nucleotide substitution models from Fasta or Phylip alignments. ModelGenerator supports 56 nucleotide and 96 amino acid substitution models.