The program ProtEvol performs two kinds of computation.
It computes the mean-field site-specific amino acid distributions that have minimal differences with respect to the background distribution and that constraint the average stability of the native state of the protein against both unfolding and misfolding. The program also computes an exchangeability matrix derived from an empirical substitution model or from a mutation model that can be used together with the site-specific distributions for applications in phylogenetic inference.
It simulates protein evolution subject to the constraint of selection on the folding stability of the native state of the protein against both unfolding and misfolding.
forqs is a forward-in-time population genetics simulation that tracks individual haplotype chunks as they recombine each generation. forqs also also models quantitative traits and selection on those traits.
rvsel is an R package for rare variants selection with sequence data. The most outome-related rare variants are selected within a gene or a genetic region. The selection procedure is based on the power set of the subset of the rare variants.
H-clust is a simple clustering method that can be used to rapidly identify a set of tag SNP’s based upon genotype data. This method does not require haplotype estimation. H-clust consists of two stages. The first stage uses hierarchical clustering to determine the clusters. In the second stage, the tag SNP is chosen by finding the SNP most correlated with all the other SNPs in the cluster. Optionally, the quality of each SNP can be included in the analysis. In this case, both quality and correlation affect the determination of tag SNPs. The input for H-clust is a genotype matrix using 0,1,2 to denote the number of copies of a particular allele. It then computes the similarity matrix based on Pearson’s correlation between allele counts. The distance between two SNPs is one minus the squared correlation. By default, H-clust uses the “complete linkage” method. Hierarchical clustering can be represented as a dendrogram in which any two SNPs diverge at a height equal to their distance. The clusters are obtained by declaring SNPs to be in the same cluster when they converge before a certain cut-off value. In the H-clust program, this cutoff is 1- hcbound, where hcbound is determined by the user. (This is slightly different in the stepwise version, see below.) The second stage of H-clust finds a tag SNP to represent the cluster. This is done by scoring each SNP based on squared correlation and quality. If multiple SNPs are scored equally, then the one in the middle is chosen as the tag SNP.
BoNB (Bag of Naïve Bayes), an algorithm for genetic biomarker selection and subjects classification from the simultaneous analysis of genome-wide SNP data. BoNB is based on the Naïve Bayes classification framework, enriched by three main features: bootstrap aggregating of an ensemble of Naïve Bayes classifiers, a novel strategy for ranking and selecting the attributes used by each classifier in the ensemble and a permutation-based procedure for selecting significant biomarkers, based on their marginal utility in the classification process. BoNB is tested on the Wellcome Trust Case-Control study on Type 1 Diabetes and its performance is compared with the ones of both a standard Naïve Bayes algorithm and HyperLASSO, a penalized logistic regression algorithm from the state-of-the-art in simultaneous genome-wide data analysis.
RADinitio is a forward simulator for creating population-level RAD data sets, based on a given reference genome. This in silico RADseq library preparation and sequencing process, allows for the exploration of parameters including restriction enzyme selection, library insert size, PCR duplicate distribution, and sequencing coverage.