The UProC (ultrafast protein classification) toolbox implements a novel algorithm (“Mosaic Matching”) for large-scale sequence analysis and is now available in terms of an open source C library. UProC is up to three orders of magnitude faster than profile-based methods and achieved up to 80% higher sensitivity on unassembled short reads (100 bp) from simulated metagenomes. UProC does not depend on a multiple alignment of family-specific sequences. Therefore, in addition to the protein domain classfication according to the Pfam database, UProC can, in principle, also provide the detection of KEGG Orthologs
PISCES (Protein Sequence Culling Server) is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity.
C-HMM is a software to detect remote/distant homologues from protein sequence databases. It is based on HMMs(Hidden Markov Models) for identifying the deep evolutionary relationships of protein sequences.
CS-PSeq-Gen is a program derived from PSeq-Gen, a program developed by Nick C. Grassly and Andrew Rambaut, designed to simulate the evolution of protein sequences along evolutionary trees. CS-PSeq-Gen modifications are related to the aim of simulating the evolution of protein sequences under the constraints of the information of a particular reconstructed phylogeny: the “root sequence” that initiates the simulation, or the rate heterogeneity among sites are specific on each particular protein family. CS-Pseq-Gen will allow simulations to take such information into account. As well, exploring the evolution of one protein family and testing hypotheses makes often it necessary to have some control on the variability of the parameters. CS-PSeq-Gen will allow some control on the simulated tree / branch lengths around an average value. Finally, a particular category of applications for such simulations is the search for the significant co-evolution of sites. CS-PSeq-Gen offers some facilities to generate sequences under such hypotheses, and propose a basic scheme for their detection, that can be easily adapted by programmers.
tantan is a tool to mask simple regions (low complexity and short-period tandem repeats) in DNA, RNA, and protein sequences.The aim of tantan is to prevent false predictions when searching for homologous regions between two sequences. Simple repeats often align strongly to each other, causing false homology predictions.