mobility is a collection of Perl scripts to calculate gene-level similarity among annotated genomes. The scripts can be executed from the command line and the only dependencies are BioPerl and NCBI BLAST. Separately, we include a MATLAB script to calculate fluidity and its variance directly from matrices of shared and total gene counts.
GeneMark developed in 1993 was the first gene finding method recognized as an efficient and accurate tool for genome projects. GeneMark was used for annotation of the first completely sequenced bacteria, Haemophilus influenzae, and the first completely sequenced archaea, Methanococcus jannaschii. The GeneMark algorithm uses species specific inhomogeneous Markov chain models of protein-coding DNA sequence as well as homogeneous Markov chain models of non- coding DNA. Parameters of the models are estimated from training sets of sequences of known type. The major step of the algorithm computes a posteriory probability of a sequence fragment to carry on a genetic code in one of six possible frames (including three frames in complementary DNA strand) or to be “non-coding”
GeneMark is documented as the most accurate prokaryotic gene finder.
GeneMark.hmm-P and GeneMark.hmm-E programs are predicting genes and intergenic regions in a sequence as a whole. They use the Hidden Markov models reflecting the “grammar” of gene organization.
The GeneMark.hmm (P and E) programs identify the maximum likely parse of the whole DNA sequence into protein coding genes (with possible introns) and intergenic regions.
AGMIAL is an integrated system for bacterial genome annotation. It is currently used at INRA for the newly sequenced bacterial genomes : Lactobacillus bulgaricus, Lactobacillus sakei and Flavobacterium psychrophilum, as well as the re-annotation of Lactococcus lactis, Enterococcus faecalis and faecium.
BRA (Binary Repeat Align) is a software for aligning tandem repeat regions for which the repeats can be treated as marked by mutations. Tandem repeat regions are abundant in many genomes. Normally not very informative, except for various fingerprinting techniques using length statistics, regions where the repeats are marked, that is, there are slight variations of the basic repeat, remarkable patterns can occur. These patterns can be utilized for evolutionary analysis.
avdist is a simple tool for bootstrap analysis of haplotype differences. It computes hamming distance between pairs of sequences sampled from the input sequences and presents average difference and standard devitation of the results after some number of iterations. Indels are discarded from the distance calculation.
MRSfinder is a program to find the Matrix Attachment Region (MAR) Recognition Signature in DNA sequence. The Matrix Attachment Region (MAR) Recognition Signature (MRS), defined by van Drunen et al. (Nucleic Acids Research 1999, 27:2924-30), has been proposed as a motif characteristic of MAR. The signature is composed of two motifs (AATAAYAA and AWWRTAANNWWGNNNC (one mis-match allowed)) which lie with 200bp of each other, on either strand of the DNA duplex. MRSfinder.pl searches a user defined FASTA sequence file for all instances of the MRS and reports their positions.
::DEVELOPER
The Blaxter Lab at The Institute of Evolutionary Biology University of Edinburgh
annot8r is a tool for the annotation of protein or nucleotide sequences from non model organisms with GO terms, EC numbers and KEGG pathways. The annotation is based on BLAST similarity searches against annotated subsets of EMBL UniProt from which sequences with non-informative entries have been removed. GO, EC and KEGG annotations are saved as flat files and in a relational postgreSQL database to allow for more sophisticated searches within the results.
WebPartiGene is a tool that generates HTML, php and CGI scripts that together form a web based interface to PartiGene relational databases. WebPartiGene allows specific clusters to be retrieved from the PartiGene database by entering either a cluster identifier or an EST idenitifier. The BLAST annotation text can also be searched and limited by BLAST score. Selected clusters are displayed in graphical format, showing the alignment of constituent sequences. The display includes links to full BLAST reports, phrap assembly quality files and public depository database files. WebPartiGene is full compatible with mulit-species PartiGene databases.
::DEVELOPER
The Blaxter Lab at The Institute of Evolutionary Biology University of Edinburgh
CLOBB (Cluster on the basis of BLAST similarity) takes a set of DNA sequences and clusters them into groups which putatively derive from the same gene. In order to operate, the user must have BLASTALL in their path. The output is a blastable fasta file named <cluster_id>EST, where cluster_id is given by the user, which contails a list of sequences with identifiers <cluster_id>00001 to <cluster_id>99999.
::DEVELOPER
John Parkinson (john.parkinson@ed.ac.uk) and Mark Blaxter , Institute of Cell, Animal and Population Biology, University of Edinburgh
trace2seq process raw sequencing chromatograph trace files into quality-checked sequences. trace2seq is a PERL script, designed to be run friom the command line.
::DEVELOPER
The Blaxter Lab at The Institute of Evolutionary Biology University of Edinburgh