mGene is a computational tool for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It is based on recent advances in machine learning and uses discriminative training techniques, such as support vector machines (SVMs) and hidden semi-Markov support vector machines (HSMSVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. The evaluated developmental version of mGene exhibited the best prediction performance (in terms of the average between sensitivity and specificity) for the multiple-genome prediction tasks on all four evaluation levels (considering, nucleotides, exons, transcripts and genes). The ab-initio version was best on nucleotide, exon and transcript level, and only slightly worse than Augustus on the gene level. The fully developed version shows the best overall performance compared to the submitted gene finders’ predictions, including the ones of Fgenesh and Augustus.
LOCUS (Length Optimized Characterization of Unknown Spliceforms) is a dynamic-programming algorithm for finding the optimal set of splice sites in a genomic region of interest with estimated prior length information.
::DEVELOPER
Stormo Lab in Department of Genetics, Washington University
The GotohScan program is a search tool that finds shorter sequences (usually genes) in large database sequences (chromosomes, genomes, ..)by computing all semi-global alignments. Thus, the query sequence is never truncated or split into subsequences, but always mapped to the database over its complete length. The alignment is computed via the Gotoh-alignment algorithm using affine gap costs.