sgp2 1.1 – Predict Genes by comparing Anonymous Genomic Sequences from two different Species

sgp2 1.1

:: DESCRIPTION

sgp2 is a program to predict genes by comparing anonymous genomic sequences from two different species. It combines tblastx, a sequence similarity search program, with geneid, an “ab initio” gene prediction program. In “assymetric” mode, genes are predicted in one sequence from one species (the target sequence), using a set of sequences (maybe only one) from the other species (the reference set). Essentially, geneid is used to predict all potential exons along the target sequence. Scores of exons are computed as log-likelihood ratios, function of the splice sites defining the exon, the coding bias in composition of the exon sequence as measured by a Markov Model of order five, and of the optimal alignment at the amino acid level between the target exon sequence and the counterpart homologous sequence in the reference set. From the set of predicted exons, the gene structure is assembled (eventually multiple genes in both strands) maximizing the sum of the scores of the assembled exons.

::DEVELOPER

RODERIC GUIGO LAB

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

sgp2

:: MORE INFORMATION

Citation

G. Parra, P. Agarwal, J.F. Abril, T. Wiehe, J.W. Fickett and R. Guigó.
“Comparative gene prediction in human and mouse.”
Genome Research 13(1):108-117 (2003)

geneid 1.4.4 – Predict Genes in Anonymous Genomic Sequences

geneid 1.4.4

:: DESCRIPTION

geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using Position Weight Arrays (PWAs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. geneid offers some type of support to integrate predictions from multiple source via external gff files and the redefinition of the general gene structure or model is also feasible. The accuracy of geneid compares favorably to that of other existing tools, but geneid is likely more efficient in terms of speed and memory usage.

geneid Online

:: DEVELOPER

geneid Team

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

geneid

:: MORE INFORMATION

Citation:

E. Blanco, G. Parra and R. Guigó,
“Using geneid to Identify Genes.”,
Curr Protoc Bioinformatics. 2007 Jun;Chapter 4:Unit 4.3.

Spines 1.15 – C++ Software Package for Genomic Sequence Alignment and Analysis

Spines 1.15

:: DESCRIPTION

Spinesis a C++ software package for genomic sequence alignment and analysis.

::DEVELOPER

The Broad Institute, Cambridge, MA

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux/MacOsX

:: DOWNLOAD

Spines

:: MORE INFORMATION

Citation

Grabherr MG, Russell P, Meyer M, Mauceli E, Alfoldi J, Di Palma F, Lindblad-Toh K.
Genome-wide synteny through highly sensitive sequence alignment: Satsuma.
Bioinformatics. 2010 May 1;26(9):1145-51. Epub 2010 Mar 5

REPET 3.0 / PASTEClassifier 2.0 – Detection, Annotation and Analysis of Repeats in Genomic Sequences

REPET 3.0 / PASTEClassifier 2.0

:: DESCRIPTION

REPET is a software of detection, annotation and analysis of repeats in genomic sequences, specifically designed for transposable elements

PASTEClassifier classifies TEs by searching for structural features and similarities.

::DEVELOPER

URGI

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

REPET , PASTEC

:: MORE INFORMATION

Citation

PASTEC: An Automatic Transposable Element Classification Tool.
Hoede C, Arnoux S, Moisset M, Chaumier T, Inizan O, Jamilloux V, Quesneville H.
PLoS One. 2014 May 2;9(5):e91929. doi: 10.1371/journal.pone.0091929.

Flutre T, Duprat E, Feuillet C, Quesneville H.
Considering transposable element diversification in de novo annotation approaches.
PLoS One. 2011 Jan 31;6(1):e16526.

fragrep 2 – Efficient Search for Fragmented Patterns in RNA

fragrep 2

:: DESCRIPTION

fragrep implements an efficient algorithm for detecting nucleotide pattern fragments in genomes that occur in a given order. For each pattern a tolerance can be specified separately.

::DEVELOPER

Bioinformatics Leipzig

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

fragrep

:: MORE INFORMATION

Citation

Axel Mosig, Julian Chen, Peter F. Stadler,
Homology Search with Fragmented Nucleic Acid Sequence Patterns,
Proc. Worksh. Alg. Bioinf. (WABI), 2007.

MaM 1.4.2 – Manipulate Multiple Alignments of Genomic Sequences

MaM 1.4.2

:: DESCRIPTION

MaM (Multiple Alignment Manipulator) is a software tool that processes and manipulates multiple alignments of genomic sequences.MaM computes the exact locations of common repeat elements in multiple aligned sequences, provided by a variety of user identified programs databases and tables.

::DEVELOPER

Lab for Bioinformatics and Computational Genomics

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux
repeatmasker
cross_match
gnuplot
C Complier

:: DOWNLOAD

MaM

:: MORE INFORMATION

Citation

“Manipulating Multiple Sequence Alignments via MaM and WebMaM“,
Can Alkan, Eray Tuzun, Jerome Buard, Franck Lethiec, Evan E. Eichler, Jeffrey A. Bailey, S. Cenk Sahinalp.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W295-8.

Riboswitch Scanner – pHMM based application for Detecting Riboswitches from Genomic Sequences

Riboswitch Scanner

:: DESCRIPTION

Riboswitch Scanner is a new web application which provides an automated pipeline for pHMM-based detection of riboswitches in partial as well as complete genomic sequences rapidly, with high sensitivity and specificity.

::DEVELOPER

Riboswitch Scanner team

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Web browser

:: DOWNLOAD

:: MORE INFORMATION

Citation:

Riboswitch Scanner: An efficient pHMM-based web-server to detect riboswitches in genomic sequences.
Mukherjee S, Sengupta S.
Bioinformatics. 2015 Oct 30. pii: btv640.

MITE-Hunter 201111 – Discovering miniature inverted-repeat Transposable Elements from Genomic Sequences

MITE-Hunter 201111

:: DESCRIPTION

MITE-Hunter was designed to identify miniature invertedrepeat transposable elements (MITEs) as well as other small (< 2Kb, default) class 2 nonautonomous transposable elements (TEs) from genomic DNA datasets. Class 1 TEs and long TEs can’t be found by MITE-Hunter. TEs with too many mutations and mismatches in the terminal inverted repeats (TIRs) may not be detected.

::DEVELOPER

MITE-Hunter team

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux
Perl
NCBI BLAST
Muscle
mDust

:: DOWNLOAD

MITE-Hunter

:: MORE INFORMATION

Citation

Nucl. Acids Res. (2010) 38 (22): e199.
MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences
Yujun Han and Susan R. Wessler

UNWORDS – Compute Absent Words in Genomic Sequences

UNWORDS

:: DESCRIPTION

UNWORDS is an efficient software for computing shortest strings which do not occur in a given set of DNA sequences. UNWORDS is more efficient than previous algorithms and easier to use. It uses bit vector encoding of strings and therefore we called it unwords-bits. It directly computes unwords without the need to specify a length estimate. Moreover, it avoids the space requirements of index structures such as suffix trees and suffix arrays.

::DEVELOPER

Stefan Kurtz

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

UNWORDS

:: MORE INFORMATION

Citation

Herold, J. and Kurtz, S. and Giegerich, R.
Efficient Computation of Absent Words in Genomic Sequences,
BMC Bioinformatics, 2008, 9:167.

RECON 1.05 – Identification of Repeat Families from Genomic Sequences

RECON 1.05

:: DESCRIPTION

The RECON package performs de novo identification and classification of repeat sequence families from genomic sequences. The underlying algorithm is based on extensions to the usual approach of single linkage clustering of local pairwise alignments between genomic sequences. Specifically, our extensions use multiple alignment information to define the boundaries of individual copies of the repeats and to distinguish homologous but distinct repeat element families. RECON should be useful for first-pass automatic classification of repeats in newly sequenced genomes.

::DEVELOPER

Eddy lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux/Mac OsX/Windows
Perl
C Compiler

:: DOWNLOAD

RECON

:: MORE INFORMATION

Citation:

Bao Z. and Eddy S.R. (2002)
Automated de novo Identification of Repeat Sequence Families in Sequenced Genomes.
Genome Research, 12:1269-1276.