BANNER is a named entity recognition system, primarily intended for biomedical text. It is a machine-learning system based on conditional random fields and contains a wide survey of the best features in recent literature on biomedical named entity recognition (NER). BANNER is portable and is designed to maximize domain independence by not employing semantic features or rule-based processing steps. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text.
SEGMER is a segmental threading algorithm designed to recoginzing substructure motifs from the Protein Data Bank (PDB) library. It first splits target sequences into segments which consists of 2-4 consecutive or non-consecutive secondary structure elements (alpha-helix, beta-strand). The sequence segments are then threaded through the PDB to identify conserved substructures. It often identifies better conserved structure motifs than the whole-chain threading methods, especially when there is no similar global fold existing in the PDB.
SPRING is a template-base algorithm for protein-protein structure prediction. It first threads one chain of the protein complex through the PDB library with the binding parters retrieved from the original oligomer entries. The complex models associated with another chain is deduced from a pre-calculated look-up table, with the best orientation selected by the SPRING-score which is a combination of threading Z-score, interface contacts, and TM-align match between monomer-to-dimer templates.
SAXSTER is a new algorithm to combine small-angle x-ray scattering (SAXS) data and threading for high-resolution protein structure determination. Given a query sequence, SAXSTER first generates a list of template alignments using the MUSTER threading program from the PDB library. The SAXS data will then be used to prioritize the best template alignments based on the SAXS profile match, which are finally used for full-length atomic protein structure construction
RW (Random-Walk) is distance-dependent atomic potential for protein structure modeling and structure decoy recognition. It was derived from 1,383 high-resolution PDB structures using an ideal random-walk chain as the reference state. The RW potential has been extensively optimized and tested on a variety of protein structure decoy sets and demonstrates a significant power in protein structure recognition and a strong correlation with the RMSD of decoys to the native structures
LIBRA is based on a graph theory approach to find the largest subset of similar residues between an input protein and a collection of known functional sites.
LIBRA+ is an upgraded version of LIBRA, a tool that, given a protein’s structural model, predicts the presence and identity of active sites and/or ligand binding sites. The algorithm implemented by LIBRA+ is based on a graph theory approach to find the largest subset of similar residues between an input protein and a collection of known functional sites. For this purpose, the algorithm makes use of two predefined databases for active sites and ligand binding sites, respectively derived from the Catalytic Site Atlas and the Protein Data Bank.
LIBRA Web Application is an online portal where users can exploit LIBRA+’s capabilities in recognizing the presence and identity of active sites and/or ligand binding sites given a protein’s structural model. With a free registration, users are given a personal space where they can launch and schedule multiple recognitions, check out the resulting three-dimensional alignments and browse ligand clusters. Results produced in LIBRAWA are backward-compatible with LIBRA+ and can thus be exported in LIBRA+’s format to be accessed offline from the desktop application.
TIPR (Transcription Initiation Pattern Recognizer) is a sequence-based machine learning model that identifies TSSs with high accuracy and resolution for multiple spatial distribution patterns along the genome, including broadly distributed TSS patterns that have previously been difficult to characterize.
pGenTHREADER and pDomTHREADER is two improved versions of the GenTHREADER protocol for recognizing and aligning protein sequences and demonstrate their application to structure prediction and superfamily discrimination. The two versions use the same core alignment algorithm and in both cases accept features derived from common inputs: protein sequence profiles and structural information. However, the representation and combinations of these features differ between the methods and scoring and confidence values have been tuned to optimize performance in each application domain.
CROSS predicts the secondary structure propensity profile of an RNA molecule at single-nucleotide resolution. CROSS produces a table with the propensity scores and a graphical representation of the profile.
CROSSalign computes the similarity of RNA secondary structure
CROSSalive computes the structure of RNA molecules in vivo. Changes of structure upon N6-Methyladenosine methylation can be predicted.