RenBio is a program to identify gene and protein names in a textual document based on machine learning techniques.RenBio searches for named entities in a document according to a decision tree. The attributes of the tree nodes may be regex matches, dictionary matches or signa words.
IsotopeCalculator is a memory efficient algorithms for accurately calculating the isotopic fine structures of molecules. Treating individual isotopic species of a molecule as different mass states, we introduce the concept of transitions between mass states and represent all mass states of the molecule in a hierarchical structure. We show that there exists a simple relationship between two different mass states at two different levels of the hierarchical structure. This allows us to efficiently and accurately compute both the mass and the abundance of every mass state of a small to medium-sized molecule, whose gross structures include small number of fine structures. A truncated calculation of this algorithm can be applied to calculate a majority of isotopic species (99.99% of cumulative abundance) of a large molecule.
imCellPhen (Interactive mining of cellular phenotypes) is an innovative computing paradigm that uses intelligent human-computer interfaces to facilitate the application of the HCS technology in biomedical research. It’s a a new computing paradigm that combines unsupervised pattern mining techniques, P-VDE interfaces, and CBIR-RF techniques to boost the exploitation capacity of the HCS technology and facilitate its application to biomedical research.
Hong, P. (2006).
Interactive Analysis of High-Content Cellular Images via Relevant Feedback.
2006 Workshop on Multiscale Biological Imaging, Data Mining and Informatics, Santa Barbara, CA, USA.
GeneNotes extension is a gene-oriented tool and is developed to help biologists collect and manage a variety of biological information as notes from the Internet. It greatly helps biologists during the decision making processes in large-scale functional genomics studies.
TFdiff identifies the context-dependent transcription factor binding sites (TFBSs) interactions that may yield an explanation why the expression of genes is modified in different directions given a particular condition.
RepbaseSubmitteris a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position.
DASS-GUI is a stand-alone program written in C++ that calculates all significant closed sets* of a given dataset containing the host sets. Some of the used algorithms are taken from Hollunder et al. (2007). DASS-GUI also allows additional analyses of the identified closed sets: filtering, handling of synonymous names, enrichment analyses, calculation of means and standard deviations of different numerical features, extraction of the underlying closed set hierarchy and corresponding export as GML file, as well as comparison (validation) with pre-defined sets.
Validate GTF is a flexible Perl script that checks a GTF file for correctness. It can detect most common syntactic errors, such as including the stop codon within the CDS annotation. It can also detect semantic errors, such as annotated coding sequence that contains stop codons spanning splice sites.
parredHMMlib is a C++ library implementing the parredForward and parredViterbi algorithms for multi-core CPUs, parallelizing analysis of hidden Markov models with small state spaces.