seqRFLP includes functions for handling DNA sequences, especially simulated RFLP and TRFLP pattern based on selected restriction enzyme and DNA sequences.
sgp2 is a program to predict genes by comparing anonymous genomic sequences from two different species. It combines tblastx, a sequence similarity search program, with geneid, an “ab initio” gene prediction program. In “assymetric” mode, genes are predicted in one sequence from one species (the target sequence), using a set of sequences (maybe only one) from the other species (the reference set). Essentially, geneid is used to predict all potential exons along the target sequence. Scores of exons are computed as log-likelihood ratios, function of the splice sites defining the exon, the coding bias in composition of the exon sequence as measured by a Markov Model of order five, and of the optimal alignment at the amino acid level between the target exon sequence and the counterpart homologous sequence in the reference set. From the set of predicted exons, the gene structure is assembled (eventually multiple genes in both strands) maximizing the sum of the scores of the assembled exons.
geneid is a program to predict genes in anonymous genomic sequences designed with a hierarchical structure. In the first step, splice sites, start and stop codons are predicted and scored along the sequence using Position Weight Arrays (PWAs). In the second step, exons are built from the sites. Exons are scored as the sum of the scores of the defining sites, plus the the log-likelihood ratio of a Markov Model for coding DNA. Finally, from the set of predicted exons, the gene structure is assembled, maximizing the sum of the scores of the assembled exons. geneid offers some type of support to integrate predictions from multiple source via external gff files and the redefinition of the general gene structure or model is also feasible. The accuracy of geneid compares favorably to that of other existing tools, but geneid is likely more efficient in terms of speed and memory usage.
RNAProt is a computational RBP binding site prediction framework based on recurrent neural networks (RNNs). Conceived as an end-to-end method, RNAProt includes all necessary functionalities, from dataset generation over model training to the evaluation of binding preferences and binding site prediction.
MemPype is a Python-based pipeline that integrates several tools the prediction of topology and subcellular localization of Eukaryotic membrane proteins.
Triplet-SVM is developed for predicting a query sequence with hairpin structure as a real miRNA precursor or not. The triplet-SVM classifier analyzes the triplet elements of the query and predicts it using a SVM classifier. The SVM classifier is previously trained based on the triplet element features of a set of real miRNA precursors and a set of pseudo-miRNA hairpins.
nocoRNAc (non-coding RNA characterization) is a Java program for the prediction and characterization of ncRNA transcripts in bacteria. nocoRNAc takes the coordinates of putative ncRNA loci as input and annotates them with transcriptional features to conduct strand-specific transcript predictions. Our approach is not limited to intergenic regions but also applied to predict cis-encoded asRNA transcripts. For the detection of the transcript’s 3′ end nocoRNAc integrates the program TransTermHP (Kingsford et al., 2007) to predict Rho-independent terminator signals. The 5′ start is predicted by the detection of destabilized regions in the genomic DNA. For this purpose we implemented the so-called SIDD model (Benham, Bi, 2004), which has been shown to be applicable to the detection of promoter regions in microbial genomes. Therefore, nocoRNAc does not have to rely on information about known TFBS. The putative transcriptional features are then combined to classify ncRNA loci into either being an ncRNA transcript or not. For ncRNAs that are classified as transcripts the strand is automatically specified, and its boundaries are derived from the SIDD sites and the Rho-independent transcription termination signal. Those loci that are classified not to be a transcript might be false positive predictions or they contain cis-regulatory motifs. For the latter, nocoRNAc incorporates other functionalities for the further analysis of the ncRNA loci such as the search for known RNA motifs from the Rfam database. Furthermore, nocoRNAc provides methods for the prediction of RNA-RNA interactions between ncRNAs and mRNAs. All results can be studied in detail in nocoRNAc’s integrated interactive R environment.
DriverNet is a package to predict functional important driver genes in cancer by integrating genome data (mutation and copy number variation data) and transcriptome data (gene expression data). The different kinds of data are combined by an influence graph, which is a gene-gene interaction network deduced from pathway data. A greedy algorithm is used to find the possible driver genes, which may mutated in a larger number of patients and these mutations will push the gene expression values of the connected genes to some extreme values.