GECKO is a genetic algorithm toclassify and extract meaningful sequences from multiple types of sequencing approaches including mRNA, microRNA, and DNA methylome data.
Turtle is a novel method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high coverage libraries and large genomes such as human.
SKESA is a de-novo sequence read assembler for microbial genomes. It uses conservative heuristics and is designed to create breaks at repeat regions in the genome.
::DEVELOPER
NCBI – National Center for Biotechnology Information
BFCounter is a program for counting k-mers from DNA sequencing data it uses a Bloom filter data structure to filter unique k-mers, likely generated from sequencing errors. Counting k-mers (substrings of length k) is an essential compononet of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting k-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction – often more than 50% – of the storage capacity may be spent on storing k-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton k-mers are uninformative for many algorithms without some kind of error correction.
PLEK uses an improved computational pipeline based on k-mer and support vector machine (SVM) to distinguish long non-coding RNAs (lncRNAs) from messager RNAs (mRNAs).
DSK is a k-mer counting software, similar to Jellyfish. Jellyfish is very fast but limited to large-memory servers and k ≤ 32. In contrast, DSK supports large values of k, and runs with (almost-)arbitrarily low memory usage and reasonably low temporary disk usage. DSK can count k-mers of large Illumina datasets on laptops and desktop computers.
KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie’s choices lead to assemblies that are close to the best possible over all k-mer lengths.