HipMer is a high-performance application that produces high-quality de novo assemblies for very large-scale genomes.
The MetaHipMer extension is a recent addition to HipMer that is geared to large metagenomes and leverages iterative kmer sizes and a specialized scaffolding algorithm to produce increased contiguity and accuracy in metagenomic assemblies. It is able to reconstruct rRNA elements via a separate algorithm which relies on reference SSU and LSU Hidden Markov Models to help traverse the contig graph around ribosomal RNA regions.
E. Georganas et al.,
“HipMer: an extreme-scale de novo genome assembler,”
SC ’15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015, pp. 1-11, doi: 10.1145/2807591.2807664.
Hofmeyr S, Egan R, Georganas E, Copeland AC, Riley R, Clum A, Eloe-Fadrosh E, Roux S, Goltsman E, Buluç A, Rokhsar D, Oliker L, Yelick K. Terabase-scale metagenome coassembly with MetaHipMer.
Sci Rep. 2020 Jul 1;10(1):10689. doi: 10.1038/s41598-020-67416-5. PMID: 32612216; PMCID: PMC7329831.
FCMM is a pipeline for top-k based functional characterization of multiple metagenome samples to infer the major functions as well as their quantitative scores in a comparative metagenomics manner.
MEGAN (MEta Genome ANalyzer) allows laptop analysis of large metagenomic data sets. In a preprocessing step, the set of DNA sequences is compared against databases of known sequences using BLAST or another comparison tool. MEGAN is then used to compute and explore the taxonomical content of the data set, employing the NCBI taxonomy to summarize and order the results. A simple lowest common ancestor algorithm assigns reads to taxa such that the taxonomical level of the assigned taxon reflects the level of conservation of the sequence. The software allows large data sets to be dissected without the need for assembly or the targeting of specific phylogenetic markers. It provides graphical and statistical output for comparing different data sets.
MetaMeta is a pipeline to execute and integrate results from metagenome analysis tools. It provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample.
Taxy is a software for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Inferring the taxonomic composition of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in computational biology. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmental sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependency complicates comparative analysis of data originating from different sequencing platforms or preprocessing pipelines. We have developed a read length-independent method for taxonomic profiling and we provide a freely available Matlab/Octave toolbox which includes an ultra-fast implementation of that method. Besides the platform-independent toolbox we also provide a prototype tool implementation for Windows that allows the user to compare a large number of preprocessed metagenomes within a graphical environment.Our tests indicate that Taxy results compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, Taxy provides a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed. As input, DNA sequences in terms of multi-FASTA files of any size can be used for the estimation of metagenomic profiles. The analysis of a large sequence file with a Gbp volume typically requires less than a minute of processing time and can even be performed on a standard notebook.
In contrast to the oligonucleotide-based Taxy method, Taxy-Pro is based on mixture model analysis of protein signatures in terms of protein domain frequencies.
DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.