Taxy is a software for taxonomic profiling based on mixture modeling of the overall oligonucleotide distribution of a sample. Inferring the taxonomic composition of a microbial community from a large collection of anonymous DNA sequencing reads is a challenging task in computational biology. Because existing methods for taxonomic profiling of metagenomes are all based on the assignment of fragmental sequences to phylogenetic categories, the accuracy of results largely depends on fragment length. This dependency complicates comparative analysis of data originating from different sequencing platforms or preprocessing pipelines. We have developed a read length-independent method for taxonomic profiling and we provide a freely available Matlab/Octave toolbox which includes an ultra-fast implementation of that method. Besides the platform-independent toolbox we also provide a prototype tool implementation for Windows that allows the user to compare a large number of preprocessed metagenomes within a graphical environment.Our tests indicate that Taxy results compare well with taxonomic profiles obtained with other methods. However, in contrast to the existing methods, Taxy provides a nearly constant profiling accuracy across all kinds of read lengths and it operates at an unrivaled speed. As input, DNA sequences in terms of multi-FASTA files of any size can be used for the estimation of metagenomic profiles. The analysis of a large sequence file with a Gbp volume typically requires less than a minute of processing time and can even be performed on a standard notebook.
In contrast to the oligonucleotide-based Taxy method, Taxy-Pro is based on mixture model analysis of protein signatures in terms of protein domain frequencies.
GWproxy is a software for clustering metagenome short reads. The software incorporates biological knowledge in the clustering process, by means of a list of proteins associated to
each read. These proteins are chosen from a reference proteome database according to their similarity with the given read, as evaluated by BLAST.
Gianluigi Folino, Fabio Gori, Mike S. M. Jetten, and Elena Marchiori. Clustering metagenome short reads using weighted proteins.
In Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, 7th European Conference, EvoBIO 2009, Tübingen, Germany, April 15-17, 2009, Proceedings, volume 5483 of Lecture Notes in Computer Science, pages 152-163. Springer, 2009.
MLTreeMap is a tool that employs full maximum likelihood to give insights into phylogenetic and functional properties of metagenomes and the underlying microbial communities. It does so by detecting and phylotyping a series of relevant marker genes on the submitted DNA fragments. Among these genes are protein coding phylogenetic markers, SSU rRNA genes and markers for important functional pathways.
ShotgunFunctionalizeR is an R-package for functional comparison of metagenomes. The package contains tools for importing, annotating and visualising metagenomic data produced by shotgun high-throughput sequencing. ShotgunFunctionalizeR contains several statistical procedures for assessing functional differences between samples, both for individual genes and for entire pathways.
Erik Kristiansson (email@example.com) , Daniel Dalevi (firstname.lastname@example.org)
Treephyler is a tool for fast taxonomic profiling of metagenomes. It combines the predictive power of tree-based and speed of signature-based approaches. Treephyler was evaluated on a real metagenome to assess its performance in comparison to previous approaches for taxonomic profiling.