CAST 1.2.2 – Compression-accelerated BLAST and BLAT

CAST 1.2.2

:: DESCRIPTION

CAST is a set of tools that compress data in a way that allows direct computation on the compressed data. Compression-accelerated BLAST (CaBLAST) and Compression-accelerated BLAT (CaBLAT) are two prototype implementations of alignment and sequence search algorithms that apply “compressive genomics” : i.e., they exploit redundancy in genomic data sets by compressing data in a way that allows direct computation on the compressed data.

::DEVELOPER

Berger Lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux /  MacOsX
  • C++ Compiler
  • NCBI C++ Toolkit
  • BLAST+
  • BLAT

:: DOWNLOAD

  CAST

:: MORE INFORMATION

Citation:

Nat Biotechnol. 2012 Jul 10;30(7):627-30. doi: 10.1038/nbt.2241.
Compressive genomics.
Loh PR, Baym M, Berger B.

BARCODE – A fast Lossless Read Compression tool based on Bloom Filters

BARCODE

:: DESCRIPTION

BARCODE achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress.

::DEVELOPER

Ron Shamir’s lab

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Windows/Linux/ MacOsX
  • Python

:: DOWNLOAD

  BARCODE

:: MORE INFORMATION

Citation

BMC Bioinformatics. 2014;15 Suppl 9:S7. doi: 10.1186/1471-2105-15-S9-S7. Epub 2014 Sep 10.
Fast lossless compression via cascading Bloom filters.
Rozov R, Shamir R, Halperin E.

HapZipper – Compression Scheme for HapMap Phase III Phased Data

HapZipper

:: DESCRIPTION

HapZipper is a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip, bzip2 and lzma.

::DEVELOPER

Joel Bader lab

:: SCREENSHOTS

N/A

::REQUIREMENTS

  • Linux / Windows
  • JRE

:: DOWNLOAD

 HapZipper

:: MORE INFORMATION

Citation

HapZipper: sharing HapMap populations just got easier.
Chanda P, Elhaik E, Bader JS.
Nucleic Acids Res. 2012 Nov 1;40(20):e159. doi: 10.1093/nar/gks709.

MINCE v0.5.0 ‐ Bucketing-based Reference-free Compression

MINCE v0.5.0

:: DESCRIPTION

MINCE is a technique for encoding collections of short reads so that they can be more effectively compressed via a standard compressor like LZIP.

::DEVELOPER

Kingsford Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOs

:: DOWNLOAD

MINCE

:: MORE INFORMATION

Citation

Bioinformatics. 2015 Sep 1;31(17):2770-7. doi: 10.1093/bioinformatics/btv248. Epub 2015 Apr 24.
Data-dependent bucketing improves reference-free compression of sequencing reads.
Patro R, Kingsford C.

Referee – Rapid, Separable Compression for Sequence Alignments

Referee

:: DESCRIPTION

Referee is a command-line tool that takes sequence alignment SAM files and compresses them in a lossless manner.

::DEVELOPER

Kingsford Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOs

:: DOWNLOAD

Referee

:: MORE INFORMATION

Citation

Darya Filippova, Carl Kingsford (2015).
Rapid, separable compression enables fast analyses of sequence alignments.
Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, pages 194-201.

Kpath 0.6.3 – Statistical Reference-based Compression for Short Reads

Kpath 0.6.3

:: DESCRIPTION

Kpath (PathEnc) is a reference-based compression software of short read data sets.

::DEVELOPER

Kingsford Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux/ MacOsX
  • Go

:: DOWNLOAD

 Kpath

:: MORE INFORMATION

Citation

Reference-based compression of short-read sequences using path encoding.
Kingsford C, Patro R.
Bioinformatics. 2015 Feb 2. pii: btv071.

CODOC 0.0.2 – Analysis and Compression of Depth of Coverage Signals

CODOC 0.0.2

:: DESCRIPTION

CODOC is a compressed data format and API for coverage data stemming from sequencing experiments

::DEVELOPER

Niko Popitsch

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux/ Windows/ MacOsX
  • Java

:: DOWNLOAD

 CODOC

:: MORE INFORMATION

Citation

Bioinformatics. 2014 May 28. pii: btu362. [Epub ahead of print]
CODOC: Efficient Access, Analysis and Compression of Depth of Coverage Signals.
Popitsch N.

DSRC 2.0 RC2 – DNA Sequence Reads Compression

DSRC 2.0 RC2

:: DESCRIPTION

DSRC is an application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Usually universal compression programs like gzip or bzip2 are used for this purpose, but it is obvious that a specialized tool can work better.

::DEVELOPER

REFRESH Bioinformatics Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux / MacOSX / Windows
  • C++ Compiler

:: DOWNLOAD

 DSRC

:: MORE INFORMATION

Citation

Bioinformatics. 2014 Apr 18. pii: btu208.
DSRC 2-Industry-oriented compression of FASTQ files.
Roguski L, Deorowicz S.

S. Deorowicz and Sz. Grabowski,
Compression of DNA sequence reads in FASTQ format,
Bioinformatics (2011) 27(6):860–862.

Quip 1.1.8 – Aggressive Compression of FASTQ and SAM/BAM files

Quip 1.1.8

:: DESCRIPTION

Quip compresses next-generation sequencing data in the FASTQ and SAM/BAM formats with extreme prejudice.

::DEVELOPER

Daniel C. Jones

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux

:: DOWNLOAD

 Quip

:: MORE INFORMATION

Citation

Nucleic Acids Res. 2012 Dec;40(22):e171. doi: 10.1093/nar/gks754. Epub 2012 Aug 16.
Compression of next-generation sequencing reads aided by highly efficient de novo assembly.
Jones DC, Ruzzo WL, Peng X, Katze MG.

GRS 1.0 – Compression tool for efficient Storage of Genome Re-Sequencing data

GRS 1.0

:: DESCRIPTION

GRS is a novel compression tool for storing and analyzing Genome ReSequencing data. GRS is able to process the genome sequence data without the use of the reference SNPs and other sequence variation information and automatically rebuild the individual genome sequence data using the reference genome sequence.

:: DEVELOPER

Congmao Wang, Shanghai Jiao Tong University plant developmental biology laboratory.

:: SCREENSHOTS

N/A

:: REQUIREMENTS

  • Linux

:: DOWNLOAD

 GRS

:: MORE INFORMATION

Citation:

Nucleic Acids Res. 2011 Apr;39(7):e45. doi: 10.1093/nar/gkr009. Epub 2011 Jan 25.
A novel compression tool for efficient storage of genome resequencing data.
Wang C, Zhang D.