CAST is a set of tools that compress data in a way that allows direct computation on the compressed data. Compression-accelerated BLAST (CaBLAST) and Compression-accelerated BLAT (CaBLAT) are two prototype implementations of alignment and sequence search algorithms that apply “compressive genomics” : i.e., they exploit redundancy in genomic data sets by compressing data in a way that allows direct computation on the compressed data.
CAST is a novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith-Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity.