VICUNA is a de novo assembly program targeting populations with high mutation rates. It creates a single linear representation of the mixed population on which intra-host variants can be mapped. For clinical samples rich in contamination (e.g., >95%), VICUNA can leverage existing genomes, if available, to assemble only target-alike reads. After initial assembly, it can also use existing genomes to perform guided merging of contigs. For each data set (e.g., Illumina paired read, 454), VICUNA outputs consensus sequence(s) and the corresponding multiple sequence alignment of constituent reads.
Xiao Yang, Patrick Charlebois, Sante Gnerre, Matthew G Coole, Niall J. Lennon, Joshua Z. Levin, James Qu, Elizabeth M. Ryan, Michael C. Zody, and Matthew R. Henn (2012) De novo assembly of highly diverse viral populations.
BMC Genomics 13:475.
AV454 (AssembleViral454) is an assembler, based on the ARACHNE package, designed for small and non-repetitive genomes sequenced at high depth. It was specifically designed to assemble read data generated from a mixed population of viral genomes. Reads need not be paired, and it is assumed that no sequence repeat in the genome would be large enough to fully contain an average read.
QSRA (Quality-value-guided Short Read Assembler) is a quality-value guided de novo short read assembler. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.
Tag-DB provides a user-friendly, lightweight and open-source graphical user interface for running the de novo sequencing algorithm PepNovo+ and also holds the possibiliy to search derived short amino acid sequences (so-called tags) against a protein database in order to retrieve peptide and protein identifications.
PASQUAL (PArallel SeQUence AssembLer) is designed for shared memory parallelism, using OpenMP due to its good tradeoff between performance and programmer productivity. Shared memory parallelism has become mainstream with the widespread production of multicore commodity processors. For PASQUAL we follow the OLC approach and use a careful combination of tailored algorithms and data structures to obtain high-quality solutions.
Xing Liu, Pushkar R. Pande, Henning Meyerhenke, and David A. Bader.
PASQUAL: A Parallel de novo Assembler for Next Generation Genome Sequencing.
Submitted for journal publication, 2011.
GS De Novo Assembler (Newbler) is a software package for de novo DNA sequence assembly. It is designed specifically for assembling sequence data generated by the 454 GS-series of pyrosequencing platforms sold by 454 Life Science, a Roche diagnostic.
SHORTY is targetted for de novo assembly of microreads with mate pair information and sequencing errors. SHORTY has some novel approach and features in addressing the short read assembly problem.
Lutefisk is software for the de novo interpretation of peptide CID spectra.High quality tandem mass spectra of peptides are often obtained for which no exact database match can be made. Consequently, we are faced with the question of whether the protein under investigation is novel, or if the non-matching spectra are due to less exciting prospects such as inter-species variation, database sequence errors, or unexpected proteolytic cleavages. To begin addressing this problem we perform a de novointerpretation of the CID spectra using the computer program Lutefisk; however, any such interpretations nearly always yield multiple sequence candidates, where it is often difficult or impossible to distinguish the correct sequence from the incorrect ones. The variations between candidate sequences are often minor and typically involve dipeptide inversions, swapping of dipeptides of the same mass, replacements of dipeptides with single amino acids of the same mass, and replacements of amino acids by dipeptides of the same mass. We use the multiple sequence candidates produced by Lutefisk as query sequences in a second program, CIDentify. CIDentify is a version of Bill Pearson’s FASTA algorithm modified by Alex Taylor to accommodate MS nuances such as multiple query sequences, ambiguous dipeptides and isobaric mass equivalencies.