ClusterEnG – Interactive Education in Clustering

ClusterEnG

:: DESCRIPTION

ClusterEnG (acronym for Clustering Engine for Genomics) is an educational web resource on clustering and visualization of high-dimensional datasets. The resource currently offers visualization of PCA, t-SNE vectors of input dataset for several clustering algorithms. Furthermore, the user can also explore eighteen internal clustering validation measures to compare different clustering results.

::DEVELOPER

Jun S. Song’s Research Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Web browser

:: DOWNLOAD

ClusterEnG

:: MORE INFORMATION

Citation

Manjunath M, Zhang Y, Yeo SH, Sobh O, Russell N, Followell C, Bushell C, Ravaioli U, Song JS.
ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data.
PeerJ Comput Sci. 2018;4:e155. doi: 10.7717/peerj-cs.155. Epub 2018 May 21. PMID: 30906871; PMCID: PMC6429934.

HTSCluster 2.0.8 – Clustering High Throughput Sequencing (HTS) data

HTSCluster 2.0.8

:: DESCRIPTION

HTSCluster implements a Poisson mixture model to cluster observations (e.g., genes) in high throughput sequencing data. Parameter estimation is performed using either the EM or CEM algorithm, and the slope heuristics are used for model selection (i.e., to choose the number of clusters).

::DEVELOPER

Andrea Rau <andrea.rau at jouy.inra.fr>

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux/ Windows/ MacOsX
R

:: DOWNLOAD

HTSCluster

:: MORE INFORMATION

Citation:

Co-expression analysis of high-throughput transcriptome sequencing data with Poisson mixture models.
Rau A, Maugis-Rabusseau C, Martin-Magniette ML, Celeux G.
Bioinformatics. 2015 Jan 5. pii: btu845.

ELaSTIC 1.90 – Rapid Identification and Clustering of Similar Sequences

ELaSTIC 1.90

:: DESCRIPTION

ELaSTIC is a software suite for a rapid identification and clustering of similar sequences from large-scale biological sequence collections. At its core is an efficient MinHash-based strategy to detect similar sequence pairs without aligning all sequences against each other.

::DEVELOPER

Jaroslaw Zola

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

ELaSTIC

:: MORE INFORMATION

Citation

J. Zola,
“Constructing Similarity Graphs from Large-scale Biological Sequence Collections”,
In Proc. IEEE International Workshop on High Performance Computational Biology (HiCOMB), 2014.

ClustVis – Visualizing Clustering of Multivariate data

ClustVis

:: DESCRIPTION

ClustVis allows users to upload their own data and easily create Principal Component Analysis (PCA) plots and heatmaps.

::DEVELOPER

Bioinformatics, Algorithmics and Data Mining Group (BIIT)

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Web browser

:: DOWNLOAD

:: MORE INFORMATION

Citation

Metsalu, Tauno and Vilo, Jaak.
Clustvis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap.
Nucleic Acids Research, 43(W1):W566–W570, 2015. doi: 10.1093/nar/gkv468.

BCCA / ACCA / DCCA – Bi-Correlation / Average Correlation / Divisive Correlation Clustering Algorithm

BCCA / ACCA / DCCA

:: DESCRIPTION

Software of Bi-Correlation Clustering Algorithm (BCCA)
Software of Average Correlation Clustering Algorithm (ACCA)
Software of Divisive Correlation Clustering Algorithm (DCCA)

::DEVELOPER

Dr. Rajat Kumar De

:: SCREENSHOTS

:: REQUIREMENTS

Windows

:: DOWNLOAD

BCCA / ACCA / DCCA

:: MORE INFORMATION

Citation:

Bioinformatics. 2009 Nov 1;25(21):2795-801. doi: 10.1093/bioinformatics/btp526. Epub 2009 Sep 3.
Bi-correlation clustering algorithm for determining a set of co-regulated genes.
Bhattacharya A1, De RK.

J Biomed Inform. 2010 Aug;43(4):560-8. doi: 10.1016/j.jbi.2010.02.001. Epub 2010 Feb 6.
Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values.
Bhattacharya A1, De RK.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.
Bhattacharya A, De RK.
Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133

Hetero-RP – Enhanced Clustering and Classification in Integrative Genomics

Hetero-RP

:: DESCRIPTION

Hetero-RP (Heterogeneity Rescaling Pursuit) is a scalable and tuning-free preprocessing framework, which weighs important features more highly than less important ones in accord with implicitly existing auxiliary knowledge.

::DEVELOPER

Fengzhu Sun

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Windows / MacOsX / Linux
Python

:: DOWNLOAD

Hetero-RP

:: MORE INFORMATION

Citation

Lu YY, Lv J, Fuhrman JA, Sun F.
Towards enhanced and interpretable clustering/classification in integrative genomics.
Nucleic Acids Res. 2017 Nov 16;45(20):e169. doi: 10.1093/nar/gkx767. PMID: 28977511; PMCID: PMC5714251.

C3 – Correlation Clustering method for Cancer Mutation analysis

C3

:: DESCRIPTION

C3 (Cancer Correlation Clustering) identifies cancer mutation patterns from patient cohort by leveraging mutual exclusivity of mutations, patient coverage and driver network concentration principles.

::DEVELOPER

Ma Laboratory

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux / Windows / MacOsX
Python

:: DOWNLOAD

:: MORE INFORMATION

Citation

A new correlation clustering method for cancer mutation analysis.
Hou JP, Emad A, Puleo GJ, Ma J, Milenkovic O.
Bioinformatics. 2016 Dec 15;32(24):3717-3728.

PFClust – Novel Parameter Free Clustering algorithm

PFClust

:: DESCRIPTION

PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.

::DEVELOPER

John Mitchell Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux

:: DOWNLOAD

PFClust

:: MORE INFORMATION

Citation:

PFClust: an optimised implementation of a parameter-free clustering algorithm.
Musayeva K, Henderson T, Mitchell JB, Mavridis L.
Source Code Biol Med. 2014 Feb 4;9(1):5. doi: 10.1186/1751-0473-9-5.

FlowerPower – Clustering algorithm for Identification of Global Homologs

FlowerPower

:: DESCRIPTION

FlowerPower is a clustering algorithm designed for the identification of global homologs. It employs an iterative approach to clustering sequences. However, rather than using a single HMM or profile to expand the cluster, FlowerPower identifies subfamilies using the SCI-PHY algorithm and then selects and aligns new homologs using subfamily hidden Markov models. FlowerPower is shown to outperform BLAST, PSI-BLAST and the UCSC SAM-Target 2K methods at discrimination between proteins in the same domain architecture class and those having different overall domain structures.

::DEVELOPER

the Berkeley Phylogenomics Group

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux
Python

:: DOWNLOAD

FlowerPower

:: MORE INFORMATION

Citation

BMC Evol Biol. 2007 Feb 8;7 Suppl 1:S12.
FlowerPower: clustering proteins into domain architecture classes for phylogenomic inference of protein function.
Krishnamurthy N, Brown D, Sjölander K.

TimeClust 1.3 – Clustering tool for Gene Expression Time Series

TimeClust 1.3

:: DESCRIPTION

TimeClust is a user-friendly software package to cluster genes according to their temporal expression profiles. It can be conveniently used to analyze data obtained from DNA microarray time-course experiments. It implements two original algorithms specifically designed for clustering short time series together with hierarchical clustering and self-organizing maps.

::DEVELOPER

laboratorio di Bioinformatica e Biologia Sintetica – Univ. of Pavia

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux / Windows

:: DOWNLOAD

TimeClust

:: MORE INFORMATION

Citation

Bioinformatics. 2008 Feb 1;24(3):430-2. Epub 2007 Dec 6.
TimeClust: a clustering tool for gene expression time series.
Magni P, Ferrazzi F, Sacchi L, Bellazzi R.