Combined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL): A Robust Method for Selection of Cluster Number, K
Open Access
- 19 November 2015
- journal article
- research article
- Published by Springer Nature in Scientific Reports
- Vol. 5 (1), 16971
- https://doi.org/10.1038/srep16971
Abstract
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of ‘dark art’, with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.This publication has 25 references indexed in Scilit:
- Network Signatures of Survival in Glioblastoma MultiformePLoS Computational Biology, 2013
- ConsensusClusterPlus: a class discovery tool with confidence assessments and item trackingBioinformatics, 2010
- Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA, IDH1, EGFR, and NF1Cancer Cell, 2010
- clValid: AnRPackage for Cluster ValidationJournal of Statistical Software, 2008
- Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesisCancer Cell, 2006
- Computational cluster validation in post-genomic data analysisBioinformatics, 2005
- A new algorithm for hybrid hierarchical clustering with visualization and the bootstrapJournal of Statistical Planning and Inference, 2003
- Self-Organizing MapsPublished by Springer Nature ,2001
- On Clustering Validation TechniquesJournal of Intelligent Information Systems, 2001
- Silhouettes: A graphical aid to the interpretation and validation of cluster analysisJournal of Computational and Applied Mathematics, 1987