< Back to previous page

Publication

Denoised Kernel Spectral Data Clustering

Book Contribution - Book Chapter Conference Contribution

© 2016 IEEE. Kernel Spectral Clustering (KSC) solves a weighted kernel principal component analysis problem in a primal-dual optimization framework. It builds an unsupervised model on a small subset of data using the dual solution of the optimization problem. This allows KSC to have a powerful out-of-sample extension property leading to good cluster generalization w.r.t. unseen data points. However, in the presence of noise that causes overlapping data, the technique often fails to provide good generalization capability. In this paper, we propose a two-step process for clustering noisy data. We first denoise the data using kernel principal component analysis (KPCA) with a recently proposed Model selection criterion based on point-wise Distance Distributions (MDD) to obtain the underlying information in the data. We then use the KSC technique on this denoised data to obtain good quality clusters. One advantage of model based techniques is that we can use the same training and validation set for denoising and for clustering. We discovered that using the same kernel bandwidth parameter obtained from MDD for KPCA works efficiently with KSC in combination with the optimal number of clusters k to produce good quality clusters. We compare the proposed approach with normal KSC and KSC with KPCA using a heuristic method based on reconstruction error for several synthetic and real-world datasets to showcase the effectiveness of the proposed approach.
Book: Proc. of the International Joint Conference on Neural Networks
Pages: 3709 - 3716
ISBN:9781509006199
Publication year:2016
BOF-keylabel:yes
IOF-keylabel:yes
Authors from:Higher Education
Accessibility:Closed