Item


Hierarchical clustering based on the information bottleneck method using a control process

Clustering techniques aim organizing data into groups whose members are similar. A key element of these techniques is the definition of a similarity measure. The information bottleneck method provides us a full solution of the clustering problem with no need to define a similarity measure, since a variable is clustered depending on a control variable by maximizing the mutual information between them. In this paper, we propose a hierarchical clustering algorithm based on the information bottleneck method such that, instead of using a control variable, the different possible values of a Markov process are clustered by maximally preserving the mutual information between two consecutive states of the Markov process. These two states can be seen as the input and the output of an information channel that is used as a control process, similarly to how the variable is used as a control variable in the original information bottleneck algorithm. We present both agglomerative and divisive versions of our hierarchical clustering approach and two different applications. The first one, to quantize an image by grouping intensity bins of the image histograms, is tested on synthetic, photographic and medical images and compared with hand-labelled images, hierarchical clustering using Euclidean distance and non-negative matrix factorization methods. The second one, to cluster brain regions by grouping them depending on their connectivity, is tested on medical data. In all the applications, the obtained results demonstrate the efficacy of the method in getting clusters with high mutual information.

This work was supported by the Spanish Government (Grant No. TIN2013-47276-C6-1-R) and by the Catalan Government (Grant No. 2014-SGR-1232)

Springer Verlag

Manager: Ministerio de Economía y Competitividad (Espanya)
Generalitat de Catalunya. Agència de Gestió d’Ajuts Universitaris i de Recerca
Author: Bonmatí Coll, Ester
Bardera i Reig, Antoni
Boada, Imma
Feixas Feixas, Miquel
Sbert, Mateu
Date: 2015 August 24
Abstract: Clustering techniques aim organizing data into groups whose members are similar. A key element of these techniques is the definition of a similarity measure. The information bottleneck method provides us a full solution of the clustering problem with no need to define a similarity measure, since a variable is clustered depending on a control variable by maximizing the mutual information between them. In this paper, we propose a hierarchical clustering algorithm based on the information bottleneck method such that, instead of using a control variable, the different possible values of a Markov process are clustered by maximally preserving the mutual information between two consecutive states of the Markov process. These two states can be seen as the input and the output of an information channel that is used as a control process, similarly to how the variable is used as a control variable in the original information bottleneck algorithm. We present both agglomerative and divisive versions of our hierarchical clustering approach and two different applications. The first one, to quantize an image by grouping intensity bins of the image histograms, is tested on synthetic, photographic and medical images and compared with hand-labelled images, hierarchical clustering using Euclidean distance and non-negative matrix factorization methods. The second one, to cluster brain regions by grouping them depending on their connectivity, is tested on medical data. In all the applications, the obtained results demonstrate the efficacy of the method in getting clusters with high mutual information.
This work was supported by the Spanish Government (Grant No. TIN2013-47276-C6-1-R) and by the Catalan Government (Grant No. 2014-SGR-1232)
Format: application/pdf
Document access: http://hdl.handle.net/10256/10949
Language: eng
Publisher: Springer Verlag
Collection: info:eu-repo/semantics/altIdentifier/doi/10.1007/s10044-015-0467-1
info:eu-repo/semantics/altIdentifier/issn/1433-7541
info:eu-repo/semantics/altIdentifier/eissn/1433-755X
info:eu-repo/grantAgreement/MINECO//TIN2013-47276-C6-1-R/ES/AVANCES EN CONTENIDOS DIGITALES PARA JUEGOS SERIOS/
AGAUR/2014-2016/2014 SGR-1232
Rights: Tots els drets reservats
Subject: Anàlisi de conglomerats
Cluster analysis
Algorismes computacionals
Imatges -- Processament
Imatges -- Segmentació
Computer algorithms
Image processing
Imaging segmentation
Title: Hierarchical clustering based on the information bottleneck method using a control process
Type: info:eu-repo/semantics/article
Repository: DUGiDocs

Subjects

Authors