Item


Clustering compositional data trajectories

Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices

Geologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Generalitat de Catalunya, Departament d’Innovació, Universitats i Recerca; Ministerio de Educación y Ciencia; Ingenio 2010.

Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada

Manager: Daunis i Estadella, Josep
Martín Fernández, Josep Antoni
Other contributions: Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada
Author: Bruno, Francesca
Greco, Fedele
Abstract: Our essay aims at studying suitable statistical methods for the clustering ofcompositional data in situations where observations are constituted by trajectories ofcompositional data, that is, by sequences of composition measurements along a domain.Observed trajectories are known as “functional data” and several methods have beenproposed for their analysis.In particular, methods for clustering functional data, known as Functional ClusterAnalysis (FCA), have been applied by practitioners and scientists in many fields. To ourknowledge, FCA techniques have not been extended to cope with the problem ofclustering compositional data trajectories. In order to extend FCA techniques to theanalysis of compositional data, FCA clustering techniques have to be adapted by using asuitable compositional algebra.The present work centres on the following question: given a sample of compositionaldata trajectories, how can we formulate a segmentation procedure giving homogeneousclasses? To address this problem we follow the steps described below.First of all we adapt the well-known spline smoothing techniques in order to cope withthe smoothing of compositional data trajectories. In fact, an observed curve can bethought of as the sum of a smooth part plus some noise due to measurement errors.Spline smoothing techniques are used to isolate the smooth part of the trajectory:clustering algorithms are then applied to these smooth curves.The second step consists in building suitable metrics for measuring the dissimilaritybetween trajectories: we propose a metric that accounts for difference in both shape andlevel, and a metric accounting for differences in shape only.A simulation study is performed in order to evaluate the proposed methodologies, usingboth hierarchical and partitional clustering algorithm. The quality of the obtained resultsis assessed by means of several indices
Geologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Generalitat de Catalunya, Departament d’Innovació, Universitats i Recerca; Ministerio de Educación y Ciencia; Ingenio 2010.
Document access: http://hdl.handle.net/2072/14761
Language: eng
Publisher: Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada
Rights: Tots els drets reservats
Subject: Anàlisi matemàtica
Simulació, Mètodes de
Anàlisi de conglomerats
Title: Clustering compositional data trajectories
Type: info:eu-repo/semantics/conferenceObject
Repository: Recercat

Subjects

Authors