Item


Robust Factor Analysis for Compositional Data

Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data

Geologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Generalitat de Catalunya, Departament d’Innovació, Universitats i Recerca; Ministerio de Educación y Ciencia; Ingenio 2010.

Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada

Manager: Daunis i Estadella, Josep
Martín Fernández, Josep Antoni
Other contributions: Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada
Author: Filzmoser, Peter
Hron, Karel
Reimann, Clemens
Garrett, Robert G.
Date: 2008 May 30
Abstract: Factor analysis as frequent technique for multivariate data inspection is widely used also for compositional data analysis. The usual way is to use a centered logratio (clr) transformation to obtain the random vector y of dimension D. The factor model is then y = Λf + e (1) with the factors f of dimension k < D, the error term e, and the loadings matrix Λ. Using the usual model assumptions (see, e.g., Basilevsky, 1994), the factor analysis model (1) can be written as Cov(y) = ΛΛT + ψ (2) where ψ = Cov(e) has a diagonal form. The diagonal elements of ψ as well as the loadings matrix Λ are estimated from an estimation of Cov(y). Given observed clr transformed data Y as realizations of the random vector y. Outliers or deviations from the idealized model assumptions of factor analysis can severely effect the parameter estimation. As a way out, robust estimation of the covariance matrix of Y will lead to robust estimates of Λ and ψ in (2), see Pison et al. (2003). Well known robust covariance estimators with good statistical properties, like the MCD or the S-estimators (see, e.g. Maronna et al., 2006), rely on a full-rank data matrix Y which is not the case for clr transformed data (see, e.g., Aitchison, 1986). The isometric logratio (ilr) transformation (Egozcue et al., 2003) solves this singularity problem. The data matrix Y is transformed to a matrix Z by using an orthonormal basis of lower dimension. Using the ilr transformed data, a robust covariance matrix C(Z) can be estimated. The result can be back-transformed to the clr space by C(Y ) = V C(Z)V T where the matrix V with orthonormal columns comes from the relation between the clr and the ilr transformation. Now the parameters in the model (2) can be estimated (Basilevsky, 1994) and the results have a direct interpretation since the links to the original variables are still preserved. The above procedure will be applied to data from geochemistry. Our special interest is on comparing the results with those of Reimann et al. (2002) for the Kola project data
Geologische Vereinigung; Institut d’Estadística de Catalunya; International Association for Mathematical Geology; Càtedra Lluís Santaló d’Aplicacions de la Matemàtica; Generalitat de Catalunya, Departament d’Innovació, Universitats i Recerca; Ministerio de Educación y Ciencia; Ingenio 2010.
Format: application/pdf
Citation: Filzmoser, P. et al. ’Robust Factor Analysis for Compositional Data’ CODAWORK’08. Girona: La Universitat, 2008 [consulta: 15 maig 2008]. Necessita Adobe Acrobat. Disponible a Internet a: http://hdl.handle.net/10256/752
Document access: http://hdl.handle.net/10256/752
Language: eng
Publisher: Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada
Rights: Tots els drets reservats
Subject: Anàlisi factorial
Estadística matemàtica
Title: Robust Factor Analysis for Compositional Data
Type: info:eu-repo/semantics/conferenceObject
Repository: DUGiDocs

Subjects

Authors