Ítem


Statistical inference for Hardy-Weinberg Equilibrium using Log-ratio Coordinates

Testing markers for Hardy-Weinberg equilibrium (HWE) is an important step in the analysis oflarge databases used in genetic association studies. Gross deviation from HWE can be indicative ofgenotyping error. There are many approaches to testing markers for HWE. The classical chi-squaretest was, till recently, the most widely used approach to HWE-testing. Over the last decade, thecomputationally more demanding exact test has become more popular. Bayesian approaches, wherethe full posterior distribution of a disequilibrium parameter is obtained, have also been developed.As far as CODA is concerned, Aitchison described how the HWE law can be “discovered” whena set of samples, all genotyped for the same marker, is analyzed by log-ratio principal componentanalysis. A well-known tool in CODA, the ternary plot, is known in genetics as a de Finettidiagram. The Hardy-Weinberg law defines a parabola in a ternary plot of the three genotypesfrequencies of a bi-allelic marker. Ternary plots of bi-allelic genetic markers typically show pointsthat “follow” the parabola, though with certain scatter that depends on the sample size. Whenrepresented in additive, centered or isometric log-ratio coordinates, the HW parabola becomes astraight line. Much of CODA is concerned with data sets where each individual row in the dataset (an individual, a sample, an object) constitutes a composition. In data sets comprising geneticmarkers, individual rows (persons) are not really compositions, but it is the total sample of allindividuals that constitutes a composition. The CODA approach to genetic data has shown usefulin supplying interesting graphics, but to date CODA seems not to have provided formal statisticalinference for HWE, probably because the distribution of the log-ratio coordinates is not known.Nevertheless, the log-ratio approach directly suggests some statistics that can be used for measuringdisequilibrium: the second clr and the second ilr coordinate of the sample. Similar statistics havebeen used in the genetics literature. In this contribution, we will use the multivariate delta methodto derive the asymptotic distribution of the isometric log-ratio coordinates. This allows hypothesistesting for HWE and the construction of confidence intervals for large samples that contain nozeros. The type 1 error rate of the test is compared with the classical chi-square test

Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada

Altres contribucions: Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada
Autor: Graffelman, Jan
Resum: Testing markers for Hardy-Weinberg equilibrium (HWE) is an important step in the analysis oflarge databases used in genetic association studies. Gross deviation from HWE can be indicative ofgenotyping error. There are many approaches to testing markers for HWE. The classical chi-squaretest was, till recently, the most widely used approach to HWE-testing. Over the last decade, thecomputationally more demanding exact test has become more popular. Bayesian approaches, wherethe full posterior distribution of a disequilibrium parameter is obtained, have also been developed.As far as CODA is concerned, Aitchison described how the HWE law can be “discovered” whena set of samples, all genotyped for the same marker, is analyzed by log-ratio principal componentanalysis. A well-known tool in CODA, the ternary plot, is known in genetics as a de Finettidiagram. The Hardy-Weinberg law defines a parabola in a ternary plot of the three genotypesfrequencies of a bi-allelic marker. Ternary plots of bi-allelic genetic markers typically show pointsthat “follow” the parabola, though with certain scatter that depends on the sample size. Whenrepresented in additive, centered or isometric log-ratio coordinates, the HW parabola becomes astraight line. Much of CODA is concerned with data sets where each individual row in the dataset (an individual, a sample, an object) constitutes a composition. In data sets comprising geneticmarkers, individual rows (persons) are not really compositions, but it is the total sample of allindividuals that constitutes a composition. The CODA approach to genetic data has shown usefulin supplying interesting graphics, but to date CODA seems not to have provided formal statisticalinference for HWE, probably because the distribution of the log-ratio coordinates is not known.Nevertheless, the log-ratio approach directly suggests some statistics that can be used for measuringdisequilibrium: the second clr and the second ilr coordinate of the sample. Similar statistics havebeen used in the genetics literature. In this contribution, we will use the multivariate delta methodto derive the asymptotic distribution of the isometric log-ratio coordinates. This allows hypothesistesting for HWE and the construction of confidence intervals for large samples that contain nozeros. The type 1 error rate of the test is compared with the classical chi-square test
Accés al document: http://hdl.handle.net/2072/273430
Llenguatge: eng
Editor: Universitat de Girona. Departament d’Informàtica i Matemàtica Aplicada
Drets: Tots els drets reservats
Matèria: Estadística matemàtica -- Congressos
Mathematical statistics -- Congresses
Títol: Statistical inference for Hardy-Weinberg Equilibrium using Log-ratio Coordinates
Tipus: info:eu-repo/semantics/conferenceObject
Repositori: Recercat

Matèries

Autors