The aim of the paper is to discuss the association between SNP geno- type data and a disease. For genetic association studies, the statistical analyses with multiple markers have been shown to be more powerful, efficient, and biologically meaningful than single marker association tests. As the number of genetic markers considered is typically large, here we cluster them and then study the association between groups of markers and disease. We propose a two step procedure: first a Bayesian nonparametric cluster estimate under normalized generalized gamma process mixture models is introduced, so that we are able to incorporate the information from a large-scale SNP data with a much smaller number of explanatory variables. Then, thanks to the introduction of a genetic score, we study the association between the relevant disease response and groups of markers using a logit model. Inference is obtained via a MCMC truncation method recently introduced in the literature. We also provide a review of the state of art of Bayesian nonparametric cluster models and algorithms for the class of mixtures adopted here. Finally, the model is applied to genome-wide association study of Crohn’s disease in a case-control setting.

Modelling the association between clusters of SNPs and disease responses

GUGLIELMI, ALESSANDRA;
2015-01-01

Abstract

The aim of the paper is to discuss the association between SNP geno- type data and a disease. For genetic association studies, the statistical analyses with multiple markers have been shown to be more powerful, efficient, and biologically meaningful than single marker association tests. As the number of genetic markers considered is typically large, here we cluster them and then study the association between groups of markers and disease. We propose a two step procedure: first a Bayesian nonparametric cluster estimate under normalized generalized gamma process mixture models is introduced, so that we are able to incorporate the information from a large-scale SNP data with a much smaller number of explanatory variables. Then, thanks to the introduction of a genetic score, we study the association between the relevant disease response and groups of markers using a logit model. Inference is obtained via a MCMC truncation method recently introduced in the literature. We also provide a review of the state of art of Bayesian nonparametric cluster models and algorithms for the class of mixtures adopted here. Finally, the model is applied to genome-wide association study of Crohn’s disease in a case-control setting.
2015
Nonparametric Bayesian Methods in Biostatistics
9783319195179
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/959584
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact