Poisson mixture distribution analysis for North Carolina SIDS counts using information criteria

Authors

  • Tyler Massaro Duke University Alzheimer's Disease Metabolomics Consortium (ADMC) and the Duke Clinical Research Institute (DCRI) https://orcid.org/0000-0001-9201-4589

DOI:

https://doi.org/10.2427/12550

Keywords:

Finite mixture model, Poisson distribution, model selection, overdispersion, count data

Abstract

 

Mixture distribution analysis provides us with a tool for identifying unlabeled clusters that naturally arise in a data set.  In this paper, we demonstrate how to use the information criteria AIC and BIC to choose the optimal number of clusters for a given set of univariate Poisson data.  We give an empirical comparison between minimum Hellinger distance (MHD) estimation and EM estimation for finding parameters in a mixture of Poisson distributions with artificial data.  In addition, we discuss Bayes error in the context of classification problems with mixture of 2, 3, 4, and 5 Poisson models.  Finally, we provide an example with real data, taken from a study that looked at sudden infant death syndrome (SIDS) count data from 100 North Carolina counties (Symons et al., 1983).  This gives us an opportunity to demonstrate the advantages of the proposed model framework in comparison with the original analysis.

Author Biography

Tyler Massaro, Duke University Alzheimer's Disease Metabolomics Consortium (ADMC) and the Duke Clinical Research Institute (DCRI)

Duke University Alzheimer's Disease Metabolomics Consortium (ADMC) and the Duke Clinical Research Institute (DCRI)
United States

Downloads

Published

2022-03-28

Issue

Section

Statistical Methods