Improving Bayesian Mixture Models for Multiple Imputation of Missing Data Using Focused Clustering
DOI:
https://doi.org/10.57805/revstat.v16i2.239Keywords:
incomplete, nonparametric, nonresponse, survey, tensorAbstract
We present a joint modeling approach for multiple imputation of missing continuous and categorical variables using Bayesian mixture models. The approach extends the idea of focused clustering, in which one separates variables into two sets before estimating the mixture model. Focus variables include variables with high rates of missingness and possibly other variables that could help improve the quality of the imputations. Non-focus variables include the remainder. In this way, one can use a rich sub-model for the focus set and a simpler model for the non-focus set, thereby concentrating fitting power on the variables with the highest rates of missingness. We present a procedure for specifying which variables with low rates of missingness to include in the focus set. We examine the performance of the imputation procedure using simulation studies based on artificial data and on data from the American Community Survey.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 REVSTAT-Statistical Journal
This work is licensed under a Creative Commons Attribution 4.0 International License.