< Back to previous page

Publication

Missing value imputation with MERCS: a faster alternative to MissForest

Book Contribution - Book Chapter Conference Contribution

Fundamentally, many problems in Machine Learning are understood as some form of function approximation; given a dataset D, learn a function 푓:X→Y. However, this overlooks the ubiquitous problem of missing data. E.g., if afterwards an unseen instance has missing input variables, we actually need a function 푓:′푋′→Y with X′⊂X to predict its label. Strategies to deal with missing data come in three kinds: naive, probabilistic and iterative. The naive case replaces missing values with a fixed value (e.g. the mean), then uses 푓:X→Y as if nothing was ever missing. The probabilistic case has a generative model M of D and uses probabilistic inference to find the most likely value of Y, given values for any subset of X. The iterative approach consists of a loop: according to some model M, fill in all the missing values based on the given ones, retrain M on the completed data and redo your predictions, until these converge. MissForest is a well-known realization of this idea using Random Forests. In this work, we establish the connection between MissForest and MERCS (a multi-directional generalization of Random Forests). We go on to show that under certain (realistic) conditions where the retraining step in MissForest becomes a bottleneck, MERCS (which is trained only once) offers at-par predictive performance at a fraction of the time cost.
Book: Lecture Notes in Computer Science
Pages: 502 - 516
ISBN:978-3-030-61527-7
Publication year:2020
Accessibility:Open