Project
Principled Machine Learning for longitudinal biomedical data with repeated measures
Advanced computational methods using machine learning could become an indispensable tool for medical practitioners in a multitude of clinical settings like prognostic modeling and patient risk stratification. However, most Machine Learning techniques were not designed to account for the complexities stemming from the longitudinal and otherwise hierarchical nature of most health datasets. Not only do correlations between repeated measurements within patients violate the fundamental assumption of most algorithms that all observations are independent and identically distributed, but most methods are also incapable of handling missing values which arise in most longitudinal datasets due to irregular follow-up times or patient dropout. Imputing these values using Machine Learning without accounting for the mechanism by which those missing values came to be could lead to biased predictions or inferences. The black-box nature of these methods also makes it challenging to not only explain certain predictions about individual patients, but also draw rigorous inferences on the impact of specific variables and disentangle whether observed differences are statistically significant or due to chance.
The main objectives of this project are developing Machine Learning methods for working with longitudinal datasets which are grounded in sound statistical theory and leveraging their ability to model complex non-linear relationships in a principled manner. Special attention will be given to clinical applications in Ophthalmology.