< Back to previous page

Publication

Methoden met meer niveaus voor multirelationele factorisatie met eigenschappen

Book - Dissertation

Machine Learning methods are increasingly important in society and industry. The amount of data available for these Machine Learning applications is growing exponentially. This excessive amount of data still has to be processed efficiently. Designing robust and scalable algorithms for these large-scale data sets becomes increasingly important. Matrix factorization of an incompletely filled matrix is one of these applications which has been successfully applied for large-scale recommender systems. Matrix factorization has been generalized to Bayesian multirelational factorization with features and was specifically developed to handle large-scale data and incorporate side information to aid in the factorization. The designed algorithm is based on Bayesian Markov Chain Monte Carlo and, more specifically, Gibbs sampling. The samples in the Markov Chain are taken by solving linear systems using a Krylov subspace method and thus preserving sparsity of the data. The method still suffers from two bottlenecks, namely (1) the speed and number of iterations of the burn-in and (2) the number of iterations of the iterative solver. Inspiration to tackle these bottlenecks is found in the multilevel methods for Partial Differential Equations. Here, they combine solutions from different levels. Each level approximates the same solution, but there is a difference in the accuracy of these approximations. The coarser solutions are less accurate than the finer solutions but are computationally cheaper to compute than the finer levels. Using the solutions from different levels, the overall computation time can be reduced without losing any of the accuracy of the approximated solution. In this thesis, we will investigate the use of multilevel methods for a Gibbs sampler developed for Bayesian regression. This Gibbs sampler is closely related to the Gibbs sampler in Bayesian multirelational factorization with features and has the same underlying computational concepts. A hierarchy of data matrices is created by clustering the features and/or samples of the data set. The coarser data levels contain most of the data variance and the solutions are cheaper to compute. This hierarchy is then exploited to create a two-level preconditioner to speed-up the iterative solver. The same level hierarchy is additionally used to develop multilevel Gibbs samplers which sample on different levels.
Publication year:2021
Accessibility:Open