< Back to previous page

Project

Efficient and Versatile Methods for Relational Machine Learning

The field of machine learning concerns computer algorithms that automatically improve their performance on a task through experience from data. Most common machine learning approaches expect data in the form of a table. Therefore, they cannot directly learn from more expressive data formats, such as relational databases, knowledge bases or logic programs, which are able to represent relations between complex instances. To learn from these expressive data formats, there are two obvious solutions. The first is to enable the use of the tabular machine learning toolbox by summarizing the relational dataset into a single table. The second is to use algorithms that can explicitly handle the more expressive data formats. Both of these solutions require constructing features by querying the relational data, which is computationally expensive. If this feature construction can be sped up by faster query engines or smarter feature exploration, learning from relational data becomes more practically viable.

Apart from the learning complexity, knowledge bases (KBs) bring with them two additional challenges to the learning task. First, KBs contain only true facts without any explicit negative information, as any information outside the KB is not necessarily false but simply unknown. Second, KBs are a biased selection of the ground  truth  due to the way they are constructed. For example, more popular Wikipedia articles have a tendency to be more complete.  When learning a model to complete a KB with new facts, one must be careful that such a model does not capture and propagate the observation biases in the KB.

Apart from differences in data format, predictive machine learning approaches also differ in the prediction tasks they support. Typically, tabular machine learning approaches require predefining one or more attributes to be predicted from all others. However, when it is not known in advance which attributes need to be predicted, it can be useful to have one interpretable multi-directional model that can predict any attribute given all others, instead of naively learning for every attribute a model that predicts it.

This dissertation has four contributions: three in the area of relational learning, and one in the area of multi-directional learning. The first contribution is an experimental comparison of different relational data representations and their corresponding query engines. The goal is to investigate which representation and query engine is most suitable for implementing algorithms explicitly dealing with relational data in terms of run time. The second contribution is the proposal of a new algorithm that guides feature construction using a relational learner, resulting in both a relational prediction model and feature table, while being faster than constructing features up front. The third contribution provides a way of dealing with observation biases in knowledge base completion by using ideas from the field of learning from positive and unlabeled data. The fourth contribution is the proposal of two new algorithms for the construction of multi-directional rule sets from multi-target rules, which allow to learn one rule model with the same predictive performance but a much smaller size when compared to a collection of single-target models.

Date:20 Sep 2017 →  14 Jun 2022
Keywords:machine learning
Disciplines:Applied mathematics in specific fields, Computer architecture and networks, Distributed computing, Information sciences, Information systems, Programming languages, Scientific computing, Theoretical computer science, Visual computing, Other information and computing sciences
Project type:PhD project