< Back to previous page

Project

Efficient mining for unexpected patterns in complex biological data.

The last decade, life sciences have become increasingly overwhelmed and driven by large amounts of complex data. Thanks to disruptive new technologies, the speed at which the biomolecules (such as DNA, metabolites or proteins) of a living system can be analyzed, is already for several years increasing faster than the capacity of computer processors and hard drives. This trend means that "traditional techniques" to analyze and interpret biomolecular data become less suitable in the current era. Indeed, extracting relevant knowledge from these data relies on a range of dedicated "big data" techniques, falling under the terms "data mining" and "machine learning". This project addresses "pattern mining", a specific class of techniques that is very relevant for life science. Pattern mining allows for the discovery of previously unseen, interesting patterns in complex data. Traditionally, frequent pattern mining deals with finding the most frequent sets or "combinations" of items in a dataset. There are however major problems with such pattern lists, which we will address in this project. First, these pattern lists are often huge, and no domain expert is typically able to investigate and try to interpret every pattern in a pattern mining result list. Second, many of the patterns in such a list are not interesting for the domain expert, for example because they are trivial. In this project, we develop a generic formal and statistically sound framework to re-define pattern interestingness given the specific life science context. After definition of novel pattern mining interestingness criteria, we will develop efficient algorithms to mine such patterns. The algorithms will be validated on toy datasets and golden standard data. Finally we will put these methods into force to extract novel knowledge from large scale microbial gene expression compendia, a huge set of human genome sequences and drug-compound interaction networks, with the goals to generate fundamentally new biological or biomedical insights.
Date:1 Oct 2016 →  30 Sep 2020
Keywords:SYSTEMS BIOLOGY, PATTERN MINING, BIOINFORMATICS, DATA MINING
Disciplines:Scientific computing, Bioinformatics and computational biology, Public health care, Public health services