< Back to previous page


Developing data mining methods for bioinformatics.

This project is situated in the intersection of data mining and bioinformatics. We want to use our expertise in (relational) data mining to develop learning methods that deal with open questions in bioinformatics. Three important research questions that we have identified at this point are (1) categorizing biomedical articles, (2) prediction gene functions for groups of orthologous genes, and (3) predicting gene expression levels. For the first subject, we want to investigate the performance of our decision tree induction algorithm for hierarchical multilabel classificiation (HMC) on textual data. Text classification is an instance of HMC, and an important task in bioinformatics. A positive result would mean that our algorithm is a generally usable HMC system. The other research questions result from collaborations with bioinformatics research groups that have a concrete interest to use our expertise in (relational) data mining for biological problems. Each problem has its challenges from a data mining point of view. For instance, when predicting the functions of orthologous genes, there is both a hierarchy in the target space, as well as a hierarchy in the attribute space. Predicting gene expression levels is a difficult task because there are many missing values and the data is very skewed and not independent.
Date:1 Oct 2009 →  30 Sep 2013
Keywords:Data mining, Bioinformatics, Machine learning, Functional genomics, Text classification
Disciplines:Scientific computing, Bioinformatics and computational biology, Public health care, Public health services, Artificial intelligence, Cognitive science and intelligent systems