Project

Feature Extraction and Machine Learning Techniques for Various Text Mining Tasks

This thesis explores the implementation, application and evaluation of feature extraction and machine learning techniques for various text mining tasks operating on World Wide Web documents. All techniques require the availability of text corpora, which for this thesis are written in English, Dutch and French. Association measures are used to extract domain-specific terminology, or to detect visual entities and their visual attributes in texts. In sentiment analysis the opinion of the writer towards a topic is classified as being either positive, negative or neutral (objective). We improve on the standard machine learning scheme by using features specific for (linguistic) problems and by using the learners in a cascaded architecture. The problem of insufficient training data is tackled by the use of active learning techniques. Finally we recognize therhetorical discourse roles that hold between text segments, which is also seen as a text classification task.

Date:29 Sep 2008 → 15 Mar 2010

Keywords:Machine learning, Feature extraction

Disciplines:Applied mathematics in specific fields

Project type:PhD project

Project

Feature Extraction and Machine Learning Techniques for Various Text Mining Tasks

Researchers

Project partners

Funding