PU learning from relational data
An algorithm, like a human, is said to learn if its performance on a task improves as it gains more experience. For example, consider the task of labeling an abnormality on a mammogram as benign or malignant, where experience consists of a radiologist’s description of an abnormality and its known label. Traditional machine learning approaches for binary classification require having access to fully labeled data, containing positive and negative examples. In the illustrative mammography example, this would entail having data with both benign and malignant abnormalities. However, in many significant applications, a learner would only have access to positive examples and large set of unlabeled examples. This project tackles this setting, which is known as learning from positive and unlabeled (PU) data.
The goal of the project is to address two substantial short-comings to existing work on PU learning that prevent these techniques from being more broadly to a wide range of important applications. Namely, the projects (1) focuses on the far less studied problem of considering PU data in a relational setting, and (2) considers data where the observed positive examples were not selected completely at random.