Publicatie

Online suicide risk detection using automatic text classification

Boekbijdrage - Boekabstract Conferentiebijdrage

Identifying individuals at risk for suicide is a key prevention objective. With the growing importance of online networks, signals of suicidal ideation and intent are increasingly expressed on the web. Particular advantages of online communication are that it can offer anonymity and a sense of control. There is evidence that introverted individuals are strongly motivated to communicate online, which leads to more self-disclosure. Although suicidal expressions may be recognized and responded to by peers, this does not always happen in an appropriate or timely fashion. It is therefore preferable to also have prevention specialists monitor user-generated content, if this is not in conflict with users' preferences, safety and privacy concerns. The huge volume of online content prohibits manual monitoring, so automatic filtering approaches are required to prevent information overload. Previous research has focused on solutions based on keyword searches. Their efficacy is limited, since search queries can only cover a limited range of explicit suicidal expressions, they typically return many irrelevant hits, and are not robust to spelling errors. This paper presents a suicide prevention approach based on automatic text classification. In supervised text classification, a system assigns each text to one of a number of predefined categories. In our use case, a text is an online post, and there are three categories: alarming posts that should be reviewed, suicide-related posts that are harmless, and irrelevant posts (i.e. the majority class). A set of 300,000 forum and blog messages was collected and partly labeled by staff and volunteers at the Belgian Suicide Prevention Centre (CPZ). We used natural language processing to represent each text as a rich set of features that should allow suicidality prediction, including words, characters, significant terms derived from transcripts from the CPZ emergency chat hotline, topic models capturing semantically related words, and the emotional orientation of words (positive, negative or neutral) based on two external sentiment lexicons. Support Vector Machines (SVM), a supervised machine learning technique, was trained, evaluated and optimized on the corpus to allow classification of unseen material. In practice, the model will classify newly posted content based on the knowledge it induced from the annotated training corpus, and notify a prevention specialist in case of a possibly alarming post. The experimental results show that both suicide-related and alarming messages can be detected with high precision (80 to 90%). As a result, the amount of noise generated by the system is minimal: only 1 or 2 out of ten messages flagged by the system is irrelevant. In terms of recall, a measure for how many relevant messages were missed, suicide-related content is almost always detected (90%, i.e. missing one in ten). Recall for alarming posts is lower, at around 60%, a problem mainly attributable to implicit references to suicide, which often go undetected. The text classification approach outperforms a system based on keyword searching, most notably in terms of precision. In order to evaluate performance in a real-world prevention setting, we also tested the system on datasets of increasing size in which the class skew was augmented and thus the incidence of suicide-related material was decreased. Interestingly, we observe that the system also scores well on large datasets with high class skew, making it usable in big data automated prevention. The results are a first and promising indication that text classification is a viable approach to online message filtering for suicide prevention purposes. Additional positive training data (i.e. messages that can be considered suicide-related) should allow further performance improvements. The resulting system is currently being validated on the message boards of two Belgian LGBT organizations. Volunteer website moderators, overwhelmed with the above-average amount of suicide-related content, receive training from CPZ professionals on how to screen possibly alarming content and how best to respond. The results of this study will be presented at IASR.

Boek: International Summit on Suicide Research

Aantal pagina's: 1

Jaar van publicatie:2015

Handle: http://hdl.handle.net/1854/LU-7198984

Publicatie

Online suicide risk detection using automatic text classification

Boekbijdrage - Boekabstract Conferentiebijdrage

Auteurs/uitgever

Onderzoekseenheden