< Terug naar vorige pagina

Publicatie

De-identification of clinical free text in Dutch with limited training data

Boekbijdrage - Boekabstract Conferentiebijdrage

Ondertitel:a case study
In order to analyse the information present in medical records while maintaining patient privacy, there is a basic need for techniques to automatically de-identify the free text information in these records. This paper presents a machine learning deidentification system for clinical free text in Dutch, relying on best practices from the state of the art in de-identification of English-language texts. We combine string and pattern matching features with machine learning algorithms and compare performance of three different experimental setups using Support Vector Machines and Random Forests on a limited data set of one hundred manually obfuscated texts provided by Antwerp University Hospital (UZA). The setup with the best balance in precision and recall during development was tested on an unseen set of raw clinical texts and evaluated manually at the hospital site.
Boek: Workshop on NLP for Medicine and Biology
Pagina's: 18 - 23
Jaar van publicatie:2013
Trefwoorden:P3 Proceeding
Toegankelijkheid:Closed