< Terug naar vorige pagina

Publicatie

A Deep Generative Approach to Native Language Identification

Boekbijdrage - Boekabstract Conferentiebijdrage

Native language identification (NLI) – identifying the native language (L1) of a person based on his/her writing in the second language (L2) – is useful for a variety of purposes, including marketing, security, and educational applications. From a traditional machine learning perspective,NLI is usually framed as a multi-class classification task, where numerous designed features are combined in order to achieve the state-of-the-art results. We introduce a deep generative language modelling (LM) approach to NLI, which consists in fine-tuning a GPT-2 model separately on texts written by the authors with the same L1, and assigning a label to an unseen text based on the minimum LM loss with respect to one of these fine-tuned GPT-2 models. Our method outperforms traditional machine learning approaches and currently achieves the best results on the benchmark NLI datasets.
Boek: Proceedings of the 28th International Conference on Computational Linguistics
Pagina's: 1778 - 1783
Jaar van publicatie:2020
Trefwoorden:P1 Proceeding
Toegankelijkheid:Closed