Publicatie

Creating a richly annotated corpus of papyrological Greek: the possibilities of Natural Language Processing approaches to a highly inflected historical language

Tijdschriftbijdrage - e-publicatie

This article describes a first attempt to annotate the full Greek papyrus corpus automatically for linguistic information. It gives an overview of existing work on Ancient Greek and analyzes the typical problems one encounters when using natural language processing techniques on (1) a historical corpus of (2) a highly inflectional language (as opposed to the more analytic present-day English) and offers solutions to them, testing several different approaches. The focus is on part-of-speech/morphological tagging and lemmatization; some syntactic parsing experiments are also briefly discussed. The conclusion discusses the strengths and shortcomings of the examined techniques and suggests possible ways to further improve tagging and parsing accuracy.

Tijdschrift: Digital Scholarship in the Humanities

ISSN: 2055-7671

Issue: 1

Volume: 35

Pagina's: 1 - 16

Jaar van publicatie:2020

VABB Id: c:vabb:481757
Institutional Repository URL: https://lirias.kuleuven.be/2369903
DOI: https://doi.org/10.1093/llc/fqz004
WoS Id: 000558978500006

BOF-keylabel:ja

IOF-keylabel:ja

BOF-publication weight:1

CSS-citation score:1

Authors from:Higher Education

Toegankelijkheid:Closed

Publicatie

Creating a richly annotated corpus of papyrological Greek: the possibilities of Natural Language Processing approaches to a highly inflected historical language

Tijdschriftbijdrage - e-publicatie

Auteurs/uitgever

Onderzoekseenheden