Project
Natiolectal variation in Dutch grammar: A data-driven approach
While Belgians and Dutchmen are well aware that they use different words, and that their pronunciation diverges, they are mostly oblivious to the fact that there are also grammatical discrepancies between Belgian and Netherlandic Dutch. Few Belgians, for instance, will realize that the preposition voor in Jan maakte (voor) haar een boterham 'John made (for) her a sandwich' is optional for them, whereas it is indispensable for almost all the Dutch.
How come there are such outspoken syntactic differences between two varieties (in a comparatively small language area) which did not begin to diverge before the 16th century? And where do these differences come from? In order to answer these questions, we draw on large subtitle and newspaper corpora, and marshal machine translation, machine learning, and automated semantic classification technologies to access the syntactic motor, or motors, of Dutch.