< Back to previous page

Dataset

Dutch Audio Description Corpus

The Dutch Audio Description corpus is the first corpus of its kind and includes the transcribed texts of 39 audio described Dutch films and TV series, in total 154,570 words and 3,074 minutes of video. This Dutch AD corpus was used to extract a series of quantitative data regarding the language of AD, namely frequency counts of parts of speech, words, lemmas, collocations and the calculation of other relevant text statistics such as reading speed, word and sentence length, text readability and type token ratios (a statistical measure reflecting lexical variety). The data registered here include the corpus files (XML-files) of the transcribed audio descriptions, the multimodal concordancer developed for the project and the raw data extracted from the corpus as part of the PHD project during which this corpus was developed.
Publication year:2017
Accessibility:closed
Publisher:-
License:No license
Format:avi, mpeg, xlsx, xml
Keywords: Linguistics