< Terug naar vorige pagina

Publicatie

Optimizing the recognition lexicon for automatic speech recognition

Boek - Dissertatie

Ondertitel:Optimalisatie van het uitspraakwoordenboek in automatische spraakherkenning
Korte inhoud:A regular automatic speech recognizer works with a so-called recognition lexicon. This lexicon contains a list of words that can be recognized, along with at least one plausible pronunciation per word. Such pronunciations are modeled as sequences of expected sounds. The recognition lexicon defines the vocabulary of the speech recognizer. Words that are not included in the vocabulary, so-called Out Of Vocabulary (OOV) words, can never be recognized and consequently, they introduce recognition errors. The recognition lexicon also defines the expected pronunciations of words. If words are pronounced differently than expected, recognition errors will occur as well. In this dissertation methodologies to minimize the errors caused by OOV words and unexpected word pronunciations were devised and analyzed. The problem of OOV words was tackled in the context of Dutch continuous speech recognition (useful for e.g. the automatic subtitling of broadcast news shows). Two novel methods were conceived. In the first one, the recovery of OOV compound words is targeted. To that end, the core recognition engine recognizes ordinary words and compound constituents, while a post-processor is used to form compounds out of these. In the second method, the recovery of all OOV words is targeted. The core recognition engine generates a pronunciation for each OOV word zone, while a large background lexicon is employed to search for words that support these pronunciations. Both methods have led to a statistically significant gain in recognition accuracy over a state-of-the-art baseline speech recognizer. The problem of unexpected word pronunciations was tackled in the context of multilingual (proper) name recognition (useful for e.g. voice-driven GPS systems) where large amounts of pronunciation variation occur. Here, a novel method based on sound-to-sound conversion was conceived. The idea is to convert pronunciations emerging from a general-purpose letter-to-sound converter into more plausible pronunciations. To that end, the sound-to-sound converter was designed to perform certain domain-specific conversions in particular linguistic situations (contexts). The devised method yields a statistically significant gain in recognition accuracy over a state-of-the-art speech recognizer already including multiple pronunciations per name and acoustic models that were trained to cope with multilingual speech data.
ISBN:9789085785712
Jaar van publicatie:2013
Toegankelijkheid:Closed