Project
NeLF: Next Level Flemish Speech Recognition
NeLF aims to provide automatic speech recognition technology that
does not require building costly corpora containing large amounts of
manually transcribed speech. Instead, we want to exploit low-cost
unlabeled or weakly labeled speech data in a self-training and
unsupervised training setting. Such a setup, which relies on technical
expertise and smart algorithms instead of on large and costly annotation
programs, is expected to be a good solution for the Flemish market, a
market which is diverse (dialects, non-native speakers), is relatively small
(6 million people), and has a multitude of use-cases spread amongst the
various industries. By leveraging the technological know-how available in
Flanders, combined with a joint effort to make speech data available for
the development of speech technology, tailor-made speech solutions
can be provided at reasonable cost, a cost that medium and small
companies can bear. Our research results will be validated on other
languages as well and should be readily applicable to other (European)
countries with similarly diverse language variation and market (e.g.
Switzerland, France, Italy, Poland, ...).
The project outcomes include (1) open source tools and publications
describing the underlying technology, (2) a public repository containing
the collected data (speech, annotations, and pseudo-annotations, with a
focus on the more challenging speech data such as spontaneous speech,
dialects, and speech form non-natives) that can be made publicly
available, (3) a private repository containing the data which is only
available for research by trusted parties, (4) a web-service to allow
citizen and companies to donate additional speech for either the public
or private repository, (5) models for open source speech recognition
toolkits which are made available to (local) industry, and (6) webservices built on top of those models and toolkits to provide easy access
to a baseline automatic speech transcription setup.
- See also: Next Level Flemish Speech Recognition