Intel’s first set of People’s Speech targets automatic speech recognition tasks. The second – Multilingual Spoken Words Corpus (MSWC) – for identifying keywords.
Work on datasets began in 2018. The goal was to combine more than 50 of the most spoken languages in the world in a dataset. This will be useful, for example, in creating systems for automatic translation.
The peculiarity of datasets is that they do not contain “theatrical” audio recordings of speech, but recordings made in a natural environment. Learning from such, the algorithm will then be able to more accurately recognize natural speech that a person dictates directly to the microphone.
The first dataset contains tens of thousands of hours of calls. It is currently one of the largest datasets for English speech recognition in the world. The MSWC contains over 300,000 keywords in dozens of languages. The dataset can be used in voice assistants, for example.