The VKontakte team announced good news for third-party developers. Now developers will be able to use the VKontakte technology in their projects for free, which reads the voice and translates it into text.
Alexander Tobol, technical director of VKontakte, spoke about this at the opening of the Saint HighLoad ++ conference. Speech recognition technology, or ASR, Automatic Speech Recognition, is implemented in a few clicks. Neural networks do well with audio with extraneous noise, a lot of slang and abbreviations.
For recognition, you can choose one of two models. Neutral is good for intelligible speech, like in a TV show or interview, while spontaneous will help when you need to process more casual speech with slang and profanity. The VKontakte neural networks process files in a few seconds, are able to remove noise and pauses from decoding, understand unintelligible speech and even a single sound “ъ”.
The technology can be tried through the web interface on a special page or integrate via the VKontakte public API. On the portal a wide range of methods is available with which you can create VKontakte mini-applications or use them in third-party projects. The solution is suitable for startups, indie projects, personal pet projects for learning and self-development. The version with audio processing up to 100 minutes per day can be used for any purpose. And for unlimited use of technology, you can send a request by e-mail.