Developers from Meta (formerly Facebook) talked about the Project CAIRaoke project, which created an “end-to-end neural model” that provides a much more natural process of human interaction with a voice assistant. The new model is already being used in Portal smart displays and will be part of VR and AR devices in the future to provide the best virtual assistant experience.
The report says that one of the main obstacles to improving voice assistants is the architecture on which they are built. Despite the fact that such systems look like a single whole, they are based on four separate components: natural language understanding (NLU), dialogue state tracking (DST), dialogue policy management (DP), and natural language generation (NLG). These components are interconnected, which makes their optimization and adaptation to new tasks difficult and largely dependent on annotated datasets. This is one of the reasons why today’s voice assistants keep users within strict boundaries when interacting.
Project CAIRaoke has already created neural models that will allow people to communicate more naturally and freely with voice assistants, for example, returning to the previous topic of conversation or completely changing it, mentioning things that depend on understanding the nuances of the context, etc. Users will also be able to interact with voice assistants in new ways, such as gestures. The new model is currently being used in Portal smart displays and is in early testing. However, developers are already confident that it is superior to existing approaches used to create voice assistants.
The developers expressed their confidence that the progress made within the framework of Project CAIRaoke will make communication between AI algorithms and humans more natural, and will also become an important tool in creating the metaverse. The voice assistant built into an AR headset will become more useful in the future and be able to understand the meaning of what the user says in natural language. In the future, such voice assistants may appear in different applications, so that people around the world will be able to interact with them.
source: Meta AI