At the international conference on artificial intelligence Artificial Intelligence Journey, Sber presented Kandinsky 2.0, an improved version of the Kandinsky neural network, which debuted in June of this year. According to the creators, this is the first Russian multilingual diffusion model for generating images from a text description with 2 billion parameters. Kandinsky 2.0, unlike its predecessor, can process requests in 101 languages and, according to the developers, it does it equally quickly and efficiently, regardless of what language is used – common, like Russian and English, or rare, for example, Mongolian.
Kandinsky 2.0 uses the increasingly popular diffusion approach, which gives good results in almost all tasks of generating multimedia content from a textual description (image, video, 3D and audio synthesis). According to Sber, Kandinsky 2.0 differs from its predecessor in a richer, deeper and more realistic picture and advanced features. On the site Fusion Brain images can be generated in 20 different styles, including Renaissance, Classicism, Cartoon, New Year and even Khokhloma. The model also implements the inpainting functions (replacing any part of the image and any object in the image with those generated by the neural network) and outpainting (the ability to finish the finished image and the background around the image). It is interesting that Kandinsky 2.0 draws the same concepts in different languages in different ways: for example, if you formulate the query “national dish” in Russian, it will most likely be shchi, and in Japanese – miso soup and sushi.
The neural network was developed and trained by Sber AI researchers with the support of scientists from the AIRI Institute of Artificial Intelligence. You can see how she draws on the FusionBrain website, as well as using the “Launch Artist” command on Sber smart devices and the Salyut mobile app. As the creators of Kandinsky 2.0 note, in a few seconds it allows you to get a unique image for a specific task and freely distribute it without a license, which is very important for business. In their opinion, the neural network draws realistic images that are often indistinguishable from those created by people.