Kyutai unveils Moshi, the very first voice-enabled AI openly accessible to all

Kyutai unveils Moshi, the very first voice-enabled AI openly accessible to all

In just 6 months, with a team of 8, the Kyutai research lab developed from scratch an Artificial Intelligence (AI) model with unprecedented vocal capabilities called Moshi.

The team publicly unveiled its experimental prototype in Paris. At the end of the presentation, the participants – researchers, developers, entrepreneurs, investors and journalists – were themselves able to interact with Moshi. The interactive demo of the AI will be accessible from the Kyutai website at the end of the day. It can therefore be freely tested online as from today, which constitutes a world first for a generative voice AI.

This new type of technology makes it possible for the first time to communicate in a smooth, natural and expressive way with an AI. During the presentation, the Kyutai team interacted with Moshi to illustrate its potential as a coach or companion for example, and its creativity through the incarnation of characters in roleplays.

More broadly, Moshi has the potential to revolutionize the use of speech in the digital world. For instance, its text-to-speech capabilities are exceptional in terms of emotion and interaction between multiple voices.

Compact, Moshi can also be installed locally and therefore run safely on an unconnected device.

With Moshi, Kyutai intends to contribute to open research in AI and to the development of the entire ecosystem. The code and weights of the models will soon be freely shared, which is also unprecedented for such technology. They will be useful both to researchers in the field and to developers working on voice-based products and services. This technology can therefore be studied in depth, modified, extended or specialized according to needs. The community will in particular be able to extend Moshi’s knowledge base and factuality, which are currently deliberately limited in such a lightweight model, while exploiting its unparalleled voice interaction capabilities.