NVIDIA Teases Fugatto “World’s Most Flexible Sound Machine”


Today, semiconductor manufacturer and the world’s most valuable company NVIDIA shared a preview of Fugatto, an AI-powered audio tool that they describe as “the World’s Most Flexible Sound Machine”.

Fugatto is intended to be a sort of Swiss Army Knife for audio, letting you generate or transform any mix of music, voices and sounds using just text prompts.

“Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale,” says composer & NVIDIA researcher Rafael Valle.

Official teaser video:

Like earlier generative audio demos, many of the audio examples in their promo seem primitive. On the other hand, this is first generative AI demo that we’ve seen that also showcases the tool being used in interesting creative ways.

For example, the video demonstrates how you can use text prompts with Fugatto to extract vocals from a mix, morph one sound into another, generate realistic speech, remix existing audio, and convert MIDI melodies into realistic vocal samples. These are capabilities that c0uld actually complement and extend the capabilities of the current generation of digital audio workstations.

What they have to say about the technology behind Fugatto:

“Fugatto is a foundational generative transformer model that builds on the team’s prior work in areas such as speech modeling, audio vocoding and audio understanding.

The full version uses 2.5 billion parameters and was trained on a bank of NVIDIA DGX systems packing 32 NVIDIA H100 Tensor Core GPUs.

Fugatto was made by a diverse group of people from around the world, including India, Brazil, China, Jordan and South Korea. Their collaboration made Fugatto’s multi-accent and multilingual capabilities stronger.

One of the hardest parts of the effort was generating a blended dataset that contains millions of audio samples used for training. The team employed a multifaceted strategy to generate data and instructions that considerably expanded the range of tasks the model could perform, while achieving more accurate performance and enabling new tasks without requiring additional data.

They also scrutinized existing datasets to reveal new relationships among the data. The overall work spanned more than a year.”

We’ve got questions about Fugatto, ranging from “When will Fugatto become a real thing?” to “Will the data centers needed to power Fugatto generate enough heat to bring ocean-front views to the Midwest?”

But, with this demo, we can see a paradigm-shift coming in how musicians work with audio – where text-based and spoken commands become an important part of musician’s toolkits.

We may have been optimistic on that timetable. But it’s clear that – for at least younger musicians – we’re heading for an era where the ‘virtual studio’ paradigm of current DAWs may not be relevant anymore. For someone new to music production, being able to remix audio and arrange music using voice commands will make it much easier to get started with music production.

And, for those that have invested years in developing skills with audio software – it’s clear that new audio tools are coming, very quickly, that promise to let us work with audio in new ways. It seems inevitable that some of the capabilities demonstrated in this video will be integrated into the next generation of digital audio workstations.

Is generative audio about to get interesting for creative musicians? Or are you sick of hearing about how AI is going to get awesome? Share your thoughts in the comments!

 


Leave a comment

Name: (Required)

eMail: (Required)

Website:

Comment: