Software News

Nvidia Unveils AI Model for Audio and Voice Manipulation

Nvidia introduced a groundbreaking artificial intelligence model designed to generate music and audio, modify voices, and create unique sounds.

Kapish Khajuria

26 Nov 2024 17:16 IST

New Update

Nvidia Unveils AI Model for Audio and Voice Manipulation

Listen to this article

0.75x 1x 1.5x

00:00 / 00:00

Nvidia introduced a groundbreaking artificial intelligence model designed to generate music and audio, modify voices, and create unique sounds. This innovative technology is specifically aimed at professionals in music production, filmmaking, and video game development.

As the world's leading provider of AI-focused chips and software, Nvidia revealed that it has no immediate plans to make the technology, called Fugatto, publicly available. The name Fugatto stands for Foundational Generative Audio Transformer Opus 1. This AI model enters a space already populated by similar innovations from startups like Runway and major companies such as Meta Platforms, which have developed systems capable of generating audio or video from text prompts.

How does the Nvidia AI model modify the voices?

Headquartered in Santa Clara, California, Nvidia’s Fugatto can produce sound effects and music based on text descriptions. Beyond merely generating audio, the model can create imaginative soundscapes—for instance, it can make a trumpet mimic the sound of a barking dog. What sets it apart from other generative AI tools is its capacity to modify existing audio. For example, it can transform a piano melody into a vocal line sung by a human-like voice or alter a spoken word recording to feature a different accent or emotional tone.

Bryan Catanzaro, Nvidia's vice president of applied deep learning research, highlighted the transformative potential of the technology:

"If we think about synthetic audio over the past 50 years, music sounds different now because of computers and synthesizers. Generative AI is poised to bring entirely new capabilities to music, video games, and everyday creators who want to make something unique."

OpenAI's role in the entertainment industry?

However, the broader adoption of AI in entertainment has sparked debate. For instance, OpenAI's discussions with Hollywood studios over AI's role in content creation have been overshadowed by controversies, including accusations from actress Scarlett Johansson about AI imitating her voice without consent.

Fugatto was trained using open-source data, and Nvidia is carefully considering the implications of releasing it to the public. Catanzaro explained the potential risks associated with generative AI, stating:

"Generative technology carries inherent risks, as it can be misused to create content we'd rather not see. This is why we’re cautious and have no immediate plans for public release."

The creators of generative AI models, including Nvidia, OpenAI, and Meta, are grappling with challenges like preventing misuse, avoiding the spread of misinformation, and mitigating copyright violations, such as unauthorized replication of copyrighted characters. For now, companies remain cautious, with no clear timelines for public access to these advanced AI systems.