NVIDIA has launched a revolutionary audio generation and processing AI model called Fugatto, which has 2.5 billion parameters and is designed to bring unprecedented flexibility and creativity to the field of music and sound creation. Fugatto combines text prompts and advanced audio synthesis technology, supports text and audio input, breaks through the limitations of traditional audio generation models, allows users to create and modify in real time, and generate a variety of novel sound effects. Its innovative "Composable Audio Representation Transformation" (ComposableART) technology gives users unprecedented control and precise control over sound.
In the field of music and sound creation, the combination of technology and creativity has always faced many challenges. Existing AI models are often only good at specific tasks and lack broad adaptability, which limits the auxiliary role of AI in music production. In order for AI to better serve music and audio production, a universal model that can flexibly respond to various creative needs is urgently needed. To this end, NVIDIA launched Fugatto, an audio generation and processing model with 2.5 billion parameters.
Fugatto is designed to provide a highly flexible space for voice input and creative experimentation by combining text prompts with advanced audio synthesis capabilities. For example, it can transform a piano melody into a sung vocal, or give the trumpet an unexpected sound.
Fugatto not only supports text input, but also supports optional audio input, breaking the limitations of traditional audio generation models, allowing artists and developers to create and modify in real time, and smoothly generate new types of sounds.
On the technical side, Fugatto uses an innovative approach to data generation that goes beyond traditional supervised learning. Its training relies not only on regular datasets, but also on specially generated datasets, creating a rich variety of audio and conversion tasks. In addition, Fugatto uses large language models (LLM) to enhance instruction generation capabilities and better understand the relationship between audio and text prompts.
An important innovation is the Composable Audio Representation Transform (ComposableART), a technique used at inference time to flexibly combine, interpolate or negate different audio generation instructions. ComposableART gives users greater control over the audio synthesis process, allowing them to precisely navigate Fugatto's sonic palette to create unique sonic phenomena.
Fugatto's architecture is based on the enhanced Transformer model and uses specific modifications such as adaptive layer normalization to maintain consistency under multiple input conditions and support complex combination instructions. Preliminary tests show that Fugatto performs well on common benchmarks, particularly in sound synthesis and transformation, showing greater capabilities than other professional models.
The launch of Fugatto marks an important advancement in audio generation AI, breaking through traditional limitations and providing a powerful and flexible tool for creative audio production. Its potential applications in multiple fields such as music, games, entertainment, and education mean that AI technology will continue to play an important role in assisting human creativity.
Official blog: https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model/
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/FUGATTO.pdf
Highlight:
Fugatto is an audio AI model launched by NVIDIA. It has 2.5 billion parameters, supports text and audio input, and assists music and sound creation.
Using innovative data generation methods and combinable audio representation transformation technology, users can flexibly generate and modify sounds.
Preliminary tests show that Fugatto outperforms several professional models in audio synthesis and transformation, demonstrating its strong creative potential.
All in all, Fugatto, with its powerful functions and flexible features, brings new possibilities to the fields of music creation and sound design, indicating that the application of AI in the creative industry will be more extensive and in-depth. We look forward to Fugatto bringing us more surprises in the future.