Adobe Research and Northwestern University have joined forces to create a revolutionary AI system - Sketch2Sound. This system can transform simple voice imitation and text description into professional-grade sound effects, bringing unprecedented efficiency improvements to the sound design industry. It analyzes the loudness, timbre, and pitch of speech and combines it with text descriptions to generate the sound effects desired by the user. Its unique context understanding ability is even more amazing. For example, by combining a simple "forest atmosphere" with the imitation of bird calls, the system can automatically generate realistic bird calls without additional instructions. Sketch2Sound also supports music creation. Users only need to hum the rhythm and enter the name of the instrument, and the system will automatically match the pitch and rhythm and generate the corresponding drum pattern.
The system analyzes three key elements of speech input: loudness, timbre (which determines how bright the sound is) and pitch. The system then combines these features with a text description to generate the desired sound.
Video: García et al., Adobe Research
The interesting thing about Sketch2Sound is its ability to understand context. For example, if someone types in "forest vibe" and makes short sounds, the system automatically recognizes that those sounds should be bird calls - without the need for specific instructions.
The same intelligence applies to music. When creating a drum pattern, users can enter "bass drum, snare drum" and then hum the rhythm using the bass and treble. The system automatically places the bass drum on the low end and the snare drum on the high end.
Provides professionals with granular controlThe research team built in special filtering technology that allows users to adjust and control the precision of the generated sounds. Sound designers can choose precise, detailed control or a more relaxed, approximate approach, depending on their needs.
This flexibility makes Sketch2Sound especially valuable for Foley artists (professionals who create sound effects for movies and TV shows). Instead of manipulating physical objects to make sounds, they can create effects faster through speech and text descriptions.
The researchers note that the spatial audio characteristics of the input recording can sometimes affect the resulting sound in undesirable ways, but they are working to address this issue. Adobe has not announced when or if Sketch2Sound will become a commercial product.
The emergence of Sketch2Sound will undoubtedly greatly improve the efficiency and convenience of sound design, and bring new creative possibilities to the film, television, game and other industries. Although it is still in the research and development stage, its potential cannot be ignored, and its future development is worth looking forward to.