Meta recently quietly released six major AI research results, covering multiple fields such as multi-modal models, text-generated music, audio watermarking technology, and data sets, demonstrating its continuous innovation and technical strength in the field of artificial intelligence. These research results not only provide new possibilities for applications in the field of AI, but also provide valuable reference for future technological development directions. These impressive research results are described in detail below.
Recently, Meta quietly released six research results, bringing new applications and technological breakthroughs to the field of AI. These include multi-modal models, text-generated music models, audio watermarking technology, data sets and other projects. Let’s take a look at the specific results of these studies.
Meta Chameleon ("Chameleon" model)
First of all, the released multi-modal model "Chameleon" can process text and images at the same time, supports mixed input and output text, and provides a new solution for processing multi-modal data.
While most current late-stage fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes models easier to design, maintain, and extend.
Video Examples: Generate creative titles from images or use a mix of text prompts and images to create a completely new scene
Meta will now publicly release key components of the Chameleon7B and 34B models under a research license. Its currently released model is security-tuned, supports mixed-mode input and plain text output, and can be used for research purposes. The official emphasized that the Chameleon image generation model will not be released.
Product entrance: https://top.aibase.com/tool/meta-chameleon
Multi-Token Prediction
The new language model training method "Multi-Token Prediction" aims to improve model capabilities and training efficiency. It trains the model to predict multiple words at a time, improving the prediction accuracy of the model.
Using this approach, language models can be trained to predict multiple future words simultaneously, rather than the previous method of predicting one word at a time. This improves model capabilities and training efficiency while increasing speed. In the spirit of responsible open science, pre-trained models will be released for code completion under a non-commercial/research-only license.
Product entrance: https://top.aibase.com/tool/multi-token-prediction
Text generation music model "JASCO"
While existing text-to-music models such as MusicGen rely primarily on text input to generate music, Meta's new model, Meta-Joint Audio and Symbol Conditioning for Temporally Controlled Text-to-Music Generation (JASCO), is able to accept a variety of conditions Inputs, such as specific chords or beats, to improve control over the resulting musical output. Specifically, an information bottleneck layer can be used in conjunction with temporal fuzziness to extract information relevant to specific controls. This allows combining symbolic and audio-based conditions simultaneously in the same text-to-music generative model.
JASCO is comparable to the evaluation baseline in terms of generation quality while allowing better and more flexible control over the generated music. Officials will publish research papers and example pages, and later this month the inference code will be released as part of the AudioCraft repository under the MIT license, and the pre-trained model will be released under CC-BY-NC.
Code entrance: https://top.aibase.com/tool/audiocraft
Audio watermark technology "AudioSeal"
This is the first audio watermarking technology specifically designed for the local detection of AI-generated speech, enabling precise localization of AI-generated segments within longer audio clips. AudioSeal improves on traditional audio watermarks by focusing on detecting AI-generated content rather than steganography.
Unlike traditional methods that rely on complex decoding algorithms, AudioSeal's local detection approach enables faster, more efficient detection. This design improves detection speed by 485 times compared to previous methods, making it ideal for large-scale and real-time applications. Our method achieves state-of-the-art performance in terms of robustness and imperceptibility of audio watermarks.
AudioSeal is released under a commercial license.
Product entrance: https://top.aibase.com/tool/audioseal
PRISM dataset
At the same time, Meta also released the PRISM data set in cooperation with external partners, which contains the dialogue data and preferences of 1,500 participants around the world. It is used to improve large-scale language models, thereby improving the dialogue diversity, preference diversity and social benefits of the model. .
This dataset maps each person’s preferences and fine-grained feedback onto 8,011 real-time conversations with 21 different LLMs.
Dataset entrance: https://huggingface.co/datasets/HannahRoseKirk/prism-alignment
“DIG In” indicator
Used to evaluate geographical differences in text generation image models, providing more reference data for model improvement. To understand how people in different regions view geographic representation differently, Meta conducted a large-scale annotation study. We collected over 65,000 annotations and over 20 survey responses for each example, covering attractiveness, similarity, consistency, and shared recommendations to improve automatic and human evaluation of text-to-image models.
Code entrance: https://top.aibase.com/tool/dig-in
The release of these projects has brought new technological breakthroughs and application prospects to the AI field, and is of great significance in promoting the development and application of AI technology.
All in all, the six AI research results released by Meta this time demonstrate its leading technology and forward-looking layout in multi-modality, text generation, audio processing, and data set construction. Advances in these technologies will promote further development in the field of AI and bring more possibilities for future applications.