Salesforce launches xGen-MM open source multi-modal AI model to help visual understanding

Author：Eve Cole Update Time：2024-12-22 13:16:01

Salesforce has made a groundbreaking open source of its multi-modal AI model xGen-MM, a powerful tool that can simultaneously understand and generate multiple data types such as text and images. The model has 4 billion parameters and has performed well on multiple benchmarks. Its open source nature contrasts with current industry trends and reflects Salesforce's commitment to promoting open research in the field of AI. xGen-MM is unique in its ability to process "interleaved data," which allows it to perform more complex tasks such as answering questions about multiple images simultaneously, with huge application potential in areas such as medical diagnostics and autonomous driving. .

Salesforce has launched a set of open source multi-modal AI models called xGen-MM. This set of models can simultaneously understand and generate multiple data types such as text and images, and may completely change the way we research and apply AI.

The Salesforce AI research team published a paper on arXiv detailing the xGen-MM framework. This framework not only includes pre-trained models, but also datasets and fine-tuning code. It is worth mentioning that this largest model has 4 billion parameters, and its performance has performed well in multiple benchmark tests, not inferior to similar open source models.

This open source move is completely different from the current trend of many technology giants choosing to keep advanced AI models secret. Salesforce says it hopes to promote broader research and development by opening up models and datasets. In fact, this decision is also to give more researchers and developers the opportunity to participate in the advancement of multi-modal AI technology.

One of the innovations of xGen-MM is its ability to handle "interleaved data," that is, it can process multiple images and text simultaneously. This ability allows the model to perform more complex tasks, such as answering questions about multiple images at the same time, which is really awesome! Such application scenarios may be of great use in fields such as medical diagnosis and autonomous driving.

The release also includes multiple optimized versions of the model, such as a base pre-trained model, a model tuned to follow instructions, and a "safety-tuned" model designed to reduce harmful output. This diverse selection reflects the AI community’s increasing emphasis on the balance between capabilities and safety ethics.

However, the release of powerful models has also triggered discussions about the potential risks and social impacts of more advanced AI systems. Although Salesforce has made security adjustments to reduce risks, how to balance innovation and security is still a question worth pondering.

This open source release from Salesforce gives researchers valuable tools to better understand and improve these powerful technologies. At the same time, this also sets a new benchmark for transparency in the field of AI, which may push other technology giants to be more open about their research.

Model entrance: https://huggingface.co/collections/Salesforce/xgen-mm-1-models-662971d6cecbf3a7f80ecc2e

Highlight:

xGen-MM is a set of open source multi-modal AI models launched by Salesforce that supports comprehensive understanding and generation of text and images.

The model has the ability to process interleaved data and can answer questions about multiple images at the same time, so it has broad application prospects.

? This release includes a variety of optimized versions, pays attention to safety and ethical issues, and provides rich resources for researchers.

All in all, Salesforce's open source xGen-MM is a major progress in the field of AI. It not only provides powerful tools, but also sets an example for a more open and responsible AI research and development direction. It is worth looking forward to its future applications and development in various fields. .