Google Gemini version 2.0 is officially released: 2.0 Flash now supports multi-modal output

Author：Eve Cole Update Time：2024-12-20 16:00:02

Google has released its latest generation artificial intelligence model Gemini 2.0, marking a major breakthrough for Google in the field of general artificial intelligence assistants. Gemini 2.0 offers significant improvements in multimodal processing and tool usage, enabling a deeper understanding of the world and execution of user commands. This model is developed based on Gemini 1.0 and 1.5 versions and has been applied in multiple Google products, serving millions of users around the world. This article will introduce in detail the functions of Gemini 2.0 and its impact on Google products and developer ecosystem.

Sundar Pichai, CEO of Google and its parent company Alphabet, announced that the company has launched its latest artificial intelligence model, Gemini 2.0, which marks an important step for Google in building a universal AI assistant. Gemini 2.0 demonstrates significant advancements in multi-modal input processing and the use of native tools, enabling AI agents to gain a deeper understanding of the world around them and take actions on behalf of the user under their supervision.

Gemini2.0 is developed based on its predecessors Gemini1.0 and 1.5, which for the first time achieved native multi-modal processing capabilities and can understand a variety of information types including text, video, images, audio and code. Currently, millions of developers use Gemini to develop, driving Google to reimagine its products, including 7 products that serve 2 billion users, and create new products. NotebookLM is an example of multi-modal and long-context capabilities and is widely loved.

微信截图_20241212080452.png

The launch of Gemini 2.0 heralds Google's entry into a new agent era. This model has native image and audio output capabilities, as well as native tool usage capabilities. Google has started making Gemini 2.0 available to developers and trusted testers, and plans to quickly integrate it into products, starting with Gemini and search. From now on, the Gemini2.0 Flash experimental model will be open to all Gemini users. At the same time, Google also launched a new feature called Deep Research, which uses advanced reasoning and long context capabilities to act as a research assistant to explore complex topics and compile reports on behalf of users. This feature is currently available in Gemini Advanced.

Search is one of the products most impacted by AI, and Google’s AI overview now reaches 1 billion people, enabling them to ask entirely new questions, quickly becoming one of Google’s most popular search features. As the next step, Google will bring Gemini 2.0’s advanced reasoning capabilities to AI Overview to solve more complex topics and multi-step problems, including advanced mathematical equations, multi-modal querying and encoding. Limited testing began this week, with a wider rollout planned early next year. Google will also continue to bring AI Overview to more countries and languages over the next year.

Google also demonstrated its cutting-edge results in agent research through Gemini 2.0’s native multi-modal capabilities. Gemini 2.0 Flash improves upon 1.5 Flash, the most popular model among developers to date, with similarly fast response times. Notably, the 2.0 Flash even outperformed the 1.5 Pro in key benchmarks by being twice as fast. Flash 2.0 also brings new capabilities. In addition to supporting multi-modal input such as images, video and audio, Flash 2.0 now also supports multi-modal output such as natively generated images mixed with text and controllable multi-language text-to-speech (TTS) audio. It can also natively call tools such as Google search, code execution, and third-party user-defined functions.

微信截图_20241212080808.png

Gemini 2.0 Flash is now available to developers as an experimental model, with multimodal input and text output available to all developers via Google AI Studio and Vertex AI’s Gemini API, while text-to-speech and native image generation are available for early access Partners. General availability will follow in January, along with additional model sizes.

To help developers build dynamic and interactive applications, Google also released a new multi-modal real-time API with real-time audio and video streaming input capabilities and the ability to use multiple combination tools.

Starting today, Gemini users around the world can access a chat-optimized version of the 2.0 Flash experiment by selecting it in the model drop-down menu on desktop and mobile web, and it will soon be available in the Gemini mobile app. Early next year, Google will expand Gemini 2.0 to more Google products.

All in all, the release of Gemini 2.0 represents another step taken by Google in the field of AI. Its powerful multi-modal capabilities and tool integration will bring a richer experience to developers and users, and promote the application of AI technology in more fields. development and application. In the future, Gemini 2.0 will be further integrated into Google's product ecosystem to bring more intelligent and convenient services to users.