Awkward! Google was exposed to using Claude model for comparative testing to improve Gemini AI

Author：Eve Cole Update Time：2024-12-28 12:48:01

Recently, there were reports that Google is using Anthropic’s Claude model to improve its Gemini artificial intelligence project. Internal documents show that Google contractors are systematically comparing the output of Gemini and Claude to evaluate and improve Gemini's performance. This approach has attracted industry attention and also involves industry norms for AI model evaluation and ethical issues in technical cooperation between different companies. This article will delve into the details of this incident and analyze its potential impact.

Recently, Google's Gemini artificial intelligence project is improving its performance by comparing its output results with Anthropic's Claude model. According to internal communications obtained by TechCrunch, the contractor responsible for improving Gemini is systematically evaluating the answers of the two AI models.

代码互联网电脑

In the AI industry, model performance evaluation is usually done through industry benchmarks, rather than having contractors compare the answers of different models one by one. The contractor responsible for Gemini needs to score the model's output based on several criteria, including authenticity and level of detail. They had up to 30 minutes each time to decide which answer was better, Gemini's or Claude's.

Recently, these contractors have noticed references to Claude appearing frequently on the internal platforms they use. Part of what was shown to contractors clearly stated: "I am Claude created by Anthropic." In an internal chat, contractors also found that Claude's answers were more focused on security. Some contractors pointed out that Claude's security settings are the most stringent among all AI models. In some cases, Claude will choose not to respond to prompts it deems unsafe, such as role-playing other AI assistants. In another case, Claude avoided a prompt and Gemini's answer was flagged as a "major safety violation" for containing "nudity and bondage."

It should be noted that Anthropic's commercial service terms prohibit customers from using Claude to "build competing products or services" or "train competing AI models" without authorization. Google is one of Anthropic's major investors.

In an interview with TechCrunch, Google DeepMind spokesperson Shira McNamara would not disclose whether Google has received approval from Anthropic to use Claude. McNamara said that DeepMind does compare model outputs for evaluation, but does not train Gemini on the Claude model. "Of course, as is standard industry practice, we will compare model outputs in some cases," she said. "However, any suggestion that we used the Anthropic model to train Gemini is inaccurate."

Last week, TechCrunch also exclusively reported that Google contractors were asked to rate Gemini's AI responses in areas outside of their areas of expertise. Some contractors have expressed concerns in internal communications that Gemini could generate inaccurate information on sensitive topics such as health care.

Highlights:

Gemini is conducting comparative testing with Claude to improve the performance of its own AI model.

The contractor is responsible for the scoring, and the two's responses are compared across multiple criteria, including authenticity and safety.

Anthropic prohibits the unauthorized use of Claude for competitive model training.

Google's use of the Claude model to improve Gemini's behavior has triggered discussions about AI model evaluation methods, data use ethics, and competitive relationships. In the future, whether similar cross-company AI model comparisons will become the norm in the industry and how to regulate such behavior are worthy of further attention. This will have a profound impact on the development and regulation of the AI industry.