This article reveals the intensity of internal competition at Meta during the development of Llama3 and the potential copyright issues that resulted. Through the internal information released by the court, we can see that Meta has made great efforts to surpass OpenAI and Anthropic, even taking risks to use training data that may have copyright issues. This article will provide a detailed analysis of Meta’s internal competitive situation, attitude towards competitors, and the resulting legal risks and future prospects.
In Kadrey v. Meta, one of Meta's ongoing artificial intelligence copyright cases, internal information released by the court revealed the company's fierce competition and potential copyright issues when developing Llama3. Meta’s senior leaders and researchers have gone all out to surpass companies like OpenAI and Anthropic in AI model development, and consider its GPT-4 and Claude as the gold standard for their efforts.
Fierce competitive mentality: Meta vows to defeat OpenAI
According to Ahmad Al-Dahle, Vice President of Generative AI at Meta, the company’s goal when developing Llama3 was clearly directed at GPT-4, and it was determined to take an advantage in the AI competition through efficient hardware support, such as 64k GPU. Al-Dahle wrote in an internal message: "We will launch 64k GPU! We need to learn how to build the cutting edge and win this game."
However, even though Meta releases open source AI models, Meta executives are more focused on beating out competitors like OpenAI and Anthropic that typically don’t disclose the weights of their models, instead putting their models under APIs , forming a strong focus of competition.
Contempt for Mistral and Internal Anxiety
French artificial intelligence startup Mistral is one of Meta's biggest public competitors, but Meta executives are clearly dismissive of it. In one message, Al-Dahle said: "Mistral is insignificant to us, we should be able to do better." This also exposed the extreme anxiety within Meta about AI competition and their role in the industry. Strong ambition.
At the same time, Meta's AI leaders frequently talked about how they actively obtain data to train Llama3 in internal communications, and some information showed that executives were full of expectations for Llama3. One executive even mentioned in the message: "Llama3 That’s all I really care about.”
Copyright Issues and Legal Challenges
As Meta encountered fierce competition in the development of Llama3, the training data it used also began to cause legal disputes. Prosecutors allege that Meta executives may have cut corners and used copyrighted books as training data as they raced to catch up on AI development progress.
In a message, researcher Hugo Touvron revealed that the combination of data sets used by Meta in Llama2 training was "terrible" and proposed optimizing Llama3 by improving the data sets. They also discussed clearing barriers to using the LibGen dataset, which contains copyrighted works from publishers such as Cengage Learning, Macmillan Learning, McGraw Hill and Pearson Education.
Despite the copyright issues, Meta CEO Mark Zuckerberg emphasized that Meta will continue to promote the progress of the Llama model and narrow the gap with other closed models such as OpenAI and Google.
Meta’s future prospects and Llama3’s market position
In July 2024, Zuckerberg said that Llama3 was comparable in performance to the most advanced AI models and was leading in some areas. He predicts that from 2025, Meta’s Llama series will become the most advanced AI model in the industry. However, the release of Llama3 still has to contend with mounting copyright litigation, particularly legal scrutiny over its training data.
All in all, Meta demonstrated its ambition and competitiveness in the AI field during the development of Llama3, but it also exposed its problems in data acquisition and copyright compliance. The future market position of Llama3 and Meta's AI strategy will largely depend on its ability to effectively resolve these legal challenges.