Meta CEO Mark Zuckerberg recently defended Meta’s use of copyrighted data to train AI in court, attracting widespread attention. The plaintiffs in this case include well-known authors, accusing Meta of using a large amount of pirated book data in AI model training. Zuckerberg's defense strategy is quite controversial. He compared Meta's behavior with pirated content on YouTube, trying to prove that Meta's behavior was not intentional infringement. Whether this defense strategy will be adopted by the courts remains to be seen. This article will analyze Zuckerberg’s testimony and the latest developments in the case in detail, and explore its impact on the future development of the artificial intelligence industry.
Meta CEO Mark Zuckerberg has used YouTube's fight against pirated content to defend the company's use of copyrighted data in AI training in recent legal proceedings. The case, called "Kadrey v. Meta," is one of many copyright lawsuits against AI companies in U.S. courts. The plaintiffs include well-known writers Sarah Silverman and Ta-Nahisi Coates.
According to recently released excerpts of Zuckerberg's testimony, he noted that while some pirated content may exist on YouTube, YouTube is still working to remove it. "Most of the content on YouTube should be legal and they have relevant licenses." Zuckerberg said. The remarks hint at his stance on Meta's use of a copyrighted e-book data set called LibGen for AI training.
LibGen is a link aggregation website that provides copyrighted works from multiple publishers, including Cengage Learning, McGraw-Hill, and Pearson Education. The site has been sued multiple times for copyright infringement and fined tens of millions of dollars. Court documents show that despite Meta's AI team expressing concerns about the legal risks of using LibGen, Zuckerberg still approved its use as a training data set.
During questioning, Zuckerberg claimed he was unfamiliar with LibGen but said banning a platform like YouTube would be unreasonable. "No, I wouldn't want to set policies on people using YouTube because some content may be copyrighted," he said. He also acknowledged that Meta needs to be cautious when using copyrighted material for training.
According to the latest allegation from the plaintiffs' attorneys, Meta cross-referenced certain pirated books on LibGen with copyrighted books available for licensing to determine whether it should reach a licensing agreement with the publisher. In addition, the plaintiff also accused Meta of using LibGen's data set when training its latest Llama model and downloading e-books from another pirated source, Z-Library, for training.
Z-Library has also been subject to multiple legal actions due to copyright issues, and its defenders were charged with copyright infringement, online fraud, and money laundering in 2022.
Highlight:
Zuckerberg cited the YouTube case in court to defend Meta's use of copyrighted material in AI training.
The plaintiff accused Meta of using the pirated book data set LibGen to train the Llama model and hiding relevant information.
Meta has faced multiple copyright lawsuits, and related legal risks have triggered internal discussions and concerns.
The final judgment of this case will have a profound impact on the artificial intelligence industry. It will provide an important legal reference for how AI companies use copyrighted data to train models in the future, and may prompt the industry to develop stricter copyright protection measures.