OpenAI’s latest video generation model, Sora, has attracted widespread attention for its powerful functions, but it also brings legal risks that cannot be ignored. Sora can generate high-quality videos based on text or pictures, and the video clips it generates even look like some well-known games, such as "Super Mario Bros.", "Call of Duty", etc. However, the source of Sora’s training data is not fully disclosed and may contain a large amount of copyrighted game content, which exposes it to potential copyright litigation.
OpenAI recently launched its video generation model Sora, but the model's training data may contain a lot of copyrighted game content, which has raised concerns about legal issues. Sora can generate a video of up to 20 seconds based on the user's text prompts or pictures, and supports a variety of formats and resolutions. At the time of launch, OpenAI mentioned that Sora's training data included Minecraft videos, sparking curiosity about other gaming content that might be used.
In real-world testing, Sora was able to generate videos that resembled several well-known games, including what looked like a Super Mario Bros. clone, a first-person shooter inspired by Call of Duty and Counter-Strike, and The style is similar to a clip from the 90s Teenage Mutant Ninja Turtles arcade fighting game. In addition, Sora also seems to have mastered the performance of Twitch live broadcasts, and is able to generate characters similar to popular streamers Auronplay and Pokimane.
However, OpenAI did not detail the data sources used by Sora. While OpenAI says it uses "publicly available" data and obtains licensed data from stock media libraries like Shutterstock, that doesn't eliminate legal risks. Intellectual property lawyer Joshua Weigensberg pointed out that if Sora's training data does contain live video of the game, then this is likely to involve the copying of copyrighted material.
The characteristics of AI generative models such as Sora are based on probabilistic learning, identifying patterns through large amounts of data. However, this can also lead to the generation of output that approximates the training data, causing dissatisfaction among creators, and an increasing number of people are seeking solutions through legal means.
Dealing with game content is particularly complex because video playback involves not only the copyright of the game developer, but also potentially unique videos produced by players. If the court finds that copyright has been infringed when training an AI model, developers will face more legal risks.
Although AI companies may prevail in some legal disputes, this does not mean that users of these models are completely exempt from liability. The generated content may involve multiple legal areas such as copyright, trademark rights, and portrait rights. Therefore, developers must exercise extreme caution when training AI models.
The emergence of Sora demonstrates the rapid development of AI video generation technology, but its copyright issues also sound a warning for the development of AI technology. In the future, the training and use of AI models need to pay more attention to intellectual property protection and balance innovation and legal compliance.