Recently, the media revealed that Nvidia secretly grabbed YouTube video data on a large scale to train its AI model, triggering widespread legal and ethical controversy. This move involves many of Nvidia's AI products, including the Cosmos deep learning model, autonomous driving algorithms, etc. The data acquisition method is hidden and without authorization from the video creator and Google. Internal NVIDIA emails show that senior executives are optimistic about this behavior and believe it has been "fully approved." This statement is contrary to Google's official statement, which clearly stated that this behavior violates the platform's terms of service. The huge amount of data involved, the covert operating methods, and the completely different responses from all parties have made this incident a concern.
Recently, a secret operation by the technology giant Nvidia in data acquisition was exposed. According to reports from media 404, Nvidia trained their artificial intelligence model by grabbing massive amounts of YouTube video data, which is quite ambiguous in terms of law and ethics.
The report pointed out that Nvidia is using these video data to train its multiple AI models, including Cosmos deep learning models, autonomous driving algorithms, digital human AI avatar products, and 3D world building tool Omniverse.
It is understood that Nvidia has taken many covert measures to cover up their data scraping behavior, using multiple "virtual machines" and constantly changing IP addresses to avoid being discovered by YouTube. Moreover, the video creator and YouTube parent company Google did not give any authorization for this data scraping activity. Nvidia’s internal communications show that their strategy is quite bold. One executive mentioned in an email that they are building a “video data factory” that can generate visual experience data equivalent to a human lifetime every day.
Interestingly, when employees expressed concerns about the legality and ethics of such data acquisition, management seemed quite confident, believing that all this was a high-level decision. "We have a blanket approval of all data," the email read.
Even more troubling is that Nvidia knew for some time that it was using the HD-VG-130M dataset containing 130 million YouTube videos, which was originally created for academic research. Many experts have expressed strong dissatisfaction with this, arguing that the commercialization of data used for research is inappropriate.
As a core player in the AI industry, NVIDIA occupies a prominent position in the market, and its graphics processing units (GPUs) are the basis for many computationally intensive AI systems. Companies working with Nvidia, such as OpenAI, Microsoft and Google, have expressed concern about this behavior. A Google spokesperson mentioned that using YouTube data without permission is a clear violation of the platform's terms of service.
In response to the media, Nvidia claimed that their AI training practices are “fully compliant with the spirit and letter of copyright law.” However, what do the creators who use this content think about this statement?
Highlight:
Nvidia secretly scraped a large amount of YouTube video data for AI training, raising concerns about legal and ethical issues.
? Internal emails show that Nvidia executives believed that this behavior was fully approved and their attitude was quite bold.
? Google pointed out that using YouTube data without permission clearly violated the platform's terms of service, and Nvidia's response caused controversy.
NVIDIA's data scraping behavior triggered widespread discussions about the ethics and laws of AI data acquisition, and its response failed to quell the controversy. This incident highlights the challenges faced by large technology companies in the use of data, and the urgent need for relevant laws and regulations to be improved. In the future, similar incidents may continue to attract attention and prompt the industry to strengthen self-discipline and standardize data usage behaviors.