According to 404 Media, Automattic plans to share public post data on the Tumblr platform from 2014 to 2023, including non-publicly visible content, to third parties, which may include artificial intelligence companies such as Midjourney and OpenAI. The move sparked discussions about data privacy and users’ right to know, while also revealing the complex relationship between artificial intelligence companies’ needs for training on massive amounts of data and platform companies’ data-sharing strategies. This "initial data dump" is so large that it covers all public posts on the Tumblr platform, which will have a significant impact on the training of artificial intelligence models.
According to a report from 404 Media, Automattic plans to share data with third parties, including training data obtained from user posts. The company scraped an "initial data dump" containing the content of all public posts on Tumblr between 2014 and 2023, including content that would not be publicly visible on the blog. It's unclear how much of this data was sent to Midjourney and OpenAI. This indicates that Automattic is in talks with the artificial intelligence company and the deal is close to completion.Automattic's move has raised concerns about data privacy and user consent. How to balance the development of artificial intelligence and user data protection will become an important issue in the future. This incident also reminds us that we need to be cautious about the disclosure of personal information when using social media platforms.