Tumblr parent company Automattic is planning to sell user post data to OpenAI and Midjourney for AI model training, a move that raises concerns about user privacy and data security. This is similar to previous cases of cooperation between companies such as Reddit and Shutterstock and AI companies, both involving the commercial use of user data. Automattic promises to provide users with settings to opt out of data sharing, but there are still many ambiguities about the scope of data collection and processing methods, especially regarding the handling of accidentally collected non-public posts, which requires further clarification and explanation.
Tumblr parent company Automattic is in talks with OpenAI and Midjourney to sell user posts for training AI models. Automattic is preparing to release a setting that will allow users to opt out of data sharing with third parties. They have crawled all public posts on Tumblr from 2014 to 2023. Although the errors include some non-public posts, it is not clear how the data will be processed and which data will be used to train the model. Previously, Reddit signed an agreement with Google to use user data to train Google's AI model every year; Shutterstock signed an agreement with OpenAI to use its photo library to train the model.This move once again highlights the reality that large language model training relies on massive amounts of data, and also raises continued concerns about data privacy and ethics. Automattic needs to transparently explain its data processing processes and ensure users have real choice in order to maintain user trust and the sustainable development of the platform. In the future, similar data sharing practices will face stricter regulation and wider public scrutiny.