Meta to begin using public Facebook, Instagram data to train AI in Europe

Company paused plans last year after a request from Data Protection Commissioner

Meta is to begin using EU user data to train its AI. Photograph: Jason Henry/The New York Times
Meta is to begin using EU user data to train its AI. Photograph: Jason Henry/The New York Times

Meta is to begin training its artificial intelligence (AI) models using the public posts of European Union (EU) users.

From tomorrow, users of Facebook. Threads and Instagram will begin seeing notifications in Meta’s apps about the plans, with details also provided on how to opt out.

The company said it would only use public data, and would not include private messages of users, nor the data of people under the age of 18. WhatsApp data will also be excluded.

“This is how we have been training our generative AI models for other regions since launch and we’re following the example set by others including Google and OpenAI, both of which have already used data from European users to train their AI models,” Meta said.

READ SOME MORE

The move comes after the launch of Meta AI in Europe, with the company maintaining that using local data from European users is vital to making the service relevant.

“We believe we have a responsibility to build AI that’s not just available to Europeans, but is actually built for them,” Meta said. “That’s why it’s so important for our generative AI models to be trained on a variety of data so they can understand the incredible and diverse nuances and complexities that make up European communities.”

Meta previously paused plans to use content that people in the European Union have chosen to share publicly on Meta’s products and services to train its large language models that power AI features, following a request from the Data Protection Commissioner in Ireland. Advocacy group NOYB had complained about the plans, and called on privacy watchdogs across Europe to intervene.

“We welcome the opinion provided by the EDPB (European Data Protection Board) in December, which affirmed that our original approach met our legal obligations,” Meta said.

“Since then, we have engaged constructively with the IDPC and look forward to continuing to bring the full benefits of generative AI to people in Europe.”

The company said the notifications pushed out to users would also include a link to a form where people could object to their data being used, and that it had made the objection form easy to find, read, and use.

Meta said it would honour any objections it had already received to the plans, along with any new forms that are submitted.

The company came under fire earlier this year for allegedly using pirated versions of copyrighted books to train its artificial intelligence systems.

The company is being sued by a group of authors, including Ta-Nehisi Coates and comedian Sarah Silverman, who claimed in court filings that there was new evidence to show Meta used the AI training data set LibGen.

That data set allegedly includes millions of pirated works, with several prominent Irish authors said to be included, including President Michael D Higgins, Anne Enright and Joseph O’Connor. – Additional reporting: Reuters

Ciara O'Brien

Ciara O'Brien

Ciara O'Brien is an Irish Times business and technology journalist