Technology

Publishers sue Meta, allege Zuckerberg approved piracy for AI training

Publishers say Meta fed Llama on millions of copyrighted works, with Zuckerberg accused of personally approving the piracy.

Lisa Park··2 min read
Published
Listen to this article0:00 min
Share this article:
Publishers sue Meta, allege Zuckerberg approved piracy for AI training
Source: sm.mashable.com

Five major publishers and novelist-lawyer Scott Turow have accused Meta of building its Llama AI models on millions of copyrighted books, journal articles and other texts taken without permission. The lawsuit says the company did not just scrape ordinary web pages, but drew from pirate libraries and torrent networks while removing copyright-management information from the works.

The putative class action, filed in federal court in the Southern District of New York, names Mark Zuckerberg as a defendant and alleges he “personally authorized and actively encouraged” the infringement. The plaintiffs are Elsevier, Cengage Learning, Hachette Book Group, Macmillan Publishing Group and McGraw Hill. They say Meta used material from LibGen, Anna’s Archive and Sci-Hub to train its generative AI systems, turning years of editorial investment into raw material for machine learning.

At the center of the case is a legal fight that could reshape how AI companies buy, borrow or copy the material used to train large language models. The publishers say Meta’s conduct was deliberate and that Llama functions as an “infinite substitution machine,” capable of producing books and other competing text at industrial scale. If a court accepts that theory, it could strengthen claims from authors and publishers that AI training is not a harmless technical process, but a commercial use that undercuts licensing markets and the labor behind them.

AI-generated illustration
AI-generated illustration

Hachette’s public statement says the plaintiffs seek to represent a class of similarly situated copyright owners, a move that could widen the case far beyond the five named companies and one author. For publishers, the outcome could affect whether they can demand payment when their catalogs are used to train AI models. For writers, it could determine whether their books are treated as protected creative work or as input data that can be taken first and contested later.

The lawsuit lands after a 2025 ruling in Meta’s earlier authors’ case, where Judge Vince Chhabria said Meta’s use of books for training qualified as fair use in that instance, while also warning that the decision did not establish that Meta’s conduct was lawful in general. Anthropic’s 2025 agreement to pay authors $1.5 billion to settle a major AI copyright case has already raised the stakes across the industry, showing that the price of training data can be measured in litigation as well as code.

Related stock photo
Photo by Brett Sayles

Meta has said it plans to fight the new case aggressively and has argued that courts have found AI training on copyrighted material can qualify as fair use. The new suit is poised to test that defense at a much larger scale, with consequences that could reach publishers, authors and every company building generative AI on vast text corpora.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.

Get Prism News updates weekly. The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More in Technology