Atlantic exposes 21 million tracks in AI training datasets
A searchable database now lets musicians check 21 million tracks in AI training datasets, turning secrecy into a public test of consent and copyright.

A new searchable database has pulled four music datasets into public view, giving artists, labels and publishers a way to check whether their work turned up in material used to train AI systems. Together, the four datasets contain more than 21 million tracks, including two massive collections of roughly 12 million and 9 million tracks, plus two others with more than 100,000 songs each.
The database was built from reporting by Alex Reisner and is part of The Atlantic’s AI Watchdog project, which has already exposed datasets built from books, YouTube videos, movie and TV dialogue, and paywalled articles. The music records include songs by Taylor Swift, Bad Bunny, Nirvana, Billie Eilish, Pearl Jam, the Beatles and Willie Nelson, along with jazz and classical recordings. One follow-up report described the largest collection as equal to about 91 years of music.

The point is not just scale. The Atlantic says a work’s presence in a dataset is not definitive proof that it was used to train a model, and a missing work is not proof that it was excluded. Still, the searchable index changes the terms of the debate. For musicians and rights holders, AI training has often been a black box. The new database turns a broad complaint about hidden scraping into a track-by-track inquiry that can be tied to specific artists and songs.
That matters because the legal fight over AI music is already underway. Major record labels sued Suno and Udio in June 2024, accusing the companies of copyright infringement tied to both training and output. In 2026, both companies sought to keep the size of their training datasets out of the public record in separate cases, underscoring how aggressively developers have tried to limit disclosure even as they defend the legality of their systems.
The Atlantic’s database does not resolve those disputes on its own, but it raises the pressure on the industry to account for what went into its models. If musicians can now search for evidence of their own catalogues in datasets used by AI developers, the secrecy that has long shielded training practices becomes harder to defend. The broader question is whether public exposure will force clearer disclosure, firmer licensing norms and a more enforceable standard of consent in the AI economy.
This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.
Did this article answer your question?


