News

Biohub launches $500M virtual biology push for AI cell modeling

Biohub is backing a five-year, $500 million drive to build the data backbone for virtual cells, betting biology needs better training sets more than bigger models.

Jamie Taylor··2 min read
Published
Listen to this article0:00 min
Share this article:
Biohub launches $500M virtual biology push for AI cell modeling
Source: biohub.org

Chan Zuckerberg Biohub has committed $500 million to what it is calling a Virtual Biology Initiative, a five-year push aimed at building the data foundation for AI models that can predict how proteins behave inside living cells. The size of the bet underscores the field’s central argument now: the bottleneck is not just smarter model design, but the lack of standardized, high-volume training data needed to make virtual-cell claims reliable.

Biohub said $100 million will help nucleate a coordinated worldwide data-generation effort that no single institution could assemble alone. Another $400 million will go toward large-scale data generation and new technologies for measuring, imaging and engineering biology. The organization said the resulting data will be open and freely available to the global scientific community, a move meant to make the effort more than an internal platform play and more like a shared research backbone.

The list of collaborators shows how broad Biohub wants the network to be. The Allen Institute, Arc Institute, Broad Institute and Wellcome Sanger Institute are named partners, alongside the Human Cell Atlas and Human Protein Atlas consortia. NVIDIA is listed as a technology partner, while Renaissance Philanthropy is helping catalyze additional funding. Biohub’s science lead, Alex Rives, has argued that the field needs orders of magnitude more data than exists today and new technologies that can observe cells from the molecular to tissue level in health and disease.

That focus on data quality is what makes the initiative feel like a race to build the ImageNet for cell biology. Biohub and its partners have already said the field has been slowed by fragmented, non-reproducible benchmarks and one-off evaluation pipelines, problems that leave AI cell models hard to compare and harder to trust. In that sense, the real prize is not just another model architecture, but a common benchmark and interoperable dataset standard that can support credible virtual-cell science.

AI-generated illustration
AI-generated illustration

The new push builds on an earlier January 12, 2026 partnership among Biohub, Arc Institute and Tahoe Therapeutics to generate more than 120 million single-cell data points across 225,000 perturbation interactions for virtual cell models. Biohub said that dataset would be more than four times as perturbation-rich as Tahoe-100M, a telling measure of where the field thinks value now lies: not merely in observing cells, but in systematically perturbing them to see how they respond.

Biohub first framed its frontier AI and frontier biology strategy on November 6, 2025, when Mark Zuckerberg said the group’s goal was to help scientists cure or prevent all diseases this century and that AI advances might make that possible sooner. The new initiative turns that vision into an expensive, highly coordinated data effort, and it will now be judged on whether it can produce the shared biological infrastructure the field has lacked for years.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.

Get Protein updates weekly. The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More Protein Articles