Folddisco searches the protein universe for functional structural motifs
Folddisco turns 53 million predicted structures into a 1.45-terabyte motif index, letting researchers find functional 3D patterns in seconds.

The hard part in protein science is no longer just predicting structure. The new bottleneck is finding the small 3D motifs that actually explain what a protein does, and Folddisco was built to hunt those patterns at a scale that finally matches the size of modern protein databases.
The tool, from Hyunbin Kim, Rachel Seongeun Kim, Milot Mirdita and Martin Steinegger, searches for similar protein structural motifs, the short geometric arrangements that can be functionally crucial even when overall sequences have drifted far apart. The paper says Folddisco uses position-independent geometric features, including side-chain orientation, plus a rarity-based scoring system to make those searches practical across huge structure collections that were previously too expensive to scan.

The scale is the eye-opener. The authors report that Folddisco indexes 53 million AFDB50 structures into a 1.45-terabyte database in 24 hours, then lets users query that database within seconds. In the study, the method was up to 20-fold faster in querying than existing approaches, used four times less storage, and improved accuracy as well. That combination matters because speed alone is not enough if a search engine cannot keep up with the growing pile of predicted structures.
That pile got very large very fast after AlphaFold and related prediction systems changed the field. The AlphaFold Protein Structure Database made predicted structures for nearly all catalogued proteins freely available in 2022, and Foldseek Cluster later analyzed about 200 million predicted structures, finding more than 2 million structural clusters, with roughly one third lacking prior annotations. In that environment, the question is no longer whether a structure exists. It is whether anyone can find the functional site, motif or interface hidden inside it before the signal gets buried in the noise.
Folddisco’s downstream value is easy to see. The paper highlights three use cases: functional annotation of divergent sequences, searching for protein-state-defining motifs and interface detection. Those are exactly the kinds of tasks that can speed drug discovery when a binding pocket or interface motif is conserved across distant proteins, help enzyme engineers spot catalytic features worth transplanting into a new scaffold, and give basic biologists a faster way to assign function to proteins that still look mysterious on paper.
The project is available through the Foldseek Server and is described as free and open-source software. ResearchSquare shows the preprint circulated in 2025 before the Nature Biotechnology publication on June 5, 2026, which fits the shape of the result: a method that had to mature as the protein universe itself became searchable.
This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.
Did this article answer your question?


