Releases

floDl 0.5.3 adds bit-exact HuggingFace export, universal trainer support

floDl 0.5.3 closed the HuggingFace loop: 30 model-head combinations now round-trip bit-exactly, with one Trainer spanning CPU and multi-GPU runs.

Sam Ortega·5/2/2026·2 min read

Published 05:41 PM

Listen to this article•0:00 min

Share this article:

floDl 0.5.3 adds bit-exact HuggingFace export, universal trainer support — AI-generated illustration

floDl 0.5.3 made the jump Rust ML projects usually promise and rarely finish: it exported HuggingFace checkpoints back into the Python world with bit-exact parity. In practical terms, that means a Rust-native workflow no longer stops at loading or running a model. It can now emit a checkpoint, stage it in HuggingFace’s expected layout, and prove that nothing changed on the way out.

The April 28 release covered 30 model-and-head combinations across six families, BERT, RoBERTa, DistilBERT, ALBERT, XLM-RoBERTa, and DeBERTa-v2/v3, with four task heads for each. Twelve of those cells were already covered in 0.5.2, which shipped six days earlier and introduced the first HuggingFace integration through the sibling crate flodl-hf. The other 18 landed in 0.5.3, and every supported cell passed with max abs diff = 0. That is the number that matters here. In ML infrastructure, “close enough” is how bugs get shipped; bit-exact is where developers start trusting the toolchain.

The new export path does not just dump files and hope for the best. floDl’s export command re-emits flodl-hf checkpoints as an HF-canonical staged directory containing model.safetensors, config.json, and tokenizer.json. The verify-export command then loads that staged directory back into Hugging Face Python AutoModelFor* classes and checks bit-exact agreement, along with zero missing_keys and unexpected_keys. A heavier verify-matrix gate runs export and verify-export across the full 6-family by 5-head-shape matrix, turning reproducibility into a testable feature instead of a claim in a README.

Related stock photo — Photo by Matheus Bertelli

That is the real line this release crossed. v0.5.2 showed that Rust could load HuggingFace models with pinned checkpoints and parity under 1e-5 across BERT, RoBERTa, and DistilBERT. v0.5.3 closed the loop. For anyone building local AI tooling in Rust, that is the difference between a neat experiment and something that can sit inside an actual training and exchange pipeline without becoming an island.

The release also added a universal Trainer that fine-tuned through the same code path on CPU, a single GPU, or heterogeneous multi-GPU setups. On top of that, GitHub release notes added LoopBody and TraceEmit for multi-output per-iteration traces, which makes the package feel less like a thin wrapper around libtorch and more like a real workflow layer. That matches the broader pitch on floDl’s site: a Rust deep learning framework on libtorch with heterogeneous multi-GPU DDP and a benchmark claim of up to 31% faster than PyTorch. Its benchmark page said ten models run over ten interleaved rounds on an RTX 5060 Ti produced eight wins, two ties, and zero regressions against PyTorch 2.10.0+cu128. The CLI story is equally pointed: fdl is a pure Rust binary with zero native dependencies, and fdl install check compares an installed copy against the latest GitHub release.

Know something we missed? Have a correction or additional information?

Submit a Tip