Releases

Cloudflare open-sources ecdysis, Rust library for zero-downtime service restarts

Cloudflare open-sourced ecdysis after five years in production, giving Rust services a cleaner path to restart without dropping live traffic.

Nina Kowalski··2 min read
Published
Listen to this article0:00 min
Share this article:
Cloudflare open-sources ecdysis, Rust library for zero-downtime service restarts
AI-generated illustration

Cloudflare has turned one of production infrastructure’s most awkward moments into a Rust library. Ecdysis, opened up in February 2026 after five years of internal use, gave Cloudflare a way to restart services without dropping live connections or resorting to manual socket handoffs, a small-sounding improvement with huge stakes at Cloudflare scale.

The company said the library already powered zero-downtime upgrades across critical Rust infrastructure and saved millions of requests on every restart across its global network. That matters most in services that sit on the hot path for traffic routing, TLS lifecycle management, and firewall rules enforcement. In those systems, Cloudflare said, even a 100-millisecond gap can translate into hundreds of dropped connections when traffic is heavy.

That is the problem ecdysis is built to erase. The traditional restart pattern creates a dangerous handoff: stop the old process, bring up the new one, and hope the window in between stays invisible to users. Cloudflare’s approach flips the sequence. The new process gets time to initialize first, the old one keeps serving until the replacement is ready, and only then does the switchover happen. The name fits the idea well. Ecdysis is the biological process of shedding old skin.

On GitHub, Cloudflare described ecdysis as a Rust library for graceful restarts, based on its Go library tableflip. The crate’s goals are blunt and production-minded: no old code keeps running after a successful upgrade, the new process gets a grace period to come up, crashing during initialization is acceptable, and only one upgrade runs in parallel. That is the kind of design that feels less like a convenience feature and more like a reliability contract.

The library also shipped with practical integrations that fit the Rust server stack. Ecdysis supported Tokio, plus systemd-notify and systemd named sockets. docs.rs noted that systemd-notify integration required systemd v253 or later and a service unit using Type=notify-reload. For teams running Rust services in Linux environments, that makes the restart story much easier to wire into real deployments.

Cloudflare’s earlier work on graceful restarts in Oxy followed the same pattern in Go: start the new version, wait until it is ready, then let the old version stop accepting new connections and drain what is already in flight. Ecdysis brings that operating model into Rust, which matters at a company whose 2025 Radar year-in-review said its network reached 330 cities in more than 125 countries and handled more than 81 million HTTP requests per second on average, peaking above 129 million.

For Rust infrastructure, that is the real story. Ecdysis is not just another crate. It is Cloudflare treating graceful restarts as a first-class systems problem and showing that Rust can own the boring, critical machinery that keeps production alive.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.

Get Rust Programming updates weekly. The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More Rust Programming News