Releases

Tokio Introduces Dial9 Flight Recorder to Debug Async Runtime Latency

Russell Cohen's dial9 caught 10ms+ kernel scheduling delays on a live AWS service that no staging environment could replicate.

Sam Ortega2 min read
Published
Listen to this article0:00 min
Share this article:
Tokio Introduces Dial9 Flight Recorder to Debug Async Runtime Latency
Source: techcrunch.com
This article contains affiliate links, marked with a blue dot. We may earn a small commission at no extra cost to you.

A production mystery that no staging environment could crack is what gave Russell Cohen the reason to build dial9, a new runtime telemetry tool for Tokio that landed on crates.io this month. Carl, the Tokio blog's editor, was direct about his reaction when Cohen first showed it to him: "When Russell showed me dial9, I knew the Tokio community needed to see it. I asked him to write this post and invited him to demo it at TokioConf."

The problem dial9 was built to solve is one many Rust async developers will recognize: a service connecting concurrently to thousands of hosts was showing a severe performance cliff once CPU utilization crossed 90 percent, despite meaningful headroom remaining. Aggregate metrics gave no useful signal. dial9 found it: kernel scheduling delays of over 10ms on AWS were the culprit, invisible to anything that only tracks p99 poll duration or task counts.

That distinction is the core design philosophy. As the Tokio blog post puts it, dial9 goes "beyond aggregate metrics like 'how many tasks are running?' or 'what is my p99 poll duration?'" by capturing "the underlying runtime events like individual polls, parks, and wakes as a log rather than as a pool of counters." It also pulls in Linux kernel events and your own application spans and logs, so you get the full chain: what your code did, what Tokio did with it, and what the operating system did to Tokio.

Integration requires wrapping your existing Tokio runtime with TracedRuntime. The blog post shows exactly how little ceremony that involves:

use dial9_tokio_telemetry::telemetry::{RotatingWriter, TracedRuntime};

AI-generated illustration
AI-generated illustration

The RotatingWriter takes a file path, a rotation threshold (20 MiB in the example), and a maximum retention size (100 MiB), then TracedRuntime::build_and_start takes the standard Builder and the writer. Your existing runtime.block_on call stays unchanged. Traces land in /tmp/my_traces/ as binary .bin files, and the trace viewer accepts them by drag-and-drop. For production deployments where local disk is impractical, dial9 also supports writing traces directly to S3.

Cohen will be demoing dial9 at TokioConf, where the project also has a lightning talk slot. The crate is published now at dial9_tokio_telemetry on crates.io, with source and API documentation available through the linked GitHub and docs.rs pages. A prebuilt demo trace is provided so you can load the viewer and get a feel for the event timeline before instrumenting anything yourself.

The fact that Carl specifically arranged the TokioConf slot and wrote the editorial introduction signals this is not just a community crate getting a routine blog mention. For anyone who has burned hours adding metrics dashboards to an async service only to watch a production regression stay completely unexplained, dial9 is worth pulling into your next deployment.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.
Get Rust Programming updates weekly.

The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More Rust Programming News