Releases

NVlabs launches cuda-oxide, compiling Rust directly to CUDA kernels

NVlabs’ cuda-oxide turns Rust into PTX, so GPU kernels, host code, and Cargo live in one workflow instead of a separate CUDA detour.

Nina Kowalski··2 min read
Published
Listen to this article0:00 min
Share this article:
NVlabs launches cuda-oxide, compiling Rust directly to CUDA kernels
Source: docs.nvidia.com

NVlabs has put Rust on the CUDA hot seat with cuda-oxide, an experimental compiler that sends idiomatic Rust straight to PTX and lets SIMT GPU kernels live in the same source file as the host code. The practical change is bigger than a new backend: it removes the usual jump to a separate DSL, a foreign-language binding layer, or a second mental model for the device side. Instead of treating Rust as the code around CUDA, cuda-oxide treats Rust as the language that emits CUDA itself.

The project is built around a custom rustc codegen backend called rustc-codegen-cuda. Its pipeline runs from Rust to Rust MIR, then to Pliron IR, LLVM IR, and finally PTX. That Pliron layer matters because it is described as an extensible compiler IR framework written in pure Rust and inspired by LLVM’s MLIR, which keeps the toolchain closer to Rust’s own ecosystem than a bolt-on GPU adapter would. NVlabs also packaged the workflow as cargo oxide, a proper Cargo subcommand meant to replace the familiar xtask pattern and work both inside and outside the repository.

AI-generated illustration
AI-generated illustration

cuda-oxide is not just a compiler skeleton. The workspace includes device-side abstractions for type-safe indexing, shared memory, scoped atomics, barriers, TMA, and warp and cluster operations. On the host side, it ships runtime pieces called cuda-core and cuda-async for memory management and kernel launching. The documentation already includes a hello GPU walkthrough, generic kernel examples, and atomic examples, which gives the project real shape even while it remains early. One atomic example also flags a dependency on LLVM 22 or newer for correct syncscope generation, a clear sign that backend details still matter here.

That early-stage reality is visible in the repository itself. The release reads like an inaugural drop rather than a finished compiler toolchain, and the public footprint is still small, with single-digit stars and a nascent release history. Even so, the direction is unmistakable. NVIDIA’s CUDA stack still centers on C++ tooling and a runtime library, so cuda-oxide is a notable alternative front end into the same GPU execution model. For Rust teams that want safety-minded kernel code without leaving Cargo or splitting host and device logic across different languages, cuda-oxide is the first serious proof that Rust can reach all the way down to the accelerator path.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.

Get Rust Programming updates weekly. The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More Rust Programming News