Updates

Hugging Face Candle Adds Native Rust Support for New Google Model

Hugging Face's Candle framework added native Rust inference for Google's Gemma 4, just two days after the Apache 2.0 model family dropped on April 2, 2026.

Sam Ortega·4/10/2026·2 min read

Published 02:05 AM

Listen to this article•0:00 min

Share this article:

Follow on Google

Hugging Face Candle Adds Native Rust Support for New Google Model — Source: github.com

This article contains affiliate links, marked with a blue dot. We may earn a small commission at no extra cost to you.

Hugging Face's Candle framework landed a native Rust implementation of Google's Gemma 4 in commit #3443, merging into the `huggingface/candle` repository around April 4, roughly 48 hours after Google DeepMind published Gemma 4 under an Apache 2.0 license. The turnaround signals something more than routine maintenance: the Candle maintainers are actively tracking the open-model release cadence and wiring up first-class Rust inference paths before the community has had time to file its first bug reports.

Gemma 4 is a multimodal family built on the same research foundation as Gemini 3. It ships in four configurations covering a wide span of hardware: the edge-optimized E2B and E4B variants, which activate roughly 2 and 4 billion effective parameters respectively during inference, plus a 26-billion-parameter mixture-of-experts variant and a 31-billion-parameter dense model. All four accept text, image, and audio inputs. That breadth makes native runtime support non-trivial, and the Candle commit builds it directly into `candle-examples` as a `gemma4` example rather than relying on Python bindings or a thin wrapper.

The addition dropped alongside broader activity across the codebase. The `candle-flash-attn` and `candle-kernels` subcrates received version bumps in the same early-April window, and a separate work-in-progress pull request, PR #3424, is pushing forward an initial ROCm backend implementation. Candle already carries CUDA and Metal acceleration paths inside its multi-crate workspace (candle-core, candle-nn, candle-transformers, candle-examples), and ROCm support would extend that hardware coverage to AMD GPUs running the open compute stack.

For teams running inference without a Python runtime, particularly in edge deployments, serverless functions, or minimal container images, the Gemma 4 addition materially expands what Candle can serve. The `cargo run example gemma4 features metal` invocation pattern that users have already been testing with the `google/gemma-4-E2B-it` model ID illustrates the kind of zero-overhead, dependency-light deployment story that makes Rust inference backends attractive in the first place. Where a Python stack pulls in transformers, accelerate, and a half-dozen CUDA libraries, a compiled Candle binary carries only what it links.

Gemma 4 also introduces configurable visual token budgets (options at 70, 140, 280, 560, and 1,120 tokens) to let callers trade image resolution for compute, an architecture detail that Rust implementors need to surface correctly or risk silent quality degradation. That Candle's maintainers shipped the implementation within days of the model's public release suggests the project has the upstream access and contributor bandwidth to keep pace as Google DeepMind and others push new open-weight releases in 2026.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Hugging Face Candle Adds Native Rust Support for New Google Model

Discussion (0 Comments)

More Rust Programming News

cargo-audit 0.22.2 adds binary scans to verify real vulnerability exposure

RustSec flags ammonia XSS flaw in MathML sanitization handling

RustSec flags Wasmtime WASI flaw that bypasses file permissions