Analysis

ClozeMaster uses LLM infilling to generate valid Rust fuzzing programs

ClozeMaster turns LLMs into a better fuzzing partner for rustc, and the payoff is real: 27 confirmed bugs, 10 fixed.

Nina Kowalski··5 min read
Published
Listen to this article0:00 min
Share this article:
ClozeMaster uses LLM infilling to generate valid Rust fuzzing programs
Source: csdl-images.ieeecomputer.org

Why this matters to Rust users

Rust’s promise depends on trust in the compiler. If rustc accepts bad programs, rejects good ones, or trips over internal compiler errors, that breaks the quiet contract that makes Rust attractive for critical systems in the first place. That is why a paper like ClozeMaster lands with real weight: it is not trying to have an AI write Rust for you, it is trying to help break the compiler more convincingly than today’s generation of fuzzers often can.

That stakes out a very practical corner of the Rust reliability story. The Rust Foundation now has a Safety-Critical Rust Consortium focused on responsible use of Rust in software where failures could hurt people, property, or the environment, and that makes compiler quality more than an academic concern. For everyday Rust users, the difference between a compiler bug caught early and one that slips into a release can mean wasted time, broken builds, or worse, a false sense of safety in code that is supposed to be dependable.

What fuzzing is supposed to do

Rust’s own compiler development guide describes fuzzing as a way to compile many programs in an effort to uncover bugs in rustc, especially internal compiler errors, or ICEs. The appeal is straightforward: when fuzzing works well, it can find failure cases before users do and it can often shrink those failures into small, self-contained reproducers that make debugging possible.

That traditional model is still important, but it has a stubborn weakness. Rust syntax is strict, and a lot of machine-generated programs never make it far enough to exercise the compiler paths researchers actually care about. If the generated code is invalid too often, the fuzzer spends too much energy producing noise instead of pressure-testing the compiler.

Where LLM code generation falls short

The ClozeMaster paper is reacting to a problem the ICSE 2025 research track page states plainly: directly using LLMs to generate Rust programs often produces a large number of invalid test cases. That is a real mismatch between the promise of large language models and the needs of compiler testing. A model that can write fluent-looking text is not automatically good at preserving the exact syntax and structure Rust demands.

AI-generated illustration
AI-generated illustration

That is the key shift in this story. ClozeMaster does not ask the model to invent a whole program from scratch. Instead, it starts from real programs and masks structured snippets so the model fills in the missing pieces, which keeps more of the surrounding syntax intact and makes the resulting tests look closer to code the compiler would actually see in the wild.

How ClozeMaster changes the game

The authors, Hongyan Gao, Yibiao Yang, Maolin Sun, Jiangchang Wu, Yuming Zhou, and Baowen Xu, built the system around historical issue reports. Rather than mining only fresh synthetic examples, they extract test code from past bugs, mask selected regions, and use LLM infilling to reconstruct valid Rust programs. That matters because the input is no longer a free-form guess, but a constrained repair task grounded in real failure history.

That design is the real innovation here. ClozeMaster is not replacing fuzzing infrastructure; it is feeding it better seeds and more realistic inputs. In practice, that makes the LLM a helper for compiler testing, not a substitute for the fuzzer itself, which is exactly the kind of disciplined use the Rust ecosystem tends to reward.

The results are bigger than one paper

The published abstract reports that ClozeMaster identified 27 confirmed bugs in rustc and mrustc, and 10 of those have already been fixed by developers. It also says the system outperformed existing fuzzers in code coverage and effectiveness. That combination is what makes the work feel consequential: it is not just generating elegant examples, it is finding defects that matter enough for maintainers to patch.

Those numbers should get Rust users’ attention, because compiler bugs are not abstract. A bug in rustc can affect build reliability across crates, platforms, and release pipelines, and the whole point of fuzzing is to surface those failures before they become everyone’s problem. ClozeMaster’s results suggest that the next wave of compiler testing may come from a hybrid workflow, where LLMs improve the quality of fuzzing inputs instead of trying to generate standalone code.

Related stock photo
Photo by Daniil Komov

Why this fits the Rust ecosystem now

ClozeMaster was presented at ICSE 2025, the 47th IEEE/ACM International Conference on Software Engineering, held in Ottawa, Canada, from April 27 to May 3, 2025, with core conference days from April 30 to May 2. That venue matters. ICSE is one of the field’s main stages for software engineering research, so this is not a side project or a novelty demo tucked into the margins of the ecosystem.

The paper also leans on a broader point that Rust maintainers already know well: compiler bugs are a recurring maintenance problem, not a rare embarrassment. A 2025 empirical study reviewed 301 valid rustc issues from 2022 to 2024, underscoring how active that surface remains. In that context, a tool that helps turn historical bugs into stronger new tests feels less like an experiment and more like a practical upgrade to Rust’s reliability toolbox.

What to watch next

The most interesting implication of ClozeMaster is not that LLMs can write Rust better than before, because that is not the story. The real development is that LLMs can help produce valid, structured, compiler-stressing Rust inputs when they are anchored to real programs and real bug history. That approach respects the strictness of the language instead of fighting it.

For Rust users, that is the kind of AI story worth caring about. It strengthens the same compiler guarantees they already depend on, and it does so by helping fuzzers do what they do best: find the bugs before the rest of the ecosystem has to.

Know something we missed? Have a correction or additional information?

Submit a Tip

Never miss a story.

Get Rust Programming updates weekly. The top stories delivered to your inbox.

Free forever · Unsubscribe anytime

Discussion

More Rust Programming News