Analysis

Google cuts Gemma 4 memory use with quantization update

Google’s QAT update cut Gemma 4 E2B’s memory use to about 1GB, making on-device AI cheaper and faster to ship inside products like monday.com.

Marcus Chen·6/6/2026·2 min read

Published 02:06 AM

Listen to this article•0:00 min

Share this article:

Follow on Google

Google cuts Gemma 4 memory use with quantization update — Source: storage.googleapis.com

Google just made the AI deployment problem harder to ignore: its new Gemma 4 checkpoints use quantization-aware training to shrink memory use and improve performance on smaller devices. The mobile-focused format cuts Gemma 4 E2B’s memory footprint to about 1GB, a change that pushes more of the model’s work onto phones, Raspberry Pi boards, Jetson Nano systems, and consumer GPUs instead of larger cloud stacks.

That matters because Google has been positioning Gemma 4 as an open model family for on-device AI since its April 2 launch under the Apache 2.0 license. The company said the models support more than 140 languages and that the smaller E2B and E4B versions can run completely offline with near-zero latency. In the June 5 update, Google said the new QAT checkpoints are meant to preserve quality better than standard post-training quantization, which often trims memory at the cost of accuracy. Google also said it had already added Multi-Token Prediction to accelerate inference before this release.

For monday.com, the lesson lands squarely in product and engineering. AI features are no longer judged only by whether they work in a demo. They now have to clear the more practical test of whether they can run fast enough, cheaply enough, and reliably enough to live inside a workflow product used every day by sales teams, customer support reps, and operations managers. A model that needs heavy cloud inference can raise costs and complicate latency. A model that can run locally can improve responsiveness, reduce infrastructure pressure, and make AI feel like part of the product instead of an add-on.

That tradeoff sits at the center of monday.com’s own AI push. On May 6, the company said it was becoming an AI Work Platform with native agents that can draft campaigns, qualify leads, close support tickets, onboard new hires, and process purchase requests under human supervision. monday.com said it had 250,000 customers running businesses on the platform, while its first-quarter 2026 revenue reached $351.3 million, up 24% year over year. Paid customers with more than 10 users rose to 65,016, and net dollar retention held at 110%.

monday.com has already moved from experimenting with AI to packaging it into the core product. In 2025, it introduced monday magic, monday vibe, and monday sidekick, then said the tools were fully available at Elevate 2025. Its first agents focused on sales development tasks such as lead identification, enrichment, qualification, and calling leads while they were still warm. Google’s latest memory reduction makes the competitive direction clearer: the next wave of AI products will be won by teams that can turn model efficiency into a better user experience and a healthier cost structure at the same time.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Did this article answer your question?

Google cuts Gemma 4 memory use with quantization update

Discussion (0 Comments)

More Monday.com News

monday.com says daily task lists should drive execution, not clutter

monday.com guide says engineering productivity means business impact

Monday.com guide frames product innovation as a repeatable system