🤯 Gemma 4: Freedom & Power Unleashed! 🚀

AI

🎧English flagFrench flagGerman flagSpanish flag

Summary

Google has released Gemma 4, marking a significant update to its open-weight AI models after a year. The new model comes in four sizes, optimized for local use and designed to improve reasoning and math capabilities. Developers can now utilize the 26B Mixture of Experts and 31B Dense variants, which are intended to run on powerful Nvidia H100 GPUs, while the 2B and 4B Effective models are geared toward mobile devices. Google has removed the restrictive Gemma license, shifting toward the more permissive Apache 2.0 standard. This change reflects a focus on developer control and expands the potential of the “Gemmaverse,” with native function calling and support for structured JSON output. The release of the E2B and E4B models further emphasizes Google’s commitment to smartphone AI integration, building upon the existing Gemini Nano foundation.

INSIGHTS


Gemma 4: A Significant Update to Google’s Open Models
Google’s Gemini AI models have improved by leaps and bounds over the past year, but you can only use Gemini on Google’s terms. The company’s Gemma open-weight models have provided more freedom, but Gemma 3, which launched over a year ago, is getting a bit long in the tooth. Starting today, developers can start working with Gemma 4, which comes in four sizes optimized for local usage. Google has also acknowledged developer frustrations with AI licensing, so it’s dumping the custom Gemma license.

Optimized Model Sizes for Diverse Hardware
Like past versions of its open-weight models, Google has designed Gemma 4 to be usable on local machines. That can mean plenty of things, of course. The two large Gemma variants, 26B Mixture of Experts and 31B Dense, are designed to run unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU. Granted, that’s a $20,000 AI accelerator, but it’s still local hardware. If quantized to run at lower precision, these big models will fit on consumer GPUs. Google also claims it has focused on reducing latency to really take advantage of Gemma’s local processing. The 26B Mixture of Experts model activates only 3.8 billion of its 26 billion parameters in inference mode, giving it much higher tokens-per-second than similarly sized models. Meanwhile, 31B Dense is more about quality than speed, but Google expects developers to fine-tune it for specific uses.

Efficient Models for Mobile and Embedded Devices
The other two Gemma 4 models, Effective 2B (E2B) and Effective 4B (E4B), are aimed at mobile devices. These options were designed to maintain low memory usage during inference, running at an effective 2 billion or 4 billion parameters. Google says the Pixel team worked closely with Qualcomm and MediaTek to optimize these models for devices like smartphones, Raspberry Pi, and Jetson Nano. Not only do they use less memory and battery than Gemma 3, but Google also touts “near-zero latency” this time around.

Enhanced Capabilities and Workflow Support
Google claims these are the most capable models you can run on your local hardware. Google says Gemma 31B will debut at number three on the Arena list of top open AI models, behind GLM-5 and Kimi 2.5. However, even the biggest Gemma 4 variant is a fraction of the size of those models, making it theoretically much cheaper to run. Based on the same underlying technology as Google’s Gemini 3 closed models, Gemma 4 offers improved reasoning, math, and instruction-following. AI has also shifted toward agentic workflow management in the past year, and Gemma 4 is ready for that change with support for native function calling, structured JSON output, and native instructions for common tools and APIs.

Code Generation and Visual Processing Improvements
Code generation is also emerging as a core application of generative AI, and Google says Gemma 4 is optimized for that, too. You can generate competent code with any number of AI systems, but strong performers like Gemini Pro and Claude Code are cloud services. Google says that Gemma 4 can give you similarly high-quality code in an offline environment, provided you have the hardware to run the larger variants. Likewise, Google says Gemma 4 is better at processing visual input, making tasks like OCR and chart understanding more reliable on local systems.

Expanded Context Windows and Language Support
The efficient E2B and E4B models also have native support for speech recognition—the Gemma 3 family also had that, but Google seems to imply Gemma 4 is better at it. This all works in more than 140 languages, and whichever one you use, Gemma 4 can handle a whole lot of words. The context window for the edge models is now 128k tokens, and the 26B and 31B models get 256k. That’s good for a local model, but the cloud-based Gemini models are much more generous, with 1 million tokens of context.

Addressing Licensing and Developer Flexibility
All the alleged performance gains are nice, but the licensing shake-up may be the most important change for Gemma. Previous versions of Google’s open models came with a custom Google license, which many developers found too Restrictive. The Gemma 3 license had a strict prohibited-use policy that Google could update unilaterally, and it required developers to enforce Google’s rules across all Gemma-based projects. It could even be read to transfer the license to other AI models created with synthetic data produced by Gemma. This made many devs apprehensive about building with Google’s open models. Apache 2.0, by comparison, is much more permissive, with no overbearing terms of use or commercial restrictions. Developers are familiar and comfortable with Apache, and Google can’t just decide the license works differently one day in the future. Google believes giving developers more control over their data and deployment plans in this way will encourage them to use Gemma for more projects and expand what the company insists on calling the “Gemmaverse.”

Focus on Mobile AI and Agentic Workflows
The release of E2B and E4B also shows where Google is heading with its smartphone AI efforts. Google Pixels and a few other phones run local AI models known as Gemini Nano. That’s how these Android phones can detect phone and text scams, summarize notes, or create phone call summaries without sending your data to the cloud. A Google representative notes that Gemini Nano has always been derived from Gemma models, but that’s especially true of the next-gen update to Gemini Nano 4. This is the first time Google has confirmed that there will be an updated version of its minimal smartphone-based AI model. The current Gemini Nano 3 running on Pixel phones is based on Gemma 3n, but Google confirmed to Ars Technica that the next-gen Nano 4 will have 2B and 4B variants based on Gemma 4 E2B and E4B. The company invites developers to begin prototyping agentic workflows in the latest AI Core Developer Preview with Gemma E2B and E4B. Systems designed with these new models will be forward-compatible with Gemini Nano 4 when it launches. We may hear more about that at I/O in a few weeks.

Immediate Access and Future Development
You can check out the new Gemma models immediately in AI Studio. (No changes needed - the provided text contains no temporal markers or noise requiring removal according to the defined rules.)

This article is AI-synthesized from public sources and may not reflect original reporting.