Dev Tools

Google Releases Gemma 4 Under Apache 2.0 — and the License Change May Matter More Than the Benchmarks

Google DeepMind released Gemma 4 on April 2 under the Apache 2.0 open-source license, a first for the Gemma family. The 31B dense model ranks #3 globally across all models on the Arena AI leaderboard, outperforming many closed commercial systems, while the Apache licensing removes the usage restrictions that limited enterprise and commercial adoption of previous Gemma versions.

April 6, 2026 5 min read

On April 2, Google DeepMind dropped Gemma 4 — and buried in the release blog, almost as an afterthought, was the sentence that the developer community immediately seized upon: this release ships under the Apache 2.0 license.

It is hard to overstate how significant that single line is for anyone building production systems on top of open-weight AI models. Previous Gemma releases shipped under Google’s custom Gemma Terms of Use, a bespoke license that imposed acceptable-use restrictions and required attribution in ways that created friction for commercial applications. Apache 2.0 removes that friction entirely. You can use Gemma 4 commercially, modify it, redistribute it, and build closed products on top of it — all without royalties, without attribution requirements, and without navigating Google’s specific acceptable-use clauses.

The benchmark numbers are impressive on their own terms. But the license change is what makes Gemma 4 a different kind of artifact than its predecessors.

Four Models, One Release

Gemma 4 ships as a family of four distinct models, each targeting a different deployment context:

E2B (2.3 billion effective parameters): Designed for on-device and embedded applications, including mobile deployment. This is the model you reach for when you need inference on a smartphone or edge hardware.
E4B (4.5 billion effective parameters): A step up in capability while remaining practical for consumer hardware with modest GPU memory.
26B A4B MoE (26 billion total parameters, 3.8 billion active): A Mixture-of-Experts architecture that delivers strong performance at a fraction of the compute cost of a dense model of equivalent size. The 3.8B active parameters mean inference costs are closer to a small model than a large one.
31B Dense: The flagship. A fully dense 31-billion-parameter model built for maximum capability, designed to run on high-end workstations or single-server GPU setups.

The architecture choices reflect a sophisticated understanding of the deployment landscape. Not everyone needs the 31B. Many production use cases are better served by the E2B or MoE variant, and Google has done the work to make all four feel like first-class releases rather than scaled-down afterthoughts.

Benchmark Performance: #3 Globally, Not Just Among Open Models

The headline number is the 31B Dense model’s position on the Arena AI text leaderboard: #3 globally, across all models, including closed commercial systems from OpenAI, Anthropic, and Google’s own Gemini family.

That ranking is not a measurement of academic benchmarks; the Arena AI leaderboard is based on blind human preference votes — real people comparing outputs side by side without knowing which model produced which response. It is arguably the most human-aligned performance measurement the field has, and Gemma 4’s 31B sitting at #3 puts it ahead of systems that cost substantially more to run.

On structured benchmarks, the results are similarly strong. The 31B scores 89.2% on AIME 2026, a mathematics competition benchmark, compared to just 20.8% for Gemma 3’s 27B model — a 4x improvement in mathematical reasoning in a single generation. On MMLU Pro (broad knowledge), the 31B scores 85.2%. On LiveCodeBench v6 (competitive programming), it achieves a Codeforces ELO of 2,150, a figure that puts it comfortably in the competitive programmer tier.

The 26B MoE, notably, reaches #6 globally on the Arena leaderboard — outperforming models twenty times its total parameter count. The MoE architecture’s efficiency advantage compounds here: you get #6-level performance at the inference cost of a ~4B active-parameter model.

Why Apache 2.0 Changes the Calculus

To understand why the license matters, consider the practical reality of the previous Gemma Terms of Use. Developers building commercial products faced legal uncertainty about what constituted acceptable use. Enterprises with legal teams were required to review the custom license before deployment. Distillation — the process of training smaller models using outputs from a larger model — was restricted in ways that made some fine-tuning workflows legally ambiguous.

Apache 2.0 eliminates all of these concerns with a license that every developer, legal team, and enterprise procurement department already understands. It is the license of Linux, Kubernetes, TensorFlow, and hundreds of other foundational infrastructure components. Choosing it for Gemma 4 signals that Google is serious about Gemma becoming infrastructure, not just a showcase model.

The practical effects will compound over time. Hugging Face repositories, LangChain integrations, fine-tuning pipelines, and enterprise deployments that previously excluded Gemma on licensing grounds can now adopt it without restriction. Distillation from Gemma 4 into smaller custom models is now legally unambiguous. Companies that want to ship a product without disclosing that it uses a specific model can do so.

What This Means for the Open-Weights Landscape

For the past two years, the open-weights model race has been characterized by three competing dynamics: performance, safety restrictions, and licensing. Meta’s Llama family has dominated the open ecosystem largely because its licensing, while not perfectly permissive, was more commercial-friendly than alternatives. Mistral has competed aggressively on efficiency and European data sovereignty grounds. Google, with Gemma, had a strong performance story but a license that created friction.

Gemma 4 with Apache 2.0 removes that friction completely. Combined with performance that puts it at #3 globally, Google is now making a credible play for the center of the open-weights ecosystem — not just for researchers, but for the infrastructure builders, enterprise developers, and startup founders who need a capable model they can ship without legal uncertainty.

The timing is also notable. The Apache 2.0 switch comes as the open-weight model ecosystem is maturing rapidly, with Mixtral, DeepSeek, Llama, and Qwen all competing aggressively for developer mindshare. Google has evidently decided that restrictive licensing is a competitive liability it can no longer afford.

Developer Experience and Deployment

The Gemma 4 family is available on Google AI Studio, Vertex AI, Kaggle, and Hugging Face from day one. Google has published quantized versions of the E2B and E4B models suitable for CPU-only inference, expanding the accessible deployment surface to hardware that cannot run GPU workloads. The MoE and 31B dense models are available in BF16 and INT4/INT8 quantized formats for local deployment.

For most developers, the practical on-ramp is through Hugging Face’s transformers library, where Gemma 4 integrates without special dependencies. Fine-tuning documentation and LoRA adapter examples were published alongside the release.

The combination of top-tier benchmark performance, an unrestricted open-source license, and strong deployment tooling makes Gemma 4 a serious contender for the default open-weights model in production stacks. The benchmark will be revised and the rankings will shift, but Apache 2.0 is permanent — and that may be Gemma 4’s most durable advantage.

Sources

google gemma open source LLM Apache 2.0 benchmarks