AI & ML

China's GLM-5.2 Beats GPT-5.5 on Coding Benchmarks at One-Sixth the Cost

Z.ai's open-weight GLM-5.2, a 753-billion-parameter mixture-of-experts model released June 17, outperforms OpenAI's GPT-5.5 on SWE-bench Pro (62.1 vs 58.6) and long-horizon coding tasks, while charging just $4.40 per million output tokens—roughly one-sixth of GPT-5.5's price. With an MIT license and weights freely downloadable from Hugging Face, GLM-5.2 reshapes the calculus for developers locked out of premium closed-source models.

4d ago 5 min read

When Anthropic’s Fable 5 went offline on June 12 due to US export controls, it left a significant gap at the frontier of AI performance. Z.ai, the Beijing-based company formerly known as Zhipu AI, has just demonstrated how quickly that gap can be filled—and from an unexpected direction.

Released June 17, 2026, GLM-5.2 is the latest entry in Z.ai’s General Language Model series, and it arrives with credentials that would have seemed implausible even six months ago: on several key benchmarks, it outperforms OpenAI’s GPT-5.5 while costing developers approximately one-sixth as much per token. The weights are MIT-licensed and downloadable from Hugging Face, with no usage restrictions.

Architecture: Scale With Efficiency

GLM-5.2 is a 753-billion-parameter mixture-of-experts (MoE) model, but only 40 billion parameters are active during any single forward pass. This sparse activation architecture—similar in principle to Mistral’s Mixtral family and more recently Google’s Gemini Ultra—means GLM-5.2 achieves the raw capacity of a dense 753B model while running inference at the speed and cost of a much smaller one.

The model extends its predecessor GLM-5 with two architectural innovations Z.ai calls IndexShare and multi-token prediction. IndexShare optimizes how the model routes inputs to expert sub-networks; multi-token prediction allows the model to generate multiple tokens per inference step rather than one, significantly improving throughput during long-context tasks. The full context window stretches to one million tokens, making it competitive with the largest context offerings from OpenAI and Anthropic.

The Benchmark Case

The headline number is SWE-bench Pro, the gold standard for real-world software engineering tasks drawn from verified GitHub pull requests. GLM-5.2 scores 62.1—up from its predecessor GLM-5.1’s 58.4 and decisively ahead of GPT-5.5’s 58.6. For context, Claude Opus 4.8 sits at approximately 75.1 on the same metric, making GLM-5.2 the closest any open-weight model has come to the closed-source frontier.

On FrontierSWE Dominance—a benchmark designed to measure long-horizon engineering tasks that take hours rather than minutes—GLM-5.2 scored 74.4%, edging past GPT-5.5 at 72.6% and sitting just below Claude Opus 4.8’s 75.1%. The pattern repeats on PostTrainBench (34.3% for GLM-5.2 vs 25.0% for GPT-5.5) and SWE-Marathon (13.0% vs 12.0%)—both specifically testing multi-hour, real-world coding marathon performance.

Perhaps most striking is GLM-5.2’s first-place finish on Design Arena, the crowdsourced HTML and web design leaderboard. With an Elo score of 1,360, it places above Claude Fable 5, Gemini 3.5 Pro, and GPT-5.5. Design Arena measures outputs that human raters prefer rather than automated pass/fail criteria, suggesting GLM-5.2’s quality improvements extend beyond pure programming correctness to aesthetic and structural judgment in code generation.

On the Artificial Analysis Intelligence Index v4.1—a composite benchmark aggregating performance across reasoning, coding, and knowledge tasks—GLM-5.2 scores 51, placing it ahead of MiniMax-M3 (44), DeepSeek V4 Pro (44), and Kimi K2.6 (43) among open-weight models.

The Cost Equation

For many developers, the benchmark story is secondary to the pricing story. GLM-5.2’s API endpoint, served through Z.ai’s Anthropic-compatible API, charges $1.40 per million input tokens and $4.40 per million output tokens. GPT-5.5’s output tokens cost approximately $30 per million—roughly 6.8 times more expensive.

The practical implication is substantial. A coding agent running at 10 million output tokens per day would cost roughly $44 on GLM-5.2’s API versus $300 on GPT-5.5. For startups or individual developers whose workloads previously required premium models, GLM-5.2 changes the economics of AI-assisted development meaningfully.

The model also supports a dual inference regime: a High setting optimizes for speed at modest quality trade-offs, while Max applies full extended thinking for maximum accuracy on complex tasks. This mirrors the tiered inference approach adopted by OpenAI’s o-series and Anthropic’s extended thinking mode.

Open Weights, Open Questions

The MIT license on GLM-5.2’s weights is, in the context of 2026’s increasingly restricted AI landscape, notable. With Anthropic’s Fable 5 and Mythos 5 still offline due to US Commerce Department export controls, and OpenAI restricting access in several jurisdictions, GLM-5.2 represents one of the few frontier-adjacent models that organizations can download, self-host, and deploy without vendor dependency.

Self-hosting at full capability requires a minimum of eight H100 GPUs running at FP8 quantization, which costs $25–$35 per hour on major cloud platforms. That places fully sovereign deployment out of reach for small teams, but within budget for mid-sized enterprises and research institutions.

The data sovereignty question is more complex for teams using Z.ai’s hosted API. As a Chinese company operating under PRC jurisdiction, Z.ai’s data handling practices are subject to Chinese law, including potential government access provisions. Security-conscious organizations—particularly those in defense, healthcare, or finance—have flagged this as a material concern for API usage, even as the open weights enable on-premises alternatives.

Competitive Context: China’s Open-Source Offensive

GLM-5.2 arrives as part of a sustained pattern from Chinese AI labs that began in earnest with DeepSeek-R1 in early 2025. The strategy has been remarkably consistent: match or exceed US closed-source frontier performance with open-weight models, price aggressively via API, and publish weights under permissive licenses. Kimi K2.6, MiniMax-M3, and now GLM-5.2 have each taken turns occupying the “best open-weight” position on key benchmarks within weeks of each other.

The result is a competitive dynamic that US AI companies didn’t fully anticipate: open-source Chinese models are effectively setting a price ceiling on what closed-source US models can charge for comparable coding performance. When GLM-5.2 solves software engineering tasks at 1/6th the cost of GPT-5.5 and licenses the underlying weights for free, it makes premium pricing harder to justify for any developer task that doesn’t explicitly require frontier-only capabilities.

For the broader AI industry, the more significant question is what GLM-5.2 signals about the pace of Chinese AI development. Z.ai’s model reaches near-Opus-4.8 performance on several coding dimensions—a model Anthropic released only two months ago and positioned as its most capable reasoning system. The gap between US closed-source frontier and Chinese open-weight is narrowing faster than most competitive forecasts anticipated.

What Comes Next

Z.ai has indicated through public communications that Anthropic’s publicly available research on chain-of-thought and tool-use scaling informed some of GLM-5.2’s training approach. Whether deliberate or coincidental, the alignment between GLM-5.2’s capabilities and the techniques Anthropic publicly documented suggests that open publication of AI research—a longstanding norm in the field—may increasingly benefit competitors as much as collaborators.

For developers making model choices in the second half of 2026, GLM-5.2 has established a new baseline expectation: open-weight, MIT-licensed, frontier-competitive, and dramatically cheaper than the closed-source alternatives. The next question is whether GPT-5.6—rumored for a late-June launch—can re-establish a meaningful performance gap that justifies its price premium.

Sources

GLM-5.2 Zhipu AI Z.ai open-source AI China AI coding benchmarks GPT-5.5 open weights