Every few months an open-weights model from China resets the conversation. DeepSeek did it. Qwen did it. In mid-June 2026, Zhipu AI (the company behind the international brand z.ai) did it again with GLM-5.2: a Mixture-of-Experts model, released under the permissive MIT license, with a genuine one-million-token context, that the independent benchmarker Artificial Analysis crowned the best open-weights model in the world, fourth overall behind only the closed frontier, at roughly one-sixth the price of GPT-5.5.
For anyone building products with AI, that combination (frontier-adjacent quality, open weights you can self-host, and a price that undercuts the US labs by a wide margin) is the most important development of the quarter. But the headline hides real caveats: self-reported benchmarks, a verbose model that is cheap per token yet expensive per task, and a vendor on the US Entity List whose hosted API routes your data through China. We pulled the primary sources (the Hugging Face model card, Artificial Analysis, vLLM, z.ai's own docs, the US Federal Register) to separate what is verified from what is marketing, and to answer the only question that matters for a business: when should you actually use this?
What GLM-5.2 is, in one table
GLM-5.2 is a sparse Mixture-of-Experts (MoE) model. Most of its parameters sit idle on any given token, which is how a model this large stays affordable to run. Here are the verified specs.
| Spec | GLM-5.2 (verified) |
|---|---|
| Architecture | Sparse Mixture-of-Experts, DeepSeek-style sparse attention |
| Parameters | ~744 to 753B total, ~40B active per token |
| Context window | 1,048,576 tokens (a real 1M, 5x GLM-5.1's 200K) |
| Max output | 128K tokens |
| Modality | Text only (no vision) |
| License | MIT (commercial use, modify, redistribute, self-host) |
| Weights | BF16 (~1.51 TB) and native FP8 (~744 GB) on Hugging Face (zai-org) |
| Features | Reasoning modes, tool calling, JSON output, prompt caching, streaming, MCP |
| Released | Mid-June 2026 (Artificial Analysis lists June 16) |
The keywords that matter for search and for strategy are all here: an open-weights LLM, a Mixture-of-Experts design, a usable 1M-token context, and a model engineered for agentic coding. The next three sections put numbers on each.
The benchmarks: number one open model, number four overall
The most credible signal is independent, not from z.ai. Artificial Analysis, which runs its own evaluation suite, places GLM-5.2 at 51 on its Intelligence Index v4.1, the highest of any open-weights model (it tests 92 of them, where the class average is around 24). It sits fourth overall, behind three closed models. That is the "Chinese open-source AI is catching the frontier" story, told with a third party's numbers.
Artificial Analysis Intelligence Index v4.1 (higher is better)
Source: Artificial Analysis Intelligence Index v4.1, June 2026 (independent). GLM-5.2 is first among open-weights, fourth overall.
On individual tests, watch the difference between what z.ai reports and what third parties measure. The company's model card cites strong coding and reasoning numbers; Artificial Analysis confirms big jumps over GLM-5.1 but with slightly lower absolute figures. We label each below.
| Benchmark | Score | Source |
|---|---|---|
| SWE-bench Pro (agentic coding) | 62.1 (up from GLM-5.1's 58.4) | z.ai (company-reported) |
| Terminal-Bench 2.1 | 81.0 claimed vs 78 measured (Opus 4.8: 85) | z.ai claim vs Artificial Analysis |
| GPQA Diamond (science reasoning) | 91.2 claimed, ~89 measured | z.ai vs Artificial Analysis |
| Humanity's Last Exam | 40.5 (54.7 with tools) | z.ai (company-reported) |
| FrontierSWE | "trailing Opus 4.8 by 1%" | z.ai (marketing claim) |
The honest read: GLM-5.2 is genuinely frontier-adjacent on coding and reasoning, the independent ranking proves it, but the splashiest single numbers ("trailing Opus by 1%", Terminal-Bench 81) are z.ai's own and run a touch hot versus neutral measurement. For a buying decision, trust the Artificial Analysis aggregate (number one open model) and treat the rest as directional.
The real story is price, with one catch
This is where GLM-5.2 reorders the market. The official z.ai API charges $1.40 per million input tokens and $4.40 per million output tokens, with cached input at just $0.26 (an 81% cache discount). VentureBeat measured the blended cost at roughly one-sixth of GPT-5.5. Third-party routers go lower still (OpenRouter lists $1.20 / $4.10). For an open, near-frontier model, that is a structural price cut, not a promotion.
The catch is token consumption. GLM-5.2 is a heavy reasoner: on Artificial Analysis's suite it burns around 43,000 output tokens per task (about 37,000 of them reasoning), so the cost per completed task lands higher than several rivals despite the low per-token price. Cheap per token does not automatically mean cheap per job.
Cost per task on the Artificial Analysis suite (lower is better)
Source: Artificial Analysis, June 2026. GLM-5.2 is the smartest open model but also the most token-hungry, so budget for output, not just the per-token rate.
| GLM-5.2 official pricing (z.ai) | Per 1M tokens |
|---|---|
| Input | $1.40 |
| Cached input | $0.26 (81% off, storage free for now) |
| Output | $4.40 |
| Blended vs GPT-5.5 | roughly one-sixth the cost (VentureBeat) |
Open weights mean sovereignty, not just savings
The pricing matters, but the license matters more. GLM-5.2 ships under a standard, unmodified MIT license with no acceptable-use addendum and no regional limits on the weights. You can download the full BF16 or FP8 checkpoints from Hugging Face, run them on your own hardware, fine-tune them, and ship them commercially. For a business, that is the difference between renting intelligence and owning your stack.
Self-hosting is real but not trivial. The FP8 checkpoint fits on a single node of 8x H200 or 8x H20 GPUs; serving the full 1M-token context needs 8x B200. It runs on vLLM, SGLang and Transformers, and AMD has shipped an MXFP4 build for its Instinct MI350/MI355 accelerators. In practice, most teams will start on the API and reserve self-hosting for the cases where it pays off: strict data sovereignty, predictable high-volume costs, or fine-tuning on proprietary data. The point is that the option exists, which is something no amount of GPT-5.5 or Claude budget can buy you.
The catch: governance, trust, and the Entity List
Here is what the launch posts will not lead with. Zhipu AI was added to the US Entity List on January 16, 2025 (Federal Register rule 2025-00704), the first Chinese LLM company on it, with the stated rationale that it helps "advance the People's Republic of China's military modernization." That does not stop you from downloading MIT-licensed weights, but it is a real signal for any organization weighing vendor risk.
More concretely for day-to-day use: the convenient hosted z.ai API runs through a China-based company subject to China's data laws. For a European or French business handling client or personal data, that is a governance question you must answer before piping sensitive prompts to it. The clean resolution is exactly the one the MIT license enables: self-host the weights inside your own infrastructure, and the data never leaves. Use the cheap API for non-sensitive workloads, self-host for the rest. Add the verbose cost-per-task profile and the gap between self-reported and independently-measured benchmarks, and you have the full, honest picture.
The GLM lineage, in dates
GLM-5.2 did not appear from nowhere. It is the latest step of a fast, public cadence that has steadily closed the gap with the US labs.
- GLM-4.5 to GLM-4.6 Zhipu establishes itself as a serious open-weights contender.
- GLM-5 The first to trade real blows with the frontier on coding.
- GLM-5.1 744B/40B MoE, 200K context, the workhorse predecessor.
- GLM-5.2 (mid-June 2026) Same size as GLM-5.1, but quintuples context to 1M, posts the largest one-version benchmark jump of the line, and takes the number-one open-weights ranking.
Our take: when to actually use GLM-5.2
What follows is our analysis.
The hype is mostly earned, and the right response for a business is neither to dismiss it nor to migrate everything overnight. It is to match the model to the job. From how we build with AI for clients, here is the practical grid.
- Use it for agentic coding and high-volume automation. As a Claude Code alternative or the engine behind internal agents, GLM-5.2's price and openness are hard to beat. Wire it behind an abstraction so you can switch models in a config change, and budget for its token appetite.
- Self-host it when sovereignty or scale demands it. Sensitive data, regulated sectors, or predictable heavy volume are the cases where owning the MIT weights on your own GPUs beats any rented API.
- Keep it off your most sensitive data on the hosted API. Until you self-host, do not route confidential or personal data through the China-based endpoint. This is a governance line, not a quality one.
- Do not single-vendor anything. The lesson of the past month, from tools getting acquired to models being suspended, is that the model under your product should be a swappable component. GLM-5.2 is a superb addition to a multi-model stack, not a reason to bet the company on one provider.
This is exactly how we architect AI features for clients: the model as an interchangeable part behind your own interfaces, chosen per task on price, performance and governance, on infrastructure you control (see our work). If you want help deciding where GLM-5.2, Claude or GPT actually fit in your product, and how to keep your data and your options open, tell us about your project (or contact us) and we will come back within 48 hours. For more on the fast-moving AI stack, see our pieces on SpaceX buying Cursor and the government suspension of Fable 5.
Key numbers (as of June 2026)
This is a fast-moving space; every figure is date-stamped to mid-June 2026 and will shift as rivals respond.
- 51 Artificial Analysis Intelligence Index, number one open-weights model, number four overall.
- 1,048,576 tokens of context, with 128K max output.
- ~744 to 753B total parameters, ~40B active (Mixture-of-Experts).
- $1.40 / $4.40 per million input/output tokens, about one-sixth of GPT-5.5.
- MIT license, fully self-hostable on 8x H200 (FP8).
- January 16, 2025 the date Zhipu was added to the US Entity List.



