Skip to content
Go To Agency
/AI & Tech
AI & Tech

GLM-5.2: the open-weights LLM that just became the world's best, at one-sixth the cost

In mid-June 2026, Zhipu AI shipped GLM-5.2: an MIT-licensed, open-weights model with a real 1M-token context that independent benchmarker Artificial Analysis ranks the number-one open model on Earth, fourth overall, at roughly one-sixth the price of GPT-5.5. We break down the benchmarks, the pricing, the self-hosting reality and the governance catch, and what it actually means for businesses building with AI.

By Robin MonteiroJune 20, 20268 min · 1 686 mots
GLM-5.2Open-weights LLMZhipu AIAgentic codingAI sovereignty
Share article
GLM-5.2: the open-weights LLM that just became the world's best, at one-sixth the cost

Every few months an open-weights model from China resets the conversation. DeepSeek did it. Qwen did it. In mid-June 2026, Zhipu AI (the company behind the international brand z.ai) did it again with GLM-5.2: a Mixture-of-Experts model, released under the permissive MIT license, with a genuine one-million-token context, that the independent benchmarker Artificial Analysis crowned the best open-weights model in the world, fourth overall behind only the closed frontier, at roughly one-sixth the price of GPT-5.5.

For anyone building products with AI, that combination (frontier-adjacent quality, open weights you can self-host, and a price that undercuts the US labs by a wide margin) is the most important development of the quarter. But the headline hides real caveats: self-reported benchmarks, a verbose model that is cheap per token yet expensive per task, and a vendor on the US Entity List whose hosted API routes your data through China. We pulled the primary sources (the Hugging Face model card, Artificial Analysis, vLLM, z.ai's own docs, the US Federal Register) to separate what is verified from what is marketing, and to answer the only question that matters for a business: when should you actually use this?

What GLM-5.2 is, in one table

GLM-5.2 is a sparse Mixture-of-Experts (MoE) model. Most of its parameters sit idle on any given token, which is how a model this large stays affordable to run. Here are the verified specs.

SpecGLM-5.2 (verified)
ArchitectureSparse Mixture-of-Experts, DeepSeek-style sparse attention
Parameters~744 to 753B total, ~40B active per token
Context window1,048,576 tokens (a real 1M, 5x GLM-5.1's 200K)
Max output128K tokens
ModalityText only (no vision)
LicenseMIT (commercial use, modify, redistribute, self-host)
WeightsBF16 (~1.51 TB) and native FP8 (~744 GB) on Hugging Face (zai-org)
FeaturesReasoning modes, tool calling, JSON output, prompt caching, streaming, MCP
ReleasedMid-June 2026 (Artificial Analysis lists June 16)

The keywords that matter for search and for strategy are all here: an open-weights LLM, a Mixture-of-Experts design, a usable 1M-token context, and a model engineered for agentic coding. The next three sections put numbers on each.

The benchmarks: number one open model, number four overall

The most credible signal is independent, not from z.ai. Artificial Analysis, which runs its own evaluation suite, places GLM-5.2 at 51 on its Intelligence Index v4.1, the highest of any open-weights model (it tests 92 of them, where the class average is around 24). It sits fourth overall, behind three closed models. That is the "Chinese open-source AI is catching the frontier" story, told with a third party's numbers.

Artificial Analysis Intelligence Index v4.1 (higher is better)

Claude Fable 5 (closed)
60
Claude Opus 4.8 (closed)
56
GPT-5.5 xhigh (closed)
55
GLM-5.2 (open, MIT)
51
MiniMax-M3 (open)
44
DeepSeek V4 Pro (open)
44
Kimi K2.6 (open)
43

Source: Artificial Analysis Intelligence Index v4.1, June 2026 (independent). GLM-5.2 is first among open-weights, fourth overall.

On individual tests, watch the difference between what z.ai reports and what third parties measure. The company's model card cites strong coding and reasoning numbers; Artificial Analysis confirms big jumps over GLM-5.1 but with slightly lower absolute figures. We label each below.

BenchmarkScoreSource
SWE-bench Pro (agentic coding)62.1 (up from GLM-5.1's 58.4)z.ai (company-reported)
Terminal-Bench 2.181.0 claimed vs 78 measured (Opus 4.8: 85)z.ai claim vs Artificial Analysis
GPQA Diamond (science reasoning)91.2 claimed, ~89 measuredz.ai vs Artificial Analysis
Humanity's Last Exam40.5 (54.7 with tools)z.ai (company-reported)
FrontierSWE"trailing Opus 4.8 by 1%"z.ai (marketing claim)

The honest read: GLM-5.2 is genuinely frontier-adjacent on coding and reasoning, the independent ranking proves it, but the splashiest single numbers ("trailing Opus by 1%", Terminal-Bench 81) are z.ai's own and run a touch hot versus neutral measurement. For a buying decision, trust the Artificial Analysis aggregate (number one open model) and treat the rest as directional.

The real story is price, with one catch

This is where GLM-5.2 reorders the market. The official z.ai API charges $1.40 per million input tokens and $4.40 per million output tokens, with cached input at just $0.26 (an 81% cache discount). VentureBeat measured the blended cost at roughly one-sixth of GPT-5.5. Third-party routers go lower still (OpenRouter lists $1.20 / $4.10). For an open, near-frontier model, that is a structural price cut, not a promotion.

The catch is token consumption. GLM-5.2 is a heavy reasoner: on Artificial Analysis's suite it burns around 43,000 output tokens per task (about 37,000 of them reasoning), so the cost per completed task lands higher than several rivals despite the low per-token price. Cheap per token does not automatically mean cheap per job.

Cost per task on the Artificial Analysis suite (lower is better)

GLM-5.2
$0.46
Kimi K2.6
$0.31
GLM-5.1
$0.25
MiniMax-M3
$0.18
DeepSeek V4 Pro
$0.05

Source: Artificial Analysis, June 2026. GLM-5.2 is the smartest open model but also the most token-hungry, so budget for output, not just the per-token rate.

GLM-5.2 official pricing (z.ai)Per 1M tokens
Input$1.40
Cached input$0.26 (81% off, storage free for now)
Output$4.40
Blended vs GPT-5.5roughly one-sixth the cost (VentureBeat)

Open weights mean sovereignty, not just savings

The pricing matters, but the license matters more. GLM-5.2 ships under a standard, unmodified MIT license with no acceptable-use addendum and no regional limits on the weights. You can download the full BF16 or FP8 checkpoints from Hugging Face, run them on your own hardware, fine-tune them, and ship them commercially. For a business, that is the difference between renting intelligence and owning your stack.

Self-hosting is real but not trivial. The FP8 checkpoint fits on a single node of 8x H200 or 8x H20 GPUs; serving the full 1M-token context needs 8x B200. It runs on vLLM, SGLang and Transformers, and AMD has shipped an MXFP4 build for its Instinct MI350/MI355 accelerators. In practice, most teams will start on the API and reserve self-hosting for the cases where it pays off: strict data sovereignty, predictable high-volume costs, or fine-tuning on proprietary data. The point is that the option exists, which is something no amount of GPT-5.5 or Claude budget can buy you.

The catch: governance, trust, and the Entity List

Here is what the launch posts will not lead with. Zhipu AI was added to the US Entity List on January 16, 2025 (Federal Register rule 2025-00704), the first Chinese LLM company on it, with the stated rationale that it helps "advance the People's Republic of China's military modernization." That does not stop you from downloading MIT-licensed weights, but it is a real signal for any organization weighing vendor risk.

More concretely for day-to-day use: the convenient hosted z.ai API runs through a China-based company subject to China's data laws. For a European or French business handling client or personal data, that is a governance question you must answer before piping sensitive prompts to it. The clean resolution is exactly the one the MIT license enables: self-host the weights inside your own infrastructure, and the data never leaves. Use the cheap API for non-sensitive workloads, self-host for the rest. Add the verbose cost-per-task profile and the gap between self-reported and independently-measured benchmarks, and you have the full, honest picture.

The GLM lineage, in dates

GLM-5.2 did not appear from nowhere. It is the latest step of a fast, public cadence that has steadily closed the gap with the US labs.

  • GLM-4.5 to GLM-4.6 Zhipu establishes itself as a serious open-weights contender.
  • GLM-5 The first to trade real blows with the frontier on coding.
  • GLM-5.1 744B/40B MoE, 200K context, the workhorse predecessor.
  • GLM-5.2 (mid-June 2026) Same size as GLM-5.1, but quintuples context to 1M, posts the largest one-version benchmark jump of the line, and takes the number-one open-weights ranking.

Our take: when to actually use GLM-5.2

What follows is our analysis.

The hype is mostly earned, and the right response for a business is neither to dismiss it nor to migrate everything overnight. It is to match the model to the job. From how we build with AI for clients, here is the practical grid.

  • Use it for agentic coding and high-volume automation. As a Claude Code alternative or the engine behind internal agents, GLM-5.2's price and openness are hard to beat. Wire it behind an abstraction so you can switch models in a config change, and budget for its token appetite.
  • Self-host it when sovereignty or scale demands it. Sensitive data, regulated sectors, or predictable heavy volume are the cases where owning the MIT weights on your own GPUs beats any rented API.
  • Keep it off your most sensitive data on the hosted API. Until you self-host, do not route confidential or personal data through the China-based endpoint. This is a governance line, not a quality one.
  • Do not single-vendor anything. The lesson of the past month, from tools getting acquired to models being suspended, is that the model under your product should be a swappable component. GLM-5.2 is a superb addition to a multi-model stack, not a reason to bet the company on one provider.

This is exactly how we architect AI features for clients: the model as an interchangeable part behind your own interfaces, chosen per task on price, performance and governance, on infrastructure you control (see our work). If you want help deciding where GLM-5.2, Claude or GPT actually fit in your product, and how to keep your data and your options open, tell us about your project (or contact us) and we will come back within 48 hours. For more on the fast-moving AI stack, see our pieces on SpaceX buying Cursor and the government suspension of Fable 5.

Key numbers (as of June 2026)

This is a fast-moving space; every figure is date-stamped to mid-June 2026 and will shift as rivals respond.

  • 51 Artificial Analysis Intelligence Index, number one open-weights model, number four overall.
  • 1,048,576 tokens of context, with 128K max output.
  • ~744 to 753B total parameters, ~40B active (Mixture-of-Experts).
  • $1.40 / $4.40 per million input/output tokens, about one-sixth of GPT-5.5.
  • MIT license, fully self-hostable on 8x H200 (FP8).
  • January 16, 2025 the date Zhipu was added to the US Entity List.
RM

About the author

Robin Monteiro

Co-fondateur de Go To Agency

Développeur full-stack et co-fondateur de Go To Agency, Robin conçoit des solutions web performantes avec Next.js, React et les dernières technologies.

Meet the team

Go To Agency — digital agency, Dijon (France)

The team behind this article can build it for you

Custom Next.js websites and e-commerce, SEO that ranks, and ad campaigns measured down to the return. Everything happens in writing, no meetings: describe what you need and we come back with a concrete read.

Your request lands directly in [email protected] — reply within 24 business hours, no commitment.

Share article

Questions fréquentes

What is GLM-5.2?+

GLM-5.2 is the flagship large language model released in mid-June 2026 by Zhipu AI, the Chinese lab behind the z.ai brand. It is a sparse Mixture-of-Experts model (roughly 744 to 753 billion total parameters, about 40 billion active per token) with a genuine one-million-token context window, 128K max output, text only. It is released as open weights under the permissive MIT license, so businesses can download, self-host, fine-tune and commercialize it.

Is GLM-5.2 really the best open-source model?+

By the most credible independent measure, yes. Artificial Analysis ranks it 51 on its Intelligence Index v4.1, the highest of any open-weights model, and fourth overall behind three closed frontier models (Claude Fable 5, Claude Opus 4.8 and GPT-5.5). Note that some of z.ai's own single-benchmark numbers run slightly higher than independent measurements, so trust the third-party aggregate over the marketing claims.

How much does GLM-5.2 cost?+

On the official z.ai API it is $1.40 per million input tokens and $4.40 per million output tokens, with cached input at $0.26 (an 81% discount). That is roughly one-sixth the blended cost of GPT-5.5. The important caveat: GLM-5.2 is a heavy reasoner that produces a lot of output tokens, so the cost per completed task (around $0.46 on a standard suite) can be higher than cheaper rivals despite the low per-token price. Budget for token consumption, not just the rate.

Can a business self-host GLM-5.2?+

Yes. The MIT license permits commercial use, modification and redistribution, and the weights are on Hugging Face in BF16 and native FP8. The FP8 checkpoint runs on a single node of 8x H200 or 8x H20 GPUs (serving the full 1M context needs 8x B200), via vLLM, SGLang or Transformers. Self-hosting is the clean answer to data-governance concerns: the data never leaves your infrastructure.

Is it safe to send sensitive data to GLM-5.2?+

Distinguish the weights from the hosted API. The MIT weights you self-host carry no regional restriction and keep data in-house. The convenient z.ai hosted API, however, routes data through a China-based company subject to China's data laws, and Zhipu has been on the US Entity List since January 2025. For confidential or personal data, self-host the weights or use a non-China provider; reserve the hosted API for non-sensitive workloads.

Should we switch from Claude or GPT to GLM-5.2?+

Not wholesale. The smart move is a multi-model stack: use GLM-5.2 where its price, openness and self-hosting win (agentic coding, high-volume automation, sovereignty-sensitive workloads), while keeping Claude or GPT for tasks where they lead. Put every model behind your own abstraction so switching is a config change, and never bet your product on a single provider.

Related articles

Free quote
GLM-5.2: the best open-weights LLM, at 1/6 the cost | Go To Agency