On June 23, 2026, at the 2026 Volcano Engine FORCE Conference, ByteDance announced Seedance 2.5, the next generation of its AI video model, and one number traveled around the internet faster than the rest: 30 seconds. Not 30 seconds stitched from clips. Thirty seconds of native, continuous, single-pass generation, scene changes and tempo shifts included. For a field where most models still hand you 5 to 10 seconds at a time, that is the kind of jump that resets expectations.
But a launch-day headline is not a benchmark. So we did what we always do before recommending a tool to a client: we went to the primary sources (ByteDance Seed's own tech reports, the arXiv papers, the Volcano Engine announcement, and the live Artificial Analysis Video Arena) and split the claims into three buckets, verified, company-claimed, and not-yet-known. The short version: the Seedance line is genuinely the best in the world right now on independent blind-vote leaderboards, the 30-second claim is real but still company-stated for a model in beta, and most of the hard 2.5 specs (resolution, fps, pricing) simply do not exist publicly yet. This is the data, with sources, that creators and businesses can actually build on.
What ByteDance actually announced (the verified part)
Strip away the demos and only four things about Seedance 2.5 are solid right now. They come from the conference itself, reported by multiple outlets close to the event (BigGo Finance, The Decoder, AIBase) and confirmed across them.
| Seedance 2.5, what is actually confirmed | Detail | Confidence |
|---|---|---|
| Announced | June 23, 2026, at the 2026 Volcano Engine FORCE Conference (by Volcano Engine president Tan Dai) | Verified (event) |
| Availability | Global enterprise beta now, general availability targeted for early July 2026 | Company-announced timeline |
| Headline capability | Direct, single-pass output of a 30-second native clip, one continuous generation with scene and tempo changes, not post-stitched | Company-claimed |
| Reference inputs | Accepts up to 50 multimodal reference materials in one generation (up from 12 in Seedance 2.0) | Company-claimed |
| Resolution / fps / audio for 2.5 | Not officially disclosed at announcement | Unknown (do not assume) |
| API pricing for 2.5 | Not announced during beta | Unknown |
That last row matters more than any spec. Anyone publishing a confident "Seedance 2.5 does 4K at 60fps for $X per second" right now is guessing. ByteDance disclosed the duration and the reference-count, and almost nothing else. We will treat everything beyond those as unconfirmed until the early-July general-availability tech report lands.
The headline, in one line
A 30-second clip in one continuous generation, from up to 50 reference inputs. That is the Seedance 2.5 pitch. Everything else is still beta.
The only numbers that are independently verified: the leaderboard
Here is the honest hook, and it is more impressive than any unverifiable spec. Seedance 2.5 has no benchmark score anywhere. It is not on the Artificial Analysis Video Arena, not on llm-stats, nowhere. It is days old and in beta, so any "Seedance 2.5 Elo" you see circulating is invented. We checked the live boards directly.
What is real, and verified against the primary leaderboard, is that the previous model, Seedance 2.0, already sits at number one in the world. On the Artificial Analysis Text-to-Video Arena (blind human preference, with-audio view, June 2026), "Dreamina Seedance 2.0 720p" leads with an Elo of 1,219, ahead of Alibaba's HappyHorse-1.0, Kuaishou's Kling 3.0 Pro, and Google's Veo 3.1, which sits all the way down at #8. That is the factual basis for the "Chinese AI video models are leading the global leaderboards" story, and it is the floor Seedance 2.5 is launching from.
Artificial Analysis Text-to-Video Arena, Elo (with audio, June 2026)
Elo axis starts at 1,050 to show the spread. Source: Artificial Analysis Text-to-Video Arena, with-audio view, June 2026 (independent, blind human preference). This is Seedance 2.0, the model before 2.5. Seedance 2.5 is not yet ranked.
The pattern repeats on the image-to-video board. On the Artificial Analysis Image-to-Video Arena (with-audio, June 2026), Seedance 2.0 720p again holds #1 at Elo 1,195, with Alibaba and Google trailing. Chinese labs (ByteDance, Alibaba, Kuaishou) occupy the entire top tier of both boards. One caveat to keep you honest: these are the with-audio sub-leaderboards, and the no-audio views shuffle slightly (Alibaba's HappyHorse edges ahead on text-to-video without audio). Always read the view label. The takeaway holds either way: the Seedance family is, today, the strongest video generator that exists by independent blind vote, and 2.5 is its successor.
The Seedance lineage, in dates
Seedance 2.5 did not appear from nowhere. It is the latest step in a fast, public cadence that has steadily closed the gap on, and now passed, the Western labs on the leaderboards.
- Seedance 1.0 (June 2025) The foundation. Tech report on arXiv (2506.09113), integrated into Doubao and Jimeng. ByteDance claimed #1 on both Artificial Analysis boards at launch. The Pro tier generated a 5-second 1080p clip in 41.4 seconds on an NVIDIA L20.
- Seedance 1.5 pro (December 2025) The audio milestone: native, joint audio-video generation in a single pass, with lip-sync across languages and dialects. This is when sound stopped being a bolt-on (ByteDance Seed paper).
- Seedance 2.0 (February 2026) The current leaderboard champion. Up to 12 reference inputs, and the model now sitting at #1 on both Artificial Analysis arenas.
- Seedance 2.5 (announced June 23, 2026, GA early July 2026) The 30-second single-pass leap, up to 50 reference inputs, in enterprise beta as of this writing.
The stack: what is under the hood
The Seedance family is a diffusion-transformer (DiT) lineage, and this part rests on strong primary sources rather than launch hype. Seedance 1.0's tech report describes an MMDiT backbone with decoupled spatial and temporal layers, multimodal rotary position embeddings (MM-RoPE) and a temporally-causal VAE, trained so that a single model natively handles multi-shot generation and learns text-to-video and image-to-video jointly. There is no separate "image model" and "video model"; it is one unified architecture.
Seedance 1.5 pro extended that into a dual-branch Diffusion Transformer with a cross-modal joint module, generating the video frames and the audio waveform simultaneously in one pass, rather than dubbing sound on afterward. That is why the synchronization (lip-sync, action-linked sound effects) holds up. ByteDance has not published the architectural details specific to 2.5, but the family's direction is clear: longer context (now 30 seconds), more reference conditioning (now 50 inputs), and audio-visual generation treated as a single problem.
How it stacks up against Sora 2, Veo 3.1, Kling and Runway
Here is the competitive picture. Read it with one rule in mind: the only independently verified numbers in this table are the Artificial Analysis Elo scores. Maximum-duration and audio columns are taken from each vendor's documentation as of June 2026 and change constantly, so treat them as directional, not gospel, and verify before you build a production pipeline on them.
| Model (vendor) | Max single-pass clip | Native audio | AA Video Arena (T2V, with audio, Jun 2026) |
|---|---|---|---|
| Seedance 2.5 (ByteDance) | 30s (announced) | Family yes; 2.5 not detailed | Not benchmarked yet (just announced) |
| Seedance 2.0 (ByteDance) | Short clips, multi-shot | Yes (since 1.5 pro) | #1, Elo 1,219 |
| Kling 3.0 Pro (Kuaishou) | ~10s, extendable | Yes | #3, Elo 1,106 |
| Google Veo 3.1 | ~8s typical | Yes | #8, Elo 1,094 |
| Alibaba Wan 2.7 | Short clips | Yes | #9, Elo 1,089 |
| OpenAI Sora 2 | Longer clips, varies by tier | Yes | Not in this dataset |
| Runway Gen-4 | ~10s | Limited | Not in this dataset |
| MiniMax Hailuo 02 | ~6 to 10s | Varies | Not in this dataset |
Why Sora 2, Runway and Hailuo show "not in this dataset": they did not surface with confirmed Elo figures on the boards we verified. We would rather leave a cell empty than print a number we cannot source. That discipline is the whole point of this article.
The catches creators and businesses must price in
The technology is genuinely ahead. The caveats are real, and a launch post will not lead with them.
- It is beta, and benchmarks are pending. The 30-second and 50-reference claims are ByteDance's own, for a model the public cannot fully test yet. Until 2.5 appears on an independent arena, treat the quality as "the Seedance line, probably better," not as a measured fact.
- Pricing is unknown, and the figures circulating are unreliable. The per-second numbers floating around the web are for Seedance 2.0, and even those did not survive our verification. Budget nothing on 2.5 pricing until ByteDance publishes it for the Volcano Engine and BytePlus APIs.
- The API is China-hosted. Volcano Engine (domestic) and BytePlus (international) route generations through infrastructure subject to Chinese data law. For a European or French business handling client or personal data, that is a governance question to answer before sending anything sensitive, not a detail.
- Deepfake risk is concrete. ByteDance reportedly suspended a "voice from a single photo" feature after the 1.5 launch over misuse concerns. Watermarking and C2PA provenance behavior for 2.5 has not been confirmed. If you generate likenesses, that is on you to manage.
Our read: what Seedance 2.5 means for your video workflow
What follows is our analysis.
The 30-second single-pass clip is not a gimmick. Most real-world video, a product explainer, a social ad, an opening sequence, lives in the 15-to-30-second range, and stitching short AI clips together is exactly where consistency breaks: the character's face drifts, the lighting jumps, the motion stutters at the cut. A model that holds one continuous generation for 30 seconds, with up to 50 reference inputs to lock character and style, attacks the single biggest production headache in AI video head-on. If ByteDance ships what it announced, this is a workflow change, not a spec bump.
For creators and businesses, the practical advice is the same as it is for every AI model we evaluate. Treat the model as an interchangeable component behind your own process, not as the process. Use the Seedance line where it is strongest (it is, by independent vote, the strongest video generator that exists right now), keep a second vendor like Veo or Kling wired up so you are never hostage to one API, and resolve the China-hosting governance question before any sensitive footage goes near it. The teams that win with generative video are not the ones chasing every launch; they are the ones with a pipeline that can swap the best model in with a config change.
That is exactly how we build AI features for clients: the model as a swappable part behind interfaces and infrastructure you control, chosen per task on quality, cost and governance (see our work). If you are a brand, an agency or a creator trying to fold AI video into real production, and you want it done with the data discipline this article is built on rather than launch-day hype, tell us about your project (or get in touch) and we will come back to you within 48 hours. For more on the fast-moving AI stack, read our breakdowns of GLM-5.2, the best open-weights LLM, Midjourney's medical body scanner, and SpaceX buying Cursor for 60 billion dollars.
Key numbers (as of June 23, 2026)
This is a launch-window snapshot; every figure is dated and will move as the model ships and rivals respond.
- 30 seconds single-pass native clip, the headline Seedance 2.5 capability (company-claimed).
- 50 multimodal reference inputs accepted in one generation, up from 12 in Seedance 2.0.
- June 23, 2026 announcement date; general availability targeted for early July 2026.
- 1,219 Elo for Seedance 2.0 on the Artificial Analysis text-to-video arena, #1 in the world (with audio). 2.5 is not ranked yet.
- 1,195 Elo for Seedance 2.0 on the image-to-video arena, also #1.
- June 2025 the start of the lineage (Seedance 1.0), to 2.5 in a single year.



