HappyHorse-1.0: The Anonymous Model That Just Topped Every AI Video Leaderboard

HappyHorse-1.0 just claimed #1 on Artificial Analysis for both text-to-video and image-to-video.

No team. No API. No downloadable weights. But blind human voters on Artificial Analysis just ranked HappyHorse-1.0 above Seedance 2.0, Kling 3.0, and every other video model in existence. Here’s what that means for your stack — and what it doesn’t.

A model nobody recognizes just won the most credible video benchmark

The Artificial Analysis Video Arena is the closest thing to a trusted, independent ranking for AI video models. It works like chess Elo: users see two videos generated from the same prompt, pick the better one without knowing which model made which, and those votes accumulate into a rating. No lab gets to cherry-pick demos. No self-reported FID scores. Just blind human preference at scale.

Last week a model called HappyHorse-1.0 appeared at the top of both the text-to-video and image-to-video rankings. It wasn’t a close race.

CategoryEloRankGap to #2
Text-to-Video (no audio)1,333#1+60 pts
Image-to-Video (no audio)1,392#1+37 pts
Text-to-Video (with audio)1,205#2−14 pts
Image-to-Video (with audio)1,161#2−1 pt

For context: a 60-point Elo gap means HappyHorse wins roughly 58–59% of head-to-head blind matchups against the previous #1, Seedance 2.0. A 5-point gap is statistical noise. Sixty points is not.

The picture flips when audio enters the equation — Seedance 2.0 edges ahead in both audio categories, though by much thinner margins (14 points and 1 point). This suggests HappyHorse’s audio synthesis exists and is competitive, but isn’t its strongest suit.

A caveat worth repeating: new models have more volatile Elo scores. Seedance 2.0 has 7,500+ vote samples. HappyHorse’s sample count isn’t public. These numbers will move. The direction is unknown.

What HappyHorse claims about itself

Everything below comes from the model’s own website. None of it has been independently verified. I’m treating these as claims, not facts.

Architecture

A single unified Transformer with 40 layers. Text tokens, reference image latents, and noisy video/audio tokens are jointly denoised within one sequence. The first and last 4 layers use modality-specific projections; the middle 32 share parameters across all modalities. No cross-attention. A secondary site claims 15 billion parameters — the primary domain doesn’t confirm that number.

Multilingual audio-video

Six natively supported languages for joint audio-video generation: English, Chinese, Japanese, Korean, German, and French. One secondary page adds Cantonese and mentions “ultra-low WER lip-sync.” None of this is testable without access to the model.

Unified pipeline

Both T2V and I2V run through a single pipeline — consistent with its presence in both arena categories under one model name. The site also claims joint Foley/dialogue/ambient sound synthesis in one forward pass.

The architecture description is specific enough that it will be falsifiable the moment weights are public. Until then, it’s marketing.

Everything we can’t verify

This is the part that matters for anyone making decisions.

ClaimStatus
Team identityUnknown
Open-source weights“Coming soon”
Public APINone
15B parameter countUnconfirmed
Inference speed (2s @ 256p, 38s @ 1080p on H100)Self-reported
Architecture detailsPlausible, unverified
Elo rankings on Artificial AnalysisVerified

Artificial Analysis themselves described the submission as “pseudonymous.” The website says weights are “released” and “everything is open” — but the GitHub and HuggingFace links both point nowhere. That’s a contradiction, not a release.

The WAN 2.7 speculation

Some in the community suspect HappyHorse is actually WAN 2.7 — the next version from Alibaba’s WAN video family — running anonymously before an official launch. The reasoning: anonymous pre-release drops have become a pattern in the Chinese AI ecosystem. The Pony Alpha / GLM-5 situation in February 2026 is the clearest precedent, where a mystery model on OpenRouter turned out to be Z.ai stress-testing GLM-5 under a pseudonym.

It’s a plausible theory. It’s also unconfirmed. WAN 2.6 currently sits at Elo 1,189 — well below HappyHorse. The architecture claims don’t obviously align with known WAN designs. No leaked weights or API fingerprinting has connected the two.

The practical leaderboard: what you can actually use today

If you’re a builder evaluating video generation for a product, pipeline, or creative workflow, here’s the reality as of April 2026:

RankModelElo (T2V)API AccessCost
#1HappyHorse-1.01,333No
#2Seedance 2.0 720p1,273No public API
#3SkyReels V41,245Yes$7.20/min
#4Kling 3.0 1080p Pro1,241Yes$13.44/min
#5PixVerse V61,240Yes$5.40/min

The two highest-rated models by blind comparison — HappyHorse and Seedance 2.0 — are both inaccessible. Positions 3 through 5 are separated by just 5 Elo points, which is effectively a three-way tie.

So the decision matrix, right now, comes down to:

  • Best quality-to-price: SkyReels V4 at $7.20/min with an Elo of 1,245
  • Native 1080p: Kling 3.0 Pro — higher cost, but no upscaling step needed
  • Lowest cost in the top tier: PixVerse V6 at $5.40/min

The question isn’t “which model is best?” — it’s “which model is best that I can actually ship with?” Right now, the answer starts at #3 on the leaderboard.

What to watch for

Three signals would move HappyHorse from “interesting leaderboard entry” to “serious contender for your stack”:

  1. A real GitHub release — actual weights, inference code, and a license. Not a “coming soon” link.
  2. A HuggingFace model card — verifiable architecture details, benchmark reproductions, community testing.
  3. An API with documented pricing — something you can hit with a POST request and get a video back.

None of these exist as of publication. But the stealth-drop-then-release pattern has played out multiple times this year. It’s reasonable to keep HappyHorse on your radar while building with what’s available.

Discover more