Built by the team behind Kling AI. 15B parameter Transformer with native audio, 1080p HD, multilingual lip-sync, and the highest blind-test scores ever recorded. Now available on Aura AI.
The model that dethroned every competitor on blind-test leaderboards
HappyHorse 1.0 topped both Text-to-Video and Image-to-Video leaderboards on April 8, 2026, beating Seedance 2.0, Veo 3.1, and Sora 2 in blind user preference tests.
Generates synchronized audio and video in a single pass — ambient sounds, dialogue, lip-sync, and music. No post-processing or separate audio tools needed.
Native support for English, Mandarin, Cantonese, Japanese, Korean, German, and French with industry-leading low Word Error Rate. Characters speak naturally in any of these languages.
Single-stream 40-layer Transformer with approximately 15 billion parameters. Purpose-built for video generation with unprecedented motion quality and temporal coherence.
Every video is generated at full 1080p resolution natively — not upscaled. Clean detail, sharp edges, and professional-grade output ready for any platform.
One of the fastest AI video models available. Average generation time of approximately 10 seconds, making rapid iteration and creative experimentation practical.
Generate coherent multi-scene narratives with consistent characters, environments, and visual style across shots. Built for creators who need more than single clips.
Built by Future Life Lab (Taotian/Alibaba), led by Zhang Di — former VP of Kuaishou and architect of Kling AI. Open-source release includes base model, distilled variant, super-resolution module, and inference code.
How the #1 ranked model compares to leading competitors
| Feature | HappyHorse 1.0 | Seedance 2.0 | Veo 3.1 | Sora 2 |
|---|---|---|---|---|
| Arena Ranking | #1 | #2 | #3 | #5 |
| Parameters | ~15B | Unknown | Unknown | Unknown |
| Native Audio | Yes | Yes | Yes | Yes |
| Resolution | 1080p | 720p | 1080p / 4K | 1080p |
| Max Duration | 15s | 15s | 8s | 12s |
| Lip-Sync Languages | 7 | Limited | English | English |
| Generation Speed | ~10s | ~30s | ~60s | ~45s |
| Open Source | Yes | No | No | No |
| On Aura AI | Exclusive | Yes | Yes | Yes |
Available exclusively on Aura AI — the first platform to offer this model.
Available to Starter ($14/mo), Pro ($59/mo), and Premium ($99/mo) subscribers.
Generate videos from text prompts or images. Both standard and fast variants available.
No geographic restrictions. Access HappyHorse 1.0 from any country, on any device.
No waitlist, no approval. Subscribe and select HappyHorse from the model dropdown.
From the architects of Kling AI to the #1 video model in the world
HappyHorse 1.0 was built by Future Life Lab, a division of Taotian Group (Alibaba), led by Zhang Di — former VP of Kuaishou and the technical architect behind Kling AI, one of the most successful AI video models of 2025.
The model uses a single-stream 40-layer Transformer architecture with approximately 15 billion parameters, trained specifically for joint audio-video generation. It appeared on the Artificial Analysis Video Arena on April 8, 2026 under an anonymous entry, rapidly climbing to #1 in both text-to-video and image-to-video categories before being identified.
The open-source release includes the base model, a distilled variant for faster inference, a super-resolution module, and full inference code — supporting self-hosting, fine-tuning, and commercial use.
Everything you need to know about HappyHorse 1.0 on Aura AI
The #1 ranked AI video model. Native audio, 1080p HD, 7-language lip-sync, ~10 second generation. The first and only platform to offer HappyHorse 1.0.