Our Recent Tests with AI Text-to-Video Models

We’ve been spending some time testing different AI text-to-video models lately, focusing on how they handle character movements and interactions. What we found is that each model has its own strengths depending on what you’re trying to create. There’s no single “best” option—it really depends on your specific use case.

Here’s what we discovered during our testing:

Kling 2.1 Master: Great for detailed single actions

When it comes to subtle, focused movements—like facial expressions or eye contact—Kling 2.1 Master performs really well.

We tested it with a prompt like “a woman looks up from reading and makes direct eye contact with the camera,” and the results were clean and natural. The transition from focused reading to a gentle smile felt authentic, and it even captured small details like the reflection on her glasses.

Hailuo: Smooth social interactions

For scenes involving multiple people and emotional exchanges, Hailuo consistently delivered good results.

With prompts like “two people shake hands and have a brief chat,” we got videos with warm color tones and natural-feeling interactions. The flow between movements and dialogue felt realistic and engaging.

Seedance: Authentic everyday moments

When we wanted to capture realistic daily life scenarios, Seedance stood out.

For example, we tried “A person cooking while a child tugs at their clothes for attention, showing gentle but firm responses.” The generated scene felt organic—complete with steam rising from the pot and natural multitasking between cooking and parenting.

Veo 3: Complex scene management

For busy, multi-layered scenes with lots of moving parts, Veo 3 handled the complexity well.

We tested something challenging: “A person having an emotional breakdown in public, with different bystanders showing varied reactions – some helping, others avoiding, some recording” Veo 3 managed to balance the background activity, blend characters naturally into the environment, and maintain believable emotional depth across different reactions.

Wan 2.2: Professional production quality

For polished entertainment content, Wan 2.2 delivered the most professional results.

For prompts like “Kpop Idol group performing synchronized dance routine with stylish stage outfits, coordinated movements, and charismatic stage presence,” it produced videos with smooth choreography, detailed costumes, and strong stage presence—exactly what you’d want for high-quality entertainment content.


What we learned

Each model seems to have developed its own specialty. Some excel at individual character work, others handle group dynamics better, and some are built for complex or high-production scenarios.

If you’re working with AI video generation, we’d recommend testing a few different models for your specific use case. We’re continuing to experiment and will share more findings as we go.

Got any other interesting findings about models to share? Drop us a line: support@haimeta.com