通义万相 2.1 文生视频模型评测与部署指南

通义万相 2.1 文生视频技术解析

什么是文生视频？

文生视频（Text-to-Video）是利用人工智能技术，通过文本描述生成视频内容的一种创新技术。类似于图像生成技术，文生视频允许用户通过输入简单的文本描述，AI 模型会自动将其转化为动态视频。这种技术广泛应用于创作、广告、教育等领域，为内容创作者提供了新的创作方式和灵感。

通义万相 2.1 文生视频

阿里旗下通义万相宣布推出 2.1 版本模型升级，视频生成、图像生成两大能力均有显著提升。

在视频生成方面，通义万相 2.1 通过自研的高效 VAE 和 DiT 架构增强了时空上下文建模能力，支持无限长 1080P 视频的高效编解码，首次实现了中文文字视频生成功能，登上 VBench 榜单第一。

通义万相 2.1 功能展示

开源仓库代码

开发者可通过 GitHub（https://github.com/Wan-Video/Wan2.1）、HuggingFace（https://huggingface.co/Wan-AI ）平台直接下载并进行体验测试。

开源仓库信息

对于没有特殊手段或者懒得下载不会使用的用户，可以选择使用云服务平台进行一键部署。

部署详情

部署环境与性能测试

硬件配置对比

分别使用 RTX3090 和 RTX4090 进行测试，参数默认。

RTX3090 测试：

Prompt：Create a short video of a peaceful park scene during the golden hour. The sun is setting behind large, lush trees. The camera slowly pans through the park, capturing people walking, jogging, and sitting on benches. Birds are chirping, and there's a gentle breeze rustling through the leaves. The atmosphere is calm, serene, and warm, with soft golden light filtering through the branches.

Negative Prompt: Avoid any dark or eerie elements, such as stormy weather, gloomy skies, or ominous shadows. Do not include any loud or chaotic activities, like running or aggressive movements. The scene should remain calm and pleasant without any distractions, such as animals or people involved in unsettling behavior.

3090 测试结果