Shengshu Technology and Tsinghua University have unveiled Vidu, a powerful text-to-video generator that can create 16-second clips at 1080p resolution with a simple click.[1][6] Vidu is positioned as a competitor to OpenAI's Sora, with a similar video generation capability.[1][6]
Vidu is built on the innovative Universal Vision Transformer (U-ViT) architecture, which predates the diffusion transformer (DiT) architecture used by Sora.[1][6] Vidu can generate videos with intricate scenes, realistic lighting and shadows, and detailed facial expressions.[1][6] It also has multi-camera capabilities for dynamic shots.[1][6]
While Vidu is a remarkable achievement showcasing China's advancements in AI research, a direct comparison to Sora reveals that Vidu's visual output lags behind in terms of fidelity and realism.[1][6] However, Vidu's temporal consistency is noteworthy, and the technology has significant potential for further refinement and enhancement.[1][6]