slotmassacre| China's first Sora-level video model release performance is fully benchmarked with the international top level

Date: 5个月前 (04-27)View: 62Comments: 0

Phoenix New Media Science and Technology News (author / Yu Lei) April 27thSlotmassacreAt the Future artificial Intelligence Pioneer Forum of Zhongguancun (000931) Forum today, Sheng Mathematical Science and Technology and Tsinghua University officially released China's first long-term, high-consistency, high-dynamic video model-Vidu. Vidu is the first major video model in the world to make a major breakthrough since the release of Sora, with comprehensive performance matching the international top level, and accelerating iterative improvement.

The model uses the team's original Diffusion and Transformer architecture U-ViT, which supports one-click generation of high-definition video content with a resolution of up to 1080p and up to 16 seconds. Vidu can not only simulate the real physical world, but also has rich imagination, multi-shot generation, high temporal and spatial consistency and so on.

Vidu's rapid breakthrough comes from the team's long-term accumulation of Bayesian machine learning and multimodal large models and a number of original achievements. Its core technology, U-ViT architecture, was put forward by the team in September 2022. DiT architecture, which was adopted earlier than Sora, is the first integration of Diffusion and Transformer in the world. In March 2023, the team opened up UniDiffuser, the world's first multimodal diffusion model based on U-ViT fusion architecture, and took the lead in completing the large-scale scalability verification of U-ViT architecture.

Based on the in-depth understanding of U-ViT architecture and long-term accumulated engineering and data experience, the team further broke through many key technologies of long video representation and processing in just two months, and developed a large model of Vidu video, which significantly improved the consistency and dynamics of video.

The advent of Vidu is not only another successful verification of U-ViT fusion architecture in large-scale visual tasks, but also represents the continuous innovation and leadership of Health Digital Technology in the field of multimodal native large models. As a general visual model, Vidu can support the generation of more diversified and longer-lasting video content. At the same time, for the future, the flexible architecture will be compatible with a wider range of modes and further expand the boundaries of multimodal universal capabilities.

Tags:

Prev: winneramacasinonodepositbonus| Shanghai Rural Commercial Bank signed contracts with 19 private enterprises to draw a new chapter in financial services
Next: ridgewoodbingo| Taisheng Wind Energy's 2023 revenue of 4.813 billion yuan, net profit of 292 million yuan, general manager Zou Tao's salary of 1.007 million yuan