From the team behind Stable Diffusion XL, comes Stable Video Diffusion, which essentially uses two generative AI-based models to create multi-video views from images. These video models can be quickly adapted to various downstream tasks, such as multi-view synthesis from a single image with fine-tuning on multi-view datasets.
Stable Video Diffusion is basically a latent video diffusion model for high-resolution, cutting edge text-to-video and image-to-video generation. Unlike other latent diffusion models trained for 2D image synthesis that have been turned into generative video models by inserting temporal layers, this one is capable of generating video 14-25 frames long at speeds between 3-30 frames per second at 576 × 1024 resolution. Get the code here.
Sale
Acer Nitro 5 AN515-58-525P Gaming Laptop |Core i5-12500H | NVIDIA GeForce RTX 3050 Laptop GPU | 15.6″ FHD…
- Take your game to the next level with the 12th Gen Intel Core i5 processor. Get immersive and competitive performance for all your games.
- RTX, It’s On: The latest NVIDIA GeForce RTX 3050 (4GB dedicated GDDR6 VRAM) is powered by award-winning architecture with new Ray Tracing Cores,…
- Picture-Perfect. Furiously Fast: With the sharp visuals of a 15.6” Full HD IPS display with a lightning-quick 144Hz refresh rate, your game sessions…
However, training methods in the literature vary widely, and the field has yet to agree on a unified strategy for curating video data. In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video fine-tuning,” said the team.
[Source]