Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction

Gaurav Shrivastava
Abhinav Shrivastava

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Diffusion models have made significant strides in image generation, mastering tasks such as unconditional image synthesis, text-image translation, and image-to-image conversions. However, their capability falls short in the realm of video prediction, mainly because they treat videos as a collection of independent images, relying on external constraints such as temporal attention mechanisms to enforce temporal coherence. In our paper, we introduce a novel model class, that treats video as a continuous multi-dimensional process rather than a series of discrete frames. We also report a reduction of 75% sampling steps required to sample a new frame thus making our framework more efficient during the inference time. Through extensive experimentation, we establish state-of-the-art performance in video prediction, validated on benchmark datasets including KTH, BAIR, Human3.6M, and UCF101.1

Version published to 10.32388/dm98uz
Dec 20, 2024

A Latent Space Diffusion Transformer for High-Quality Video Frame Interpolation

This article has 2 authors:
1. Wei Chen
2. Jiing Fang
This article has no evaluationsLatest version Dec 17, 2025
Novel Nesting of Deep Learning Domain Transfer and Hybrid Video Coding for Video Compression

This article has 4 authors:
1. Shaohua Jia
2. Wan-Chi Siu
3. Pengyu Liu
4. Kebin Jia
This article has no evaluationsLatest version Jan 14, 2026
Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI

This article has 2 authors:
1. Marcus Elvain
2. Howard Pellorin
This article has no evaluationsLatest version Dec 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Latent Space Diffusion Transformer for High-Quality Video Frame Interpolation

Novel Nesting of Deep Learning Domain Transfer and Hybrid Video Coding for Video Compression

Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI