Temporal Modeling with Reversible Transformers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Memory efficiency is a critical bottleneck in deep learning models for sequence processing, particularly in long-range dependencies and continuous data streams. In the effort to solve this, we introduce a new architecture denoted as Reversible Temporal Transformer (TempVerseFormer). TempVerseFormer integrates reversible transformer blocks uniquely with a time-agnostic backpropagation strategy that decouples the memory footprint from the temporal depth and enables efficient training on long prediction time ranges. We have tested the model on a procedurally generated dataset involving the rotation of 2D shapes and show that the predictive accuracy of TempVerseFormer is competitive compared to other tested baselines, with memory consumption being practically independent of the time-to-predict. This substantial gain in memory efficiency, achieved in a controlled synthetic environment while not dropping performance on our dataset, places the TempVerseFormer as an indicative candidate for scalable temporal sequence modeling, allowing for real-time adaptation or video analysis at edge devices, and thereby leading to more fiscally responsible and temporally wise resource AI systems that are capable of working with changing and evolving environments.