Information theory for data-driven model reduction in physics and biology

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Model reduction is the construction of simple yet predictive descriptions of the dynamics of many-body systems in terms of a few relevant variables. A prerequisite to model reduction is the identification of these relevant variables, a task for which no general method exists. Here, we develop a systematic approach based on the information bottleneck to identify the relevant variables, defined as those most predictive of the future. We elucidate analytically the relation between these relevant variables and the eigenfunctions of the transfer operator describing the dynamics. Further, we show that in the limit of high compression, the relevant variables are directly determined by the slowest-decaying eigenfunctions. Our information-based approach indicates when to optimally stop increasing the complexity of the reduced model. Furthermore, it provides a firm foundation to construct interpretable deep learning tools that perform model reduction. We illustrate how these tools work in practice by considering uncurated videos of atmospheric flows from which our algorithms automatically extract the dominant slow collective variables, as well as experimental videos of cyanobacteria colonies in which we discover an emergent synchronization order parameter.

The first step to understand natural phenomena is to intuit which variables best describe them. An ambitious goal of artificial intelligence is to automate this process. Here, we develop a framework to identify these relevant variables directly from complex datasets. Very much like MP3 compression is about retaining information that matters most to the human ear, our approach is about keeping information that matters most to predict the future. We formalize this insight mathematically and systematically answer the question of when to stop increasing the complexity of minimal models. We illustrate how interpretable deep learning tools built on these ideas reveal emergent collective variables in settings ranging from satellite recordings of atmospheric fluid flows to experimental videos of cyanobacteria colonies.

Article activity feed