From Centralized to Composable: Advances in Distributed and Multimodal Language Modeling

Irma Mirta
Klaus Elli

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The advent of large language models (LLMs) has ushered in a new era of general-purpose artificial intelligence capable of language understanding, generation, and reasoning. Recent progress has further extended these capabilities to the multimodal domain, where models integrate textual, visual, auditory, and sensory information. As the scale and complexity of both unimodal and multimodal LLMs grow, centralized training and inference become increasingly infeasible due to computational, memory, energy, and privacy constraints. Distributed architectures—spanning data parallelism, model parallelism, pipeline sharding, federated learning, and expert specialization—have emerged as necessary frameworks for scalable deployment and collaboration across modalities and domains. This survey provides a comprehensive review of the advances, methodologies, and challenges in distributed LLMs and multimodal large language models (MLLMs). We begin with a mathematical characterization of distributed LLM architectures, followed by a taxonomy of communication-efficient fusion mechanisms, parallel training strategies, and alignment techniques across modalities. We discuss theoretical underpinnings of multimodal representation learning, including universal approximation, modality-aware attention, and conditional computation. We then examine training paradigms across decentralized and federated infrastructures, emphasizing optimization under partial supervision, low-bandwidth constraints, and heterogeneity in hardware and data distribution. The survey further explores real-world applications, deployment architectures, and constraints on inference latency, energy efficiency, and privacy. We highlight deployment patterns ranging from cloud-centric to fully edge-based systems and discuss model compression techniques including quantization, pruning, and mixture-of-experts routing. The final sections identify open research challenges in scalability, cross-modal generalization, robustness to missing or adversarial modalities, and the interpretability and safety of distributed MLLMs. We conclude with future directions that emphasize sustainability, democratization, and theoretical convergence in multimodal distributed intelligence. This survey aims to serve as a foundational resource for researchers and practitioners building the next generation of distributed, multimodal, and human-aligned AI systems.

Version published to 10.31224/4624
May 14, 2025

Multimodal and Distributed LLMs: Bridging Scalability and Cross-Modal Reasoning

This article has 4 authors:
1. Rajesh Kumar
2. Isabelle Laurent
3. David Müller
4. Klaus Elli
This article has no evaluationsLatest version May 15, 2025
A Comparative Survey of Large Language Models: Foundation, Instruction-Tuned, and Multimodal Variants

This article has 2 authors:
1. Owen Graham
2. Jim Balford
This article has no evaluationsLatest version Jun 13, 2025
A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

This article has 1 author:
1. Snehil Shrivastava
This article has no evaluationsLatest version Jun 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Multimodal and Distributed LLMs: Bridging Scalability and Cross-Modal Reasoning

A Comparative Survey of Large Language Models: Foundation, Instruction-Tuned, and Multimodal Variants

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation