A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies

Jorge Villarrubia
Luis Costero
Francisco D. Igual
Katzalin Olcoz

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

To mitigate the increasingly common underutilization of computational resources in modern GPUs, spatial sharing methods enable multiple applications to use them simultaneously. This work presents a comprehensive evaluation of NVIDIA's primary technologies to achieve that goal: Multi-Process Service (MPS) and Multi-Instance GPU (MIG). Our findings reveal a crucial trade-off between MPS's flexibility and MIG's isolation, and reveals many key insights for improving the co-execution strategy according to jobs profiles. In the most favorable scenarios, MPS improves performance by up to 30% and reduces energy by about 20%, using its provisioning option to avoid resource monopolization. However, under memory contention, it suffers severe degradation, worsening performance by around 30%. Conversely, MIG's full hardware isolation resolves memory contention, leading to more consistent improvements, but these gains are tempered by higher overhead, and its rigid scheme can degrade performance in certain cases.

Version published to 10.21203/rs.3.rs-7849499/v1 on Research Square
Nov 11, 2025

Implementation and Performance Optimization of a DPDK Packet Gateway on Manycore CPUs

This article has 1 author:
1. Daisuke Sugisawa
This article has no evaluationsLatest version Jan 19, 2026
Implementation and Evaluation of MemGuard in the Bao Hypervisor

This article has 2 authors:
1. Everaldo Gomes
2. Giovani Gracioli
This article has no evaluationsLatest version Jan 19, 2026
Parallel Architectures for Large - Scale Document Processing:Integrating OCR and RAG Pipelines

This article has 4 authors:
1. Alejandro Jaime
2. Veronica Gil-Costa
3. Marcelo Errecalde
4. Leticia Cagnina
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Implementation and Performance Optimization of a DPDK Packet Gateway on Manycore CPUs

Implementation and Evaluation of MemGuard in the Bao Hypervisor

Parallel Architectures for Large - Scale Document Processing:Integrating OCR and RAG Pipelines