CAHT: A Constraint-Aware Heterogeneous Transformer for Real-Time Multi-Robot Task Allocation in Warehouse Environments

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The NP-hard coordination of heterogeneous robots for time-windowed warehouse tasks remains challenging: metaheuristics are precise but slow, whereas neural methods cannot handle heterogeneous constraints, leading to infeasible allocations. This paper presents the Constraint-Aware Heterogeneous Transformer (CAHT), a lightweight encoder–decoder architecture that performs end-to-end task assignment and sequencing in a single forward pass. The central innovation is a dynamic feasibility masking mechanism that enforces capacity and energy constraints directly within the softmax computation, eliminating infeasible allocations at the architectural level. This is complemented by a spatial-bias Transformer encoder and a two-stage supervised–reinforcement learning training paradigm using ALNS-generated labels. Experiments across four problem scales (5–20 robots, 50–200 tasks) demonstrate that CAHT achieves objective values within 7–13% of the ALNS reference while being 29–91× faster (23–104 ms vs. 2–3 s). Constraint violation rates remain below 6% with time-window satisfaction above 94%. Ablation analysis identifies dynamic masking as the dominant contribution (+213% degradation upon removal), and cross-scale generalization reveals that the optimality gap decreases from 13.0% to 10.7% as problem scale grows. With only 0.82M parameters, CAHT occupies a previously vacant region on the speed–quality Pareto frontier, offering a practical path toward real-time autonomous warehouse coordination.

Article activity feed