Hybrid Deep Learning Framework for Eye-in-Hand Visual Control Systems

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This work proposes a hybrid deep learning-based framework to visual feedback control an eye-in-hand robotic system. The framework uses an early fusion approach in which real and synthetic images define the training data. The first layer of a ResNet-18 backbone is augmented to fuse interest-point maps with RGB channels, enabling the network to capture scene geometry better. A manipulator robot with an eye-in-hand configuration provides a reference image, while subsequent poses and images are generated synthetically, removing the need for extensive real data collection. The experimental results reveal that this enriched input representation significantly improves convergence accuracy and velocity smoothness compared to a baseline that processes real images alone. Specifically, including feature point maps allows the network to discriminate crucial elements in the scene, resulting in more precise velocity commands and stable end-effector trajectories. Thus, integrating additional, synthetically generated map data into convolutional architectures can enhance the robustness and performance of the visual servoing system, particularly when real-world data gathering is challenging.

Article activity feed