3D Instance Segmentation using Deep Learning

Muhammad Yasir Siddiqui

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

To endow machines with the ability to perceive the real-world in a three-dimensional representation as we do as humans is a fundamental and longstanding topic in Artificial Intelligence. One of the important goals is to understand the geometric structure and semantics of the 3D environment given various types of visual inputs such as images or point clouds acquired by 2D/3D sensors. Traditional approaches usually leverage handcrafted features to estimate the shape and semantics of objects or scenes. However, they struggle to overcome critical issues caused by visual occlusions and find it challenging to generalize to novel objects and scenarios. In contrast, understanding scenes and the objects within each-other is the goal of deep neural networks trained on large-scale real-world 3D data to learn general and robust representations. To achieve these aims, from object-level 3D shape estimation from single or multiple views to scene-level semantic understanding, this research made three key contributions. In Chapter 3, we start by estimating the full 2D shape of a small detailed defect as an object from a single image. To recover a dense 2D detection with geometric details, a powerful architecture with a bounding box is proposed to learn feasible geometric priors from small-scale 2D defect repositories. In Chapter 4, we extend our study to 3D instance segmentation which is used to detect multiple objects in an indoor environment using an RGB-D sensor. From RGB images captured from the sensor, first Mask R-CNN is adopted to take the 2D instance segmentation. The results of segmented regions of objects are combined with the depth image of the sensor and produced segmented depth regions of individual objects. The depth points are transferred to 3D coordinates expressing 3D instance segmentation. The experimentation results show that the proposed algorithm produces good performance in the test of zoom-in and zoom-out view of the scene. As a result, the proposed 3D instance segmentation algorithm can be applied to an intelligent robot to enhance cognitive capability in the real world.

Version published to 10.20944/preprints202411.2177.v1
Nov 28, 2024

Visual Localisation Using Deep Learning and Graph Neural Networks: Approaches and Evaluation

This article has 1 author:
1. Dinesh Kumar Koilada
This article has no evaluationsLatest version Aug 20, 2025
Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection

This article has 3 authors:
1. R. Ramana
2. V. Vasudevan
3. B. S. Murugan
This article has no evaluationsLatest version Aug 21, 2025
Deep-DSO: Improving Mapping of Direct Sparse Odometry Using CNN-based Single Image Depth Estimation

This article has 7 authors:
1. Erick P. Herrera-Granda
2. Juan C. Torres-Cantero
3. Israel D. Herrera-Granda
4. José F. Lucio-Naranjo
5. Andrés Rosales
6. Javier Revelo-Fuelagán
7. Diego H. Peluffo-Ordóñez
This article has no evaluationsLatest version Oct 8, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Visual Localisation Using Deep Learning and Graph Neural Networks: Approaches and Evaluation

Spectral Pyramid Pooling and Fused Keypoint Generation in ResNet-50 for Robust 3D Object Detection

Deep-DSO: Improving Mapping of Direct Sparse Odometry Using CNN-based Single Image Depth Estimation