Improvement of YOLOv8 algorithm through integration of Pyramid Vision Transformer architecture

Zhiqiang Dong
Shu Yang
Yang Xiao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Addressing the issue of poor target detection accuracy in complex backgrounds with the YOLOv8s model, this chapter proposes an improved YOLOv8s model that incorporates the Pyramid Vision Transformer (PVT). Specifically, to enhance the feature extraction capabilities of the base module, this paper proposes using PVT in the Backbone stage of YOLOv8s to replace the previous basic convolutional feature extraction blocks. This structure allows the model to process images at different resolution levels, thereby more effectively capturing details and contextual information.

Version published to 10.21203/rs.3.rs-4987159/v1 on Research Square
Oct 22, 2024

A Hybrid YOLOv5s-Faster R-CNN Architecture for Object Detection in Complex Road Scenes

This article has 3 authors:
1. Lenard Nkalubo Byenkya
2. Rose Nakibuule
3. Danison Taremwa
This article has no evaluationsLatest version Jan 21, 2026
MeMVSNet: Monocular Depth Enhanced Multi-view Reconstruction

This article has 4 authors:
1. Cui Haohao
2. Di Yanqiang
3. Meng Xianguo
4. Feng Shaochong
This article has no evaluationsLatest version Jan 23, 2026
TriORU2-Net++: Attention-Guided Three-StageU2-Net++ for Light Field Occlusion Removal

This article has 5 authors:
1. Mostafa Farouk Senussi
2. Mahmoud Abdalla
3. Mahmoud SalahEldin Kasem
4. Mohamed Mahmoud
5. Hyun-Soo Kang
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Hybrid YOLOv5s-Faster R-CNN Architecture for Object Detection in Complex Road Scenes

MeMVSNet: Monocular Depth Enhanced Multi-view Reconstruction

TriORU2-Net++: Attention-Guided Three-StageU2-Net++ for Light Field Occlusion Removal