A Transformer-Driven Clustering Framework for Image-Based Document Segregation of OCR-Extracted Data

Sahaya Beni Prathiba
Dhanalakshmi Ranganayakulu
Vijay Arunachalam
Uodit Vishvaa
Veera Karthick
Suriya Priya R Asaithambi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid increase in image-based documents across industries such as healthcare, law, and government underscores the need for efficient techniques to organize and extract meaningful insights from unstructured datasets. Traditional methods, including manual sorting and rule-based clustering, fail to effectively handle large-scale, noisy, and heterogeneous datasets, highlighting a significant research gap. To address this, we propose the Enhancing Document Segregation (EDS) model, a framework designed to cluster image-based datasets using a combination of Optical Character Recognition (OCR), semantic analysis, and advanced clustering algorithms. The EDS pipeline extracts text from images via OCR, preprocesses the data to eliminate noise, and generates embeddings using transformer-based models to capture semantic relationships. These embeddings are clustered using K-means, DBSCAN, Gaussian Mixture Models, and agglomerative clustering techniques to verify changes in variable data. Empirical analysis demonstrates the robustness of the EDS model in improving clustering accuracy and efficiency, particularly in noisy and complex datasets. Integrating theoretical foundations with practical clustering methodologies ensures the EDS model delivers a scalable solution for real-world challenges, enhancing document organization and retrieval in critical domains.

Version published to 10.21203/rs.3.rs-7372555/v1 on Research Square
Sep 10, 2025

Enhancing Medical Anomaly Detection via Text-Adapted Few-Shot Learning with Visual-Language Models

This article has 5 authors:
1. Keming Mao
2. Shengbin Hou
3. Haoming Fang
4. Jianzhe Zhao
5. Xinlu Xiao
This article has no evaluationsLatest version Jan 12, 2026
A Multimodal Adaptive Graph-based Intelligent Classification Model for Fake News

This article has 1 author:
1. Junhao Xu
This article has no evaluationsLatest version Jan 20, 2026
Towards Scalable Monitoring: A Robust Few-Shot Multimodal Framework for Migration Detection on TikTok

This article has 3 authors:
1. Dimitrios Taranis
2. Gerasimos Razis
3. Ioannis Anagnostopoulos
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing Medical Anomaly Detection via Text-Adapted Few-Shot Learning with Visual-Language Models

A Multimodal Adaptive Graph-based Intelligent Classification Model for Fake News

Towards Scalable Monitoring: A Robust Few-Shot Multimodal Framework for Migration Detection on TikTok