VI-OCR: "Visually Impaired" Optical Character Recognition Pipeline for Text Accessibility Assessment

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Low vision adversely impacts daily activities, particularly reading. However, quantifying text accessibility for different levels of low vision is challenging, leading to product designs that often overlook the vision status of low vision readers. In this paper, we bridge the gap between computer vision and low vision fields by introducing a text accessibility assessment pipeline called VI-OCR (short for Visually Impaired Optical Character Recognition), based on state-of-the-art OCR models. VI-OCR mimics human text recognition ability under specified levels of visual acuity and contrast sensitivity loss, to estimate whether text of a given size would be recognizable for a low vision human reader. We benchmarked specialized OCR models and vision-language models in replicating text recognition performances with visual acuity and contrast sensitivity deficits across three reading tasks: letter acuity using ETDRS charts, word acuity using MNREAD charts, and scene text recognition using complex real-life images. Comparing model performance to that of normal vision participants on degraded texts revealed major issues in some models including limited generalizability across reading tasks, difficulties dealing with severe contrast reduction, and overperforming rather than mimicking human observers. However, robust human-like performance of winning models such as Qwen and Gemini supports the feasibility of VI-OCR in assessing text accessibility.

Article activity feed