Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

Tanvir Hasan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper presents AutoVisQA, a fully autonomous QA framework that addresses the persistent Developer–QA communication gap in modern CI/CD environments. Visual correctness remains largely manual and subjective: two professionals examining the same UI change routinely perceive it differently, triggering emotionally draining disputes that slow releases and erode team trust. AutoVisQA resolves this by replacing human perception with deterministic automated evidence. The framework crawls web applications autonomously, captures pixel-perfect screenshots under controlled conditions, computes hybrid pixel-plus-perceptual diffs augmented by MSE and PSNR metrics, correlates visual changes with performance data, and generates plain-language explanations via a GPT-4-class LLM. A pilot evaluation across two publicly accessible applications—a stable healthcare demo and a high-dynamic live news site—over four crawl sessions (10–11 November 2025) demonstrated correct zero-regression detection on the stable application and accurate identification of five distinct visual changes on the dynamic site (diff%: 0.00–6.09%, MSE: 0.03–1128.60, PSNR: 17.61–63.75 dB). Average page load time was 31.45 s across 48 captured screenshots, with load stability confirmed across all four sessions. The framework correctly distinguished genuine regressions from expected dynamic-content updates without manual configuration—directly addressing the false-positive problem that undermines trust in automated QA tools. The full implementation is publicly available at https://github.com/Tanveerrifu/AutoVisQA.

Version published to 10.21203/rs.3.rs-9383307/v1 on Research Square
Apr 17, 2026

AI-Assisted Test Scope Recommendation for Manual QA: A Framework and Evaluation

This article has 1 author:
1. Arbaz Surti
This article has no evaluationsLatest version Apr 8, 2026
A Taylorizable Process for Textual Detector Development

This article has 5 authors:
1. Ryan Shaun Baker
2. Caitlin Mills
3. Caitlin Mills
4. Andrew Lan
5. Amanda Barany
This article has no evaluationsLatest version Apr 8, 2026
ARPG+: Teaching Students to Ask Effective Questions for Educational LLM Use

This article has 6 authors:
1. Pei-Gen Ye
2. Kanghua Mo
3. Yucheng Long
4. Mengyun Liu
5. Haiwei Sang
6. Jun Zheng
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

AI-Assisted Test Scope Recommendation for Manual QA: A Framework and Evaluation

A Taylorizable Process for Textual Detector Development

ARPG+: Teaching Students to Ask Effective Questions for Educational LLM Use