Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents AutoVisQA, a fully autonomous QA framework that addresses the persistent Developer–QA communication gap in modern CI/CD environments. Visual correctness remains largely manual and subjective: two professionals examining the same UI change routinely perceive it differently, triggering emotionally draining disputes that slow releases and erode team trust. AutoVisQA resolves this by replacing human perception with deterministic automated evidence. The framework crawls web applications autonomously, captures pixel-perfect screenshots under controlled conditions, computes hybrid pixel-plus-perceptual diffs augmented by MSE and PSNR metrics, correlates visual changes with performance data, and generates plain-language explanations via a GPT-4-class LLM. A pilot evaluation across two publicly accessible applications—a stable healthcare demo and a high-dynamic live news site—over four crawl sessions (10–11 November 2025) demonstrated correct zero-regression detection on the stable application and accurate identification of five distinct visual changes on the dynamic site (diff%: 0.00–6.09%, MSE: 0.03–1128.60, PSNR: 17.61–63.75 dB). Average page load time was 31.45 s across 48 captured screenshots, with load stability confirmed across all four sessions. The framework correctly distinguished genuine regressions from expected dynamic-content updates without manual configuration—directly addressing the false-positive problem that undermines trust in automated QA tools. The full implementation is publicly available at https://github.com/Tanveerrifu/AutoVisQA.

Article activity feed