"Make it Pop, but not Like That": A Taxonomy of Iterative Prompting Strategies for Refining AI-Generated Web Interfaces

Zhenjiang Song

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The rapid proliferation of Large Language Models (LLMs) and generative tools (e.g., GPT-4, Tongyi Lingma, Trae) has fundamentally democratized the landscape of web development, shifting the paradigm from manual syntax construction to natural language intent specification. However, while the barrier to "drafting" initial code has been lowered, a significant "Refinement Crisis" has emerged. As task complexity scales from static landing pages to dynamic, stateful applications, novice users encounter a profound "Gulf of Evaluation" when attempting to repair AI-generated errors. Unlike text generation, where errors are semantic and visible, web interface generation involves a complex interplay between visual presentation (CSS) and invisible state management (JavaScript). In this paper, we present a large-scale observational study with 200 novice participants tasked with utilizing IDE-integrated AI assistants to build a fully functional CRUD (Create, Read, Update, Delete) note-taking application. Through a rigorous analysis of interaction logs and source code snapshots, we reveal that while 90% of users could generate a baseline prototype, 80% encountered severe "invisible state" breakdowns (e.g., data persistence failure), and 50% suffered from persistent layout regressions . We contribute a detailed taxonomy of four repair strategies: Perceptual Refinement , Behavioral Correction , Diagnostic Proxy , and Global Reset . Furthermore, we characterize the "Whack-a-Mole" effect —a phenomenon where repairing visual elements inadvertently corrupts functional logic due to the AI's lack of holistic state awareness. Our findings provide empirical evidence for the limitations of current chat-based coding interfaces and offer critical design implications for future "State-Aware" AI IDEs that reify invisible data flows to bridge the gap between user intent and execution.

Version published to 10.21203/rs.3.rs-8994174/v1 on Research Square
Mar 8, 2026

Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

This article has 1 author:
1. Tanvir Hasan
This article has no evaluationsLatest version Apr 17, 2026
Same Prompt, Different Answer: Exposing the Reproducibility Illusion in Large Language Model APIs

This article has 5 authors:
1. Lucas Rover
2. Hugo Siqueira
3. Anibal Azevedo
4. Eduardo Bacalhau
5. Yara Tadano
This article has no evaluationsLatest version Mar 13, 2026
A Taylorizable Process for Textual Detector Development

This article has 5 authors:
1. Ryan Shaun Baker
2. Caitlin Mills
3. Caitlin Mills
4. Andrew Lan
5. Amanda Barany
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Bridging Developer–QA Gaps Using Large Language Models and Automation: A Pilot Evaluation of AutoVisQA

Same Prompt, Different Answer: Exposing the Reproducibility Illusion in Large Language Model APIs

A Taylorizable Process for Textual Detector Development