When Bots Browse: Measuring Advertising Visibility in an AI-Agent Internet
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
AI assistants increasingly read webpages on a user's behalf. When that happens, the assistant becomes an intermediary between the publisher and the user, and it is the assistant, not the page layout, that decides what commercial content is seen, repeated, or acted on. This paper reports the first round of an empirical study of commercial signal visibility in AI-mediated web reading. We deployed controlled product references and outbound links across three live WordPress articles, using fifteen embedding formats spanning structured data, metadata, hidden markup, and visible editorial text. We then tested twelve AI agents under standardized prompts designed to mimic realistic "read and report" behavior, and we recorded whether agents (a) surfaced the products, (b) surfaced the URLs, and (c) followed links to a tracked landing page. Across the ten-agent core matrix (170 format × agent combinations), 87.6% produced no surfacing event. Twelve of fifteen embedding formats surfaced in zero agents; three formats: Microdata, RDFa, and Microformats2, each surfaced in 5 of 10 agents (50%), exclusively in the leading consumer AI assistants (all tested Claude variants and both ChatGPT configurations). The controls establish the practical floor: a visible editorial text recommendation was missed by 5 of 10 agents; a browser-rendered sponsored paragraph was missed by 9 of 10. In one cross-page comparison, an agent surfaced hidden head-injected markup on one page while simultaneously missing a visible product recommendation in the article body of a different page, document position within the raw HTML fetch, not human readability, appears to have determined extraction. The three surfacing formats exhibit perfect co-occurrence across all runs, suggesting they constitute a single extraction pathway rather than independent format types. Follow-through to the tracked landing page was initiated by most agents (92% of attempts) but failed at the redirect layer in all but two; click willingness was near-universal while redirect completion was confounded by the test URL pattern. Prompt framing produced a first-order effect: one agent surfaced zero formats under an unprimed prompt and four distinct formats under a commercially framed prompt on identical pages. Taken together, these results suggest that 80–90% of web commercial inventory is structurally invisible in agent-mediated browsing under current delivery conventions, and that the mechanisms of this invisibility, rendering-layer bypass, script-block extraction failure, and prompt dependence, are separable and independently measurable. Two additional Google-stack agents tested under the same protocol produced a within-vendor divergence: Google Gemini surfaced all injected shortlinks on all three pages and followed them to the test landing page, while Google AI Mode surfaced none on identical pages under identical prompts, demonstrating that commercial visibility varies not only across vendors but across AI surfaces within a single vendor.