Quantifying the relative impact of embedding models and system architecture on first stage dense retrieval performance

Eliseo Curcio

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Retrieval-augmented generation (RAG) systems increasingly depend on dense retrieval as a critical subsystem that constrains end-to-end performance, reliability, and cost. In practice, retrieval quality is often attributed primarily to embedding model choice, while the impact of system-level design decisions remains less well quantified under operational constraints. This paper presents a controlled empirical study of first-stage dense retrieval, comparing the effect of embedding model selection with architectural choices, specifically index structure and retrieval-unit granularity. Holding the corpus, evaluation protocol, and similarity function constant, we evaluate a lightweight baseline encoder and a modern retrieval-optimized encoder across exact and approximate vector search configurations and multiple chunk sizes. Performance is measured using standard retrieval metrics alongside tail-latency statistics representative of production workloads. The results show that embedding model upgrades produce limited and inconsistent improvements in first-stage retrieval quality while substantially increasing computational cost, whereas system-level design choices induce large and predictable shifts in the quality latency trade-off. These findings indicate that architectural decisions often dominate embedding choice in determining first-stage retrieval behavior and provide practical guidance for prioritizing engineering effort in retrieval-backed systems.

Version published to 10.21203/rs.3.rs-8799068/v1 on Research Square
Mar 17, 2026

Graph-Based RAG for Manuscript Collections: A LangGraph Approach

This article has 4 authors:
1. Yahya Momtaz
2. Guido Russo
3. Massimo Brescia
4. Luisa Di Landa
This article has no evaluationsLatest version Apr 3, 2026
RCT-MARS: When Per-Query Retrieval Routing Fails, and What It Takes to Succeed

This article has 1 author:
1. Ankit Srivastava
This article has no evaluationsLatest version Apr 7, 2026
Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning

This article has 2 authors:
1. Tianrui Zhao
2. Linyu Wu
This article has no evaluationsLatest version Mar 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Graph-Based RAG for Manuscript Collections: A LangGraph Approach

RCT-MARS: When Per-Query Retrieval Routing Fails, and What It Takes to Succeed

Adaptive-PEFT: Dynamic Rank Adjustment for Efficient and Enhanced Large Language Model Fine-Tuning