Best Practices for Using Large Language Models at Scale

Bhargavee Kannikanti
Arjun Coimbatore Nagarasan
Alberto Rosas
Sriram Kothandaraman
Sravan Kumar Kannuri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The proliferation of Large Language Models (LLMs) has transformed numerous domains in natural language processing (NLP) and various artificial intelligent (AI)-driven applications. However, efficiently scaling these models involves challenges related to latency, cost, and system complexity. This paper presents a comprehensive set of best practices structured around key areas including direct access to vector databases, direct invocation of OpenAI LLM APIs, optimal scaling of computational resources, reranking of AI search results, dynamic adjustment of context chunk counts, and dynamic model selection to balance cost and quality. It also explores understanding usage modes for cost optimization, leveraging vector caching to reduce embedding expenses, and addressing networking overhead impacts on latency in large-scale generative AI API calls. Together, these guidelines enable scalable, high-performance, and cost-effective LLM deployments in enterprise environments.

Version published to 10.21203/rs.3.rs-8329621/v1 on Research Square
Dec 12, 2025

Small Language Models: Architecture, Evolution, and the Future of Artificial Intelligence

This article has 5 authors:
1. Ankit Parag Shah
2. Mohammad-Parsa Hosseini
3. Su Min Park
4. Connie Miao
5. Wei Wei
This article has no evaluationsLatest version Jan 13, 2026
Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

This article has 5 authors:
1. Deepshikha Bhati
2. Fnu Neha
3. Devi Sri Bandaru
4. Matthew Weber
5. Ishan Dilipbhai Gajera
This article has no evaluationsLatest version Jan 15, 2026
Stakeholder Involvement and Planning Based on Large Language Models

This article has 2 authors:
1. Lachlan
2. Haoran
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Small Language Models: Architecture, Evolution, and the Future of Artificial Intelligence

Large Language Models: A Survey of Architectures, Training Paradigms, and Alignment Methods

Stakeholder Involvement and Planning Based on Large Language Models