Efficient Deployment of a 685B-Parameter Open-Source LLM on the Brazilian Santos Dumont Supercomputer

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This brief communication presents the deployment of open-source large language models (LLMs) on the Santos Dumont Brazilian Supercomputer to support the Laborat´orio Nacional de Computa¸c˜ao Cient´ıfica (LNCC) academic community. The solution, named Carcar´a, leverages key optimizations such as dynamic quantization techniques to enable an accessible, scalable, and cost-effective implementation. As a result, a single-instance stateof- the-art independent model was deployed on a single node equipped with 4 × NVIDIA H100 GPUs. The quantized models fit entirely within the node’s VRAM, enabling horizontal scaling without the need for inter-node synchronization. In this way, we could provide access to these models to the entire LNCC academic community using only four computational nodes, demonstrating the efficiency and scalability of this approach. Crucially, our approach ensures data sovereignty, allowing LNCC researchers and postgraduate students to utilize AI for sensitive research topics with full control over their data, free from the privacy risks associated with proprietary solutions. This initiative strengthens national scientific autonomy while providing secure and efficient AI tools for academic and research advancements. The code to deploy this solution is openly available, and we encourage other institutions to adapt it and support their own communities. Discussions with the Brazilian Ministry of Science, Technology and Innovation are underway to expand this strategic solution to other research centers and universities in Brazil.

Article activity feed