Towards HydroLLM: Building a Domain-Specific Language Model for Hydrology

Dilara Kizilkaya
Yusuf Sermet
Ibrahim Demir

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As large language models (LLMs) continue to expand, their effective adaptation to specialized fields remains a critical challenge. This work presents an initial step toward the development of HydroLLM, a domain-specific LLM for hydrology. We construct a dataset of approximately 8,800 hydrology-focused question–answer pairs, each with a supporting context passage drawn from textbooks and scientific articles. The dataset includes four instructional formats: multiple choice, true/false, fill-in-the-blank, and open-ended. Using this corpus, we fine-tune several LLMs of varying type and scale—from compact (1.5B) to large (32B) parameter counts using parameter-efficient LoRA (Low-Rank Adaptation) methods. Our methodology compares different fine-tuned models and evaluates model performance using accuracy and cosine similarity metrics across task types. Results show that larger model size is not always advantageous: among the fine-tuned models, the 8B DeepSeek Llama variant achieved the strongest overall performance, while the 32B model overfit and the 1.5B model underperformed—emphasizing the need to match model capacity to dataset size. This work demonstrates that effective domain adaptation requires careful consideration of model architecture, parameter count, and task complexity, with fill-in-the-blank tasks proving particularly challenging across all models. By establishing performance and identifying the limits of current fine-tuning approaches, we took a concrete step toward building HydroLLM as a robust, domain-specific language model for hydrological analysis and decision support.

Version published to 10.31223/x51h99
Jul 14, 2025

Domain-Specific Embedding Models for Hydrology and Environmental Sciences: Enhancing Semantic Retrieval and Question Answering in RAG Pipelines

This article has 3 authors:
1. Ramteja Sajja
2. Yusuf Sermet
3. Ibrahim Demir
This article has no evaluationsLatest version Jul 13, 2025
Embracing Large Language Model (LLM) Technologies in Hydrology Research

This article has 9 authors:
1. Zewei Ma
2. Bin Peng
3. Zhenrui Yue
4. Huimin Zeng
5. Ming Pan
6. xiaocui wu
7. Jie Yang
8. Liting Mai
9. Kaiyu Guan
This article has no evaluationsLatest version Jun 19, 2025
Fine-Tuning a Multilingual Translation Model for Financial Crime Data

This article has 2 authors:
1. Ravi Kumar Mishra
2. Avadhoot Suresh Jathar
This article has no evaluationsLatest version Jul 2, 2025

Listed in

Abstract

Article activity feed

Related articles

Domain-Specific Embedding Models for Hydrology and Environmental Sciences: Enhancing Semantic Retrieval and Question Answering in RAG Pipelines

Embracing Large Language Model (LLM) Technologies in Hydrology Research

Fine-Tuning a Multilingual Translation Model for Financial Crime Data