OmniExtract: An automatic data extraction tool based on Large Language Model and Prompt Engineering

Yibo Wang
Bixia Tang
Sicheng Wu
Yuyan Meng
Demian Kong
Wenming Zhao

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Extracting structured information from documents or scientific papers is crucial for data sharing and retrieval. Recently, Large Language Model (LLM) has shown its impressive ability in text understanding and several tools based on LLM has been developed. However, it’s still difficult to find a universal and user-friendly tool for various practical extraction tasks. To address this challenge, we propose OmniExtract, an automatic data extraction tool with user-friendly configuration files which can adapt to various data extraction tasks. OmniExtract uses a prompt optimized engineering to improve prompt and obtain high performance, and it can support a comprehensive data extraction including text and tables. Evaluation results show that OmniExtract obtains a high accuracy over 80% for 3 datasets. Furthermore, two additional data extraction applications using OmniExtract have been provided, achieving an accuracy of 92.21% and an average F1 score of 0.83 respectively. The data reliability performance shows that OmniExtract is a valuable tool for database updating.

Version published to 10.1101/2025.09.11.675332 on bioRxiv
Sep 13, 2025

topSEARCH: a Comprehensive Tool for the Retrieval and Analysis of Multi-Type Online Resources

This article has 6 authors:
1. Ander Cejudo
2. Yone Tellechea
3. Teresa García-Navarro
4. Amaia Calvo
5. Garazi Artola
6. Nekane Larburu
This article has no evaluationsLatest version Jan 20, 2026
ReviewAid: An Open-Source Tool for Efficient PICO-Based Screening and Data Extraction in Systematic Reviews

This article has 2 authors:
1. Vihaan Sahu
2. Mohith Balakrishnan
This article has no evaluationsLatest version Jan 5, 2026
APPLICATION OF PYTHON LANGUAGE IN SEARCH ENGINE OPTIMIZATION: EXPLORING ITS CONTRIBUTION TO DATA ANALYSIS

This article has 4 authors:
1. Felipe Ivo da Silva
2. Gustavo Camossi
3. Marilde Terezinha Prado Santos
4. Cecílio Merlotti Rodas
This article has no evaluationsLatest version Dec 10, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

topSEARCH: a Comprehensive Tool for the Retrieval and Analysis of Multi-Type Online Resources

ReviewAid: An Open-Source Tool for Efficient PICO-Based Screening and Data Extraction in Systematic Reviews

APPLICATION OF PYTHON LANGUAGE IN SEARCH ENGINE OPTIMIZATION: EXPLORING ITS CONTRIBUTION TO DATA ANALYSIS