Harnessing Large Language Models for Structured Extraction of CYP–Substance Interactions from Biomedical Texts

Mariam Alkarmouty
Junya Ooka
Fumiyoshi Yamashita

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Building on our previous work in biomedical text mining, we revisit the extraction of cytochrome P450 (CYP) and substance interactions using recent advances in large language models (LLMs). We present a scalable, high-accuracy framework that leverages the ChatGPT O3-mini model, employing optimized prompting and batch processing without relying on dictionaries or domain-specific ontologies. Our system achieves strong performance, with recall and precision of 0.950 and 0.978 across all CYP targets, and 0.923 and 0.993 for CYP3A4 specifically. This represents a substantial improvement over our earlier rule-based method. The resulting large-scale analysis not only reflects existing knowledge but also enables a more systematic and comprehensive integration of CYP isoform–substance interaction data, addressing the limitations of previous fragmented efforts. While previous studies have attempted to catalog these interactions, the scale, precision, and automation demonstrated here represent a significant step forward. These findings underscore the potential of LLM-driven pipelines to accelerate biomedical text mining and to support research in drug metabolism and related fields.

Version published to 10.1101/2025.06.24.661414 on bioRxiv
Jun 27, 2025

Intelligent Semantic Search Engine for Biomedical Literature and Clinical Trials: A Comprehensive Hybrid Retrieval Framework

This article has 1 author:
1. Sasidhara Kashyap Chaturvedula
This article has no evaluationsLatest version Jan 29, 2026
Prompt-Orchestrated Large Language Models for Clinical Information Extraction

This article has 13 authors:
1. Livia Lilli
2. Andrea Rosati
3. Giovanni Paolo Tobia
4. Massimo Criscione
5. Federica Tomassini
6. Chiara Dachena
7. Alice Luraschi
8. Chiara Cantarini
9. Carolina De Maria
10. Luigi Congedo
11. Massimo Bernaschi
12. Stefano Patarnello
13. Anna Fagotti
This article has no evaluationsLatest version Jan 16, 2026
Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Intelligent Semantic Search Engine for Biomedical Literature and Clinical Trials: A Comprehensive Hybrid Retrieval Framework

Prompt-Orchestrated Large Language Models for Clinical Information Extraction

Emergence of Biological Structural Discovery in General-Purpose Language Models