Reading papers: Extraction of molecular interaction networks with large language models

Enio Gjerga
Philipp Wiesenbach
Christoph Dieterich

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Motivation

Signalling occurs within and across cells and orchestrates essential cellular processes in complex tissues. Cell signalling involves several different components, including protein-protein interactions (PPI) and transcription factors (TF), to promoter binding in gene regulatory networks (GRNs). Dynamically changing conditions oftentimes lead to the rewiring of cellular communication networks. Computational modelling approaches typically rely on databases of possible molecular interactions. Evidently, manual curation of databases is time-consuming and automatic relation extraction from scientific literature would greatly support our strive to understand molecular mechanisms. To ease this process, we reason that prompt-based data mining with Large Language Models (LLMs) could be used to extract information from relevant scientific publications.

Approach

In our work, we use open-source LLMs to mine an annotated corpus of molecular interactions. We focus on the extraction of entity relations between proteins, as exemplified in protein-protein interaction networks, and transcription factor to target gene relations, as exemplified in gene regulatory networks.

Results

We obtain promising evaluation results as measured by precision, recall and F1-score for the extraction of PPI relations: 87%, 70% and 71% and 77%, 57% and 62% for GRN relation extraction over a large corpus of short (average 331 tokens) scientific texts.

Availability

Codes with scripts and results have been provided in: https://github.com/dieterich-lab/LLM_Relations .

Version published to 10.1101/2025.07.21.665999 on bioRxiv
Jul 25, 2025

textToKnowledgeGraph: Generation of Molecular Interaction Knowledge Graphs Using Large Language Models for Exploration in Cytoscape

This article has 4 authors:
1. Favour James
2. Christopher Churas
3. Dexter Pratt
4. Augustin Luna
This article has no evaluationsLatest version Jul 21, 2025
PPIKB: A Comprehensive Knowledge Base and Analysis Platform for Protein–Peptide Interactions Based on Literature and Patents

This article has 7 authors:
1. Ning Zhu
2. Yanyu Ming
3. Chengyun Zhang
4. Cao Sen
5. Chongyang Li
6. Jingjing Guo
7. Hongliang Duan
This article has no evaluationsLatest version Jun 12, 2025
GeneInsight: Condensing Gene Set Knowledge via Language Models

This article has 3 authors:
1. Wee Loong Chin
2. Kevin Chen
3. Timo Lassmann
This article has no evaluationsLatest version Jul 10, 2025

Listed in

Abstract

Motivation

Approach

Results

Availability

Article activity feed

Related articles

textToKnowledgeGraph: Generation of Molecular Interaction Knowledge Graphs Using Large Language Models for Exploration in Cytoscape

PPIKB: A Comprehensive Knowledge Base and Analysis Platform for Protein–Peptide Interactions Based on Literature and Patents

GeneInsight: Condensing Gene Set Knowledge via Language Models