Weakly Supervised Active Learning for Abstract Screening Leveraging LLM-Based Pseudo-Labeling

Opeoluwa Akinseloyin
Xiaorui Jiang
Vasile Paladel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Abstract screening is a notoriously labour-intensive step in systematic reviews. AI-aided abstract screening faces several grand challenges, such as the strict requirement of near-total recall of relevant studies, lack of initial annotation, and extreme data imbalance. Active learning is the predominant solution for this challenging task, which however is remarkably time-consuming and tedious. To address these challenges, this paper introduces a weakly supervised learning framework leveraging large language models (LLM). The proposed approach employs LLMs to score and rank candidate studies based on their adherence to the inclusion criteria for relevant studies that are specified in the review protocol. Pseudo-labels are generated by assuming the top T % and bottom B % as positive and negative samples, respectively, for training an initial classifier without manual annotation. Experimental results on 28 systematic reviews from a well-established benchmark demonstrate a breakthrough in automated abstract screening: Manual annotation can be eliminated to safely reducing 42-43% of screening workload on average and maintaining near-perfect recall — the first approach that has succeeded in achieving this strict requirement for abstract screening. Additionally, LLM-based pseudo-labelling significantly improves the efficiency and utility of the active learning regime for abstract screening.

Highlights

Research highlights item 1
Research highlights item 2
Research highlights item 3

Version published to 10.1101/2025.08.24.25334314 on medRxiv
Aug 26, 2025

HGACH: Hypergraph Attention Convolutional Hashing for Semi-supervised Cross-modal Retrieval

This article has 6 authors:
1. Fangming Zhong
2. Rui Zhang
3. Cun Zhu
4. Haiquan Yu
5. Chenglong Chu
6. Suhua Zhang
This article has no evaluationsLatest version Sep 24, 2025
RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

This article has 2 authors:
1. sai prabanjan kumar kalvapalli
2. MALA C
This article has no evaluationsLatest version Aug 27, 2025
Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.

This article has 5 authors:
1. Rohit M
2. Jeganathan L
3. Srinivasa Rao Ummity
4. Janaki Meena M
5. Jayaram Balabaskaran
This article has no evaluationsLatest version Sep 22, 2025

Discuss this preprint

Listed in

Abstract

Highlights

Article activity feed

Related articles

HGACH: Hypergraph Attention Convolutional Hashing for Semi-supervised Cross-modal Retrieval

RLDSCP: Reducing Label Dependency with Self-Attention and Contrastive Pretraining

Computation of Sentence Similarity Score through Hybrid Deep Learning with a Special Focus on Negation Sentence.