Hypothesis-Driven Semantic Retrieval for Discovering Unseen Knowledge Structures

Oliver Bennett
Amelia Wright
James Holloway

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This work proposes a hypothesis-driven retrieval paradigm for discovering unseen knowledge schemata from text corpora without supervision. The framework generates hypothetical semantic signals derived from linguistic regularities and validates them through evidence retrieval across the corpus. A signal verification module filters inconsistent hypotheses, while a semantic clustering component consolidates validated structures. The system delivers notable improvements on OpenSchema-ZS, CorpusGraph-ZS, and Narrative-ZS datasets, achieving +19.5%, +21.3%, and +17.8% gains in structural retrieval accuracy. It also reduces schema hallucination errors by 28.4%. Human judges rate the extracted structures 24.9% higher on interpretability metrics.

Version published to 10.20944/preprints202512.0301.v1
Dec 3, 2025

Knowledge and Context Compression via Question Generation

This article has 6 authors:
1. Alex Anvi Eponon
2. Moein Shahiki-Tash
3. Abdullah -
4. Luis Ramos
5. Christian Maldonado-Sifuentes
6. Ildar Batyrshin
This article has no evaluationsLatest version Jan 27, 2026
Knowledge and Context Compression via Question Generation

This article has 6 authors:
1. Alex Anvi Eponon
2. Moein Shahiki-Tash
3. Abdullah -
4. Luis Ramos
5. Christian Maldonado-Sifuentes
6. Ildar Batyrshin
This article has no evaluationsLatest version Jan 27, 2026
Substitute-Space Embeddings for Label-Free Syntax: Unsupervised AI for POS Discovery

This article has 1 author:
1. Vipul Razdan
This article has no evaluationsLatest version Jan 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Knowledge and Context Compression via Question Generation

Knowledge and Context Compression via Question Generation

Substitute-Space Embeddings for Label-Free Syntax: Unsupervised AI for POS Discovery