A Topic-Based Framework for Interpretable and Sparse Semantic Representations

Ming Song
Qing Cai

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In computational approaches to language, semantic representations produced by modern deep learning systems are often encoded as latent variables that are difficult to interpret. By contrast, cognitive science places strong emphasis on interpretable semantic structures. Although researchers have explored concept representations using corpus-driven distributed embeddings and various feature-based paradigms, a scalable and interpretable semantic representation framework for diverse linguistic materials remains an open methodological challenge. Recent advances in pretrained language models have led to topic modeling methods that leverage these models, such as BERTopic, which uses sentence-level embeddings to derive clusters corresponding to human-interpretable topics. Building on this line of work, the present study introduces a framework, SISTR, for representing Semantics as Interpretable and Sparse Topic-based Representations. The framework extracts interpretable topics from the corpus and models the semantics of linguistic materials via their relations to the derived topics. This framework can be applied across multiple linguistic levels and languages. We validated the interpretability of the resulting dimensions through behavioral experiments. We applied this representation to word association tasks and semantic judgments, comparing model predictions with human behavioral data. The results suggest that sparse topic-based representations may better capture human semantic intuitions than dense embeddings. Overall, SISTR provides a promising tool for cognitive science and neuroscience, offering a more interpretable bridge between linguistic data, conceptual structure, and human cognition.

Version published to 10.31234/osf.io/k9syc_v1 on OSF Preprints
Mar 25, 2026

A Compositional Model of Semantic Fluency

This article has 3 authors:
1. Surabhi S Nath
2. Alireza Modirshanechi
3. Peter Dayan
This article has no evaluationsLatest version Apr 19, 2026
Semantic Bridges for Student Modeling: Leveraging LLM-Generated Narratives for Interpretable Representation Learning

This article has 3 authors:
1. Wei Qiu
2. Jiawei Li
3. Fun Siong Lim
This article has no evaluationsLatest version Apr 1, 2026
Introducing a fusion model of language content attention mechanisms and structural embeddings to achieve automatic scoring of English writing

This article has 1 author:
1. Bingling Chen
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Compositional Model of Semantic Fluency

Semantic Bridges for Student Modeling: Leveraging LLM-Generated Narratives for Interpretable Representation Learning

Introducing a fusion model of language content attention mechanisms and structural embeddings to achieve automatic scoring of English writing