Emergent Spatio-Semantic Structure in Large Language Model Embedding Spaces

Joseph Shingleton
Yunus Serhat Bicakci
Yu Wang
Ana Basiri

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) are increasingly used in geospatial applications typically as generators of geographic text or as natural language interfaces to spatial data. Here, we explore whether LLM embedding spaces can instead function as geospatial representations that can be exploited directly. Using embeddings extracted from Airbnb property descriptions in London, we show that off-the-shelf LLM embeddings exhibit emergent spatial structure. We further demonstrate that a lightweight residual geo-adapter substantially sharpens this spatial signal, enabling approximate localisation even when explicit geographic references are removed, while preserving semantic relationships learned during LLM pre-training. These results suggest a path toward spatially explicit foundation models which operate over the spatio-semantic embedding space, rather than generated text.

Version published to 10.31223/x5fq93
Feb 24, 2026

Unsupervised text clustering with large language models

This article has 6 authors:
1. Leonid Kuligin
2. Jacqueline Lammert
3. Florence Heinkelein
4. Keno Bressem
5. Martin Boeker
6. Maximilian Tschochohei
This article has no evaluationsLatest version Feb 23, 2026
A Purely Distributional Embedding Algorithm

This article has 1 author:
1. Vincenzo Manca
This article has no evaluationsLatest version Feb 9, 2026
Exploration of Large Language Models forGeotagging of Social Media Posts

This article has 2 authors:
1. Riwaz Udas
2. Richard Sinnott
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unsupervised text clustering with large language models

A Purely Distributional Embedding Algorithm

Exploration of Large Language Models forGeotagging of Social Media Posts