Computational text analysis

Marko Bachl
Michael Scharkow

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Computational text analysis (CTA) comprises techniques for measuring the content of texts with the help of computer algorithms. The methods are discussed under various labels, such as text-as-data, automated content analysis, natural language processing, or text mining. The defining characteristic of a CTA technique is that once it is initially configured, the computational system performs the measurements independently without requiring any manual intervention or effort. The strength of CTA lies in its scalability, enabling the measurement of characteristics across vast amounts of text. As a result, CTA has seen widespread application in communication, related social sciences, and the digital humanities, with the increasing availability of digital or digitized, machine-readable texts.We start this chapter with an overview of the historical development of CTA. We then systematize CTA along two dimensions: the representations of texts for the computational analysis and the supervision of the measurement process. While doing so, we provide some examples of popular techniques. The chapter ends with an outlook into the near future.

Version published to 10.31219/osf.io/3yhu8 on OSF Preprints
Oct 2, 2024

Unsupervised text clustering with large language models

This article has 6 authors:
1. Leonid Kuligin
2. Jacqueline Lammert
3. Florence Heinkelein
4. Keno Bressem
5. Martin Boeker
6. Maximilian Tschochohei
This article has no evaluationsLatest version Feb 23, 2026
Exploration of Large Language Models forGeotagging of Social Media Posts

This article has 2 authors:
1. Riwaz Udas
2. Richard Sinnott
This article has no evaluationsLatest version Feb 3, 2026
Testing Content Analysis through Different Large Language Models: Towards a Gold Standard Protocol

This article has 8 authors:
1. Fabio Torreggiani
2. Giuliano Bobba
3. Federico Vegetti
4. Antonella SEDDONE
5. Moreno Mancosu
6. Elisa Iannone
7. Alessandra Malorgio
8. Costanza Massidda
This article has no evaluationsLatest version Mar 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Unsupervised text clustering with large language models

Exploration of Large Language Models forGeotagging of Social Media Posts

Testing Content Analysis through Different Large Language Models: Towards a Gold Standard Protocol