Are Natural Language Data “Nature-Identical” and What Is Elicitation After All?

Kristina Balykova

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Language documentation discourse commonly divides language data into two large types: natural(istic) vs. elicited. The goal of this paper is to put this dichotomy under critical scrutiny. By examining key publications on linguistic fieldwork, I show that the two terms seldom receive any clear definition and are often used inconsistently, giving rise to evident contradictions. The analysis reveals that the terms are typically distinguished by two parameters – linguistic unit (texts vs. not texts) and context of language production (controlled vs. uncontrolled) – but the distinction is virtually never thoroughly maintained. I argue that the dichotomy natural(istic) vs. elicited is insufficient to capture the complexity of possible scenarios and forms under which language is produced. Building upon previous literature, I propose a more detailed classification of language data, which abandons the notion of ‘natural(istic)’ and ‘elicited’ altogether. The paper concludes by discussing the gains of a more careful reflection on language data types.

Version published to 10.20944/preprints202510.1141.v1
Oct 14, 2025

What Distinguishes AI-Generated from Human Writing? A Rapid Review of the Literature

This article has 1 author:
1. Georgios Georgiou
This article has no evaluationsLatest version Jan 6, 2026
Random forests in corpus research: A systematic review

This article has 1 author:
1. Lukas Sönning
This article has no evaluationsLatest version Jan 17, 2026
Random forests in corpus research: A systematic review

This article has 1 author:
1. Lukas Sönning
This article has no evaluationsLatest version Jan 17, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

What Distinguishes AI-Generated from Human Writing? A Rapid Review of the Literature

Random forests in corpus research: A systematic review

Random forests in corpus research: A systematic review