AMPS-JuST: Dataset of Annotated Judgements from the Small Claims Tribunal

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

AMPS–JuST is a bilingual (Maltese–English) corpus of 16,670 judgements delivered by the Maltese Small Claims Tribunal. We automatically scraped the originals, applied a domain-specific PDF cleaning and segmentation pipeline, and split each decision into a reasoning and conclusion. Using GPT-4o and open-weight LLM baselines we then (i) generated sentence-level summaries in Maltese and English, (ii) assigned eight rhetorical role labels to every summary sentence, (iii) extracted case verdicts through a hybrid rule-based + LLM procedure, and (iv) tagged each case with a two-level thematic taxonomy. The resulting JSON corpus therefore links raw text, rich meta-data, bilingual summaries, rhetorical structure, thematic labels, and outcome fields in a machine-readable format. Expert review of 30 randomly sampled cases (5000 + sentences) on a five-factor Likert scale confirms high structural coherence (mean 4.7/5), faithful preservation of legal reasoning (4.6/5), and negligible hallucination or bias (≤4% of items). By pairing high-quality English representations with the original Maltese texts, AMPS-JuST lowers the entry barrier for legal NLP in a severely under-resourced language and provides a benchmark for cross-lingual retrieval, classification, summarisation and judgment-prediction research.

Article activity feed