AMPS-JuST: Dataset of Annotated Judgements from the Small Claims Tribunal
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
AMPS–JuST is a bilingual (Maltese–English) corpus of 16,670 judgements delivered by the Maltese Small Claims Tribunal. We automatically scraped the originals, applied a domain-specific PDF cleaning and segmentation pipeline, and split each decision into a reasoning and conclusion. Using GPT-4o and open-weight LLM baselines we then (i) generated sentence-level summaries in Maltese and English, (ii) assigned eight rhetorical role labels to every summary sentence, (iii) extracted case verdicts through a hybrid rule-based + LLM procedure, and (iv) tagged each case with a two-level thematic taxonomy. The resulting JSON corpus therefore links raw text, rich meta-data, bilingual summaries, rhetorical structure, thematic labels, and outcome fields in a machine-readable format. Expert review of 30 randomly sampled cases (5000 + sentences) on a five-factor Likert scale confirms high structural coherence (mean 4.7/5), faithful preservation of legal reasoning (4.6/5), and negligible hallucination or bias (≤4% of items). By pairing high-quality English representations with the original Maltese texts, AMPS-JuST lowers the entry barrier for legal NLP in a severely under-resourced language and provides a benchmark for cross-lingual retrieval, classification, summarisation and judgment-prediction research.