GPT vs Human Legal Texts Annotations: A Comparative Study with Privacy Policies

David Cevallos-Salas
José Estrada-Jiménez
Danny S. Guamán
David Rodríguez
José M. del Álamo

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

High-quality corpora of annotated privacy policies are scarce, yet essential for training, testing, and evaluating accurate machine learning models. However, elaborating new corpora remains an error-prone and resource-intensive task, heavily reliant on highly specialized and hard-to-find human annotators. Recent advancements in Generative Pre-trained Transformers (GPTs) open the possibility of using them to annotate privacy policies with performance comparable to that of human annotators, thereby streamlining the process while reducing human resource demands. This paper presents a novel method for annotating privacy policies based on a codebook, a well-designed prompt, and the analysis of logarithmic probabilities (logprobs) of a GPT's output tokens during the annotation process. We validated our method using the GPT-4o model and the well-known, open, multi-class, and multi-label OPP-115 corpus, achieving performance comparable to 80% of human annotators in segment-level annotation and matching 90% of human annotators in a full-text level annotation. Furthermore, incorporating logprobs analysis allowed the method to match the performance of all human annotators in full-text level annotation, suggesting that context enhances the task. These findings demonstrate the potential of our method to automate annotations with performance similar to human annotators while significantly reducing resource demands.

Version published to 10.21203/rs.3.rs-5799153/v1 on Research Square
Jan 10, 2025

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

This article has 1 author:
1. Snehil Shrivastava
This article has no evaluationsLatest version Jun 16, 2025
A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

This article has 1 author:
1. Snehil Shrivastava
This article has no evaluationsLatest version Jun 16, 2025
Interpretability-Guided Adaptation for Robust DGA Detection with Large Language Models

This article has 3 authors:
1. Reynier Leyva La O
2. Carlos A. Catania
3. Tatiana S. Parlanti
This article has no evaluationsLatest version Jun 13, 2025

Listed in

Abstract

Article activity feed

Related articles

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

A Comprehensive and Critical Survey of Large Language Model Inference and Feature Generation

Interpretability-Guided Adaptation for Robust DGA Detection with Large Language Models