Unmasking Users in IoC: LLMs, Text Profiling, and Privacy Implications

ASIMINA TSOUPLAKI
Christos Kalloniatis
George Mikros
Apostolis Siatras

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The synthesis of Large Language Models (LLMs) with the Internet of Cloud (IoC) ecosystems creates multiple opportunities across diverse domains such as healthcare, finance, and smart cities. This study explores the combination of these technologies, focusing on their ability to profile authors and the associated privacy challenges. We conducted two interesting experiments using real data from the Blog Authorship Corpus and the Reddit Self-Reported Depression Diagnosis (RSDD). Then we evaluated the capabilities of two well-known LLMs which are the ChatGPT-4o and Llama 3-70B, in identifying sensitive user demographic data such as gender, age, profession, and psychological conditions. Our findings highlight both the strengths and limitations of these LLMs, with ChatGPT achieving higher reliability across tasks but demonstrating biases and ethical concerns. Moreover, key insights indicate that while LLMs improve precision and adaptability in textual data analysis, they also increase the potential risks of detecting profiling in sensitive contexts. This research highlights the urgency of implementing robust privacy-preserving strategies to mitigate ethical risks and social impacts. Finally, it presents all the useful findings from the two experiments and then provides a detailed analysis of the results by comparing the two LLMs used.

Version published to 10.21203/rs.3.rs-6412412/v1 on Research Square
May 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed