Identifying Suicide-Related Language in Smartphone Keyboard Entries Among High-Risk Adolescents

Paul Alexander Bloom
Isaac N Treves
David Pagliaccio
Isabella Nadel
Emma Wool
Hayley Quinones
Julia Greenblatt
Natalia Parjane
Katherine Durham
Samantha Salem
Esha Trivedi
Hanga Galfalvy
Nicholas B. Allen
Deanna Barch
Ashley Blanchard
David Brent
Lauren Chernick
Peter Dayan
Caroline Paige Hoyniak
Karla Joyce
Jaclyn Schwartz Kirshenbaum
Lilian Y. Li
Joan Luby
Simryn Molina
Giovanna Porta
Zoe Price
Eva Purvin
Alex Rosenberg
Koustuv Saha
Stewart Shankman
Adela Schwartz
Soorya Ram Shimgekar
Jamie Zelazny
Randy Patrick Auerbach

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Adolescent suicide rates have risen over the past two decades, underscoring the need for improved risk detection strategies. Although natural language processing (NLP) tools are increasingly used to flag suicide-related content, little is known about how such approaches perform on adolescents’ smartphone communications. Addressing this gap, this study leverages passively-collected smartphone data to identify suicide-related language in adolescents’ keyboard usage via NLP. We developed a lexicon of suicide-related adolescent language and validated it with labeled data (N=171,468 text entries; e.g., messages, web searches), demonstrating higher performance in identifying suicide-related text than few-shot prediction with large language models (LLMs) and lexicons not designed for youth. Across two independent cohorts at elevated suicide risk (Ns=208 & 257; >6 million text entries), lifetime suicidal thoughts and behaviors (STB) and current suicidal ideation were associated with increased frequency of smartphone suicide-related language. Human coding indicated varied language, including authentic first-person current suicidal ideation (14.5%) and jokes or hyperbole (20.2%). Compared with the lexicon alone, human coding of suicide-related entries with first-person language showed stronger associations with STB history. However, an LLM showed limited performance in identifying whether suicide-related text indicated authentic, first-person, and current STB (F1=.45). These findings highlight that effective NLP-based tools for suicide prevention will require more nuanced and context-specific approaches to better distinguish suicidal intent.

Version published to 10.31234/osf.io/gfa7h_v3 on OSF Preprints
Apr 9, 2026
Version published to 10.31234/osf.io/gfa7h_v2 on OSF Preprints
Oct 4, 2025
Version published to 10.31234/osf.io/gfa7h_v1 on OSF Preprints
Sep 3, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed