Can LLMs Get High? A Dual-Metric Framework for Evaluating Psychedelic Simulation and Safety in Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) are increasingly consulted by individuals for support during psychedelic experiences ("trip sitting"), yet no framework exists to evaluate whether these models can accurately simulate or safely respond to altered states of consciousness. We aimed to determine if LLMs can be induced to generate narratives resembling human psychedelic experiences and to quantify this behavior using psychometric and linguistic metrics. We developed a dual-metric evaluation framework comparing 3,000 LLM-generated narratives (from Gemini 2.5, Claude Sonnet 3.5, ChatGPT-5, Llama-2 70B, and Falcon 40B) against 1,085 human trip reports sourced from Erowid.org. Models were prompted under neutral and psychedelic-induction conditions across five substances (psilocybin, LSD, DMT, ayahuasca, and mescaline). We assessed outcomes using semantic similarity (Sentence-BERT embeddings) to human reports and the Mystical Experience Questionnaire-30 (MEQ-30). Psychedelic induction prompts produced a significant shift in model outputs compared to neutral conditions. Semantic similarity to human reports increased from a mean of 0.156 (neutral) to 0.548 (psychedelic), and mystical-experience scores rose from 0.046 to 0.748. While models demonstrated substance-specific linguistic styles (e.g., generating distinct semantic profiles for substances like LSD versus ayahuasca), they exhibited uniformly high mystical intensity across all substances. Contemporary LLMs can be "dosed" via text prompts to generate convincingly realistic psychedelic narratives. However, the dissociation between their high linguistic mimicry and lack of genuine phenomenology suggests they simulate the form of altered states without the experiential content. This capability raises significant safety concerns regarding anthropomorphism and the potential for AI to inadvertently amplify distress or delusional ideation in vulnerable users.