Exploring ChatGPT’s capabilities, stability, potential and risks in conducting psychological counseling through simulations in school counseling
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study aims to examine ChatGPT-4’s potential and stability when simulating school-counseling dialogues, offering an exploratory snapshot of its ability to convey warmth, empathy and acceptance. Drawing on 80 real student questions, this paper aims to assess response consistency and identify risk markers, such as randomness and hallucination. The goal of this study is to inform future research, guide human–AI (artificial intelligence) collaboration and support policy development on deploying large language model chatbots for accessible mental health interventions.
Design/methodology/approach
This paper prompted ChatGPT-4 with 80 authentic college student counseling questions and collected three nondeterministic replies per query. Automated analysis used three natural language processing (NLP) models – EmoRoBERTa for emotion detection, a neural network for empathy classification and VADER for sentiment analysis – to quantify warmth, empathy and acceptance. Stability was evaluated via Fleiss’ κ for empathy labels and ICC(2,1) for continuous sentiment scores. Additional Chi-square and one-way ANOVA tests examined categorical shifts and mean-score drift, and Pearson correlation assessed the relation between question and response length.
Findings
ChatGPT-4 achieved 97.5% warm responses, 94.2% empathy classification and a mean compound sentiment score of 0.93 ± 0.19. Stability metrics indicated moderate reliability ( κ = 0.59; ICC = 0.62), while occasional confusing or realization labels (2.5% of outputs) and minor sentiment drift underscored randomness as a risk. A positive correlation ( r = 0.60, p < 0.001) revealed longer queries elicit longer replies. These results highlight both the promise and the limits of LLM chatbots in school-counseling simulations.
Research limitations/implications
As an offline simulation using a single GPT-4 model and automated proxies rather than clinician ratings or clinical outcomes, findings remain exploratory. The de-identified public data set may not capture live user dynamics. Future work should involve multi-model comparisons, mixed-methods validation with human raters and end-users, live pilot deployments and clinical trials to assess safety, usability and therapeutic impact in real-world educational settings.
Practical implications
High warmth and empathy rates suggest ChatGPT-4 could augment low-intensity support – drafting psycho-educational messages or after-hours coping tips under human oversight. Stability metrics can inform prompt-engineering benchmarks and guardrail triggers. Schools and self-help apps may pilot AI-assisted chat interfaces with escalation protocols, bias and privacy audits and human-in-the-loop triage to optimize counselor workflows, extend reach and mitigate risks through transparent policy and workflow design.
Social implications
Deployment of LLM chatbots can democratize mental health resources for youth, lowering barriers of cost, stigma and provider shortages. However, risks of misinformation, bias and overreliance necessitate digital-literacy education, equitable governance and community engagement frameworks. Policymakers and practitioners must balance innovation with safeguards – such as mandatory audit trails and accountability measures – to ensure vulnerable populations benefit safely from AI-mediated counseling.
Originality/value
This paper is among the first to apply quantitative stability metrics ( κ , ICC) and NLP-based emotion analysis to ChatGPT-4 in a school-counseling simulation, enriched by a practitioner’s deployment insights. It integrates parasocial-interaction and computational social-science theories to link technical capabilities with design patterns, policy recommendations and a roadmap for future mixed-methods research on AI in mental-health interventions.