Inter-rater Reliability of an LLM in Predicting Depression Among Indian Adults
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In recent years, the developments in Artificial intelligence (AI) has reshaped several industries and professions. Given the rising prevalence of depression and the inability of existing health infrastructure to address this, the current study was undertaken to investigate the potential of AI in clinical settings. The first step is assessing how language can provide insights into psychological states of individuals. For the same purpose RoBERTa, a transformers based deep learning model was fine-tuned on DAIC-WOZ dataset to predict depression among Indian adults. Additionally, interviews of Indian adults were conducted and analyzed by both the model and a clinical psychologist to predict indicators of depression. The model achieved a macro F1-score of 0.82 on test split of DAIC-WOZ, indicating robust performance despite class imbalance. Cohen’s kappa of 0.628 indicated a substantial agreement was reached between both the model and the rater on the interviews. However, as revealed by the thematic analysis and attribution scores for interviews which were disagreed upon, the model’s tendency to generate false positives highlights the need for enhanced contextual analysis. These findings reveal that the language of depression is universal in its essence while emphasizing the necessity for culturally tailored datasets and multimodal approaches to improve predictions in resource constraints.