Big Five Personality Traits Prediction Based on User Comments
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The study of personalities is a major component of human psychology and with the understanding of personality traits, practical applications can be used in various domains, such as mental health care, predicting job performance and optimising marketing strategies. This study explores the prediction of Big Five personality trait scores from online comments with transformer-based language models, focusing on improving model performance with a larger dataset and investigating the role of intercorrelations among traits. Using the PANDORA dataset from Reddit, RoBERTa and BERT models, including both base and large variants, were fine-tuned and evaluated to determine their effectiveness in personality trait prediction. Compared to previous work, our study utilises a significantly larger dataset to enhance model generalisation and robustness. The results indicate that RoBERTa outperforms BERT across most metrics, with RoBERTa large achieving the best overall performance. In addition to evaluating overall predictive accuracy, this study investigates the impact of intercorrelations among personality traits. A comparative analysis is conducted between a single-model approach, which predicts all five traits simultaneously, and a multiple-models approach, fine-tuning models independently, each predicting a single trait. The findings reveal that the single-model approach achieves lower RMSE and higher R2 values, highlighting the importance of incorporating trait intercorrelations in improving prediction accuracy. Furthermore, RoBERTa large demonstrated a stronger ability to capture these intercorrelations compared to previous studies. These findings emphasise the potential of transformer-based models in personality computing and underscore the importance of leveraging both larger datasets and intercorrelations to enhance predictive performance.