Predicting Suicide Outcomes: An Analysis of Key Factors and Machine Learning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Suicide attempts are lethal and self-destructive behaviors that, in some cases, lead to death or irreversible physical harm and are associated with various factors. This study aimed to identify suicide-related death risk factors. Materials and Methods: This study utilized data recorded in the suicide registry system of hospitals in Ilam Province. The data were analyzed via the Chi-square test in SPSS software. After the factors influencing suicide-related death were identified, their significance was evaluated and compared via logistic regression via the Python programming language and SPSS software. Subsequently, models for predicting suicide outcomes were developed via support vector machine (SVM), logistic regression, K-nearest neighbors (KNN), decision tree (DT), and random forest (RF) methods. These models were compared based on accuracy, recall, F1 score, and area under the receiver operating characteristic (ROC) curve. Results: Among 3,833 cases of suicide in various hospitals in Ilam Province, the results indicated that the method of suicide (P<0.001), reason for attempting suicide (P<0.001), age group (P<0.001), education level (P<0.001), marital status (P=0.008), and employment status (P=0.002) were significantly associated with suicide-related death. Variables such as the season of suicide attempt, sex, father's education level, and mother's education level were not significantly related to suicide-related death. Furthermore, the random forest model demonstrated the highest area under the ROC curve (0.79) and the highest classification accuracy and F1 score on both the training data (0.85 and 0.2, respectively) and test data (0.87 and 0.22, respectively) for predicting suicide outcomes among the models tested. Conclusion: This study revealed that older age, lower education level, divorce or widowhood, retirement, the use of physical methods and tools for suicide, and socioeconomic problems were significantly associated with suicide-related deaths. Additionally, the random forest model showed the best performance among the models tested in predicting suicide outcomes when these factors were used.