Evaluation of Bayesian Network Scoring Functions in Polychotomous Data Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Bayesian networks (BNs) are probabilistic graphical models used to represent dependencies and independencies among variables. They have been applied widely to many areas in, e.g., biology and medicine, to untangle complex interrelationships, and are now finding wider use in areas such as in social science with differing data features, for example highly polychotomous (multi-category) data. To construct BNs, scoring functions guide selection of the most appropriate model. Among these, the BDe scoring function requires specifying hyperparameters that influence the priors on the network parameters. This study evaluates the performance of four scoring functions -- AIC, BIC, BDe, and log-likelihood -- particularly with highly polychotomous data. We assessed the overall performance of the scoring function, and for BDe, we varied its hyperparameter to evaluate its impact. Performance of the scoring functions was significantly influenced by the number of nodes, network complexity, and sample size. BIC and BDe (with default hyperparameters) generally offered higher precision, especially with larger sample sizes, while log-likelihood tended to overfit, showing high recall but low precision. AIC and BDe required careful tuning based on discrete levels and sample sizes. Optimizing the hyperparameters in BDe was crucial for balancing model complexity and fit. We propose a simulation method for identifying the optimum hyperparameters for using BDe scoring function in real-world data applications. The study provides insights to enhance BN models' robustness and accuracy, emphasizing the importance of considering sample size and the number of discrete levels when selecting and tuning scoring functions for BN structure learning.