Disparities and Predictive Modeling of Foundational Learning in Somaliland: A Gender-, Location-, and School-Type-Based Analysis Using Machine Learning and Regression Approaches

Mukhtaar Axmed Cumar
Mustafe Khadar Abdi
Abdisalam Hassan Muse
Jibril Abdikadir Ali

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study aimed to develop predictive models to identify key factors driving foundational learning outcomes and explore gender and contextual disparities among Grade 2–3 students in Somaliland. Utilizing data from the 2022 Somaliland National Learning Assessment (N = 47,269 students from 1,112 schools), the research integrated student-level Early Grade Reading Assessment (EGRA) and Early Grade Mathematics Assessment (EGMA) scores with school-level details. A cross-sectional, quantitative approach was employed, analyzing data through descriptive statistics, two-way ANOVA, binary logistic regression, and supervised machine learning classifiers (Logistic Regression, Decision Tree, Random Forest, XGBoost) to predict low performance (bottom 25th percentile). A significant learning crisis was evident, with 25.6% of students (12,102) identified as low performers in literacy and 25.0% (11,838) in numeracy; 8.8% (4,144 students) were low performers in both. Gender disparities varied by subject: males exhibited slightly higher mean EGRA scores (M = 398.08 vs. M = 392.79 for females), while females achieved higher mean EGMA scores (M = 694.60 vs. M = 684.39 for males). Logistic regression confirmed males had lower odds of low literacy performance (OR = 0.894, p < .001) but higher odds of low numeracy performance (OR = 1.132, p < .001). While private school students had higher mean scores, public school attendance was associated with lower odds of low literacy (OR = 0.740, p < .001) and low numeracy (OR = 0.940, p = .040). School location was the most potent predictor: urban students consistently outperformed rural counterparts (e.g., EGRA M = 414.45 urban vs. M = 380.69 rural) and had substantially lower odds of low performance in literacy (OR = 0.494, p < .001) and numeracy (OR = 0.500, p < .001). Random Forest feature importance analysis underscored location's dominance, accounting for 87.4% (Low_EGRA) and 84.1% (Low_EGMA) of predictive power. Tree-based ML models (Decision Tree, Random Forest, XGBoost) achieved marginally better, albeit modest, F1-scores (≈ 0.412) in identifying low performers compared to standard logistic regression (F1-score ≈ 0.396 for Low_EGRA). Findings demand urgent policy attention towards equitable resource distribution and support for rural schools. Gender-responsive pedagogical strategies are needed to address subject-specific learning needs. The nuanced performance of public versus private schools suggests focusing on quality improvement and identifying effective practices in public schools that support struggling learners. The modest accuracy of ML models indicates they should complement, rather than replace, teacher assessments in student evaluation frameworks. Future research should prioritize longitudinal studies to establish causality, incorporate more granular data (e.g., teacher quality, household factors), employ qualitative methods to understand contextual nuances, and advance the development of fair, transparent, and more accurate ML models for identifying at-risk students in resource-constrained settings like Somaliland.

Version published to 10.21203/rs.3.rs-6744889/v1 on Research Square
Jun 13, 2025

Influencing Factors and Risk Prediction Modeling of Co-Occurring Anxiety and Depressive Symptoms in Middle School Students

This article has 11 authors:
1. Guofeng Li
2. Xiuhong zhang
3. Tian Yang
4. Shujuan Tian
5. Jing Zhao
6. Jufang Zhao
7. Rong Zhang
8. Jiangqiong Gao
9. Haotian Pei
10. Dong Yu
11. Caixia Ma
This article has no evaluationsLatest version Jun 13, 2025
Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest

This article has 3 authors:
1. Roman Dolgopolyi
2. Ioanna Amaslidou
3. Agrippina Margaritou
This article has no evaluationsLatest version Jun 26, 2025
Using machine learning to identify subgroups with the highest expected benefit in a population-based water, sanitation, handwashing, and nutrition intervention

This article has 10 authors:
1. Caitlin Hemlock
2. Laura H. Kwong
3. Lia C.H. Fernald
4. Alan E. Hubbard
5. John M. Colford
6. Fahmida Tofail
7. Md. Mahbubur Rahman
8. Sarker Parvez
9. Stephen P. Luby
10. Andrew N. Mertens
This article has no evaluationsLatest version Jun 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Influencing Factors and Risk Prediction Modeling of Co-Occurring Anxiety and Depressive Symptoms in Middle School Students

Interpretable Machine Learning for Life Expectancy Prediction: A Comparative Study of Linear Regression, Decision Tree, and Random Forest

Using machine learning to identify subgroups with the highest expected benefit in a population-based water, sanitation, handwashing, and nutrition intervention