Machine Learning Prediction of Goat Reproductive Success in Smallholder Farming Systems

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Predicting whether a goat doe will conceive following a mating event is a computationally tractable binary classification problem with direct implications for smallholder farm management in resource-constrained settings. This study developed, evaluated, and compared five machine learning classification algorithms. Logistic Regression (LR), Random Forest (RF), XGBoost, Support Vector Machine (SVM), and Artificial Neural Network (ANN) for predicting conception success using 900 indigenous doe-mating records from 240 smallholder farms across four agro-ecological zones of Uganda. The study employed a rigorous evaluation framework, including stratified 80/20 train-test splitting, 10-fold cross-validation, within-fold SMOTE resampling, and 11 performance metrics covering discrimination, calibration, and class-specific detection. Logistic Regression achieved the highest test set ROC-AUC (0.6908), Average Precision (0.8237), and best probability calibration (Brier Score = 0.2207), with a near-zero overfitting gap (0.010) attributable to L2 regularisation. Random Forest achieved the highest Sensitivity (0.9098) and pregnant-class F1-Score (0.8014) but showed severe overfitting (training-test AUC gap = 0.335), reflecting the complexity-sample-size mismatch. SVM and ANN both showed near-random performance (AUC = 0.540 and 0.519 respectively) due to parameter count to sample size ratios far exceeding what the training data could constrain. Learning curve analysis confirmed that all algorithms are still improving at n = 720 training records, indicating that the target performance threshold of AUC of 0.750 is achievable with larger samples. Based on its superior calibration, generalisation, and interpretability, Logistic Regression is selected as the primary model for a three-tier risk classification decision-support tool (Low, Moderate, and High Risk) deployable as a smartphone application or paper-based scoring card in resource-constrained field settings across Uganda.

Article activity feed