A Fusion of Statistical and Machine Learning Methods: GARCH-XGBoost for Improved Volatility Modelling of the JSE Top40 Index

Israel Maingo
Thakhani Ravele
Caston Sigauke

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Volatility modelling is a key feature of financial risk management, portfolio optimisation, and forecasting, particularly for market indices such as the JSE Top40 Index, which serves as a benchmark for the South African stock market. This study investigates volatility modelling of the JSE Top40 Index log-returns from 2011 to 2025 using a hybrid approach that integrates statistical and machine learning techniques through a two-step approach. The ARMA(3,2) model was chosen as the optimal mean model, using the \texttt{auto.arima()} function from the \texttt{forecast} \texttt{package} in \textts{R} (version 4.4.0). Several alternative variants of GARCH models, including sGARCH(1,1), GJR-GARCH(1,1), and EGARCH(1,1), were fitted under various conditional error distributions (i.e., STD, SSTD, GED, SGED, and GHD). The choice of the model was based on AIC, BIC, HQIC, and LL evaluation criteria, and ARMA(3,2)-EGARCH(1,1) was the best model according to the lowest evaluation criteria. Residual diagnostic results indicated that the model adequately captured autocorrelation, conditional heteroskedasticity, and asymmetry in JSE Top40 log-returns. Volatility persistence was also detected, confirming the persistence attributes of financial volatility. Thereafter, the ARMA(3,2)-EGARCH(1,1) model was coupled with XGBoost using standardised residuals extracted from ARMA(3,2)-EGARCH(1,1) as lagged features. The data was split into training (60ARMA(3,2), EGARCH(1,1), Forecasting, Hybrid model, JSE Top40 Index, Machine Learning, Risk Management, Time Series, Volatility Modelling, XGBoost.%), testing (20%), and calibration (20%) sets. Based on the lowest values of forecast accuracy measures (i.e., MASE, RMSE, MAE, MAPE, and sMAPE), along with prediction intervals and their evaluation metrics (i.e., PICP, PINAW, PIWA, and PINAD), the hybrid model captured residual nonlinearities left by the standalone ARMA(3,2)-EGARCH(1,1) and demonstrated improved forecasting accuracy. This highlights the robustness and suitability of the hybrid ARMA(3,2)-EGARCH(1,1)-XGBoost model for financial risk management in emerging markets and signifies the strengths of integrating statistical and machine learning methods in financial time series modelling.

Version published to 10.20944/preprints202508.0247.v1
Aug 4, 2025

Machining, the Better – A Look at Machine Learning-Based Volatility Using the SVR-GARCH for Frontier Market Equities

This article has 1 author:
1. Carl Hope Korkpoe
This article has no evaluationsLatest version Jul 21, 2025
Applying XGBoost for Time Series Prediction in Financial Market Data

This article has 5 authors:
1. Xia Xiao
2. Fang Wang
3. Hongmei Xu
4. Dandan Wang
5. Yefeng Zhang
This article has no evaluationsLatest version Jul 24, 2025
Multi-Model Approach for Stock Price Prediction and Trading Recommendations

This article has 6 authors:
1. Zhenrui Chen
2. Zhibo Dai
3. Huiyan Xing
4. Junyu Chen
5. Menghao Huo
6. Kuan Lu
This article has no evaluationsLatest version Jul 16, 2025

Listed in

Abstract

Article activity feed

Related articles

Machining, the Better – A Look at Machine Learning-Based Volatility Using the SVR-GARCH for Frontier Market Equities

Applying XGBoost for Time Series Prediction in Financial Market Data

Multi-Model Approach for Stock Price Prediction and Trading Recommendations