Benchmarking Machine Learning Models for ESG Prediction in South Korea Using News-Derived Time Series
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Existing ESG ratings have limitations like disclosure delays, inconsistencies, and uneven coverage,particularly in non-English markets. This paper addresses these issues by establishing the first machine learning benchmark for ESG prediction in the Korean market using news-derived time-series features. A standardized dataset of 278 Korean firms was constructed, and monthly sentiment and ESG-relevance features were generated from news using Korean-specific language models. A mask-aware CNN explicitly handles missing data by distinguishing observed months from imputed ones. The model achieved a Mean Absolute Error (MAE) of 17.9, a Root Mean Squared Error (RMSE) of 22.0, an š 2 of 0.12, and a Spearmanās š of 0.38, demonstrating that temporal modeling and explicit handling of missing data are crucial for improving predictive accuracy.