Machine Learning for Sentiment-Based Corporate Disclosure Analytics: A Systematic Review of Data, Sentiment Representations, and Predictive Models

Ramon Abilio
Guilherme Palermo Coelho
Ana Estela Antunes da Silva

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Machine learning methods have been widely used to predict stock prices using technical indicators and sentiment features, mostly extracted from social media and news. However, less attention has been given to how sentiment-based textual features obtained from corporate reports are integrated into machine learning pipelines to predict firms' financial outcomes. To examine this issue, we conducted a systematic review of 42 studies published between 2014 and 2025. The review examines how datasets are constructed, how sentiment representations are defined, and how predictive models combine textual features with financial variables. Most studies focus on the U.S. stock market and rely on feature-engineered sentiment indices derived from lexicons or sentence-level classification. Regression-based and other supervised learning approaches remain dominant, while embedding-based representations and end-to-end deep learning architectures appear only sporadically. The literature also reveals constraints, including challenges in processing long financial documents, limited availability of labeled datasets, and strong geographic and linguistic concentration. In addition, the review identifies highly heterogeneous modeling approaches with limited convergence toward shared benchmark tasks. These findings highlight research opportunities for machine learning applications in finance and for the development of sentiment-based corporate disclosure analytics.

Version published to 10.21203/rs.3.rs-9053199/v1 on Research Square
Mar 12, 2026

Deep Learning for Stock Market Prediction: A Systematic Review

This article has 2 authors:
1. Dean Rimmer
2. Martin Wonders
This article has no evaluationsLatest version Mar 31, 2026
The Power of Words: Leveraging Deep Learning Techniques to Predict Hotel Ratings from User Reviews

This article has 3 authors:
1. Milena Nikolić
2. Miloš Stojanović
3. Marina Marjanović
This article has no evaluationsLatest version Apr 14, 2026
Machine Learning in Stock Market Forecasting: A Comprehensive Review

This article has 4 authors:
1. Kamal Haddad
2. Moksh Khemka
3. Tanzim Redwan
4. Adib Ahmed
This article has no evaluationsLatest version Apr 7, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deep Learning for Stock Market Prediction: A Systematic Review

The Power of Words: Leveraging Deep Learning Techniques to Predict Hotel Ratings from User Reviews

Machine Learning in Stock Market Forecasting: A Comprehensive Review