Predicting Daily Stock Price Movements Using Data Mining Techniques: A Comparative Analysis of Logistic Regression, Decision Tree, Random Forest, and XGBoost on Yahoo Finance Time-Series Data

Soobia Saeed

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The study assesses the capability of various supervised machine learning approaches to predict the short-term movements in the stock market through historical financial time-series data. The Yahoo Finance dataset comprising 2018-2023, containing more than 1,200 daily trades, is the foundation for the research work, which seeks to determine if the closing price of the stock for the next day will be either higher or lower. To secure the quality of the data and to avoid temporal leakage, a thorough pre-processing procedure—missing value check, outlier smoothing, feature extraction with technical indicators like moving averages, normalization, and chronological splitting—was carried out. Four data mining models—Logistic Regression, Decision Tree, Random Forest, and XGBoost—were built, and their performance assessed in terms of accuracy, precision, recall, and F1-score, with a time-aware validation method through Time Series Split supporting this. Logistic Regression results indicated the highest recall (1.0) and F1-score (0.67) where it identified all price movements up, while the Random Forest and XGBoost have better precision (0.5248) and overall accuracy (0.5163) which means that a more balanced trade-off between false positives and false negatives has been indicated. The Decision Tree model was easy to interpret but was nonetheless the least effective in a highly fluctuating financial market setting because it was not able to generalize as much. In conclusion, the findings have shown the difficulties of predicting stock markets that are inherently volatile; however, it is still possible through the use of well-designed technical features and supervised learning to uncover patterns that have economic significance. The study finally recommends that the model should be retrained, the market regime should be adapted, multi-stock trading should be expanded, and testing frameworks that integrate back testing should be set up for the real-world applicability.

Version published to 10.20944/preprints202511.2007.v1
Nov 26, 2025

Directional Forecasting of WTI and Brent Crude Oil Prices: A Machine Learning Approach with Technical Indicators at Daily, Weekly, and Monthly Frequencies

This article has 3 authors:
1. Badr Alnssyan
2. Muhammad Ali
3. Muhammad Ahmad
This article has no evaluationsLatest version Dec 16, 2025
Applying Multiple Linear Regression to Enhance Short-Term Stock Forecasting Accuracy

This article has 2 authors:
1. TOUSIF AL RASHID
2. Raj Kumar
This article has no evaluationsLatest version Dec 15, 2025
Construction and analysis of data model for financial market volatility prediction based on support vector machine

This article has 1 author:
1. XiaoMeng Su
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Directional Forecasting of WTI and Brent Crude Oil Prices: A Machine Learning Approach with Technical Indicators at Daily, Weekly, and Monthly Frequencies

Applying Multiple Linear Regression to Enhance Short-Term Stock Forecasting Accuracy

Construction and analysis of data model for financial market volatility prediction based on support vector machine