A Unified Framework for Stock Price Prediction: Integrating NLP-Based Sentiment, Dimensionality Reduction and Regularization
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study examines the effect of news articles on stock price prediction and evaluates the role of dimensionality reduction and regularization techniques in improving forecasting performance. Four natural language processing (NLP) variables, Sentiment Score, Sentiment Polarity, VADER Compound, and Lexicon Score, were extracted from news texts and integrated with traditional time series indicators. Variable selection and dimensionality reduction were performed using Elastic Net, LASSO, PCA, PCA + Elastic Net, and PCA + LASSO methods. The constructed datasets, combining time series and NLP-based variables, were tested with ARIMAX, ANN, LSTM, and GRU models. The analyses, carried out through both simulation studies and applications on eight stock data series, revealed that incorporating NLP variables alongside technical indicators significantly enhances prediction accuracy. Furthermore, hybrid approaches such as PCA combined with Elastic Net or LASSO proved effective in reducing feature space complexity while preserving predictive power. Overall, the findings demonstrate that integrating dimensionality reduction, regularization techniques, and sentiment-based news analysis into traditional time series forecasting provides a comprehensive and robust framework for more accurate stock price prediction. MSC Classification: 68T07 , 68T50 , 62M10 , 62H25 , 62J99