Optimizing Retail Decision-Making Through Big Data and Machine Learning Integration

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The study explores how modern big data technologies can transform operations and customer experience in the retail industry. With the massive growth of structured and unstructured data from online and in-store transactions, businesses face increasing pressure to process information quickly and accurately. This project demonstrates how tools such as Apache Hive, Pig, and Impala can be applied to handle real-time data analytics efficiently. A retail-based dataset consisting of over forty-nine thousand sales transactions was analyzed to uncover sales trends, top-performing products, and revenue patterns. The performance of these tools was evaluated based on execution time, scalability, and memory utilization. Impala exhibited the fastest processing speed, making it suitable for real-time analytics, while Hive and Pig proved effective for batch processing and data transformation tasks. Furthermore, the integration of machine learning algorithms—such as regression, clustering, and decision trees—was discussed for enhancing predictive accuracy and personalized customer insights. The findings highlight that combining these technologies provides a scalable, cost-effective framework for optimizing marketing, forecasting demand, and improving business decision-making in the retail sector.

Article activity feed