Real-Time Big Data Technologies in Retail: Enhancing Personalization and Operational Efficiency

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study explores the integration of Big Data Technologies (BDT) in the retail industry, emphasizing their role in enabling real-time data processing and personalized customer experiences. The project examines how technologies such as Apache Hive, Impala, and Spark can process large-scale retail datasets to identify purchasing patterns, refine customer segmentation, and facilitate predictive analytics. The paper introduces four types of analytics—descriptive, predictive, prescriptive, and diagnostic—alongside machine learning algorithms, including supervised, unsupervised, and reinforcement learning, as well as Natural Language Processing (NLP). These tools collectively enable retailers to deliver dynamic, personalized marketing, enhance operational efficiency, and increase revenue. A practical experimentation using a Kaggle-based retail dataset evaluates the comparative performance of Hive, Impala, and Spark through SQL-like operations and MapReduce batch processing. Results show that while Impala excels in speed, Spark provides flexibility for complex data science tasks. The study concludes with an analysis of key considerations such as data storage, privacy, integration, and processing speeds necessary for effective big data deployment in retail environments.

Article activity feed