Real-Time Big Data Technologies in Retail: Enhancing Personalization and Operational Efficiency

Addy Arif Bin Mahathir
Charan Teja Nagisettygari
Noor UL Amin
Sai Rama Mahalingam
Sivamuganathan A/L Mohana Dass

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This study explores the integration of Big Data Technologies (BDT) in the retail industry, emphasizing their role in enabling real-time data processing and personalized customer experiences. The project examines how technologies such as Apache Hive, Impala, and Spark can process large-scale retail datasets to identify purchasing patterns, refine customer segmentation, and facilitate predictive analytics. The paper introduces four types of analytics—descriptive, predictive, prescriptive, and diagnostic—alongside machine learning algorithms, including supervised, unsupervised, and reinforcement learning, as well as Natural Language Processing (NLP). These tools collectively enable retailers to deliver dynamic, personalized marketing, enhance operational efficiency, and increase revenue. A practical experimentation using a Kaggle-based retail dataset evaluates the comparative performance of Hive, Impala, and Spark through SQL-like operations and MapReduce batch processing. Results show that while Impala excels in speed, Spark provides flexibility for complex data science tasks. The study concludes with an analysis of key considerations such as data storage, privacy, integration, and processing speeds necessary for effective big data deployment in retail environments.

Version published to 10.20944/preprints202509.0257.v1
Sep 2, 2025

Customer Purchase Behavior Analysis and Visualization Using Big Data Analytics: A PySpark-Based Apache Spark Framework

This article has 2 authors:
1. Pritam Chaudhari
2. Poonam Sawant
This article has no evaluationsLatest version Jan 20, 2026
A Scalable Big Data Framework for Real-Time Predictive Maintenance in Industrial IoT

This article has 3 authors:
1. Muhammad Aasad
2. Yaser Alhasawi
3. Abderahman Rejeb
This article has no evaluationsLatest version Jan 20, 2026
AIM² Framework for Smart Marketing Innovation: AI-Driven Consumer Analytics Using SOR, Neural Networks, and XGBoost in Saudi Retail

This article has 4 authors:
1. Fawaz Khaled Alarfaj
2. Mohamed Badouch
3. Hikmat Ullah Khan
4. Mehdi Boutaounte
This article has no evaluationsLatest version Jan 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Customer Purchase Behavior Analysis and Visualization Using Big Data Analytics: A PySpark-Based Apache Spark Framework

A Scalable Big Data Framework for Real-Time Predictive Maintenance in Industrial IoT

AIM² Framework for Smart Marketing Innovation: AI-Driven Consumer Analytics Using SOR, Neural Networks, and XGBoost in Saudi Retail