Real-Time Customer Behavior Analysis Using Big Data Analytics and Machine Learning

Authors

Smit Patel

School of Computer Science and Applications, REVA University, Bangalore (India)

Dr. Lakshmi K.

School of Computer Science and Applications, REVA University, Bangalore (India)

Article Information

DOI: 10.51244/IJRSI.2026.1304000240

Subject Category: Computer Science

Volume/Issue: 13/4 | Page No: 2801-2813

Publication Timeline

Submitted: 2026-04-24

Accepted: 2026-04-30

Published: 2026-05-19

Abstract

This paper presents a real-time customer behavior analysis system using Big Data Analytics and machine learning techniques. The rapid growth of digital platforms, mobile applications, and transactional systems has led to the generation of large volumes of customer data, which traditional processing methods struggle to handle efficiently. The proposed framework integrates Apache Kafka for real-time data ingestion, Hadoop Distributed File System (HDFS) for scalable storage, and Apache Spark for high-speed data processing. Machine learning models including K-Means Clustering, Logistic Regression, and Association Rule Mining are applied for customer segmentation, purchase prediction, and product recommendation. The study addresses a key research gap by developing a unified, end-to-end pipeline that combines real-time processing with multiple analytical models and clearly defined evaluation metrics. Experimental results on a dataset of 500,000 customer records show that Logistic Regression achieves the highest accuracy of 90%, outperforming Decision Trees (88%) and K-Means (85%). The results demonstrate that the proposed approach enables improved customer targeting, enhanced retention strategies, and more effective data-driven decision-making.

Keywords

Big Data Analytics, Customer Behavior Analysis

Downloads

References

1. J. Dean and S. Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters," Communications of the ACM, vol. 51, no. 1, pp. 107-113, Jan. 2008. [Google Scholar] [Crossref]

2. M. Zaharia et al., "Apache Spark: A Unified Engine for Big Data Processing," Communications of the ACM, vol. 59, no. 11, pp. 56-65, Nov. 2016. [Google Scholar] [Crossref]

3. T. White, Hadoop: The Definitive Guide, 4th ed. Sebastopol, CA: O'Reilly Media, 2015. [Google Scholar] [Crossref]

4. M. Chen, S. Mao, and Y. Liu, "Big Data: A Survey," Mobile Networks and Applications, vol. 19, no. 2, pp. 171-209, Apr. 2014. [Google Scholar] [Crossref]

5. X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, "Data Mining with Big Data," IEEE Trans. Knowledge and Data Engineering, vol. 26, no. 1, pp. 97-107, Jan. 2014. [Google Scholar] [Crossref]

6. K. Grolinger et al., "Data Management in Cloud Environments: NoSQL and NewSQL Data Stores," Journal of Cloud Computing, vol. 2, no. 22, Dec. 2013. [Google Scholar] [Crossref]

7. R. Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. London: SAGE Publications, 2014. [Google Scholar] [Crossref]

8. J. Xu, Y. Xiang, and D. Yang, "K-Means Clustering for Customer Segmentation in E-Commerce Using Behavioral Features," Journal of Electronic Commerce Research, vol. 18, no. 3, pp. 210-225, 2017. [Google Scholar] [Crossref]

9. R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules in Large Databases," in Proc. 20th Int. Conf. VLDB, Santiago, Chile, Sep. 1994, pp. 487-499. [Google Scholar] [Crossref]

10. W. Lian et al., "Cross-Domain Recommendation for Cold-Start Users via Neighborhood Information Aggregation," in Proc. 37th ACM SIGIR, Gold Coast, Australia, Jul. 2014, pp. 845-848. [Google Scholar] [Crossref]

11. J. Kreps, N. Narkhede, and J. Rao, "Kafka: A Distributed Messaging System for Log Processing," in Proc. NetDB Workshop at SIGMOD, Athens, Greece, Jun. 2011. [Google Scholar] [Crossref]

12. X. Meng et al., "MLlib: Machine Learning in Apache Spark," Journal of Machine Learning Research, vol. 17, no. 34, pp. 1-7, 2016. [Google Scholar] [Crossref]

13. P. Sharma, A. Gupta, and R. Mehta, "Real-Time Customer Churn Prediction Using Deep Learning on Apache Spark Streaming," Expert Systems with Applications, vol. 210, art. no. 118387, Jan. 2023. [Google Scholar] [Crossref]

14. D. Chen, "Online Retail Dataset," UCI Machine Learning Repository, University of California, Irvine, CA, 2015. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Online+Retail [Google Scholar] [Crossref]

15. Government of India, "The Digital Personal Data Protection Act, 2023," Ministry of Electronics and Information Technology, New Delhi, India, Aug. 2023. [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles