Adfraud System: Real-Time Ad Click Fraud Detection Using Stacking Ensemble, Deep Learning, and an Agentic AI Chatbot
Authors
Prof. Ramya Prabhakaran (guide)
Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)
Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)
Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)
Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)
Article Information
DOI: 10.51244/IJRSI.2026.1304000174
Subject Category: computer science and engineering
Volume/Issue: 13/4 | Page No: 2047-2052
Publication Timeline
Submitted: 2026-04-10
Accepted: 2026-04-15
Published: 2026-05-12
Abstract
Ad click fraud drains billions of advertiser budgets annually through bots and click farms that generate fake clicks with zero genuine engagement. This paper presents Adfraud system, a production-ready fraud detection system combining a novel 18-signal real-time feature engineering engine with nine ML/DL algorithms and an agentic AI chatbot. Operating on the public TalkingData AdTracking benchmark (100,000 records; 0.227% positive class), the system engineers fraud signals from raw click telemetry — click burst velocity, device–OS consistency, impossible geolocation, subnet botnet flags, and user-agent entropy — feeding a Stacking Classifier (LR+RF+XGBoost+LightGBM → meta-LR) achieving 97.4% accuracy, 96.8% F1, and AUC 0.98 — statistically significantly outperforming all eight baselines (Friedman χ²=47.3, p<0.0001). SHAP attribution identifies impossible geolocation and device–OS mismatch as the strongest discriminators. The deployed Flask platform exposes 20 REST endpoints, SSE live monitoring, batch processing, model drift detection, multi-website API-key tracking, and an agentic AI chatbot with six specialised fraud-analysis tools. The system is fully containerised via Docker.
Keywords
Ad click fraud; stacking ensemble; LightGBM; XGBoost; LSTM; SHAP; feature engineering; agentic AI; real-time monitoring; Flask; Docker
Downloads
References
1. DoubleVerify, "2023 Global Insights Report," DoubleVerify Inc., 2023. [Google Scholar] [Crossref]
2. H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE Trans. TKDE, vol. 21, no. 9, pp. 1263–1284, 2009. [Google Scholar] [Crossref]
3. W. Aqeel et al., "Click Fraud Detection: A Data Mining Approach," IEEE Access, vol. 8, pp. 192985–192996, 2020. [Google Scholar] [Crossref]
4. D. Liu et al., "DeepFraud: Deep Learning-Based Adversarial Click Fraud Detection," ACM CIKM, pp. 1028–1037, 2021. [Google Scholar] [Crossref]
5. N. V. Chawla et al., "SMOTE: Synthetic Minority Over-sampling Technique," JAIR, vol. 16, pp. 321–357, 2002. [Google Scholar] [Crossref]
6. T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," ACM SIGKDD, pp. 785–794, 2016. [Google Scholar] [Crossref]
7. G. Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," NeurIPS, pp. 3146–3154, 2017. [Google Scholar] [Crossref]
8. D. H. Wolpert, "Stacked Generalization," Neural Netw., vol. 5, no. 2, pp. 241–259, 1992. [Google Scholar] [Crossref]
9. S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," NeurIPS, pp. 4765–4774, 2017. [Google Scholar] [Crossref]
10. TalkingData, "AdTracking Fraud Detection Challenge," Kaggle, 2018. [Google Scholar] [Crossref]
11. L. Grinsztajn, E. Oyallon, and G. Varoquaux, "Why Tree-Based Models Outperform Deep Learning on Tabular Data," NeurIPS, 2022. [Google Scholar] [Crossref]
12. S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [Google Scholar] [Crossref]
13. K. Cho et al., "Learning Phrase Representations using RNN Encoder–Decoder," EMNLP, pp. 1724–1734, 2014. [Google Scholar] [Crossref]