Adfraud System: Real-Time Ad Click Fraud Detection Using Stacking Ensemble, Deep Learning, and an Agentic AI Chatbot

Authors

Prof. Ramya Prabhakaran (guide)

Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)

Omkar Sawant

Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)

Shrikar Gujjeti

Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)

Nikhil Jain

Department of Computer Engineering, Rizvi College of Engineering,University of Mumbai, Maharashtra, India (India)

Article Information

DOI: 10.51244/IJRSI.2026.1304000174

Subject Category: computer science and engineering

Volume/Issue: 13/4 | Page No: 2047-2052

Publication Timeline

Submitted: 2026-04-10

Accepted: 2026-04-15

Published: 2026-05-12

Abstract

Ad click fraud drains billions of advertiser budgets annually through bots and click farms that generate fake clicks with zero genuine engagement. This paper presents Adfraud system, a production-ready fraud detection system combining a novel 18-signal real-time feature engineering engine with nine ML/DL algorithms and an agentic AI chatbot. Operating on the public TalkingData AdTracking benchmark (100,000 records; 0.227% positive class), the system engineers fraud signals from raw click telemetry — click burst velocity, device–OS consistency, impossible geolocation, subnet botnet flags, and user-agent entropy — feeding a Stacking Classifier (LR+RF+XGBoost+LightGBM → meta-LR) achieving 97.4% accuracy, 96.8% F1, and AUC 0.98 — statistically significantly outperforming all eight baselines (Friedman χ²=47.3, p<0.0001). SHAP attribution identifies impossible geolocation and device–OS mismatch as the strongest discriminators. The deployed Flask platform exposes 20 REST endpoints, SSE live monitoring, batch processing, model drift detection, multi-website API-key tracking, and an agentic AI chatbot with six specialised fraud-analysis tools. The system is fully containerised via Docker.

Keywords

Ad click fraud; stacking ensemble; LightGBM; XGBoost; LSTM; SHAP; feature engineering; agentic AI; real-time monitoring; Flask; Docker

Downloads

References

1. DoubleVerify, "2023 Global Insights Report," DoubleVerify Inc., 2023. [Google Scholar] [Crossref]

2. H. He and E. A. Garcia, "Learning from Imbalanced Data," IEEE Trans. TKDE, vol. 21, no. 9, pp. 1263–1284, 2009. [Google Scholar] [Crossref]

3. W. Aqeel et al., "Click Fraud Detection: A Data Mining Approach," IEEE Access, vol. 8, pp. 192985–192996, 2020. [Google Scholar] [Crossref]

4. D. Liu et al., "DeepFraud: Deep Learning-Based Adversarial Click Fraud Detection," ACM CIKM, pp. 1028–1037, 2021. [Google Scholar] [Crossref]

5. N. V. Chawla et al., "SMOTE: Synthetic Minority Over-sampling Technique," JAIR, vol. 16, pp. 321–357, 2002. [Google Scholar] [Crossref]

6. T. Chen and C. Guestrin, "XGBoost: A Scalable Tree Boosting System," ACM SIGKDD, pp. 785–794, 2016. [Google Scholar] [Crossref]

7. G. Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," NeurIPS, pp. 3146–3154, 2017. [Google Scholar] [Crossref]

8. D. H. Wolpert, "Stacked Generalization," Neural Netw., vol. 5, no. 2, pp. 241–259, 1992. [Google Scholar] [Crossref]

9. S. M. Lundberg and S.-I. Lee, "A Unified Approach to Interpreting Model Predictions," NeurIPS, pp. 4765–4774, 2017. [Google Scholar] [Crossref]

10. TalkingData, "AdTracking Fraud Detection Challenge," Kaggle, 2018. [Google Scholar] [Crossref]

11. L. Grinsztajn, E. Oyallon, and G. Varoquaux, "Why Tree-Based Models Outperform Deep Learning on Tabular Data," NeurIPS, 2022. [Google Scholar] [Crossref]

12. S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [Google Scholar] [Crossref]

13. K. Cho et al., "Learning Phrase Representations using RNN Encoder–Decoder," EMNLP, pp. 1724–1734, 2014. [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles