Effective Credit Card Fraud Detection Using Data Mining Techniques
Authors
Lee Kong Chian Faculty of Engineering and Science (LKCFES), Universiti Tunku Abdul Rahman (UTAR), Rawang, Selangor (Malaysia)
Article Information
DOI: 10.51244/IJRSI.2025.1213CS008
Subject Category: Computer Science
Volume/Issue: 12/13 | Page No: 79-114
Publication Timeline
Submitted: 2025-10-24
Accepted: 2025-10-30
Published: 2025-11-15
Abstract
Businesses and consumers around the world face financial and security problems related to credit card fraud. Fraudulent activities are becoming more sophisticated, and therefore the need for effective and/or efficient fraud detection systems has become essential. This study focuses on how machine learning techniques can be applied to detect credit card fraud specifically, and how to overcome challenges like class imbalance, high dimensionality and complexity of real-world data sets. The IEEE-CIS Fraud Detection dataset, a publicly available and highly complex dataset, was utilized to evaluate the performance of various machine learning models. This study compares five machine learning models which are Logistic Regression, Random Forest, XGBoost, LightGBM, and Deep Neural Networks (DNN), to establish a performance baseline using the full dataset with the k-fold stratified cross validation method. Feature engineering was subsequently performed on the best-performing model (LightGBM), utilizing gain-based importance and cumulative feature importance to identify and retain the most relevant features. The reduced dataset was used to retrain the model, and its performance was evaluated against the full dataset to assess the effectiveness of the feature engineering process. An important finding is that feature engineering helped to reduce dataset dimensionality and improve model predictive performance, especially for fraudulent transaction detection. Consequently, the results showcase ensemble methods and advanced feature selection techniques as a possibility for constructing robust fraud detection systems. This research adds to the literature of machine learning applications in the area of fraud detection and it advances our understanding of how to obtain a balance between computational efficiency, interpretability, and accuracy. This study addressed to limitations of the traditional approaches and used state of the art machine learning methodologies in order to provide practical and theoretical contributions to the fight against credit card fraud and for future research and to real world implementations.
Keywords
Machine Learning, Data Mining
Downloads
References
1. Abdulaziz, A. H., 2021. Credit Card Fraud Detection using Data Mining Techniques: Critical Review Study. American Academic & Scholarly Research Journal, 13(2), 71-78. [Google Scholar] [Crossref]
2. Abdulghani, A.Q., Uçan, O.N. and Alheeti, K.M.A., 2021, December. Credit card fraud detection using XGBoost algorithm. In 2021 14th International Conference on Developments in eSystems Engineering (DeSE) (pp. 487-492). IEEE. [Google Scholar] [Crossref]
3. Al-Janabi, S., Patel, A., Fatlawi, H., Kalajdzic, K. and Al Shourbaji, I., 2014, November. Empirical rapid and accurate prediction model for data mining tasks in cloud computing environments. In 2014 international congress on technology, communication and knowledge (ICTCK) (pp. 1-8). IEEE. [Google Scholar] [Crossref]
4. Alamri, M. and Ykhlef, M., 2024. Hybrid undersampling and oversampling for handling imbalanced credit card data. IEEE Access. [Google Scholar] [Crossref]
5. Alkhatib, K.I., Al-Aiad, A.I., Almahmoud, M.H. and Elayan, O.N., 2021, May. Credit card fraud detection based on deep neural network approach. In 2021 12th International Conference on Information and Communication Systems (ICICS) (pp. 153-156). IEEE. [Google Scholar] [Crossref]
6. Beigi, S. and Amin Naseri, M.R., 2020. Credit card fraud detection using data mining and statistical methods. Journal of AI and Data Mining, 8(2), pp.149-160. [Google Scholar] [Crossref]
7. Bolón, C. V., Sánchez, M. N. and Alonso, B. A., 2013. A review of feature selection methods on synthetic data. Knowledge and information systems, 34, pp.483-519. [Google Scholar] [Crossref]
8. Chen, X.W. and Jeong, J.C., 2007, December. Enhanced recursive feature elimination. In Sixth international conference on machine learning and applications (ICMLA 2007) (pp. 429-435). IEEE. [Google Scholar] [Crossref]
9. Dahouda, M.K. and Joe, I., 2021. A deep-learned embedding technique for categorical features encoding. IEEE Access, 9, pp.114381-114391. [Google Scholar] [Crossref]
10. De Amorim, L.B., Cavalcanti, G.D. and Cruz, R.M., 2023. The choice of scaling technique matters for classification performance. Applied Soft Computing, 133, p.109924. [Google Scholar] [Crossref]
11. Delamaire, L., Abdou, H.A.H. and Pointon, J., 2009. Credit card fraud and detection techniques: a review. Banks and Bank systems, 4(2). [Google Scholar] [Crossref]
12. Figueira, A. and Vaz, B., 2022. Survey on synthetic data generation, evaluation methods and GANs. Mathematics, 10(15), p.2733. [Google Scholar] [Crossref]
13. García, S., Luengo, J. and Herrera, F., 2015. Data Preprocessing in Data Mining (Vol. 72, pp. 59-139). Cham, Switzerland: Springer International Publishing. [Google Scholar] [Crossref]
14. Ghosh, S. and Reilly, D.L., 1994, January. Credit card fraud detection with a neural-network. In System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on (Vol. 3, pp. 621-630). IEEE. [Google Scholar] [Crossref]
15. Goyal, R., Manjhvar, A. K., 2020. Review on Credit Card Fraud Detection using Data Mining Classification Techniques & Machine Learning Algorithms. International Journal of Research and Analytical Reviews (IJRAR), 7(1), 972-975. [Google Scholar] [Crossref]
16. Gupta, H. and Asha, V., 2020. Impact of encoding of high cardinality categorical data to solve prediction problems. Journal of Computational and Theoretical Nanoscience, 17(9-10), pp.4197-4201. [Google Scholar] [Crossref]
17. Jonnalagadda, V., Gupta, P. and Sen, E., 2019. Credit card fraud detection using Random Forest Algorithm. International Journal of Advance Research, Ideas and Innovations in Technology, 5(2), pp.1-5. [Google Scholar] [Crossref]
18. Kalid, S.N., Khor, K.C., Ng, K.H. and Tong, G.K., 2024. Detecting frauds and payment defaults on credit card data inherited with imbalanced class distribution and overlapping class problems: A systematic review. IEEE Access. [Google Scholar] [Crossref]
19. Khaled, S., Rohayanti, H., Zeba, T., Manal O., Md, O., Choi, K. (2024). When to Use Standardization and Normalization: Empirical Evidence from Machine Learning Models and XAI. IEEE Access, vol. 12, pp. 135300-135314. doi: 10.1109/ACCESS.2024.3462434. [Google Scholar] [Crossref]
20. Kondo, M., Bezemer, C.P., Kamei, Y., Hassan, A.E. and Mizuno, O., 2019. The impact of feature reduction techniques on defect prediction models. Empirical Software Engineering, 24, pp.1925-1963. [Google Scholar] [Crossref]
21. Koralage, R., 2019. Data Mining Techniques for Credit Card Fraud Detection. Sustain. Vital Technol. Eng. Informatics, (2015), pp.1-9. [Google Scholar] [Crossref]
22. Kumain, K. (2020). Analysis of Fraud Detection on Credit Cards using Data Mining Techniques. Turkish Journal of Computer and Mathematics Education, 11(1), 235-245. [Google Scholar] [Crossref]
23. Larose, C.D. and Larose, D.T., 2019. Data science using Python and R. John Wiley & Sons. [Google Scholar] [Crossref]
24. Leevy, J.L., Khoshgoftaar, T.M. and Hancock, J., 2022, October. Evaluating performance metrics for credit card fraud classification. In 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 1336-1341). IEEE. [Google Scholar] [Crossref]
25. Lundberg, S.M., and Lee, S.-I., 2017. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30 (NIPS 2017), pp. 4765–4774. [Google Scholar] [Crossref]
26. Ozgur, C., Colliau, T., Rogers, G. and Hughes, Z., 2017. MatLab vs. Python vs. R. Journal of data Science, 15(3), pp.355-371. [Google Scholar] [Crossref]
27. Patel, K., 2023. Credit card analytics: a review of fraud detection and risk assessment techniques. International Journal of Computer Trends and Technology, 71(10), pp.69-79. [Google Scholar] [Crossref]
28. Peng, J., Hahn, J., Huang, K. (2023). Handling Missing Values in Information Systems Research: A Review of Methods and Assumptions. Information Systems Research, 34(1), pp. 5-26. doi: 10.1287/isre.2022.1104 [Google Scholar] [Crossref]
29. Mittal, S. and Tyagi, S., 2019, January. Performance evaluation of machine learning algorithms for credit card fraud detection. In 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence) (pp. 320-324). IEEE. [Google Scholar] [Crossref]
30. Muraina, I., 2022. Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts in 7th International Mardin Artuklu Scientific Research Conference, 2022. Mardin, Turkey. [Google Scholar] [Crossref]
31. Nishi, N.J., Sunny, F.A. and Bakchy, S.C., 2022, December. Fraud Detection of Credit Card using Data Mining Techniques. In 2022 4th International Conference on Sustainable Technologies for Industry 4.0 (STI) (pp. 1-6). IEEE. [Google Scholar] [Crossref]
32. Novac, O.C., Chirodea, M.C., Novac, C.M., Bizon, N., Oproescu, M., Stan, O.P. and Gordan, C.E., 2022. Analysis of the application efficiency of TensorFlow and PyTorch in convolutional neural network. Sensors, 22(22), p.8872. [Google Scholar] [Crossref]
33. Raj, S.B.E. and Portia, A.A., 2011. Analysis on credit card fraud detection methods. In 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET) (pp. 152-156). IEEE. [Google Scholar] [Crossref]
34. Shi, X., Wong, Y.D., Li, M.Z.F., Palanisamy, C. and Chai, C., 2019. A feature learning approach based on XGBoost for driving assessment and risk prediction. Accident Analysis & Prevention, 129, pp.170-179. [Google Scholar] [Crossref]
35. Suresh, G., Raj, R. J., 2018. A Study on Credit Card Fraud Detection using Data Mining Techniques. International Journal of Data Mining Techniques and Applications, 7(1), 21-24. [Google Scholar] [Crossref]
36. Taha, A.A. and Malebary, S.J., 2020. An intelligent approach to credit card fraud detection using an optimized light gradient boosting machine. IEEE Access, 8, pp.25579-25587. [Google Scholar] [Crossref]
37. Truong, H.L. and Dustdar, S., 2011. Cloud computing for small research groups in computational science and engineering: current status and outlook. Computing, 91, pp.75-91. [Google Scholar] [Crossref]
38. Wang, T. and Zhao, Y., 2022, January. Credit Card Fraud Detection using Logistic Regression. In 2022 International Conference on Big Data, Information and Computer Network (BDICN) (pp. 301-305). IEEE. [Google Scholar] [Crossref]
39. Yu, L., Zhou, R., Chen, R., and Lai, K. K. (2020). Missing data preprocessing in credit classification: One-hot encoding or imputation. Emerging Markets Finance and Trade, 58(2), pp. 472–482. doi:10.1080/1540496x.2020.1825935. [Google Scholar] [Crossref]
40. Zeng, G., 2023. On the analytical properties of category encodings in logistic regression. Communications in Statistics-Theory and Methods, 52(6), pp.1870-1887. [Google Scholar] [Crossref]
41. Zheng, A. and Casari, A., 2018. Feature engineering for machine learning: principles and techniques for data scientists. " O'Reilly Media, [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet