Machine Learning-Driven Anomaly Detection in a Large-Scale Database Systems: A Systematic Literature Review

Large-scale database systems form the backbone of the main processes in financial services, healthcare, e-commerce, and government infrastructure. As these systems grow in magnitude, speed, and complexity, organisations must detect anomalies such as security breaches, fraudulent transactions, performance degradation, and data corruption. Financial institutions incur annual fraud losses exceeding $1.2 trillion, and performance anomalies can trigger cascading system failures. Traditional rule-based and purely statistical approaches struggle to manage the complexity and dynamism of modern databases, often causing brittle detection rules, high false-positive rates, and alert fatigue among operations teams. This systematised literature review (SLR) provides an overview of machine learning (ML) and deep learning (DL) techniques for detecting anomalies in large-scale database systems and transactions. In accordance with the PRISMA 2020 model, evidence from over 43 studies published between 2015 and 2025 was synthesised. Initially, 1,247 articles were identified across IEEE Xplore, ACM Digital Library, Scopus, ProQuest, and ResearchGate, and these were then systematically screened and evaluated using rigorous inclusion criteria and a validated 10-criterion quality assessment framework. The reviewed studies achieved a mean quality score of 7.8 out of 10, with 74% rated as high quality.
The review discusses four research questions: the types of anomalies, ML methods, implementation issues, and implications. The major conclusions show that the unsupervised and semi-supervised paradigms predominate (75% of reviewed approaches), as in production settings, there is sparse labelled data on anomalies. Models based on deep learning, namely LSTM-based autoencoders (29 of 43 studies), Isolation Forest models (34 of 43 studies), and Graph Neural Networks, are superior in terms of detection, F1-scores above 0.90, and inference latency of less than 50ms. Best practice has shifted to hybrid multi-tier approaches that combine Isolation Forest for rapid screening with LSTM autoencoders for more detailed analysis, achieving a 30-50% reduction in false positives over single-model baselines with sub-100ms response times to detect fraud in real time.
Continued gaps in research include extreme class imbalance (anomaly rates below 0.1 per cent), hard realtime processing, insufficient model explainability in operational and regulatory conditions, and a lack of standardised, database-specific benchmarks. This review offers scientists and clinicians guidance, based on evidence, for designing effective, interpretable, and production-ready anomaly detection systems, and it makes specific recommendations on overcoming challenges and the direction the research should take.

Keywords

anomaly detection, machine learning, database transactions, transaction logs, autoencoder, systematic literature review.

Downloads

PDF JATS XML

References

1. Al-Amri, R., Murugesan, R. K., Man, M., Abdulateef, A. F., Al-Sharafi, M. A., & Alkahtani, A. A. (2021). A review of machine learning and deep learning techniques for anomaly detection in iot data. In Applied Sciences (Switzerland) (Vol. 11, Number 12). MDPI AG. https://doi.org/10.3390/app11125320 [Google Scholar] [Crossref]

2. Aldweesh, A., Derhab, A., & Emam, A. Z. (2020). Deep learning approaches for anomalybased intrusion detection systems: A survey, taxonomy, and open issues. Knowledge-Based Systems, 189, 105124. https://doi.org/10.1016/j.knosys.2019.105124 [Google Scholar] [Crossref]

3. Ali, S., Boufaied, C., Bianculli, D., Branco, P., & Briand, L. (2025). A comprehensive study of machine learning techniques for log-based anomaly detection. Empirical Software Engineering, 30(5), 129. https://doi.org/10.1007/s10664-025-10669-3 [Google Scholar] [Crossref]

4. Arshad, K., Ali, R. F., Muneer, A., Aziz, I. A., Naseer, S., Khan, N. S., & Taib, S. M. (2022). Deep Reinforcement Learning for Anomaly Detection: A Systematic Review. In IEEE Access (Vol. 10, pp. 124017–124035). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ACCESS.2022.3224023 [Google Scholar] [Crossref]

5. Budiansyah, A., Zulfan, Z., Nizamuddin, N., Candra, R. A., Ilham, D. N., & Nazaruddin, N. (2025). The Effectiveness of Machine Learning Techniques in Anomaly Detection for Cyberattack Prevention: Systematic Literature Review 2020-2025. Brilliance: Research of Artificial Intelligence, 5(1), 259–271. https://doi.org/10.47709/brilliance.v5i1.6124 [Google Scholar] [Crossref]

6. Carcillo, F., Le Borgne, Y.-A., Caelen, O., Kessaci, Y., Oblé, F., & Bontempi, G. (2021). Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, 557, 317–331. https://doi.org/10.1016/j.ins.2019.05.042 [Google Scholar] [Crossref]

7. Cavallaro, C., Cutello, V., Pavone, M., & Zito, F. (2023). Discovering anomalies in big data: a review focused on the application of metaheuristics and machine learning techniques. In Frontiers in Big Data (Vol. 6). Frontiers Media SA. https://doi.org/10.3389/fdata.2023.1179625 [Google Scholar] [Crossref]

8. Corli, S., Moro, L., Dragoni, D., Dispenza, M., & Prati, E. (2025). Quantum machine learning algorithms for anomaly detection: A review. Future Generation Computer Systems, 166, 107632. https://doi.org/10.1016/j.future.2024.107632 [Google Scholar] [Crossref]

9. Correa Bahnsen, A., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134–142. https://doi.org/10.1016/j.eswa.2015.12.030 [Google Scholar] [Crossref]

10. Dal Pozzolo, A., Caelen, O., Johnson, R. A., & Bontempi, G. (n.d.). Calibrating Probability with Undersampling for Unbalanced Classification. [Google Scholar] [Crossref]

11. Desai, A., Kosse, A., & Sharples, J. (2025). Finding a needle in a haystack: A machine learning framework for anomaly detection in payment systems. Journal of Finance and Data Science, 11. https://doi.org/10.1016/j.jfds.2025.100163 [Google Scholar] [Crossref]

12. Dreshaj, A., Hamiti, M., Hasani, Z., Besimi, N., & Ajdari, J. (2025). Systematic Literature Review on Automatic Anomaly Detection Based on Database Logs. 2025 MIPRO 48th ICT and Electronics Convention, 1933–1937. https://doi.org/10.1109/MIPRO65660.2025.11132041 [Google Scholar] [Crossref]

13. Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). DeepLog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the ACM Conference on Computer and Communications Security, 1285–1298. https://doi.org/10.1145/3133956.3134015 [Google Scholar] [Crossref]

14. Fernando, T., Gammulle, H., Denman, S., Sridharan, S., & Fookes, C. (2022). Deep Learning for Medical Anomaly Detection A Survey. ACM Computing Surveys, 54(7). https://doi.org/10.1145/3464423 [Google Scholar] [Crossref]

15. Guo, H., Yuan, S., & Wu, X. (2021). LogBERT: Log Anomaly Detection via BERT. 2021 International Joint Conference on Neural Networks (IJCNN), 1–8. https://doi.org/10.1109/IJCNN52387.2021.9534113 [Google Scholar] [Crossref]

16. Hariri, S., Kind, M. C., & Brunner, R. J. (2021). Extended Isolation Forest. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1479–1489. https://doi.org/10.1109/TKDE.2019.2947676 [Google Scholar] [Crossref]

17. Hu, X., Xie, C., Fan, Z., Duan, Q., Zhang, D., Jiang, L., Wei, X., Hong, D., Li, G., Zeng, X., Chen, W., Wu, D., & Chanussot, J. (2022). Hyperspectral Anomaly Detection Using Deep Learning: A Review. In Remote Sensing (Vol. 14, Number 9). MDPI. https://doi.org/10.3390/rs14091973 [Google Scholar] [Crossref]

18. Huang, S., Liu, Y., Fung, C., He, R., Zhao, Y., Yang, H., & Luan, Z. (2020). HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Transactions on Network and Service Management, 17(4), 2064–2076. https://doi.org/10.1109/TNSM.2020.3034647 [Google Scholar] [Crossref]

19. Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.-E., He-Guelton, L., & Caelen, O. (2018). Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100, 234–245. https://doi.org/10.1016/j.eswa.2018.01.037 [Google Scholar] [Crossref]

20. Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering – A systematic literature review. Information and Software Technology, 51(1), 7–15. https://doi.org/10.1016/j.infsof.2008.09.009 [Google Scholar] [Crossref]

21. Kumar, A., Kumar, A., Raja, R., Dewangan, A. K., Kumar, M., Soni, A., Agarwal, D., & Saudagar, A. K. J. (2025). Revolutionising anomaly detection: a hybrid framework for anomaly detection integrating isolation forest, autoencoder, and Conv. LSTM. Knowledge and Information Systems, 67(12), 11903–11953. https://doi.org/10.1007/s10115-025-02580-6 [Google Scholar] [Crossref]

22. Landauer, M., Onder, S., Skopik, F., & Wurzenberger, M. (2023). Deep learning for anomaly detection in log data: A survey. Machine Learning with Applications, 12, 100470. https://doi.org/10.1016/j.mlwa.2023.100470 [Google Scholar] [Crossref]

23. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining, 413–422. https://doi.org/10.1109/ICDM.2008.17 [Google Scholar] [Crossref]

24. Ma, X., Wu, J., Xue, S., Yang, J., Zhou, C., Sheng, Q. Z., Xiong, H., & Akoglu, L. (2023). A Comprehensive Survey on Graph Anomaly Detection With Deep Learning. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12012–12038. https://doi.org/10.1109/TKDE.2021.3118815 [Google Scholar] [Crossref]

25. Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Chen, Y., Zhang, R., Tao, S., Sun, P., & Zhou, R. (n.d.). LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. [Google Scholar] [Crossref]

26. Moldovan, S. C., & Iantovics, L. B. (2025). Review on Information Fusion-Based Data Mining for Improving Complex Anomaly Detection. In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery (Vol. 15, Number 2). John Wiley and Sons Inc. https://doi.org/10.1002/widm.70017 [Google Scholar] [Crossref]

27. Motie, S., & Raahemi, B. (2024). Financial fraud detection using graph neural networks: A systematic review. Expert Systems with Applications, 240, 122156. https://doi.org/10.1016/j.eswa.2023.122156 [Google Scholar] [Crossref]

28. Nassif, A. B., Talib, M. A., Nasir, Q., & Dakalbab, F. M. (2021). Machine Learning for Anomaly Detection: A Systematic Review. In IEEE Access (Vol. 9, pp. 78658–78700). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ACCESS.2021.3083060 [Google Scholar] [Crossref]

29. Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. [Google Scholar] [Crossref]

30. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. In BMJ (Vol. 372). BMJ Publishing Group. https://doi.org/10.1136/bmj.n71 [Google Scholar] [Crossref]

31. Priyanto, C. Y., Hendry, & Purnomo, H. D. (2021). Combination of Isolation Forest and LSTM Autoencoder for Anomaly Detection. 2021 2nd International Conference on Innovative and Creative Information Technology (ICITech), 35–38. https://doi.org/10.1109/ICITech50181.2021.9590143 [Google Scholar] [Crossref]

32. Pumsirirat, A., & Yan, L. (2018). Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine. In IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 9, Number 1). www.ijacsa.thesai.org [Google Scholar] [Crossref]

33. Rafique, S. H., Abdallah, A., Musa, N. S., & Murugan, T. (2024). Machine Learning and Deep Learning Techniques for Internet of Things Network Anomaly Detection—Current Research Trends. Sensors, 24(6), 1968. https://doi.org/10.3390/s24061968 [Google Scholar] [Crossref]

34. Reddy, C., Prabhagaran, S., & Vaid, A. (2025). Adaptive Anomaly Detection in Database Transactions: Bridging Security Gaps with Reinforcement Learning. European Journal of Artificial Intelligence and Machine Learning, 4(2), 8–14. https://doi.org/10.24018/ejai.2025.4.2.53 [Google Scholar] [Crossref]

35. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ‘Why should i trust you?’ Explaining the predictions of any classifier. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-August-2016, 1135–1144. https://doi.org/10.1145/2939672.2939778 [Google Scholar] [Crossref]

36. Saraswathi, S., & Selvakumar, S. (2025). Enhanced Anomaly Detection in Wireless Sensor Networks Using Isolation Forest, LSTM, and LSTM Autoencode. 2025 3rd International Conference on Inventive Computing and Informatics (ICICI), 1528–1534. https://doi.org/10.1109/ICICI65870.2025.11069875 [Google Scholar] [Crossref]

37. Tahir, M., Abdullah, A., Izura Udzir, N., & Azhar Kasmiran, K. (2025). A systematic review of machine learning and deep learning techniques for anomaly detection in data mining. International Journal of Computers and Applications, 47(2), 169–187. https://doi.org/10.1080/1206212X.2025.2449999 [Google Scholar] [Crossref]

38. Tran, P. H., Heuchenne, C., & Thomassey, S. (2020). An anomaly detection approach based on the combination of LSTM autoencoder and isolation forest for multivariate time series data. Developments of Artificial Intelligence Technologies in Computation and Robotics, 589–596. https://doi.org/10.1142/9789811223334_0071 [Google Scholar] [Crossref]

39. Wu, Y., Sicard, B., & Gadsden, S. A. (2024). Physics-informed machine learning: A comprehensive review on applications in anomaly detection and condition monitoring. In Expert Systems with Applications (Vol. 255). Elsevier Ltd. https://doi.org/10.1016/j.eswa.2024.124678 [Google Scholar] [Crossref]

40. Yang, L., Chen, J., Wang, Z., Wang, W., Jiang, J., Dong, X., & Zhang, W. (2021). SemiSupervised Log-Based Anomaly Detection via Probabilistic Label Estimation. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 1448–1460. https://doi.org/10.1109/ICSE43902.2021.00130 [Google Scholar] [Crossref]

41. Yoon, D. Y., Niu, N., & Mozafari, B. (2016). DBSherlock: A performance diagnostic tool for transactional databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, 26-June-2016, 1599–1614. https://doi.org/10.1145/2882903.2915218 41. Zapata-Cortes, O., Arango-Serna, M. D., Zapata-Cortes, J. A., & Restrepo-Carmona, J. A. (2024). Machine Learning Models and Applications for Early Detection. In Sensors (Vol. 24, Number 14). Multidisciplinary Digital Publishing Institute (MDPI). https://doi.org/10.3390/s24144678 [Google Scholar] [Crossref]

42. Zhang, L., Jia, T., Jia, M., Li, Y., Yang, Y., & Wu, Z. (2024). Multivariate Log-based Anomaly Detection for Distributed Database. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 4256–4267. https://doi.org/10.1145/3637528.3671725 [Google Scholar] [Crossref]

43. Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., Xie, C., Yang, X., Cheng, Q., Li, Z., Chen, J., He, X., Yao, R., Lou, J. G., Chintalapati, M., Shen, F., & Zhang, D. (2019). Robust log-based anomaly detection on unstable log data. ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 807–817. https://doi.org/10.1145/3338906.3338931 [Google Scholar] [Crossref]

Machine Learning-Driven Anomaly Detection in a Large-Scale Database Systems: A Systematic Literature Review

Authors

Article Information

Publication Timeline

Abstract

Keywords

Downloads

References

Metrics

Views & Downloads

Similar Articles