Certified Adversarial Robustness in Deep Learning Via Differential Privacy and Ensemble Training

Authors

Charles Roland Haruna

Department of Computer Science and Information Technology, University of Cape Coast, Cape Coast (Ghana)

Edmund Ofei Ayeh

Department of Computer Science and Information Technology, University of Cape Coast, Cape Coast; Computer Science Department, Lancaster University Ghana Campus, Accra (Ghana)

Maame Gyamfua Asante-Mensah

Department of Computer Science and Information Technology, University of Cape Coast, Cape Coast (Ghana)

Obed Tettey Nartey

Chengdu University of Technology Sino-British Collaborative Education Programme, Chengdu (China)

Kwame Opuni-Boachie Obour Agyekum

Department of Telecommunication Engineering, Kwame Nkrumah University of Science and Technology, Kumasi (Ghana)

Pius Kwao Gadosey

Computer Science Department, Lancaster University Ghana Campus, Accra (Ghana)

Article Information

DOI: 10.51244/IJRSI.2026.1305000026

Subject Category: Computer Science

Volume/Issue: 13/5 | Page No: 283-297

Publication Timeline

Submitted: 2026-04-22

Accepted: 2026-04-27

Published: 2026-05-22

Abstract

Deep learning models remain susceptible to adversarial attacks, posing serious risks in safety-critical applications such as autonomous driving and medical diagnosis. This study introduces the Certified Robustness Differential Privacy (CRDP) framework, which integrates differential privacy (DP) with ensemble adversarial training to enhance robustness while preserving accuracy. CRDP employs DP noise mechanisms (Laplace and Gaussian) and dynamic adversarial mixing, optimizing the robustness-accuracy trade-off through principled noise calibration. Experiments on CIFAR-10 and MNIST demonstrate that the ensemble model achieves 99.12% accuracy under adversarial attack at ε = 0.5, surpassing single-model baselines by 1.84 percentage points. CRDP further attains a certified accuracy of 80% using Laplace noise (ε = 0.5), outperforming Gaussian noise alternatives under equivalent privacy budgets. Projected Gradient Descent (PGD)-based adversarial training additionally enhances resilience against iterative attacks. These findings confirm the advantage of Laplace noise in strengthening certified security guarantees while maintaining competitive model performance. This work unifies theoretical privacy guarantees with empirical validation, providing actionable strategies for deploying robust deep learning models in adversarial environments.

Keywords

Adversarial Robustness, Certified Robustness, Differential Privacy

Downloads

References

1. J. W. Goodell, S. Kumar, W. M. Lim, and D. Pattnaik, "Artificial intelligence and machine learning in finance: Identifying foundations, themes, and research clusters from bibliometric analysis," Finance Research Letters, vol. 42, p. 102073, 2021. https://doi.org/10.1016/j.frl.2021.102073 [Google Scholar] [Crossref]

2. Q. An, S. Rahman, J. Zhou, and J. J. Kang, "A comprehensive review on machine learning in healthcare industry: Classification, restrictions, opportunities and challenges," Sensors, vol. 23, no. 9, p. 4178, 2023. https://doi.org/10.3390/s23094178 [Google Scholar] [Crossref]

3. A. Kurakin, I. J. Goodfellow, and S. Bengio, "Adversarial examples in the physical world," in Artificial Intelligence Safety and Security, R. V. Yampolskiy, Ed., Chapman and Hall/CRC, 2018, pp. 99–112. https://doi.org/10.1201/9781351251389-8 [Google Scholar] [Crossref]

4. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," in Advances in Neural Information Processing Systems, vol. 27, 2014. [Note: cited for black-box attack step-size heuristic — please verify this is the intended reference or replace with a dedicated black-box attack source.] [Google Scholar] [Crossref]

5. Y. Bai, B. Li, D. Yu, J. Chen, and J. Zou, "Improving the accuracy-robustness trade-off of classifiers via adaptive smoothing," SIAM Journal on Mathematics of Data Science, vol. 6, no. 3, pp. 788–814, 2024. https://doi.org/10.1137/23M1565417 [Google Scholar] [Crossref]

6. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, "Towards deep learning models resistant to adversarial attacks," in Proceedings of the International Conference on Learning Representations (ICLR), 2018. arXiv:1706.06083. [Google Scholar] [Crossref]

7. K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90 [Google Scholar] [Crossref]

8. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, "How transferable are features in deep neural networks?," in Advances in Neural Information Processing Systems, vol. 27, 2014. [Google Scholar] [Crossref]

9. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, "Intriguing properties of neural networks," in Proceedings of the International Conference on Learning Representations (ICLR), 2014. arXiv:1312.6199. [Google Scholar] [Crossref]

10. Y. Zhou, M. Kantarcioglu, and B. Xi, "Diversity-driven adversarial robustness in deep ensembles," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, pp. 9155–9163, 2022. https://doi.org/10.1609/aaai.v36i8.20899 [Google Scholar] [Crossref]

11. B. Lakshminarayanan, A. Pritzel, and C. Blundell, "Simple and scalable predictive uncertainty estimation using deep ensembles," in Advances in Neural Information Processing Systems, vol. 30, 2017. [Google Scholar] [Crossref]

12. Z. Yue, On Adversarial Machine Learning and Robust Optimization, Ph.D. dissertation, National University of Singapore, 2021. [Please supply URL or DOI if available.] [Google Scholar] [Crossref]

13. M. Lecuyer, V. Atlidakis, R. Geambasu, D. Hsu, and S. Jana, "Certified robustness to adversarial examples with differential privacy," in Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2019, pp. 656–672. https://doi.org/10.1109/SP.2019.00044 [Google Scholar] [Crossref]

14. F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, "Ensemble adversarial training: Attacks and defenses," in Proceedings of the International Conference on Learning Representations (ICLR), 2018. arXiv:1705.07204. [Google Scholar] [Crossref]

15. D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, "Robustness may be at odds with accuracy," in Proceedings of the International Conference on Learning Representations (ICLR), 2019. arXiv:1805.12152. [Google Scholar] [Crossref]

16. A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, "Adversarial examples are not bugs, they are features," in Advances in Neural Information Processing Systems, vol. 32, 2019. [Google Scholar] [Crossref]

17. S.-A. Rebuffi, S. Gowal, D. A. Calian, F. Stimberg, O. Wiles, and T. Mann, "Data augmentation can improve robustness," in Advances in Neural Information Processing Systems, vol. 34, pp. 29935–29948, 2021. [Google Scholar] [Crossref]

18. L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry, "Adversarially robust generalization requires more data," in Advances in Neural Information Processing Systems, vol. 31, 2018. [Google Scholar] [Crossref]

19. M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, "Deep learning with differential privacy," in Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2016, pp. 308–318. https://doi.org/10.1145/2976749.2978318 [Google Scholar] [Crossref]

20. A. Khamparia and K. M. Singh, "A systematic review on deep learning architectures and applications," Expert Systems, vol. 36, no. 3, p. e12400, 2019. https://doi.org/10.1111/exsy.12400 [Google Scholar] [Crossref]

21. A. Torfi, R. A. Shirvani, Y. Keneshloo, N. Tavabi, and E. A. Fox, "Natural language processing advancements by deep learning: A survey," arXiv:2003.01200, 2020. [Google Scholar] [Crossref]

22. A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay, "A survey on adversarial attacks and defenses," CAAI Transactions on Intelligence Technology, vol. 6, no. 1, pp. 25–45, 2021. https://doi.org/10.1049/cit2.12028 [Google Scholar] [Crossref]

23. X. Liu, L. Xie, Y. Wang, J. Zou, J. Xiong, Z. Ying, and A. V. Vasilakos, "Privacy and security issues in deep learning: A survey," IEEE Access, vol. 9, pp. 4566–4593, 2020. https://doi.org/10.1109/ACCESS.2020.3045078 [Google Scholar] [Crossref]

24. [Duplicate of reference 4 — please remove before submission.] [Google Scholar] [Crossref]

25. J. Sen, A. Sen, and A. Chatterjee, "Adversarial attacks on image classification models: Analysis and defense," arXiv:2312.16880, 2023. [Google Scholar] [Crossref]

26. Y. Wang et al., "Minimizing adversarial training samples for robust image classifiers: Analysis and adversarial example generator design," IEEE Transactions on Information Forensics and Security, 2024. [Please supply volume, issue, pages, and DOI.] [Google Scholar] [Crossref]

27. X. Zhong and C. Liu, "Sparse-PGD: A unified framework for sparse adversarial perturbations generation," arXiv:2405.05075, 2024. [Google Scholar] [Crossref]

28. M. Shao et al., "Latent code augmentation based on stable diffusion for data-free substitute attacks," IEEE Transactions on Neural Networks and Learning Systems, 2025. [Please supply volume, issue, pages, and DOI.] [Google Scholar] [Crossref]

29. J. Chen et al., "A Frank–Wolfe framework for efficient and effective adversarial attacks," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 3486–3494, 2020. https://doi.org/10.1609/aaai.v34i04.5753 [Google Scholar] [Crossref]

30. S. H. Silva and P. Najafirad, "Opportunities and challenges in deep learning adversarial robustness: A survey," arXiv:2007.00753, 2020. [Google Scholar] [Crossref]

31. C. Zhang et al., "Generative adversarial networks: A survey on attack and defense perspective," ACM Computing Surveys, vol. 56, no. 4, pp. 1–35, 2023. https://doi.org/10.1145/3617895 [Google Scholar] [Crossref]

32. A. K. M. I. Newaz et al., "Adversarial attacks to machine learning-based smart healthcare systems," in Proceedings of the IEEE Global Communications Conference (GLOBECOM), 2020, pp. 1–6. https://doi.org/10.1109/GLOBECOM42002.2020.9322472 [Google Scholar] [Crossref]

33. O. A. Bello et al., "Machine learning approaches for enhancing fraud prevention in financial transactions," International Journal of Management and Technology, vol. 10, no. 1, pp. 85–108, 2023. [Please supply DOI if available.] [Google Scholar] [Crossref]

34. B. Badjie, J. Cecilio, and A. Casimiro, "Adversarial attacks and countermeasures on image classification-based deep learning models in autonomous driving systems: A systematic review," ACM Computing Surveys, vol. 57, no. 1, pp. 1–52, 2024. https://doi.org/10.1145/3685604 [Google Scholar] [Crossref]

35. Z. Jia, H. Fang, and W. Zhang, "MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression," in Proceedings of the ACM International Conference on Multimedia, 2021, pp. 41–49. https://doi.org/10.1145/3474085.3475568 [Google Scholar] [Crossref]

36. K. Zhu et al., "Improving generalization of adversarial training via robust critical fine-tuning," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 4424–4434. [Google Scholar] [Crossref]

37. N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, "Distillation as a defense to adversarial perturbations against deep neural networks," in Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2016. https://doi.org/10.1109/SP.2016.41 [Google Scholar] [Crossref]

38. R. Feinman, R. R. Curtin, S. Shintre, and A. B. Gardner, "Detecting adversarial samples from artifacts," arXiv:1703.00410, 2017. [Google Scholar] [Crossref]

39. J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, "On detecting adversarial perturbations," in Proceedings of the International Conference on Learning Representations (ICLR), 2017. [Google Scholar] [Crossref]

40. A. Raghunathan, S. M. Xie, F. Yang, J. Duchi, and P. Liang, "Adversarial training can hurt generalization," arXiv:1906.06032, 2019. [Google Scholar] [Crossref]

41. C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille, "Mitigating adversarial effects through randomization," in Proceedings of the International Conference on Learning Representations (ICLR), 2018. [Google Scholar] [Crossref]

42. P. Samangouei, M. Kabkab, and R. Chellappa, "Defense-GAN: Protecting classifiers against adversarial attacks using generative models," in Proceedings of the International Conference on Learning Representations (ICLR), 2018. arXiv:1805.06605. [Google Scholar] [Crossref]

43. [Duplicate of reference 42 — please remove before submission.] [Google Scholar] [Crossref]

44. J. Buckman, A. Roy, C. Raffel, and I. Goodfellow, "Thermometer encoding: One hot way to resist adversarial examples," in Proceedings of the International Conference on Learning Representations (ICLR), 2018. [Google Scholar] [Crossref]

45. W. Zhao, S. Alwidian, and Q. H. Mahmoud, "Adversarial training methods for deep learning: A systematic review," Algorithms, vol. 15, no. 8, p. 283, 2022. https://doi.org/10.3390/a15080283 [Google Scholar] [Crossref]

46. O. Gungor, Towards Intelligent, Secure, and Efficient Industrial Internet of Things, Ph.D. dissertation, University of California, San Diego, 2023. [Please supply URL or DOI if available.] [Google Scholar] [Crossref]

47. H. Wang and Y. Wang, "Self-ensemble adversarial training for improved robustness," arXiv:2203.09678, 2022. [Google Scholar] [Crossref]

48. Y. Cai et al., "Ensemble-in-one: Ensemble learning within random gated networks for enhanced adversarial robustness," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, pp. 14738–14747, 2023. https://doi.org/10.1609/aaai.v37i12.26724 [Google Scholar] [Crossref]

49. V. Dutta, M. Choras, M. Pawlicki, and R. Kozik, "A deep learning ensemble for network anomaly and cyber-attack detection," Sensors, vol. 20, no. 16, p. 4583, 2020. https://doi.org/10.3390/s20164583 [Google Scholar] [Crossref]

50. C. Guo, M. Rana, M. Cisse, and L. van der Maaten, "Countering adversarial images using input transformations," arXiv:1711.00117, 2017. [Google Scholar] [Crossref]

51. K. Mahmood, R. Mahmood, and M. van Dijk, "Back in black: A comparative evaluation of recent state-of-the-art black-box attacks," IEEE Access, vol. 10, pp. 998–1019, 2021. https://doi.org/10.1109/ACCESS.2021.3128280 [Google Scholar] [Crossref]

52. M. Z. Horváth et al., "Boosting randomized smoothing with variance reduced classifiers," arXiv:2106.06946, 2021. [Google Scholar] [Crossref]

53. J. Zeng et al., "Certified robustness to text adversarial attacks by randomized [mask]," Computational Linguistics, vol. 49, no. 2, pp. 395–427, 2023. https://doi.org/10.1162/coli_a_00477 [Google Scholar] [Crossref]

54. T. Maho, T. Furon, and E. Le Merrer, "Randomized smoothing under attack: How good is it in practice?," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 3014–3018. https://doi.org/10.1109/ICASSP43922.2022.9747033 [Google Scholar] [Crossref]

55. H. Liu, K. Roeder, and L. Wasserman, "Stability approach to regularization selection (StARS) for high dimensional graphical models," in Advances in Neural Information Processing Systems, vol. 23, 2010. [Google Scholar] [Crossref]

56. D. Bhardwaj, K. Kaushik, and S. Gupta, "Accelerated smoothing: A scalable approach to randomized smoothing," arXiv:2402.07498, 2024. [Google Scholar] [Crossref]

57. E. Bagdasaryan, (Un)Trustworthy Machine Learning, Ph.D. dissertation, Cornell University, 2023. [Please supply URL or DOI if available.] [Google Scholar] [Crossref]

58. C. Yang et al., "Gradient leakage defense in federated learning using gradient perturbation-based dynamic clipping," in Proceedings of the IEEE International Conference on Web Services (ICWS), 2024, pp. 178–187. [Google Scholar] [Crossref]

59. Z. Lu, Z. Liao, and H. Li, "Robust and verifiable privacy federated learning," IEEE Transactions on Artificial Intelligence, vol. 5, no. 4, pp. 1895–1908, 2023. https://doi.org/10.1109/TAI.2022.3211887 [Google Scholar] [Crossref]

60. W. Wang et al., "Certified robustness to word substitution attack with differential privacy," in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021, pp. 1102–1112. https://doi.org/10.18653/v1/2021.naacl-main.87 [Google Scholar] [Crossref]

61. J. J. Hathaliya, S. Tanwar, and P. Sharma, "Adversarial learning techniques for security and privacy preservation: A comprehensive review," Security and Privacy, vol. 5, no. 3, p. e209, 2022. https://doi.org/10.1002/spy2.209 [Google Scholar] [Crossref]

62. S. Nandi et al., "Certified adversarial robustness within multiple perturbation bounds," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 2298–2305. [Google Scholar] [Crossref]

63. A. J. Ferreira and M. A. T. Figueiredo, "Boosting algorithms: A review of methods, theory, and applications," in Ensemble Machine Learning: Methods and Applications, Springer, 2012, pp. 35–85. [Google Scholar] [Crossref]

64. C. McPhail et al., "Robustness metrics: How are they calculated, when should they be used and why do they give different results?," Earth’s Future, vol. 6, no. 2, pp. 169–191, 2018. https://doi.org/10.1002/2017EF000649 [Google Scholar] [Crossref]

65. S. Mei et al., "A comprehensive study on the robustness of deep learning-based image classification and object detection in remote sensing," Journal of Remote Sensing, vol. 4, p. 0219, 2024. [Please supply DOI.] [Google Scholar] [Crossref]

66. A. Malik et al., "Application of functional traits in modelling productivity and resilience under climate change," in Plant Functional Traits for Improving Productivity, Springer, 2024, pp. 77–96. [Google Scholar] [Crossref]

67. Y. Chen and S. Eger, "MENLI: Robust evaluation metrics from natural language inference," Transactions of the Association for Computational Linguistics, vol. 11, pp. 804–825, 2023. https://doi.org/10.1162/tacl_a_00576 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles