Explainable Deep Learning for Age and Gender Prediction from Facial Images: A Comparative Study of VGG16, Resnet50, and Efficientnet with Grad-CAM and SHAP

Yassir Elhaj

doi:10.47772/IJRISS.2026.100400063

Explainable Deep Learning for Age and Gender Prediction from Facial Images: A Comparative Study of VGG16, Resnet50, and Efficientnet with Grad-CAM and SHAP

Authors

Yassir Elhaj

School of Computer Science Nanjing University of Information Science and Technology Nanjing, China (China)

Article Information

DOI: 10.47772/IJRISS.2026.100400063

Subject Category: Deep Learning

Volume/Issue: 10/4 | Page No: 868-886

Publication Timeline

Submitted: 2026-03-30

Accepted: 2026-04-06

Published: 2026-04-27

Abstract

Automatic age estimation and gender classification from facial images represent two of the most intensively studied problems in computer vision, with wide-ranging applications in human-computer interaction, biometric surveillance, targeted marketing, healthcare monitoring, and forensic analysis. Despite remarkable advances in convolutional neural network architectures over the past decade, the black-box nature of deep learning models continues to pose significant challenges in terms of interpretability, trustworthiness, and accountability, particularly in sensitive deployment contexts. This paper presents a comprehensive comparative study of three state-of-the-art deep learning architectures—VGG16, ResNet50, and EfficientNet-B3—for simultaneous age and gender prediction from facial images, with a strong emphasis on model explainability. Our framework employs the UTKFace dataset, comprising over 20,000 face images spanning ages from 1 to 116 across multiple ethnicities. We describe a rigorous preprocessing pipeline incorporating Multitask Cascaded Convolutional Networks (MTCNN) for face detection and alignment, followed by standardized normalization and extensive data augmentation strategies. Both Gradient-weighted Class Activation Mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP) are integrated into the evaluation workflow to provide visual and quantitative insight into the regions and features that drive model decisions. Experimental results demonstrate that EfficientNet-B3 achieves superior performance with a Mean Absolute Error (MAE) of 4.37 years for age estimation and a gender classification accuracy of 96.8%, while maintaining a significantly reduced computational footprint compared to the other architectures under evaluation. ResNet50 offers a strong middle ground between accuracy and training efficiency, whereas VGG16, though interpretable, lags behind in both performance and computational cost. Our explainability analysis reveals that all three models predominantly attend to periocular regions, nasolabial folds, and frontal skull geometry for age estimation, while gender classification relies more heavily on jaw contour, brow ridge prominence, and lip morphology. These findings underscore the importance of integrating explainability tools into the facial analysis pipeline and provide practical guidance for practitioners deploying deep learning systems in real-world, ethically sensitive environments.

Keywords

Age Estimation, Gender Classification, Deep Learning, Explainable AI, Grad-CAM, SHAP, VGG16, ResNet50

Downloads

PDF JATS XML

References

1. H. Han, C. Otto, X. Liu, and A. K. Jain, "Demographic estimation from face images: Human vs. machine performance," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 6, pp. 1148–1161, 2022. [Google Scholar] [Crossref]

2. Y. Wang et al., "Deep learning for age estimation: A survey," Pattern Recognition, 2022. [Google Scholar] [Crossref]

3. M. Ali et al., "A comprehensive survey on age and gender prediction," Expert Systems with Applications, 2022. [Google Scholar] [Crossref]

4. Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, 2015. [Google Scholar] [Crossref]

5. K. He et al., "Deep residual learning for image recognition," in CVPR, 2016. [Google Scholar] [Crossref]

6. M. Tan and Q. Le, "EfficientNet," in ICML, 2019. [Google Scholar] [Crossref]

7. J. Chen et al., "Improved EfficientNet models for image classification," IEEE Access, 2022. [Google Scholar] [Crossref]

8. Y. Zhang et al., "Explainable AI for deep learning: A comprehensive review," Information Fusion, 2022. [Google Scholar] [Crossref]

9. H. Liu et al., "Recent advances in explainable AI for computer vision," IEEE Transactions on AI, 2023. [Google Scholar] [Crossref]

10. Z. Liu et al., "Facial attribute recognition with deep learning: A review," Neurocomputing, 2022. [Google Scholar] [Crossref]

11. K. Zhang et al., "Face alignment in full pose range," IEEE TPAMI, vol. 44, no. 7, pp. 3784–3801, 2022. [Google Scholar] [Crossref]

12. E. Tjoa and C. Guan, "A survey on explainable artificial intelligence (XAI): Toward medical XAI," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 11, pp. 4793–4813, 2021. [Google Scholar] [Crossref]

13. W. Samek et al., "Explaining deep neural networks and beyond," Proceedings of the IEEE, vol. 109, no. 3, pp. 247–278, 2021. [Google Scholar] [Crossref]

14. R. Selvaraju et al., "Grad-CAM," in ICCV, 2017. [Google Scholar] [Crossref]

15. S. Lundberg and S.-I. Lee, "SHAP," in NeurIPS, 2017. [Google Scholar] [Crossref]

16. R. Singh et al., "Explainable AI techniques in deep learning systems," IEEE Access, 2022. [Google Scholar] [Crossref]

17. Q. Wang et al., "Deep learning-based face analysis: Trends and challenges," ACM Computing Surveys, 2023. [Google Scholar] [Crossref]

18. X. Zhang et al., "A survey on deep learning for facial attribute analysis," IEEE Access, 2023. [Google Scholar] [Crossref]

19. K. Simonyan and A. Zisserman, "Very deep convolutional networks," 2015. [Google Scholar] [Crossref]

20. K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499–1503, 2016. [Google Scholar] [Crossref]

21. S. Yang et al., "Deep learning for facial analysis: A survey," IEEE Transactions on Neural Networks, 2022. [Google Scholar] [Crossref]

22. X. Wu et al., "A light CNN for deep face representation," IEEE TIFS, 2021. [Google Scholar] [Crossref]

23. W. Li et al., "Aligned local-global deep attention networks for age estimation," arXiv preprint, 2021. [Google Scholar] [Crossref]

24. A. Dosovitskiy et al., "Vision Transformers," in ICLR, 2021. [Google Scholar] [Crossref]

25. M. Rahman et al., "Age estimation using deep learning: Recent advances," Applied Sciences, 2023. [Google Scholar] [Crossref]

26. T. Zhou et al., "Fairness in facial recognition: A survey," ACM Computing Surveys, 2023. [Google Scholar] [Crossref]

27. B. Zhou et al., "CAM," in CVPR, 2016. [Google Scholar] [Crossref]

28. H. Wang et al., "Score-CAM," in CVPRW, 2020. [Google Scholar] [Crossref]

29. A. Nguyen et al., "Explainable AI in computer vision: A review," Pattern Recognition, 2022. [Google Scholar] [Crossref]

30. M. Huber et al., "Mask-invariant face recognition," in ICIP, 2022. [Google Scholar] [Crossref]

31. A. Khan et al., "Deep CNN-based models for image classification: A survey," Sensors, 2022. [Google Scholar] [Crossref]

32. Z. Zhong et al., "Random erasing data augmentation," in AAAI, 2020. [Google Scholar] [Crossref]

33. H. Zhang et al., "Mixup: Beyond empirical risk minimization," in ICLR, 2018. [Google Scholar] [Crossref]

34. J. Hu et al., "Squeeze-and-excitation networks," in CVPR, 2018. [Google Scholar] [Crossref]

35. R. Achanta et al., "SLIC superpixels compared to state-of-the-art superpixel methods," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012. [Google Scholar] [Crossref]

36. D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in International Conference on Learning Representations (ICLR), 2015. [Google Scholar] [Crossref]

37. Y. Liu et al., "Efficient deep learning models for real-time vision," IEEE Access, 2022. [Google Scholar] [Crossref]

38. A. Paszke et al., "PyTorch: An imperative style, high-performance deep learning library," in NeurIPS, 2019. [Google Scholar] [Crossref]

39. I. Loshchilov and F. Hutter, "SGDR: Stochastic gradient descent with warm restarts," arXiv preprint arXiv:1608.03983, 2017. [Google Scholar] [Crossref]

40. J. Gildenblat, "PyTorch Grad-CAM library," 2021. [Google Scholar] [Crossref]

Explainable Deep Learning for Age and Gender Prediction from Facial Images: A Comparative Study of VGG16, Resnet50, and Efficientnet with Grad-CAM and SHAP

Authors

Article Information

Publication Timeline

Abstract

Keywords

Downloads

References

Metrics

Views & Downloads

Similar Articles