Facial Expression and Gesture Recognition System for Stress Detection with Deep Learning

Authors

P.G. Dilini Kanchana Kumarihamy

Information Technology, Sri Lanka Institute of Advanced Technological Education, Matale, Central (Sri Lanka)

Article Information

DOI: 10.51244/IJRSI.2026.130200165

Subject Category: Artificial Intelligence

Volume/Issue: 13/2 | Page No: 1779-1795

Publication Timeline

Submitted: 2025-12-29

Accepted: 2026-02-26

Published: 2026-03-16

Abstract

Stress is a significant contributor to declining mental and physical health, necessitating reliable and non-intrusive methods for early detection and continuous monitoring. This study proposes a deep learning–based framework for automated stress detection using facial expression and gesture recognition. Unlike traditional stress assessment methods that rely on self-reported surveys or physiological sensors, the proposed approach leverages visual behavioral cues to enable real-time, contactless monitoring.
The system integrates a Convolutional Neural Network (CNN) for spatial feature extraction from facial images and a Long Short-Term Memory (LSTM) network for modeling temporal dependencies in gesture sequences. Benchmark facial expression and gesture datasets were utilized for training and validation. Data preprocessing included normalization, augmentation, and structured dataset splitting to enhance model generalization. Performance evaluation was conducted using accuracy, precision, recall, F1-score, and root mean squared error (RMSE).
Experimental results indicate that the proposed CNN–LSTM architecture effectively captures subtle stress-related patterns in visual data, demonstrating strong classification performance. The findings support the feasibility of visual-based stress detection as a scalable and non-invasive alternative to physiological monitoring systems. While limitations remain regarding dataset diversity and real-world variability, the study establishes a foundation for future multimodal and real-time stress detection systems applicable in healthcare, workplace monitoring, and human–computer interaction contexts.

Keywords

Stress is a pervasive issue that affects mental and physical health

Downloads

References

1. Al-Shargie, F., Tariq, U., & Mir, H. (2017). A multimodal approach to stress detection using EEG and physiological data. Biomedical Signal Processing and Control, 34, 50-64. https://doi.org/10.1016/j.bspc.2017.01.010 [Google Scholar] [Crossref]

2. Boucsein, W. (2012). Electrodermal activity. Springer Science & Business Media. [Google Scholar] [Crossref]

3. Cohen, S., Kamarck, T., & Mermelstein, R. (1983). A global measure of perceived stress. Journal of Health and Social Behavior, 24(4), 385-396. https://doi.org/10.2307/2136404 [Google Scholar] [Crossref]

4. Corneanu, C. A., Simón, M. O., Cohn, J. F., & Guerrero, S. E. (2016). Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(8), 1548-1568. https://doi.org/10.1109/TPAMI.2016.2515606 [Google Scholar] [Crossref]

5. Kim, H. G., & Kim, K. (2018). Review of the applicability of real-time EEG-based stress detection in workers. Safety and Health at Work, 9(1), 10-14. https://doi.org/10.1016/j.shaw.2017.07.002 [Google Scholar] [Crossref]

6. Kim, J., & Andre, E. (2018). A review of machine learning-based physiological signal analysis for emotion recognition and classification. IEEE Transactions on Affective Computing, 11(1), 2-12. https://doi.org/10.1109/TAFFC.2017.2779832 [Google Scholar] [Crossref]

7. Laborde, S., Mosley, E., & Thayer, J. F. (2017). Heart rate variability and cardiac vagal tone in psychophysiological research–Recommendations for experiment planning, data analysis, and data reporting. Frontiers in Psychology, 8, 213. https://doi.org/10.3389/fpsyg.2017.00213 [Google Scholar] [Crossref]

8. Lazarus, R. S., & Folkman, S. (1984). Stress, appraisal, and coping. Springer Publishing Company. [Google Scholar] [Crossref]

9. Schmidt, P., Reiss, A., Duerichen, R., Marberger, C., & Van Laerhoven, K. (2018). Introducing WESAD, a multimodal dataset for wearable stress and affect detection. Proceedings of the 20th ACM International Conference on Multimodal Interaction, 400-408. https://doi.org/10.1145/3242969.3242985 [Google Scholar] [Crossref]

10. Shaffer, F., & Ginsberg, J. P. (2017). An overview of heart rate variability metrics and norms. Frontiers in Public Health, 5, 258. [Google Scholar] [Crossref]

11. https://doi.org/10.3389/fpubh.2017.00258 [Google Scholar] [Crossref]

12. Spielberger, C. D., Gorsuch, R. L., Lushene, R., Vagg, P. R., & Jacobs, G. A. (1983). Manual for the State-Trait Anxiety Inventory (Form Y). Consulting Psychologists Press. [Google Scholar] [Crossref]

13. Stalder, T., & Kirschbaum, C. (2012). Analysis of cortisol in hair–State of the art and future directions. Brain, Behavior, and Immunity, 26(7), 1019-1029. https://doi.org/10.1016/j.bbi.2012.03.002 [Google Scholar] [Crossref]

14. Wiemeyer, J., Schnaubert, L., Hein, F., & Blank, C. (2020). Gesture recognition for affective computing: A review. Journal of Multimodal User Interfaces, 14, 1-19. https://doi.org/10.1007/s12193-019-00308-2 [Google Scholar] [Crossref]

15. Yao, Y., Li, Y., & Li, W. (2018). Speech emotion recognition using deep neural network with dynamic temporal pooling. IEEE Access, 6, 65037-65045. https://doi.org/10.1109/ACCESS.2018.2878253 [Google Scholar] [Crossref]

16. Ekman, P., & Friesen, W. V. (1978). Facial action coding system: A technique for the measurement of facial movement. Consulting Psychologists Press. [Google Scholar] [Crossref]

17. Gideon, J., McDuff, D., & Cohn, J. (2020). Cross-domain learning for facial expression recognition: A review. IEEE Transactions on Affective Computing, 12(3), 652-672. https://doi.org/10.1109/TAFFC.2020.2991490 [Google Scholar] [Crossref]

18. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. https://doi.org/10.1145/3065386 [Google Scholar] [Crossref]

19. Kumar, P., Patra, S., & Mahapatra, P. (2012). Facial expression recognition using Gabor filter based feature extraction with artificial neural network. Proceedings of the 2012 International Conference on Computing, Communication, and Applications, 1-5. https://doi.org/10.1109/ICCCA.2012.6179181 [Google Scholar] [Crossref]

20. Li, S., & Deng, W. (2020). Deep facial expression recognition: A survey. IEEE Transactions on Affective Computing, 13(1), 119-136. https://doi.org/10.1109/TAFFC.2020.2979471 [Google Scholar] [Crossref]

21. Mollahosseini, A., Hasani, B., & Mahoor, M. H. (2017). AffectNet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing, 10(1), 18-31. https://doi.org/10.1109/TAFFC.2017.2740923 [Google Scholar] [Crossref]

22. Shan, C., Gong, S., & McOwan, P. W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study. Image and Vision Computing, 27(6), 803-816. https://doi.org/10.1016/j.imavis.2008.08.005 [Google Scholar] [Crossref]

23. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. https://doi.org/10.48550/arXiv.1409.1556 [Google Scholar] [Crossref]

24. Soleymani, M., Pantic, M., & Pun, T. (2012). Multimodal emotion recognition in response to videos. IEEE Transactions on Affective Computing, 3(2), 211-223. https://doi.org/10.1109/T-AFFC.2011.37 [Google Scholar] [Crossref]

25. Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39-58. https://doi.org/10.1109/TPAMI.2008.52 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles