AI-Powered Optical Character Recognition for Automated Timesheet Data Extraction: A Multimodal Approach for Handling Document Degradation

Authors

Mellanie S. Gambe

Student, Northwestern Mindanao State College of Science and Technology Instructor, St. Peter’s College (Philippines)

Florence Jean B. Talirongan

Professor, Northwestern Mindanao State College of Science and Technology (Philippines)

Article Information

DOI: 10.47772/IJRISS.2026.100300188

Subject Category: Computer Science

Volume/Issue: 10/3 | Page No: 2598-2609

Publication Timeline

Submitted: 2026-03-12

Accepted: 2026-03-17

Published: 2026-03-31

Abstract

This study explores the utilization of Optical Character Recognition in extracting employee information from physical datasheets. The development process includes the integration of Google Gemini 2.5 Flash Framework with the implementation of React frontend development. The different states of degradation, namely original, folded, crumpled, and wet, were 20 samples per category for 80 samples. The system achieved high accuracy: 100% accuracy for original documents, 90% for folded documents, 70% for crumpled documents, and 91.66% for wet timesheets, with a final accuracy of 87.92%. This means that context-aware multimodal reasoning is a powerful framework that can substantially reduce the reliance on standard binarization and template-matching in real-life document digitization, achieving 12–47 percentage point higher accuracy than the baseline OCR. This work serves as a baseline in determining document degradation in terms of manual to digital utilization and extraction.

Keywords

multimodal LLM, Gemini 2.5 Flash, optical character recognition

Downloads

References

1. Abinaya, G., Aparna, K. H., Keerthika, M., Harshini, S. R., & Jothi, K. R. (2024). Automated document processing: Combining OCR and generative AI for efficient text extraction and summarization. In Proceedings of the 2024 International Conference on Smart Electronics and Communication Systems (ICSES) (pp. 1--6). IEEE. https://doi.org/10.1109/ICSES63760.2024.10910510 [Google Scholar] [Crossref]

2. Abirami, S. K., Jyothikamalesh, S., Sowmiya, M., Abirami, S., Mary, S. A. L., & Jayasudha, C. (2022). AI-based attendance tracking system using real-time facial recognition. In 6th International Conference on Electronics, Communication and Aerospace Technology (ICECA 2022) - Proceedings (pp. 1547--1552). IEEE. https://doi.org/10.1109/ICECA55336.2022.10009331 [Google Scholar] [Crossref]

3. Asselborn, T., Dorokhova, M., Šafránek, D., Šimečková, P., & Vepřek, V. (2024). Enhancing text recognition of damaged documents through synergistic OCR and large language models. In M. Ganzha, L. Maciaszek, M. Paprzycki, & D. Ślęzak (Eds.), Proceedings of the 19th Conference on Computer Science and Intelligence Systems (FedCSIS) (pp. 443--447). IEEE. https://doi.org/10.15439/2024f7400 [Google Scholar] [Crossref]

4. Boe, C. H., Ng, K. W., Haw, S. C., Naveen, P., & Anaam, E. A. (2024). An automated face detection and recognition for class attendance. International Journal on Informatics Visualization, 8(3), 1672--1680. https://doi.org/10.62527/joiv.8.3.2967 [Google Scholar] [Crossref]

5. Chaudhury, A., Mukherjee, P. S., Das, S., Biswas, C., & Bhattacharya, U. (2022). A deep OCR for degraded Bangla documents. ACM Transactions on Asian and Low-Resource Language Information Processing, 21(5), Article 91. https://doi.org/10.1145/3511807 [Google Scholar] [Crossref]

6. Chauhan, V., Singh, H., Dewari, K., & Kumar, I. (2024). Efficient employee tracking with smart attendance system using advanced face recognition and geofencing. In 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS 2024) - Proceedings (pp. 1214--1219). IEEE. https://doi.org/10.1109/ICSCSS60660.2024.10625038 [Google Scholar] [Crossref]

7. Chia, Y. K., Xu, P., Xie, S. M., Rudzicz, F., & Kawaguchi, K. (2025). M-LongDoc: A benchmark for multimodal super-long document understanding and a retrieval-aware tuning framework. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 8341--8363). Association for Computational Linguistics. https://doi.org/10.18653/v1/2025.emnlp-main.469 [Google Scholar] [Crossref]

8. Khan, N., Din, S., Rehman, S. U., Hameed, A. A., Qureshi, S. S., Algarni, A. D., Shah, H., & Elmannai, H. (2025). Systematic literature review of machine learning models and applications for text recognition. IEEE Access, 13, 12838--12860. https://doi.org/10.1109/ACCESS.2025.3618109 [Google Scholar] [Crossref]

9. Liao, W., Xu, Y., Li, H., Wang, L., Zeng, J., Zhong, W., Chen, G., Guo, S., Zhang, S., Zhang, K., Liu, L., Liu, Z., & Sun, M. (2025). DocLayLLM: An efficient multi-modal extension of large language models for text-rich document understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3986--3996). IEEE. https://doi.org/10.1109/cvpr52734.2025.00382 [Google Scholar] [Crossref]

10. Malladhi, A. (2023). Transforming information extraction: AI and machine learning in optical character recognition systems and applications across industries. International Journal of Computer Trends and Technology, 71(4), 71--78. https://doi.org/10.14445/22312803/ijctt-v71i4p110 [Google Scholar] [Crossref]

11. Nagasubramanian, A., Srinivasan, G. K., Sharma, A., Krishnamurthy, B., & Madhan Kumar, S. (2025). OCRNet: A robust deep learning framework for alphanumeric character recognition to assist the visually impaired. Scientific Reports, 15, Article 1847. https://doi.org/10.1038/s41598-025-25278-9 [Google Scholar] [Crossref]

12. Nitayavardhana, P., Rattanathananon, P., Chinswang, S., Lekhakul, A., Charoenphon, C., Tulyathan, T., Saksirinukul, A., & Kiatsopit, K. (2025). Streamlining data recording through optical character recognition: A prospective multi-center study in intensive care units. Critical Care, 29(1), Article 88. https://doi.org/10.1186/s13054-025-05347-1 [Google Scholar] [Crossref]

13. Pawar, A., Hiwanj, R., Koparde, P., Chikmurge, D., & Barve, S. (2023). Automated employee attendance monitoring using liveness face recognition and geofencing in real time. In Proceedings of the IEEE 2023 5th International Conference on Advances in Electronics, Computers and Communications (ICAECC 2023) (pp. 1--6). IEEE. https://doi.org/10.1109/ICAECC59324.2023.10560301 [Google Scholar] [Crossref]

14. Ranjan, R., Singh, R., Kumar, J., & Tripathi, S. (2025). Time management for leaders and impact on productivity: A review study. International Journal of Innovative Research in Science & Studies, 8(2), 180--185. https://doi.org/10.53894/ijirss.v8i2.5286 [Google Scholar] [Crossref]

15. Singh, S. A. (2024). AI-driven document processing: A novel framework for automated invoice data extraction from PDF documents. International Journal of Multidisciplinary Research, 6(6), 1--8. https://doi.org/10.36948/ijfmr.2024.v06i06.32247 [Google Scholar] [Crossref]

16. Soumya, B. J. (2025). Enhancing document image processing: Correcting skew in printed documents using deep learning. Journal of Information Systems Engineering and Management, 10(25s), Article s4090. https://doi.org/10.52783/jisem.v10i25s.4090 [Google Scholar] [Crossref]

17. Tanasa, A. M., & Oprea, S. V. (2025). Rethinking chart understanding using multimodal large language models. Computers, Materials & Continua, 84(2), 2475--2492. https://doi.org/10.32604/cmc.2025.065421 [Google Scholar] [Crossref]

18. Thohir, M. I., Kharisma, I. L., & Ika. (2025). Web-based employee attendance system utilizing face recognition and CNN via Face-API.js. Bit-Tech, 8(2), 390--400. https://doi.org/10.32877/bt.v8i2.2828 [Google Scholar] [Crossref]

19. Tsai, M. F., & Li, M. H. (2023). Intelligent attendance monitoring system with spatio-temporal human action recognition. Soft Computing, 27(8), 4517--4531. https://doi.org/10.1007/s00500-022-07582-y [Google Scholar] [Crossref]

20. Wang, D., Bai, J., Zheng, B., Lin, J., Zhan, J., Fei, L., Li, L., Li, Y., Zhang, X., Li, Y., Zhou, J., Babaev, A., Yu, Y., Tiwari, A., & Kar, A. (2024). DocLLM: A layout-aware generative language model for multimodal document understanding. In L. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) (Volume 1: Long Papers) (pp. 8541--8567). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.463 [Google Scholar] [Crossref]

21. Yogish Naik, G. R., Shashidhara, B., & Amith, G. K. (2024). A review on text extraction techniques for degraded historical document images. In 2nd IEEE International Conference on Advances in Information Technology (ICAIT) (pp. 1--6). IEEE. https://doi.org/10.1109/ICAIT61638.2024.10690761 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles