Automated Financial Data Extraction Using Large Language Models: An Application of OpenAI Apis

Authors

Mohd Muhaimin Chuweni

Finance Division, International Islamic University Malaysia (IIUM), Kuala Lumpur (Malaysia)

Sharifalillah Nordin

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM), Shah Alam, Selangor (Malaysia)

Jasrul Nizam Ghazali

Pusat Asasi, Universiti Teknologi MARA (UiTM) Cawangan Selangor, Kampus Dengkil, (Malaysia)

Mohamad Norzamani Sahroni

Pusat Asasi, Universiti Teknologi MARA (UiTM) Cawangan Selangor, Kampus Dengkil, (Malaysia)

Mohd Azry Abdul Malik

Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA (UiTM) (Malaysia)

Article Information

DOI: 10.47772/IJRISS.2026.10200120

Subject Category: Artificial Intelligence

Volume/Issue: 10/2 | Page No: 1589-1598

Publication Timeline

Submitted: 2026-02-12

Accepted: 2026-02-18

Published: 2026-02-26

Abstract

Financial data extraction, traditionally a manual and labour-intensive process, is being revolutionized by artificial intelligence (AI) and machine learning (ML). However, understanding financial documents remains a significant challenge for individuals without specialized financial knowledge due to complex terminology and concepts. This study addresses this gap by designing, developing, and evaluating an AI-powered financial data extraction system tailored for non-financial individuals. The system integrates Optical Character Recognition (OCR) for text extraction from document images statements, invoices, receipts) and leverages the OpenAI platform's advanced Natural Language Processing (NLP) capabilities to organize, interpret, and explain financial information in a user-friendly manner. A Waterfall development methodology was employed, encompassing requirements gathering via questionnaires with target users, system architecture design, implementation using Python libraries and OpenAI API, and rigorous testing, including functionality tests and user evaluations. Results from functionality testing confirmed the system's ability to accurately process various document types. User evaluation, involving finance staff assessing the system's potential for non-expert users, yielded overwhelmingly positive feedback, with high ratings for accuracy, usability, efficiency, and the significant impact of AI/ML integration in enhancing the depth and speed of analysis. The findings demonstrate the system's potential to improve financial literacy and empower individuals in managing personal finances by making complex financial data more accessible and understandable.

Keywords

Financial Data Extraction, OpenAI, Artificial Intelligence, Natural Language Processing, Optical Character Recognition

Downloads

References

1. Akanksha, E., Sharma, N., & Gulati, K. (2021, April). Review on reinforcement learning, research evolution and scope of application. In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC) (pp. 1416–1423). IEEE. [Google Scholar] [Crossref]

2. https://doi.org/10.1109/ICCMC51019.2021.9418386 [Google Scholar] [Crossref]

3. Boubaker, S., Gounopoulos, D., & Rjiba, H. (2018). Annual report readability and stock liquidity. CGN: Disclosure & Accounting Decisions (Topic). [Google Scholar] [Crossref]

4. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv preprint arXiv:1606.01540. https://doi.org/10.48550/arXiv.1606.01540 [Google Scholar] [Crossref]

5. Chakraborty, I., Leone, A. J., Minutti-Meza, M., & Phillips, M. K. (2019). Financial statement complexity and bank lending. S&P Global Market Intelligence Research Paper Series. [Google Scholar] [Crossref]

6. Chew, P. A., & Robinson, D. G. (2012). Automated account reconciliation using probabilistic and statistical techniques. International Journal of Accounting & Information Management, 20(4), 322–334. https://doi.org/10.1108/18347641211287763 [Google Scholar] [Crossref]

7. Clementina, K., & Idume, G. (2015). Bank reconciliation statements, accountability and profitability of small business organisation. Research Journal of Finance and Accounting, 6(21), 21–30. [Google Scholar] [Crossref]

8. Costales, S. B. (1979). The guide to understanding financial statements. [Google Scholar] [Crossref]

9. Cunningham, P., Cord, M., & Delany, S. J. (2008). Supervised learning. In Machine learning techniques for multimedia: Case studies on organisation and retrieval (pp. 21–49). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-75171-7_2 [Google Scholar] [Crossref]

10. Fahad Mon, B., Wasfi, A., Hayajneh, M., Slim, A., & Abu Ali, N. (2023). Reinforcement learning in education: A literature review. Informatics, 10(3), 74. https://doi.org/10.3390/informatics10030074 [Google Scholar] [Crossref]

11. Gibson, C. H., & Frishkoff, P. A. (1983). Financial statement analysis: Using financial accounting information: Test bank to accompany. [Google Scholar] [Crossref]

12. Gruetzemacher, R. (2022, April 19). The power of natural language processing. Harvard Business Review. https://hbr.org/2022/04/the-power-of-natural-language-processing [Google Scholar] [Crossref]

13. Guay, W. R., Samuels, D., & Taylor, D. (2016). Guiding through the fog: Financial statement complexity and voluntary disclosure. Research Methods & Methodology in Accounting eJournal. [Google Scholar] [Crossref]

14. Gupta, A., Dengre, V., Kheruwala, H. A., Raut, R. D., & Kamble, S. S. (2020). A comprehensive review of text-mining applications in finance. Finance Innovation, 6(39). https://doi.org/10.1186/s40854-020-00205-1 [Google Scholar] [Crossref]

15. Higson, C. (2006). Financial statements: Economic analysis and interpretation. [Google Scholar] [Crossref]

16. Jensen, K. T. (2023). An introduction to reinforcement learning for neuroscience. arXiv preprint arXiv:2311.07315. https://doi.org/10.48550/arXiv.2311.07315 [Google Scholar] [Crossref]

17. Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713–3744. [Google Scholar] [Crossref]

18. https://doi.org/10.1007/s11042-022-13428-4 [Google Scholar] [Crossref]

19. Madhavi, A., & Sreedivya, B. (2017). A novel bank statements reconciliation using message transfer parser. In 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1–6). IEEE. https://doi.org/10.1109/ICCIC.2017.8528707 [Google Scholar] [Crossref]

20. Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access, 8, 142642–142668. https://doi.org/10.1109/ACCESS.2020.3012542 [Google Scholar] [Crossref]

21. Mourik, C. V., & Walton, P. (2013). The Routledge companion to accounting, reporting, and regulation. Routledge. [Google Scholar] [Crossref]

22. Rakshit, A., Mehta, S., & Dasgupta, A. (2023, June). A novel pipeline for improving optical character recognition through post-processing using natural language processing. In 2023 IEEE Guwahati Subsection Conference (GCON) (pp. 1–6). IEEE. https://doi.org/10.1109/GCON57805.2023.10236198 [Google Scholar] [Crossref]

23. Salloum, S. A., Al-Emran, M., Monem, A. A., & Shaalan, K. (2018). Using text mining techniques for extracting information from research articles. In K. Shaalan, A. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (Vol. 740, pp. 373–397). Springer. https://doi.org/10.1007/978-3-319-67056-0_18 [Google Scholar] [Crossref]

24. Sherman, H. D., & Young, S. D. (2016). Where financial reporting still falls short. Harvard Business Review, 94, 17. [Google Scholar] [Crossref]

25. Tshehla, M. (2022). An investigation of the impact of material misstatements on the quality of financial reporting for a public sector. In Proceedings of the 5th International Conference on Business, Management and Finance. [Google Scholar] [Crossref]

26. Veal, L. (2005). Tax knowledge for undergraduate accounting majors: Conceptual v. technical. [Google Scholar] [Crossref]

27. Xing, X. (2021). Financial big data reconciliation method. In 2021 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE) (pp. 260–263). IEEE. [Google Scholar] [Crossref]

28. https://doi.org/10.1109/ISAIEE53056.2021.00065 [Google Scholar] [Crossref]

29. Yenduri, G., Srivastava, G., Maddikunta, P. K. R., Jhaveri, R. H., Wang, W., Vasilakos, A. V., & Gadekallu, T. R. (2023). Generative pre-trained transformer: A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. arXiv preprint arXiv:2305.10435. https://doi.org/10.48550/arXiv.2305.10435 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles