Comparative Analysis of the Risk of Hadith Errors in Question-Answering Systems Based on Large Language Models
Authors
Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Indonesia. (Indonesia)
Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Indonesia. (Indonesia)
Article Information
DOI: 10.51244/IJRSI.2025.12120025
Subject Category: Electrical Engineering and Informatics
Volume/Issue: 12/12 | Page No: 262-272
Publication Timeline
Submitted: 2025-12-09
Accepted: 2025-12-17
Published: 2025-12-30
Abstract
The use of Large Language Models (LLMs) in Question-Answering (QA) systems is becoming increasingly widespread due to their ability to generate contextual natural language responses. However, the application of LLM-based QA systems in high-risk domains, such as religious information services, poses serious challenges related to accuracy, reliability, and ethical responsibility. In the context of hadith information, errors generated by AI-based systems have the potential to cause misinformation and undermine user trust. This study aims to conduct a comparative analysis of the risk of errors generated by Large Language Model-based Question-Answering systems in answering hadith-related questions. Four LLM models, namely GPT, Gemini, Claude, and LLaMA, were evaluated using a controlled set of designed prompts. The model outputs were analyzed and classified into four error categories: fabricated hadith, incorrect attribution, hybrid hadith, and distortion of meaning. The frequency of each error type was then analyzed comparatively to identify error patterns and risk levels for each model. The results show significant differences in the number and types of errors among LLM models, reflecting different risk profiles in the implementation of hadith QA systems. These findings indicate that LLM-based question-answering systems still have limitations in ensuring the accuracy and reliability of religious information. This research is expected to contribute to the development of a risk evaluation framework and support the implementation of ethical principles and responsible AI governance in artificial intelligence-based information systems.
Keywords
Large Language Models, Question-Answering, Hadith Errors, Risk Analysis, AI Ethics
Downloads
References
1. M. Essam, M. A. Deif, H. Attar, A. Alrosan, M. A. Kanan, and R. Elgohary, “Decoding Queries: An In-Depth Survey of Quality Techniques for Question Analysis in Arabic Question Answering Systems,” IEEE Access, vol. 12, pp. 135241–135264, 2024, doi: 10.1109/ACCESS.2024.3458466. [Google Scholar] [Crossref]
2. T. Afzal, S. A. Rauf, M. Ghulam, and A. Malik, “Fine-Tuning QurSim on Monolingual and Multilingual Models for Semantic Search,” pp. 1–15, 2025. [Google Scholar] [Crossref]
3. S. F. Abd-Hood, N. Omar, and S. Tiun, “A Novel Data Augmentation Framework for Arabic Multi-label Text Classification using AraBART, AraGPT2, and Borderline-SMOTE,” IEEE Access, vol. 13, no. October, pp. 169769–169778, 2025, doi: 10.1109/ACCESS.2025.3609462. [Google Scholar] [Crossref]
4. S. Alfadhli, “qArI : A Hybrid CTC / Attention-Based Model for Quran Recitation Recognition Using Bidirectional LSTMP in an End-to-End Architecture,” vol. 12, no. July, 2024. [Google Scholar] [Crossref]
5. A. M. Mustafa, S. Nakhleh, R. Irsheidat, and R. Alruosan, “I NTERPRETING A RABIC T RANSFORMER M ODELS : A S TUDY ON XAI I NTERPRETABILITY FOR Q UR ’ ANIC S EMANTIC - SEARCH M ODELS,” vol. 10, no. 04, pp. 350–366, 2024. [Google Scholar] [Crossref]
6. A. Mousa, I. Shahin, A. B. Nassif, and A. Elnagar, “Detection of Arabic offensive language in social media using machine learning models,” Intell. Syst. with Appl., vol. 22, no. April, p. 200376, 2024, doi: 10.1016/j.iswa.2024.200376. [Google Scholar] [Crossref]
7. M. M. Alsulami, “Evaluating ChatGPT ’ s Semantic Alignment with Community Answers : A Topic-Aware Analysis Using BERTScore and BERTopic Evaluating ChatGPT ’ s Semantic Alignment with Community Answers : A Topic-Aware Analysis Using BERTScore and BERTopic,” pp. 0–11, 2025, doi: 10.20944/preprints202504.2000.v1. [Google Scholar] [Crossref]
8. K. J. Akhter, “Topic Modeling of Quranic Verses using Latent Dirichlet Allocation with English Language,” pp. 239–251, 2024, doi: 10.21015/vtse.v12i4.1946. [Google Scholar] [Crossref]
9. E. Nabil, A. E. Nagib, M. Hany, S. Faizullah, and W. H. Gomaa, “A Novel Approach for Mitigating Class Imbalance in Arabic Text Classification,” IEEE Access, vol. 13, no. August, pp. 152870–152889, 2025, doi: 10.1109/ACCESS.2025.3604427. [Google Scholar] [Crossref]
10. J. Alghamdi, A. Albukhari, and T. Al-Dala’in, “Pretrained Models Against Traditional Machine Learning for Detecting Fake Hadith,” Electron., vol. 14, no. 17, Sep. 2025, doi: 10.3390/electronics14173484. [Google Scholar] [Crossref]
11. E. A. Refaee, “Detecting Hadith Authenticity Using a Deep-learning Approach,” Sci. J. King Faisal Univ. Basic Appl. Sci., vol. 23, no. 1, pp. 80–84, 2022, doi: 10.37575/b/sci/210084. [Google Scholar] [Crossref]
12. A. Ramzy et al., “Hadiths Classification Using a Novel Author-Based Hadith Classification Dataset (ABCD),” Big Data Cogn. Comput., vol. 7, no. 3, Sep. 2023, doi: 10.3390/bdcc7030141. [Google Scholar] [Crossref]
13. D. Refai, M. S. Al-Shaibani, and I. Ahmad, “Is This the Best Prompt? Scoring Prompts for Arabic NLP Across LLMs,” IEEE Access, vol. 13, no. October, pp. 171468–171492, 2025, doi: 10.1109/access.2025.3616181. [Google Scholar] [Crossref]
14. A. Skiredj, “Unlocking the power of transfer learning with Ad-Dabit-Al-Lughawi : A token classification approach for enhanced Arabic Text Diacritization,” Expert Syst. Appl., vol. 269, no. December 2024, p. 126166, 2025, doi: 10.1016/j.eswa.2024.126166. [Google Scholar] [Crossref]
15. L. H. Baniata and S. Kang, “Switching Self-Attention Text Classification Model with Innovative Reverse Positional Encoding for Right-to-Left Languages: A Focus on Arabic Dialects,” Mathematics, vol. 12, no. 6, 2024, doi: 10.3390/math12060865. [Google Scholar] [Crossref]
16. B. S. Al-Smadi, “DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning,” Comput. Biol. Med., vol. 170, no. January, p. 107921, 2024, doi: 10.1016/j.compbiomed.2024.107921. [Google Scholar] [Crossref]
17. “2024 Q2 BERT-based models for classifying multi-dialect Arabic texts.pdf,” 2024. [Google Scholar] [Crossref]
18. A. Abdelsattar, W. Rashad, and A. Bin-hady, “Social Sciences & Humanities Open Exploring human vs . AI-powered translation to metonymic expressions : A case study of the Holy Quran,” Soc. Sci. Humanit. Open, vol. 12, no. December 2024, p. 101615, 2025, doi: 10.1016/j.ssaho.2025.101615. [Google Scholar] [Crossref]
19. O. Ibrahim Aboulola and M. Umer, “Novel approach for Arabic fake news classification using embedding from large language features with CNN-LSTM ensemble model and explainable AI,” Sci. Rep., vol. 14, no. 1, pp. 1–13, 2024, doi: 10.1038/s41598-024-82111-5. [Google Scholar] [Crossref]
20. D. Alomari and I. Ahmad, “Exploring Character Trigrams for Robust Arabic Text Classification: A Comparative Analysis in the Face of Vocabulary Expansion and Misspelled Words,” IEEE Access, vol. 12, no. April, pp. 57103–57116, 2024, doi: 10.1109/ACCESS.2024.3390048. [Google Scholar] [Crossref]
21. S. Mahmoud, O. Saif, E. Nabil, M. Abdeen, M. Elnainay, and M. Torki, “AR-Sanad 280K: A Novel 280K Artificial Sanads Dataset for Hadith Narrator Disambiguation,” Inf., vol. 13, no. 2, Feb. 2022, doi: 10.3390/info13020055. [Google Scholar] [Crossref]
22. E. T. Luthfi, Z. Izzah, M. Yusoh, and B. M. Aboobaider, “Enhancing the Takhrij Al-Hadith based on Contextual Similarity using BERT Embeddings.” [Online]. Available: www.ijacsa.thesai.org [Google Scholar] [Crossref]
23. G. Mediamer and Adiwijaya, “Semantic Feature Analysis for Multi-Label Text Classification on Topics of the Al-Quran Verses,” J. Inf. Process. Syst., vol. 20, no. 1, pp. 1–12, Feb. 2024, doi: 10.3745/JIPS.02.0209. [Google Scholar] [Crossref]
24. F. Senator, A. Lakhfif, I. Zenbout, H. Boutouta, and C. Mediani, “Leveraging ChatGPT for Enhancing Arabic NLP: Application for Semantic Role Labeling and Cross-Lingual Annotation Projection,” IEEE Access, vol. 13, no. December 2024, pp. 3707–3725, 2025, doi: 10.1109/ACCESS.2025.3525493. [Google Scholar] [Crossref]
25. S. Elmajali and I. Ahmad, “Toward Early Detection of Depression: Detecting Depression Symptoms in Arabic Tweets Using Pretrained Transformers,” IEEE Access, vol. 12, no. July, pp. 88134–88145, 2024, doi: 10.1109/ACCESS.2024.3417821. [Google Scholar] [Crossref]
26. V. S. Pendyala and C. E. Hall, “Explaining Misinformation Detection Using Large Language Models,” Electron., vol. 13, no. 9, May 2024, doi: 10.3390/electronics13091673. [Google Scholar] [Crossref]
27. H. Boutouta, A. Lakhfif, F. Senator, and C. Mediani, “Enhancement of Implicit Emotion Recognition in Arabic Text: Annotated Dataset and Baseline Models,” IEEE Access, vol. 13, no. June, pp. 165096–165116, 2025, doi: 10.1109/ACCESS.2025.3611337. [Google Scholar] [Crossref]
28. H. Alharthi, “Investigation Into the Identification of AI-Generated Short Dialectal Arabic Texts,” IEEE Access, vol. 13, no. May, pp. 85131–85138, 2025, doi: 10.1109/ACCESS.2025.3568696. [Google Scholar] [Crossref]
29. T. Afzal, S. Abdul Rauf, M. G. A. Malik, and M. Imran, “Fine-Tuning QurSim on Monolingual and Multilingual Models for Semantic Search,” Inf., vol. 16, no. 2, pp. 1–15, 2025, doi: 10.3390/info16020084. [Google Scholar] [Crossref]
30. M. Mohd, F. Qamar, I. Al-Sheikh, and R. Salah, “Quranic optical text recognition using deep learning models,” IEEE Access, vol. 9, pp. 38318–38330, 2021, doi: 10.1109/ACCESS.2021.3064019. [Google Scholar] [Crossref]
31. E. Y. Daraghmi, S. Qadan, Y. A. Daraghmi, R. Yousuf, O. Cheikhrouhou, and M. Baz, “From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection,” IEEE Access, vol. 12, no. August, pp. 103504–103519, 2024, doi: 10.1109/ACCESS.2024.3431939. [Google Scholar] [Crossref]