When Do Large Language Models Need Retrieval? A Comparative Study of RAG, Fine-Tuning, and Hybrid Adaptation Strategies
Authors
Department of Computer Science and Technology Nanjing University of Information Science and Technology (China)
Article Information
DOI: 10.47772/IJRISS.2026.10200546
Subject Category: Artificial Intelligence
Volume/Issue: 10/2 | Page No: 7609-7624
Publication Timeline
Submitted: 2026-03-01
Accepted: 2026-03-06
Published: 2026-03-19
Abstract
Large language models (LLMs) have achieved strong performance across a broad range of natural language processing tasks and are increasingly deployed in domain- specific settings such as biomedical question answering and open- domain information access. However, adapting LLMs to spe- cialized domains remains challenging due to domain knowledge gaps, evolving information, and computational constraints. Two primary adaptation strategies are commonly used: fine-tuning, which internalizes domain knowledge within model parameters, and retrieval-augmented generation (RAG), which incorporates external evidence at inference time. Hybrid approaches that combine fine-tuning with retrieval have also been proposed, yet their relative trade-offs remain insufficiently characterized under controlled conditions.
In this work, we present a systematic empirical comparison of fine-tuning, RAG, and hybrid adaptation strategies using a unified evaluation framework. We analyze these approaches across multiple dimensions, including answer quality, grounding reliability, inference latency, and computational cost. Our study highlights practical trade-offs between internalized and external knowledge integration and provides decision-oriented guidelines for selecting adaptation strategies in real-world deployments. Rather than assuming a universally optimal approach, our results emphasize that the need for retrieval depends on domain characteristics, data availability, and system constraints.
Keywords
Large Language Models, Retrieval-Augmented Generation, Fine-Tuning
Downloads
References
1. H. Touvron et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. [Google Scholar] [Crossref]
2. ——, “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023. [Google Scholar] [Crossref]
3. OpenAI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023. [Google Scholar] [Crossref]
4. C. Raffel, N. Shazeer et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, 2020. [Google Scholar] [Crossref]
5. N. Houlsby, A. Giurgiu et al., “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning (ICML), 2019. [Google Scholar] [Crossref]
6. B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” in Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. [Google Scholar] [Crossref]
7. E. J. Hu, Y. Shen et al., “Lora: Low-rank adaptation of large language models,” in International Conference on Learning Representations (ICLR), 2022. [Google Scholar] [Crossref]
8. N. Ding et al., “Parameter-efficient fine-tuning of large language models: A survey,” arXiv preprint arXiv:2303.15647, 2023. [Google Scholar] [Crossref]
9. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, [Google Scholar] [Crossref]
10. Ku¨ttler, M. Lewis, W.-t. Yih, T. Rockta¨schel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020. [Google Scholar] [Crossref]
11. K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, “Realm: Retrieval-augmented language model pre-training,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. [Google Scholar] [Crossref]
12. V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense passage retrieval for open-domain question answering,” in Proceedings of EMNLP, 2020. [Google Scholar] [Crossref]
13. G. Izacard, E. Grave et al., “Atlas: Few-shot learning with retrieval augmented language models,” in International Conference on Machine Learning (ICML), 2022. [Google Scholar] [Crossref]
14. S. Borgeaud, A. Mensch et al., “Improving language models by retriev- ing from trillions of tokens,” Proceedings of ICML, 2022. [Google Scholar] [Crossref]
15. W. Shi et al., “Replug: Retrieval-augmented black-box language mod- els,” arXiv preprint arXiv:2301.12652, 2023. [Google Scholar] [Crossref]
16. Y. Gao et al., “Retrieval-augmented generation for large language models: A survey,” arXiv preprint arXiv:2312.10997, 2024. [Google Scholar] [Crossref]
17. T. Schick et al., “Toolformer: Language models can teach themselves to use tools,” Advances in Neural Information Processing Systems (NeurIPS), 2023. [Google Scholar] [Crossref]
18. R. Nakano et al., “Webgpt: Browser-assisted question-answering with human feedback,” in arXiv preprint arXiv:2112.09332, 2022. [Google Scholar] [Crossref]
19. Y. Zhang et al., “A survey on hallucination in large language models,” [Google Scholar] [Crossref]
20. ACM Computing Surveys, 2023. [Google Scholar] [Crossref]
21. P. Manakul et al., “Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models,” in EMNLP, 2023. [Google Scholar] [Crossref]
22. A. Q. Jiang, A. Sablayrolles, A. Mensch et al., “Mistral 7b,” arXiv preprint arXiv:2310.06825, 2023. [Google Scholar] [Crossref]
23. N. Thakur et al., “Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models,” Proceedings of NeurIPS, 2023. [Google Scholar] [Crossref]
24. Q. Jin, B. Dhingra, Z. Liu, W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” in Proceedings of EMNLP- IJCNLP, 2019. [Google Scholar] [Crossref]
25. T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, [Google Scholar] [Crossref]
26. C. Alberti, D. Epstein, I. Polosukhin, M. Kelcey, J. Devlin, K. Lee et al., “Natural questions: A benchmark for question answering research,” Transactions of the Association for Computational Linguistics, vol. 7,pp. 453–466, 2019. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- The Role of Artificial Intelligence in Revolutionizing Library Services in Nairobi: Ethical Implications and Future Trends in User Interaction
- ESPYREAL: A Mobile Based Multi-Currency Identifier for Visually Impaired Individuals Using Convolutional Neural Network
- Comparative Analysis of AI-Driven IoT-Based Smart Agriculture Platforms with Blockchain-Enabled Marketplaces
- AI-Based Dish Recommender System for Reducing Fruit Waste through Spoilage Detection and Ripeness Assessment
- SEA-TALK: An AI-Powered Voice Translator and Southeast Asian Dialects Recognition