When Do Large Language Models Need Retrieval? A Comparative Study of RAG, Fine-Tuning, and Hybrid Adaptation Strategies

Authors

Ait El Abbas Ilias

Department of Computer Science and Technology Nanjing University of Information Science and Technology (China)

Article Information

DOI: 10.47772/IJRISS.2026.10200546

Subject Category: Artificial Intelligence

Volume/Issue: 10/2 | Page No: 7609-7624

Publication Timeline

Submitted: 2026-03-01

Accepted: 2026-03-06

Published: 2026-03-19

Abstract

Large language models (LLMs) have achieved strong performance across a broad range of natural language processing tasks and are increasingly deployed in domain- specific settings such as biomedical question answering and open- domain information access. However, adapting LLMs to spe- cialized domains remains challenging due to domain knowledge gaps, evolving information, and computational constraints. Two primary adaptation strategies are commonly used: fine-tuning, which internalizes domain knowledge within model parameters, and retrieval-augmented generation (RAG), which incorporates external evidence at inference time. Hybrid approaches that combine fine-tuning with retrieval have also been proposed, yet their relative trade-offs remain insufficiently characterized under controlled conditions.
In this work, we present a systematic empirical comparison of fine-tuning, RAG, and hybrid adaptation strategies using a unified evaluation framework. We analyze these approaches across multiple dimensions, including answer quality, grounding reliability, inference latency, and computational cost. Our study highlights practical trade-offs between internalized and external knowledge integration and provides decision-oriented guidelines for selecting adaptation strategies in real-world deployments. Rather than assuming a universally optimal approach, our results emphasize that the need for retrieval depends on domain characteristics, data availability, and system constraints.

Keywords

Large Language Models, Retrieval-Augmented Generation, Fine-Tuning

Downloads

References

1. H. Touvron et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023. [Google Scholar] [Crossref]

2. ——, “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023. [Google Scholar] [Crossref]

3. OpenAI, “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023. [Google Scholar] [Crossref]

4. C. Raffel, N. Shazeer et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, 2020. [Google Scholar] [Crossref]

5. N. Houlsby, A. Giurgiu et al., “Parameter-efficient transfer learning for nlp,” in International Conference on Machine Learning (ICML), 2019. [Google Scholar] [Crossref]

6. B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” in Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. [Google Scholar] [Crossref]

7. E. J. Hu, Y. Shen et al., “Lora: Low-rank adaptation of large language models,” in International Conference on Learning Representations (ICLR), 2022. [Google Scholar] [Crossref]

8. N. Ding et al., “Parameter-efficient fine-tuning of large language models: A survey,” arXiv preprint arXiv:2303.15647, 2023. [Google Scholar] [Crossref]

9. P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, [Google Scholar] [Crossref]

10. Ku¨ttler, M. Lewis, W.-t. Yih, T. Rockta¨schel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive nlp tasks,” in Advances in Neural Information Processing Systems (NeurIPS), 2020. [Google Scholar] [Crossref]

11. K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang, “Realm: Retrieval-augmented language model pre-training,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020. [Google Scholar] [Crossref]

12. V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense passage retrieval for open-domain question answering,” in Proceedings of EMNLP, 2020. [Google Scholar] [Crossref]

13. G. Izacard, E. Grave et al., “Atlas: Few-shot learning with retrieval augmented language models,” in International Conference on Machine Learning (ICML), 2022. [Google Scholar] [Crossref]

14. S. Borgeaud, A. Mensch et al., “Improving language models by retriev- ing from trillions of tokens,” Proceedings of ICML, 2022. [Google Scholar] [Crossref]

15. W. Shi et al., “Replug: Retrieval-augmented black-box language mod- els,” arXiv preprint arXiv:2301.12652, 2023. [Google Scholar] [Crossref]

16. Y. Gao et al., “Retrieval-augmented generation for large language models: A survey,” arXiv preprint arXiv:2312.10997, 2024. [Google Scholar] [Crossref]

17. T. Schick et al., “Toolformer: Language models can teach themselves to use tools,” Advances in Neural Information Processing Systems (NeurIPS), 2023. [Google Scholar] [Crossref]

18. R. Nakano et al., “Webgpt: Browser-assisted question-answering with human feedback,” in arXiv preprint arXiv:2112.09332, 2022. [Google Scholar] [Crossref]

19. Y. Zhang et al., “A survey on hallucination in large language models,” [Google Scholar] [Crossref]

20. ACM Computing Surveys, 2023. [Google Scholar] [Crossref]

21. P. Manakul et al., “Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models,” in EMNLP, 2023. [Google Scholar] [Crossref]

22. A. Q. Jiang, A. Sablayrolles, A. Mensch et al., “Mistral 7b,” arXiv preprint arXiv:2310.06825, 2023. [Google Scholar] [Crossref]

23. N. Thakur et al., “Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models,” Proceedings of NeurIPS, 2023. [Google Scholar] [Crossref]

24. Q. Jin, B. Dhingra, Z. Liu, W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” in Proceedings of EMNLP- IJCNLP, 2019. [Google Scholar] [Crossref]

25. T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, [Google Scholar] [Crossref]

26. C. Alberti, D. Epstein, I. Polosukhin, M. Kelcey, J. Devlin, K. Lee et al., “Natural questions: A benchmark for question answering research,” Transactions of the Association for Computational Linguistics, vol. 7,pp. 453–466, 2019. [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles