A Scalable Retrieval-Augmented Generation Pipeline for Domain-Specific Knowledge Applications

Emmanuel A. Olanrewaju

doi:10.51584/IJRIAS.2025.1010000014

A Scalable Retrieval-Augmented Generation Pipeline for Domain-Specific Knowledge Applications

Authors

Emmanuel A. Olanrewaju

Nil (USA)

Article Information

DOI: 10.51584/IJRIAS.2025.1010000014

Subject Category: Machine Learning

Volume/Issue: 10/10 | Page No: 193-201

Publication Timeline

Submitted: 2025-09-26

Accepted: 2025-10-02

Published: 2025-10-28

Abstract

This work presents a Retrieval-Augmented Generation (RAG) pipeline that integrates document preprocessing, embedding-based retrieval, and large language model (LLM) generation into a unified framework. The pipeline begins with the ingestion of PDF documents, followed by text cleaning, sentence segmentation, and chunking to ensure compatibility with embedding model constraints. High-dimensional vector representations are generated using transformer-based embedding models and stored for downstream use. Semantic similarity search, implemented via dot product and cosine similarity, enables efficient retrieval of contextually relevant text. For scalability, the framework is designed to accommodate vector indexing methods such as Faiss. On the generation side, locally hosted LLM (Gemma-7B) is employed with optional quantization for reduced resource consump- tion. Retrieved context is integrated with user queries to enhance the accuracy and relevance of generated responses. This pipeline demonstrates a practical approach for building domain-specific, retrieval-augmented applications that balance efficiency, scalability, and adaptability to local com- pute environments.

Keywords

Scalable ,Retrieval, Augmented ,Generation Pipeline

Downloads

PDF JATS XML

References

1. K. I. Roumeliotis, N. D. Tselikas, and D. K. Nasiopou- los, “Llms in e-commerce: a comparative analysis of gpt and llama models in product review evaluation,” Natural [Google Scholar] [Crossref]

2. Language Processing Journal, vol. 100056, pp. 1–6, 2024. [Google Scholar] [Crossref]

3. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas- try, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, [Google Scholar] [Crossref]

4. T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, [Google Scholar] [Crossref]

5. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, [Google Scholar] [Crossref]

6. S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, [Google Scholar] [Crossref]

7. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Advances in Neural In- formation Processing Systems (NeurIPS 2020), vol. 33, 2020, pp. 1877–1901. OpenAI, “Gpt-4 technical report,” 2023. [Google Scholar] [Crossref]

8. Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, and Wang, “Retrieval-augmented generation for large lan- guage models: A survey,” 2023. [Google Scholar] [Crossref]

9. N. Kandpal, H. Deng, A. Roberts, E. Wallace, and C. Raffel, “Large language models struggle to learn long- tail knowledge,” in Proceedings of the International Con- ference on Machine Learning (ICML). PMLR, 2023, pp. 15 696–15 707. [Google Scholar] [Crossref]

10. F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller, “Language models as knowledge bases?” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, 2019, pp. 2463–2473. [Online]. Available: https://www.aclweb.org/anthology/D19-1250 [Google Scholar] [Crossref]

11. C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” 2019. [Online]. Available: https://arxiv.org/abs/1910.10683 [Google Scholar] [Crossref]

12. A. Roberts, C. Raffel, and N. Shazeer, “How much knowledge can you pack into the parameters of a language model?” 2020. [Online]. Available: https://arxiv.org/abs/2002.08910 [Google Scholar] [Crossref]

13. G. Marcus, “The next decade in ai: Four steps towards robust artificial intelligence,” 2020. [Online]. Available: https://arxiv.org/abs/2002.06177 [Google Scholar] [Crossref]

14. K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.- W. Chang, “Realm: Retrieval-augmented language model pre-training,” 2020. [Online]. Available: https://arxiv.org/abs/2002.08909 [Google Scholar] [Crossref]

15. V. Karpukhin, B. Oguz, S. Min, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense passage retrieval for open-domain question answering,” 2020. [Online]. Available: https:// arxiv.org/ abs/ 2004 .04906 [Google Scholar] [Crossref]

16. F. Petroni, P. Lewis, A. Piktus, T. Rocktäschel, Y. Wu, A. H. Miller, and S. Riedel, “How context affects language models’ factual predictions,” in Automated Knowledge Base Construction (AKBC), 2020. [Online]. Available: https://openreview.net/forum?id=025X0zPfn [Google Scholar] [Crossref]

17. H. Zamani, F. Diaz, M. Dehghani, D. Metzler, and M. Bendersky, “Retrieval-enhanced machine learning,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Madrid, Spain: Association for Computing Machinery, 2022, pp. 2875–2886. [Online]. Available: https://doi.org/10.1145/3477495.3531722 [Google Scholar] [Crossref]

18. G. Agrawal, T. Kumarage, Z. Alghami, and H. Liu, “Can knowledge graphs reduce hallucinations in llms?: A survey,” 2023. [Online]. Available: https://arxiv.org/abs/2311.07914 [Google Scholar] [Crossref]

19. G. Izacard and E. Grave, “Leveraging passage retrieval with generative models for open domain question answering,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, P. Merlo, J. Tiedemann, and [Google Scholar] [Crossref]

20. R. Tsarfaty, Eds. Online: Association for Computational Linguistics, 2021, pp. 874–880. [Online]. Available: https://doi.org/10.18653/v1/2021.eacl-main.74 [Google Scholar] [Crossref]

21. A. Salemi, S. Kallumadi, and H. Zamani, “Optimization methods for personalizing large language models through retrieval augmentation,” in Proceedings of the 47th An- nual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24), Washington, DC, USA, 2024, (to appear). [Google Scholar] [Crossref]

22. A. Salemi, S. Mysore, M. Bendersky, and H. Zamani, “Lamp: When large language mod- els meet personalization,” 2023. [Online]. Available: https://arxiv.org/abs/2304.11406 [Google Scholar] [Crossref]

23. J. James and S. Es, “Ragas: Evaluation framework for your retrieval-augmented genera- tion (rag) pipelines,” 2023. [Online]. Available: https://github.com/explodinggradients/ragas [Google Scholar] [Crossref]

24. J. Saad-Falcon, O. Khattab, C. Potts, and M. Za- haria, “Ares: An automated evaluation framework for retrieval-augmented generation systems,” 2023. [Online]. Available: https://arxiv.org/abs/2311.09476 [Google Scholar] [Crossref]

25. F. Petroni, A. Piktus, A. Fan, P. Lewis, M. Yazdani, N. De Cao, J. Thorne, Y. Jernite, V. Karpukhin, J. Maillard, V. Plachouras, T. Rocktäschel, and S. Riedel, “Kilt: a benchmark for knowledge in- tensive language tasks,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lan- guage Technologies (NAACL-HLT 2021), K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Belt- agy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, Eds. Online: Association for Computational Linguistics, 2021, pp. 2523–2544. [Online]. Available: https://doi.org/10.18653/v1/2021.naacl-main.200 [Google Scholar] [Crossref]

26. A. S. . P. authors, “Pymupdf tutorial — docu- mentation,” https://pymupdf.readthedocs.io/en/latest/ tutorial.html, 2025, accessed: 2025-09-25. [Google Scholar] [Crossref]

27. M. Fenniak and pypdf Contributors, “pypdf documen- tation,” https://pypdf.readthedocs.io/en/stable/, 2025, [Google Scholar] [Crossref]

28. accessed: 2025-09-25. [Google Scholar] [Crossref]

29. The pandas development team, “pandas documentation,” https://pandas.pydata.org, 2025, accessed: 2025-09-25. [Google Scholar] [Crossref]

30. H. Face, “Datasets documentation,” https://huggingface. co/docs/datasets/en/index, 2025, accessed: 2025-09-25. [Google Scholar] [Crossref]

31. E. AI, “Models & languages — spacy usage documen- tation,” https://spacy.io/usage/models, 2025, accessed: 2025-09-25. [Google Scholar] [Crossref]

32. N. Reimers and I. Gurevych, “all-mpnet-base- v2,” https://huggingface.co/sentence-transformers/ all-mpnet-base-v2, 2021, accessed: 2025-09-25. [Google Scholar] [Crossref]

33. H. Face, “Transformers documentation,” https:// huggingface.co/docs/transformers/en/index, 2025,ac-cessed: 2025-09-25. [Google Scholar] [Crossref]

34. E. Gradients, “Ragas documentation,” https://docs. ragas.io/en/stable/, 2025, accessed: 2025-09-25. [Google Scholar] [Crossref]

A Scalable Retrieval-Augmented Generation Pipeline for Domain-Specific Knowledge Applications

Authors

Article Information

Publication Timeline

Abstract

Keywords

Downloads

References

Metrics

Views & Downloads

Similar Articles