Querywise Prompt Routing for Large Language Models
Authors
APEX, North Carolina (United States of America)
Article Information
DOI: 10.47772/IJRISS.2026.10190054
Subject Category: Education
Volume/Issue: 10/19 | Page No: 605-611
Publication Timeline
Submitted: 2025-12-27
Accepted: 2026-01-19
Published: 2026-02-16
Abstract
This paper treats prompt choice as a per-query decision problem for large language models, learning an of-fline proxy reward that can score query-prompt pairs without additional model calls or access to gold answers at inference time. Using prior prompt-response logs as demonstrations, the method trains a preference model over prompts and then selects a best-of-N instruction per query to boost arithmetic reasoning accuracy under strict zero-shot conditions. The pipeline reduces interaction cost by shifting evaluation and optimization offline, while preserving the natural-language prompt space so the approach remains model-agnostic and immediately deployable across chat-oriented LLMs. Experiments on standard reasoning benchmarks show consistent gains over distribution-level, query-agnostic prompting and over confidence-based selectors, with improvements holding across multiple LLM scales. Ablations confirm that the learned reward generalizes to unseen prompts and queries, enabling robust prompt routing at inference without additional gradient updates or tool-specific supervision.
Keywords
Prompt selection, large language models
Downloads
References
1. T. Brown, B. Mann, N. Ryder et al., "Language models are few-shot learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020. [Google Scholar] [Crossref]
2. L. Ouyang, J. Wu, X. Jiang et al., "Training language models to follow instructions with human feedback," Advances in Neural Information Processing Systems, vol. 35, pp. 27730-27744, 2022. [Google Scholar] [Crossref]
3. Y. Bai, A. Jones, K. Ndousse et al., "Training a helpful and harmless assistant with reinforcement learning from human feedback," arXiv preprint arXiv:2204.05862, 2022. [Google Scholar] [Crossref]
4. OpenAI, "Gpt-4 technical report," https://cdn.openai.com/papers/gpt-4.pdf, 2023. [Google Scholar] [Crossref]
5. J. Wei, X. Wang, D. Schuurmans et al., "Chain-of-thought prompting elicits reasoning in large language models," Advances in Neural Information Processing Systems, vol. 35, pp. 24824-24837, 2022. [Google Scholar] [Crossref]
6. T. Kojima, S. S. Gu, M. Reid et al., "Large language models are zero-shot reasoners," Advances in Neural Information Processing Systems, vol. 35, pp. 22199-22213, 2022. [Google Scholar] [Crossref]
7. T. Liang, Z. He, W. Jiao et al., "Encouraging divergent thinking in large language models through multi-agent debate," arXiv preprint arXiv:2305.19118, 2023. [Google Scholar] [Crossref]
8. Y. Zhou, A. I. Muresanu, Z. Han et al., "Large language models are human-level prompt engineers," arXiv preprint arXiv:2211.01910, 2022. [Google Scholar] [Crossref]
9. R. Pryzant, D. Iter, J. Li et al., "Automatic prompt optimization with gradient descent and beam search," arXiv preprint arXiv:2305.03495, 2023. [Google Scholar] [Crossref]
10. T. Zhang, X. Wang, D. Zhou et al., "Tempera: Test-time prompt edit-ing via reinforcement learning," International Conference on Learning Representations, 2022. [Google Scholar] [Crossref]
11. X. L. Li and P. Liang, "Prefix-tuning: Optimizing continuous prompts for generation," arXiv preprint arXiv:2101.00190, 2021. [Google Scholar] [Crossref]
12. B. Lester, R. Al-Rfou, and N. Constant, "The power of scale for parameter-efficient prompt tuning," arXiv preprint arXiv:2104.08691, 2021. [Google Scholar] [Crossref]
13. M. Deng, J. Wang, C.-P. Hsieh et al., "Rlprompt: Optimizing discrete text prompts with reinforcement learning," arXiv preprint arXiv:2205.12548, 2022. [Google Scholar] [Crossref]
14. A. Y. Ng, S. J. Russell et al., "Algorithms for inverse reinforcement learning," International Conference on Machine Learning, pp. 663-670, 2000. [Google Scholar] [Crossref]
15. J. Ho and S. Ermon, "Generative adversarial imitation learning," Ad-vances in Neural Information Processing Systems, vol. 29, 2016. [Google Scholar] [Crossref]
16. T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016. [Google Scholar] [Crossref]
17. K. Cobbe, V. Kosaraju, M. Bavarian et al., "Training verifiers to solve math word problems," arXiv preprint arXiv:2110.14168, 2021. [Google Scholar] [Crossref]
18. A. Patel, S. Bhattamishra, and N. Goyal, "Are nlp models really able to solve simple math word problems?" arXiv preprint arXiv:2103.07191, 2021. [Google Scholar] [Crossref]
19. S. Roy and D. Roth, "Solving general arithmetic word problems," arXiv preprint arXiv:1608.01413, 2016. [Google Scholar] [Crossref]
20. H. Touvron, L. Martin, K. Stone et al., "Llama 2: Open foundation and fine-tuned chat models," arXiv preprint arXiv:2307.09288, 2023. [Google Scholar] [Crossref]
21. TigerResearch, "Tigerbot: A cutting-edge foundation for your very own llm," https://github.com/TigerResearch/TigerBot, 2023. [Google Scholar] [Crossref]
22. D. Zhou, N. Schärli, L. Hou et al., "Least-to-most prompting en-ables complex reasoning in large language models," arXiv preprint arXiv:2205.10625, 2022. [Google Scholar] [Crossref]
23. D. Huibert, "Tree of knowledge: Tok dataset for large language models," https://github.com/davel010/tree-of-thought-prompting, 2023. [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- Assessment of the Role of Artificial Intelligence in Repositioning TVET for Economic Development in Nigeria
- Teachers’ Use of Assure Model Instructional Design on Learners’ Problem Solving Efficacy in Secondary Schools in Bungoma County, Kenya
- “E-Booksan Ang Kaalaman”: Development, Validation, and Utilization of Electronic Book in Academic Performance of Grade 9 Students in Social Studies
- Analyzing EFL University Students’ Academic Speaking Skills Through Self-Recorded Video Presentation
- Major Findings of The Study on Total Quality Management in Teachers’ Education Institutions (TEIs) In Assam – An Evaluative Study