Agentic AI and Autonomous Decision-Making: A Review of Human-in-the-Loop Frameworks, Oversight Mechanisms, and Trust Calibration

Authors

Simeon Ayoade Adedokun

Department of Computer Science, Ladoke Akintola University of Technology, Ogbomoso (Nigeria)

Dorcas Atinuke Adedokun

Department of Computer Science, Ladoke Akintola University of Technology, Ogbomoso (Nigeria)

Bosede Olajoke Ishola

Department of Software Engineering, Westland University, Iwo (Nigeria)

Rachel Ihunanya Adeniran

Department of Computer Science, Ladoke Akintola University of Technology, Ogbomoso (Nigeria)

Catherine Olatorera Olaleye

Department of Computer Science, Ladoke Akintola University of Technology, Ogbomoso (Nigeria)

Article Information

DOI: 10.51584/IJRIAS.2026.11030104

Subject Category: Computer Science

Volume/Issue: 11/3 | Page No: 1350-1374

Publication Timeline

Submitted: 2026-04-02

Accepted: 2026-04-08

Published: 2026-04-18

Abstract

The rapid proliferation of agentic artificial intelligence (AI) systems, which are autonomous agents capable of perceiving, reasoning, planning, and executing multi-step tasks with minimal human intervention, presents foundational challenges for the design of effective oversight architectures. Although developers report using AI assistance in approximately 60% of their work, empirical estimates suggest that full delegation remains feasible for only 0–20% of tasks, establishing a persistent and consequential human-AI collaboration boundary that current frameworks struggle to characterize with sufficient precision. This study carried out a systematic review that synthesized peer-reviewed studies published between 2020 and 2026 to map the state of the art in human-in-the-loop (HITL) frameworks, oversight mechanisms, and trust calibration strategies across eight high-stakes sectors, which are healthcare, criminal justice, financial services, autonomous transportation, education, manufacturing, content moderation, and human resources. Following a PRISMA-aligned protocol, the study analyzed sources drawn from the Association for Computing Machinery (ACM), Institute of Electrical and Electronics Engineers (IEEE), NeurIPS, the Association for the Advancement of Artificial Intelligence (AAAI), and major journal databases. The analysis revealed four recurring tensions in the literature, which are the explainability–performance tradeoff, autonomy–accountability gap, over-trust/under-trust duality, and the participation–effectiveness paradox. Building on these tensions and the synthesized evidence, the study introduced the Adaptive Oversight Calibration Model (AOCM), a sector-agnostic framework comprising six formal propositions that relate task criticality, AI competency boundaries, human cognitive capacity, institutional constraints, trust dynamics, and feedback loops to optimal oversight configurations. The AOCM advances prior work by operationalizing meaningful oversight as a continuous, context-sensitive function rather than a binary or static design choice, and by providing testable propositions amenable to empirical validation. Implications for system designers, policymakers, and AI practitioners are discussed, with particular attention to the European Union AI Act (2024) and NIST AI Risk Management Framework (2023) as regulatory anchors.

Keywords

agentic AI, human-in-the-loop, oversight mechanisms

Downloads

References

1. Adedokun, S. A., Adeyemo, I. A., Adedokun, D. A., Ogunkan, S. K., & Ogunniyi, O. K. (2026). Artificial Intelligence and the Essence of Humanity: Strategic Frameworks for Utilizing Technology and Preserving Values in an Automated Era. Journal of Science Innovation and Technology Research, 10(9), 196–213. https://doi.org/10.70382/ajsitr.v10i9.069 [Google Scholar] [Crossref]

2. Agwunobi, Z. (2026). A Legal Technology and Digital Trust Infrastructure Framework for Bridging Digital Trust Between Artificial Intelligence and Human Intelligence. International Journal of Scientific Research in Science, Engineering and Technology, 13(2), 55–79. https://doi.org/10.32628/IJSRSET261358 [Google Scholar] [Crossref]

3. Alfredo, R., Echeverria, V., Jin, Y., Yan, L., Swiecki, Z., Gašević, D., & Martinez-Maldonado, R. (2024). Human-centred learning analytics and AI in education: A systematic literature review. Computers and Education: Artificial Intelligence, 6, 100215. https://doi.org/10.1016/j.caeai.2024.100215 [Google Scholar] [Crossref]

4. Amodei, D., & Hernandez, D. (2022). AI and compute: Revisiting the exponential. Anthropic Technical Report. [Google Scholar] [Crossref]

5. Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete Problems in AI Safety (arXiv:1606.06565). arXiv. https://doi.org/10.48550/arXiv.1606.06565 [Google Scholar] [Crossref]

6. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. https://doi.org/10.1016/j.inffus.2019.12.012 [Google Scholar] [Crossref]

7. Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional AI: Harmlessness from AI Feedback (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2212.08073 [Google Scholar] [Crossref]

8. Bansal, G., Wu, T., Zhou, J., Fok, R., Nushi, B., Kamar, E., Ribeiro, M. T., & Weld, D. (2021). Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–16. https://doi.org/10.1145/3411764.3445717 [Google Scholar] [Crossref]

9. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., Arx, S. von, Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., … Liang, P. (2022). On the Opportunities and Risks of Foundation Models (arXiv:2108.07258). arXiv. https://doi.org/10.48550/arXiv.2108.07258 [Google Scholar] [Crossref]

10. Cabitza, F., Campagner, A., Ferrari, D., & Ciucci, D. (2021). The need to rehumanize clinical AI. Artificial Intelligence in Medicine, 118, 102121. https://doi.org/10.1016/j.artmed.2021.102121 [Google Scholar] [Crossref]

11. Cabrera, Á. A., Perer, A., & Hong, J. I. (2023). Improving Human-AI Collaboration With Descriptions of AI Behavior. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–21. https://doi.org/10.1145/3579612 [Google Scholar] [Crossref]

12. Cai, C. J., Winter, S., Steiner, D., Wilcox, L., & Terry, M. (2021). Onboarding Materials as Cross-functional Boundary Objects for Developing AI Assistant. ACM Computing Surveys, (CHI 2021 Japan), 1–11. https://doi.org/10.1145/1122445.1122456 [Google Scholar] [Crossref]

13. Choudhury, S. (2023). Speeding up to fall behind: A critical review of human-machine teaming. Frontiers in Neuroergonomics, 4, 1093982. https://doi.org/10.3389/fnrgo.2023.1093982 [Google Scholar] [Crossref]

14. Daigle, K. & GitHub Staff. (2023, November 8). Octoverse: The state of open source and rise of AI in 2023. The GitHub. https://github.blog/news-insights/research/the-state-of-open-source-and-ai/ [Google Scholar] [Crossref]

15. Dietvorst, B. J., & Bartels, D. M. (2022). Consumers Object to Algorithms Making Morally Relevant Tradeoffs Because of Algorithms’ Consequentialist Decision Strategies. Journal of Consumer Psychology, 32(3), 406–424. https://doi.org/10.1002/jcpy.1266 [Google Scholar] [Crossref]

16. Dietvorst, B. J., & Bharti, S. (2020). People Reject Algorithms in Uncertain Decision Domains Because They Have Diminishing Sensitivity to Forecasting Error. Psychological Science, 31(10), 1302–1314. https://doi.org/10.1177/0956797620948841 [Google Scholar] [Crossref]

17. Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033 [Google Scholar] [Crossref]

18. Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), eaao5580. https://doi.org/10.1126/sciadv.aao5580 [Google Scholar] [Crossref]

19. Ehsan, U., Liao, Q. V., Muller, M., Riedl, M. O., & Weisz, J. D. (2021). Expanding Explainability: Towards Social Transparency in AI systems. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1–19. https://doi.org/10.1145/3411764.3445188 [Google Scholar] [Crossref]

20. Endsley, M. R. (2023). Situation awareness in future autonomous systems. Human Factors, 65(1), 99–111. https://doi.org/10.1177/00187208211059252 [Google Scholar] [Crossref]

21. European Union. (2024). The AI Act Explorer: EU Artificial Intelligence Act. Future of Life Institute. https://artificialintelligenceact.eu/ai-act-explorer/ [Google Scholar] [Crossref]

22. Fogliato, R., Chappidi, S., Lungren, M., Fisher, P., Wilson, D., Fitzke, M., Parkinson, M., Horvitz, E., Inkpen, K., & Nushi, B. (2022). Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging. 2022 ACM Conference on Fairness Accountability and Transparency, 1362–1374. https://doi.org/10.1145/3531146.3533193 [Google Scholar] [Crossref]

23. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (Version 6). arXiv. https://doi.org/10.48550/ARXIV.1506.02142 [Google Scholar] [Crossref]

24. Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Aldine. [Google Scholar] [Crossref]

25. Gomes, D. (2024). A Comprehensive Study of Advancements in Intelligent Tutoring Systems Through Artificial Intelligent Education Platforms: In F. T. Moreira & R. O. Teles (Eds.), Advances in Educational Technologies and Instructional Design (pp. 213–244). IGI Global. https://doi.org/10.4018/979-8-3693-6170-2.ch008 [Google Scholar] [Crossref]

26. Hadfield-Menell, D., & Hadfield, G. K. (2019). Incomplete Contracting and AI Alignment. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 417–422. https://doi.org/10.1145/3306618.3314250 [Google Scholar] [Crossref]

27. Hemmer, P., Schemmer, M., Vössing, M., & Kühl, N. (2021). Human-AI Complementarity in Hybrid Intelligence Systems: A Structured Literature Review. PACIS 2021 Proceedings, 78. https://aisel.aisnet.org/pacis2021/78 [Google Scholar] [Crossref]

28. Jansen, D. (2021). The International Spillovers of the 2010 U.S. Flash Crash. Journal of Money, Credit and Banking, 53(6), 1573–1586. https://doi.org/10.1111/jmcb.12790 [Google Scholar] [Crossref]

29. Jerry, B., Moreno, L., & Martínez, P. (2026). Human Oversight-by-Design for Accessible Generative IUIs. https://doi.org/10.48550/ARXIV.2602.13745 [Google Scholar] [Crossref]

30. Karinshak, E., Liu, S. X., Park, J. S., & Hancock, J. T. (2023). Working With AI to Persuade: Examining a Large Language Model’s Ability to Generate Pro-Vaccination Messages. Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1), 1–29. https://doi.org/10.1145/3579592 [Google Scholar] [Crossref]

31. Kazim, T., & Tomlinson, J. (2023). Automation Bias and the Principles of Judicial Review. Judicial Review, 28(1), 9–16. https://doi.org/10.1080/10854681.2023.2189405 [Google Scholar] [Crossref]

32. Lai, V., Chen, C., Liao, Q. V., Smith-Renner, A., & Tan, C. (2021). Towards a Science of Human-AI Decision Making: A Survey of Empirical Studies (arXiv:2112.11471). arXiv. https://doi.org/10.48550/arXiv.2112.11471 [Google Scholar] [Crossref]

33. Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors: The Journal of the Human Factors and Ergonomics Society, 46(1), 50–80. https://doi.org/10.1518/hfes.46.1.50.30392 [Google Scholar] [Crossref]

34. Li, Q., & McAdams, D. (2025). Interactive machine learning framework enabling affordable and accurate prototyping for supporting decision-making. Proceedings of the Design Society, 5, 2131–2140. https://doi.org/10.1017/pds.2025.10227 [Google Scholar] [Crossref]

35. Logg, J. M., & Minson, J. A. (2022). Algorithm appreciation and aversion: Why people should use AI more and how to help them do it. Current Directions in Psychological Science, 31(6), 518–524. https://doi.org/10.1177/09637214221117483 [Google Scholar] [Crossref]

36. Logg, J. M., Minson, J. A., & Moore, D. A. (2019). Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151, 90–103. https://doi.org/10.1016/j.obhdp.2018.12.005 [Google Scholar] [Crossref]

37. Madaio, M. A., Stark, L., Wortman Vaughan, J., & Wallach, H. (2020). Co-Designing Checklists to Understand Organizational Challenges and Opportunities around Fairness in AI. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 1–14. https://doi.org/10.1145/3313831.3376445 [Google Scholar] [Crossref]

38. Mosier, K. L., Skitka, L. J., Heers, S., & Burdick, M. (1998). Automation Bias: Decision Making and Performance in High-Tech Cockpits. The International Journal of Aviation Psychology, 8(1), 47–63. https://doi.org/10.1207/s15327108ijap0801_3 [Google Scholar] [Crossref]

39. Munro, R. (with Safari, an O’Reilly Media Company). (2021). Human-in-the-Loop Machine Learning (1st edition). Manning Publications. [Google Scholar] [Crossref]

40. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2203.02155 [Google Scholar] [Crossref]

41. Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, n71. https://doi.org/10.1136/bmj.n71 [Google Scholar] [Crossref]

42. Parasuraman, R., & Manzey, D. H. (2010). Complacency and Bias in Human Use of Automation: An Attentional Integration. Human Factors: The Journal of the Human Factors and Ergonomics Society, 52(3), 381–410. https://doi.org/10.1177/0018720810376055 [Google Scholar] [Crossref]

43. Parasuraman, R., Sheridan, T. B., & Wickens, C. D. (2000). A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 30(3), 286–297. https://doi.org/10.1109/3468.844354 [Google Scholar] [Crossref]

44. Pawson, R. (2006). Evidence-based policy: A realist perspective. Sage. [Google Scholar] [Crossref]

45. Rahwan, I. (2021). Society-in-the-loop: Programming the algorithmic social contract. Ethics and Information Technology, 20(1), 5–14. https://doi.org/10.1007/s10676-017-9430-8 [Google Scholar] [Crossref]

46. Reich, T., Kaju, A., & Maglio, S. J. (2023). How to overcome algorithm aversion: Learning from mistakes. Journal of Consumer Psychology, 33(2), 285–302. https://doi.org/10.1002/jcpy.1313 [Google Scholar] [Crossref]

47. Richardson, L. S., Fidock, J., & Gunawan, I. (2025). Systematic Literature Review of Levels of Automation (Autonomy) Taxonomy: Critiques and Recommendations. International Journal of Human–Computer Interaction, 41(24), 15824–15843. https://doi.org/10.1080/10447318.2025.2502978 [Google Scholar] [Crossref]

48. Romeo, G., & Conti, D. (2026). Exploring automation bias in human–AI collaboration: A review and implications for explainable AI. AI & SOCIETY, 41(1), 259–278. https://doi.org/10.1007/s00146-025-02422-7 [Google Scholar] [Crossref]

49. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x [Google Scholar] [Crossref]

50. Russell, S. J., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach. Pearson. [Google Scholar] [Crossref]

51. Shani, I. & GitHub Staff. (2023, June 13). Survey reveals AI’s impact on the developer experience. The GitHub. https://github.blog/news-insights/research/survey-reveals-ais-impact-on-the-developer-experience/ [Google Scholar] [Crossref]

52. Shneiderman, B. (2020). Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human–Computer Interaction, 36(6), 495–504. https://doi.org/10.1080/10447318.2020.1741118 [Google Scholar] [Crossref]

53. Shneiderman, B. (2022). Human-Centered AI (1st ed.). Oxford University PressOxford. https://doi.org/10.1093/oso/9780192845290.001.0001 [Google Scholar] [Crossref]

54. Shortliffe, E. H., & Sepúlveda, M. J. (2022). Clinical Decision Support in the Era of Artificial Intelligence. JAMA, 320(21), 2199. https://doi.org/10.1001/jama.2018.17163 [Google Scholar] [Crossref]

55. Sloane, M., Moss, E., Awomolo, O., & Forlano, L. (2020). Participation is not a Design Fix for Machine Learning (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2007.02423 [Google Scholar] [Crossref]

56. Tabassi, E. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0) (NIST AI 100-1; p. NIST AI 100-1). National Institute of Standards and Technology (U.S.). https://doi.org/10.6028/NIST.AI.100-1 [Google Scholar] [Crossref]

57. Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8(1), 45. https://doi.org/10.1186/1471-2288-8-45 [Google Scholar] [Crossref]

58. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7 [Google Scholar] [Crossref]

59. Trist, E., & Bamforth, K. (1951). Some social and psychological consequences of the longwall method of coal-getting. Human Relations, 4(1), 3–38. [Google Scholar] [Crossref]

60. Vogl, R. (Ed.). (2021). Research handbook on big data law. Edward Elgar Publishing Limited. [Google Scholar] [Crossref]

61. Wachter, S., Mittelstadt, B., & Russell, C. (2021). Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. Computer Law & Security Review, 41, 105567. https://doi.org/10.1016/j.clsr.2021.105567 [Google Scholar] [Crossref]

62. Wang, D., Yang, Q., Abdul, A., & Lim, B. Y. (2019). Designing Theory-Driven User-Centric Explainable AI. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, 1–15. https://doi.org/10.1145/3290605.3300831 [Google Scholar] [Crossref]

63. Wang, S., Wang, F., Zhu, Z., Wang, J., Tran, T., & Du, Z. (2024). Artificial intelligence in education: A systematic literature review. Expert Systems with Applications, 252, 124167. https://doi.org/10.1016/j.eswa.2024.124167 [Google Scholar] [Crossref]

64. World Health Organization. (2021). Ethics and Governance of Artificial Intelligence for Health: WHO Guidance (1st ed). World Health Organization. https://www.who.int/publications/i/item/9789240029200 [Google Scholar] [Crossref]

65. Zhang, Y., Liao, Q. V., & Bellamy, R. K. E. (2020). Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 295–305. https://doi.org/10.1145/3351095.3372852 [Google Scholar] [Crossref]

66. Ziegler, D. M., Stiennon, N., Wu, J., Brown, T. B., Radford, A., Amodei, D., Christiano, P., & Irving, G. (2020). Fine-Tuning Language Models from Human Preferences (arXiv:1909.08593). arXiv. https://doi.org/10.48550/arXiv.1909.08593 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles