Intelligent Multi-Agent Reinforcement Learning Architectures for Coordinated Autonomous Logistics and Real-Time Network Optimization

Kannan Avalurpet Loganathan; Arunraju Chinnaraju

doi:10.47772/IJRISS.2026.10100211

Intelligent Multi-Agent Reinforcement Learning Architectures for Coordinated Autonomous Logistics and Real-Time Network Optimization

Authors

Kannan Avalurpet Loganathan

Independent Researcher, California (USA)

Arunraju Chinnaraju

Doctorate in Business Administration, Westcliff University (USA)

Article Information

DOI: 10.47772/IJRISS.2026.10100211

Subject Category: Supply Chain Management

Volume/Issue: 10/1 | Page No: 2666-2742

Publication Timeline

Submitted: 2026-01-15

Accepted: 2026-01-21

Published: 2026-01-31

Abstract

The complexity and variability of large-scale global logistics networks demonstrate the inherent limits to the potential of both optimized centralization and automated rules based on the present state of knowledge. Logistics systems today function in decentralized, stochastic and partially observable environments, comprising autonomous, however, dependent upon each other, entities such as trucks, warehouses and transportation hubs. This paper provides an overall theoretical and architectural base for the application of Intelligent Multi-Agent Reinforcement Learning (MARL) as a platform for the development of autonomous logistics and the dynamic optimization of logistics networks in real time. Logistics operations are defined as decentralized decision-making processes and stochastic games, which allow agents to develop adaptive coordination policies, through decentralized execution of policies developed during centralized training. An additional layered MARL structure is described to separate perception, coordination, decision-making and optimization, to ensure the ability to scale, modularize and optimize logistics networks in a stable manner. Graph-based communication, message-passing mechanisms and bandwidth-efficient policy-sharing are used to coordinate the actions among agents; whereas, the stability of learning is addressed using value decomposition, structured credit assignment and reward shaping. Advanced learning strategies including actor-critic methods, proximal policy optimization, meta-learning and continual learning are analyzed for multi-objective optimization of logistics networks over time, cost, energy and carbon footprint constraints. In addition, this paper demonstrates how the proposed framework can be integrated with high-fidelity simulation and multiagent digital twins to safely train and validate policies under realistic disruptions, along with cloud-edge infrastructure and distributed data pipelines to deploy these policies in real time. Additionally, the paper addresses the issues of interoperability between the proposed MARL framework and enterprise supply chain systems, as well as the governance issues related to transparency, accountability and regulatory compliance. Finally, the paper outlines future research directions, combining MARL with graph neural networks, generative models and predictive digital twins to enable scalable, resilient and self-optimizing logistics ecosystems.

Keywords

Multi-Agent Reinforcement Learning, Autonomous Logistics Systems

Downloads

PDF JATS XML

References

1. A multi agent deep reinforcement learning approach for traffic signal control. (2024). IET Intelligent Transport Systems. https://doi.org/10.1049/itr2.12521 [Google Scholar] [Crossref]

2. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 308–318. https://doi.org/10.1145/2976749.2978318 [Google Scholar] [Crossref]

3. Abbas, N., Zhang, Y., Taherkordi, A., & Skeie, T. (2018). Mobile edge computing: A survey. IEEE Internet of Things Journal, 5(1), 450–465. https://doi.org/10.1109/JIOT.2017.2750180 [Google Scholar] [Crossref]

4. Abideen, A. Z., Sundram, V. P. K., Pyeman, J., Othman, A. K., & Sorooshian, S. (2021). Digital twin integrated reinforced learning in supply chain and logistics. Logistics, 5(4), 84. [Google Scholar] [Crossref]

5. https://doi.org/10.3390/logistics5040084 [Google Scholar] [Crossref]

6. Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. [Google Scholar] [Crossref]

7. https://doi.org/10.48550/arXiv.1705.10528 [Google Scholar] [Crossref]

8. Adjei, P. K., & others. (2025). A graph attention network-based multi-agent reinforcement learning approach for complex interaction modeling. Scientific Reports, 15, 14032. [Google Scholar] [Crossref]

9. https://doi.org/10.1038/s41598-025-14032-w [Google Scholar] [Crossref]

10. Agarwal, R., Schwarzer, M., Castro, P. S., Courville, A., & Bellemare, M. G. (2021). Deep reinforcement learning at the edge of the statistical precipice. arXiv. [Google Scholar] [Crossref]

11. https://doi.org/10.48550/arXiv.2108.13264 [Google Scholar] [Crossref]

12. Akidau, T., Balikov, A., Bekiroğlu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., & Whittle, S. (2013). MillWheel: Fault tolerant stream processing at internet scale. Proceedings of the VLDB Endowment, 6(11), 1033–1044. https://doi.org/10.14778/2536222.2536229 [Google Scholar] [Crossref]

13. Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández Moisés, R. J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., & Whittle, S. (2015). The Dataflow model: A practical approach to balancing correctness, latency, and cost in massive scale, unbounded, out of order data processing. Proceedings of the VLDB Endowment, 8(12), 1792–1803. https://doi.org/10.14778/2824032.2824076 [Google Scholar] [Crossref]

14. Aledhari, M., Razzak, R., Parizi, R. M., & Saeed, F. (2020). Federated learning: A survey on enabling technologies, protocols, and applications. IEEE Access, 8, 140699–140725. [Google Scholar] [Crossref]

15. https://doi.org/10.1109/ACCESS.2020.3013541 [Google Scholar] [Crossref]

16. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet of Things: A survey on enabling technologies, protocols, and applications. IEEE Communications Surveys & Tutorials, 17(4), 2347–2376. https://doi.org/10.1109/COMST.2015.2444095 [Google Scholar] [Crossref]

17. Alyahya, S., Qian, W., & Bennett, N. (2016). Application and integration of an RFID-enabled warehousing management system: A feasibility study. Journal of Industrial Information Integration, 4, 15–25. https://doi.org/10.1016/j.jii.2016.08.001 [Google Scholar] [Crossref]

18. Amato, C. (2024). An introduction to centralized training for decentralized execution in cooperative multi agent reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.2409.03052 [Google Scholar] [Crossref]

19. Ammari, K., Bel Mufti, G., & Markou, M. S. (2025). Multi agent reinforcement learning for traffic signal control. IFAC PapersOnLine, 58(10), 65–70. https://doi.org/10.1016/j.ifacol.2025.10.011 [Google Scholar] [Crossref]

20. Atzori, L., Iera, A., & Morabito, G. (2010). The Internet of Things: A survey. Computer Networks, 54(15), 2787–2805. https://doi.org/10.1016/j.comnet.2010.05.010 [Google Scholar] [Crossref]

21. Auer, P., Cesa Bianchi, N., & Fischer, P. (2002). Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2–3), 235–256. https://doi.org/10.1023/A:1013689704352 [Google Scholar] [Crossref]

22. Balcik, B., Beamon, B. M., Krejci, C. C., Muramatsu, K. M., & Ramirez, M. (2010). Coordination in humanitarian relief chains: Practices, challenges, and opportunities. International Journal of Production Economics, 126(1), 22–34. https://doi.org/10.1016/j.ijpe.2009.09.008 [Google Scholar] [Crossref]

23. Barricelli, B. R., Casiraghi, E., & Fogli, D. (2019). A survey on digital twin: Definitions, characteristics, applications, and design implications. IEEE Access, 7, 167653–167671. [Google Scholar] [Crossref]

24. https://doi.org/10.1109/ACCESS.2019.2953499 [Google Scholar] [Crossref]

25. Beamon, B. M. (1999). Measuring supply chain performance. International Journal of Operations & Production Management, 19(3), 275–292. https://doi.org/10.1108/01443579910249714 [Google Scholar] [Crossref]

26. Behzadi, G., O’Sullivan, M. J., Olsen, T. L., & Zhang, A. (2018). Agribusiness supply chain risk management: A review of quantitative decision models. Omega, 79, 21–42. [Google Scholar] [Crossref]

27. https://doi.org/10.1016/j.omega.2017.07.005 [Google Scholar] [Crossref]

28. Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning (pp. 449–458). https://doi.org/10.48550/arXiv.1707.06887 [Google Scholar] [Crossref]

29. Bellemare, M. G., Dabney, W., & Munos, R. (2017). A distributional perspective on reinforcement learning. International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1707.06887 [Google Scholar] [Crossref]

30. Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253–279. https://doi.org/10.1613/jair.3912 [Google Scholar] [Crossref]

31. Ben-Daya, M., Hassini, E., & Bahroun, Z. (2019). Internet of things and supply chain management: A literature review. International Journal of Production Research, 57(15–16), 4719–4742. https://doi.org/10.1080/00207543.2017.1402140 [Google Scholar] [Crossref]

32. Bernstein, D. S., Givan, R., Immerman, N., & Zilberstein, S. (2002). The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27(4), 819–840. https://doi.org/10.1287/moor.27.4.819.297 [Google Scholar] [Crossref]

33. Beynier, A. (2013). DEC MDP/POMDP. In Markov decision processes in artificial intelligence (Chapter 9). Wiley. https://doi.org/10.1002/9781118557426.ch9 [Google Scholar] [Crossref]

34. Bonabeau, E. (2002). Agent based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences, 99(Suppl. 3), 7280–7287. https://doi.org/10.1073/pnas.082080899 [Google Scholar] [Crossref]

35. Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., & Seth, K. (2017). Practical secure aggregation for privacy preserving machine learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1175–1191. https://doi.org/10.1145/3133956.3133982 [Google Scholar] [Crossref]

36. Bonomi, F., Milito, R., Natarajan, P., & Zhu, J. (2012). Fog computing: A platform for Internet of Things and analytics. Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, 13–16. https://doi.org/10.1145/2342509.2342513 [Google Scholar] [Crossref]

37. Bowling, M., & Veloso, M. (2002). Multi agent learning using a variable learning rate. Artificial Intelligence, 136(2), 215–250. https://doi.org/10.1016/S0004-3702(02)00121-2 [Google Scholar] [Crossref]

38. Boyd, S., Ghosh, A., Prabhakar, B., & Shah, D. (2006). Randomized gossip algorithms. IEEE Transactions on Information Theory, 52(6), 2508–2530. https://doi.org/10.1109/TIT.2006.874516 [Google Scholar] [Crossref]

39. Boysen, N., de Koster, R., & Weidinger, F. (2019). Warehousing in the e commerce era: A survey. European Journal of Operational Research, 277(2), 396–411. https://doi.org/10.1016/j.ejor.2018.08.023 [Google Scholar] [Crossref]

40. Brewer, E. A. (2012). CAP twelve years later: How the “rules” have changed. Computer, 45(2), 23–29. https://doi.org/10.1109/MC.2012.37 [Google Scholar] [Crossref]

41. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016). OpenAI Gym. arXiv. https://doi.org/10.48550/arXiv.1606.01540 [Google Scholar] [Crossref]

42. Brous, P., Janssen, M., & Herder, P. (2020). The dual effects of the Internet of Things (IoT): A systematic review of the benefits and risks of IoT adoption by organizations. International Journal of Information Management, 51, 101952. https://doi.org/10.1016/j.ijinfomgt.2019.05.008 [Google Scholar] [Crossref]

43. Brynjolfsson, E., Hitt, L. M., & Kim, H. H. (2011). Strength in numbers: How does data-driven decisionmaking affect firm performance? Proceedings of the International Conference on Information Systems (ICIS). https://doi.org/10.2139/ssrn.1819486 [Google Scholar] [Crossref]

44. Bucsoniu, L., Babuška, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics Part C Applications and Reviews, 38(2), 156– 172. https://doi.org/10.1109/TSMCC.2007.913919 [Google Scholar] [Crossref]

45. Chen, D., Doumeingts, G., & Vernadat, F. (2008). Architectures for enterprise integration and interoperability: Past, present and future. Computers in Industry, 59(7), 647–659. [Google Scholar] [Crossref]

46. https://doi.org/10.1016/j.compind.2007.12.016 [Google Scholar] [Crossref]

47. Chen, J., Li, Z., & Wang, Y. (2022). Multi robot task allocation in e commerce RMFS based on multi agent deep reinforcement learning. Mathematical Biosciences and Engineering, 20(2), 1–23. https://doi.org/10.3934/mbe.2023087 [Google Scholar] [Crossref]

48. Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209. https://doi.org/10.1007/s11036-013-0489-0 [Google Scholar] [Crossref]

49. Chiang, M., & Zhang, T. (2016). Fog and IoT: An overview of research opportunities. IEEE Internet of Things Journal, 3(6), 854–864. https://doi.org/10.1109/JIOT.2016.2584538 [Google Scholar] [Crossref]

50. Cimino, C., Negri, E., & Fumagalli, L. (2019). Review of digital twin applications in manufacturing. Computers in Industry, 113, 103130. https://doi.org/10.1016/j.compind.2019.103130 [Google Scholar] [Crossref]

51. Clark, S., & Watling, D. (2005). Modelling network travel time reliability under stochastic demand. [Google Scholar] [Crossref]

52. Transportation Research Part B: Methodological, 39(2), 119–140. [Google Scholar] [Crossref]

53. https://doi.org/10.1016/j.trb.2003.10.006 [Google Scholar] [Crossref]

54. Clarke, G., & Wright, J. W. (1964). Scheduling of vehicles from a central depot to a number of delivery points. Operations Research, 12(4), 568–581. https://doi.org/10.1287/opre.12.4.568 [Google Scholar] [Crossref]

55. Cobbe, K., Klimov, O., Hesse, C., Kim, T., & Schulman, J. (2020). Leveraging procedural generation to benchmark reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.1912.01588 [Google Scholar] [Crossref]

56. Corvello, V., Iazzolino, G., & Verteramo, S. (2025). City logistics 4.0: A reconceptualization of the domain through technology and sustainability perspectives. Annals of Operations Research. https://doi.org/10.1007/s10479-025-06835-x [Google Scholar] [Crossref]

57. Cover, T. M., & Thomas, J. A. (2006). Elements of information theory (2nd ed.). Wiley. https://doi.org/10.1002/047174882X [Google Scholar] [Crossref]

58. Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R. (2018). Distributional reinforcement learning with quantile regression. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11791 [Google Scholar] [Crossref]

59. Dantzig, G. B., & Ramser, J. H. (1959). The truck dispatching problem. Management Science, 6(1), 80–91. https://doi.org/10.1287/mnsc.6.1.80 [Google Scholar] [Crossref]

60. Davidsson, P., Henesey, L., Ramstedt, L., Törnquist, J., & Wernstedt, F. (2005). An analysis of agent based approaches to transport logistics. Transportation Research Part C: Emerging Technologies, 13(4), 255–271. https://doi.org/10.1016/j.trc.2005.07.002 [Google Scholar] [Crossref]

61. Davidsson, P., Henesey, L., Ramstedt, L., Törnquist, J., & Wernstedt, F. (2004). Agent based approaches to transport logistics. In Agent and multi agent systems: Technologies and applications (pp. 1–16). Springer. https://doi.org/10.1007/3-7643-7363-6_1 [Google Scholar] [Crossref]

62. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. https://doi.org/10.1145/1327452.1327492 [Google Scholar] [Crossref]

63. Dimakis, A. G., Kar, S., Moura, J. M. F., Rabbat, M. G., & Scaglione, A. (2010). Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11), 1847–1864. https://doi.org/10.1109/JPROC.2010.2052531 [Google Scholar] [Crossref]

64. Ding, Z., Wang, X., Li, J., & Zhang, Y. (2024). Identifying poisoning attacks in federated learning online. Scientific Reports, 14, 70375. https://doi.org/10.1038/s41598-024-70375-w Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv. https://doi.org/10.48550/arXiv.1702.08608 [Google Scholar] [Crossref]

65. Dragoni, N., Lanese, I., Larsen, S. T., Mazzara, M., Mustafin, R., & Safina, L. (2017). Microservices: Yesterday, today, and tomorrow. Present and Ulterior Software Engineering, 195–216. https://doi.org/10.1007/978-3-319-67425-4_12 [Google Scholar] [Crossref]

66. Dror, M., & Trudeau, P. (1989). Vehicle routing with stochastic demands: Properties and solution frameworks. Transportation Science, 23(3), 166–176. https://doi.org/10.1287/trsc.23.3.166 [Google Scholar] [Crossref]

67. Dubey, R., Gunasekaran, A., Childe, S. J., Wamba, S. F., & Papadopoulos, T. (2019). Big data analytics and artificial intelligence pathway to operational performance under the effects of entrepreneurial orientation and environmental dynamism: A study of manufacturing organizations. International Journal of Production Economics, 226, 107599. https://doi.org/10.1016/j.ijpe.2019.107599 [Google Scholar] [Crossref]

68. Dulac Arnold, G., Levine, N., Mankowitz, D. J., Li, J., Paduraru, C., Gowal, S., & Hester, T. (2021). An empirical investigation of the challenges of real world reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.2003.11881 [Google Scholar] [Crossref]

69. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 214–226. https://doi.org/10.1145/2090236.2090255 [Google Scholar] [Crossref]

70. El Hamdi, S., Abouabdellah, A., & Bouchentouf, T. (2022). Logistics: Impact of Industry 4.0. Applied Sciences, 12(9), 4209. https://doi.org/10.3390/app12094209 [Google Scholar] [Crossref]

71. Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., & Madry, A. (2020). Implementation matters in deep RL: A case study on PPO and TRPO. arXiv. [Google Scholar] [Crossref]

72. https://doi.org/10.48550/arXiv.2005.12729 [Google Scholar] [Crossref]

73. Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., & Kavukcuoglu, K. (2018). IMPALA: Scalable distributed deep RL with importance weighted actor learner architectures. International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1802.01561 [Google Scholar] [Crossref]

74. Eugster, P. T., Felber, P. A., Guerraoui, R., & Kermarrec, A.-M. (2003). The many faces of publish/subscribe. ACM Computing Surveys, 35(2), 114–131. https://doi.org/10.1145/857076.857078 [Google Scholar] [Crossref]

75. Finn, C., Abbeel, P., & Levine, S. (2017). Model agnostic meta learning for fast adaptation of deep networks. International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1703.03400 [Google Scholar] [Crossref]

76. Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., Luetge, C., Madelin, R., Pagallo, U., Rossi, F., Schafer, B., Valcke, P., & Vayena, E. (2018). AI4People—An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689–707. https://doi.org/10.1007/s11023-018-9482-5 [Google Scholar] [Crossref]

77. Foerster, J. N., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi agent reinforcement learning. Advances in Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1605.06676 [Google Scholar] [Crossref]

78. Foerster, J. N., Farquhar, G., Afouras, T., Nardelli, N., & Whiteson, S. (2018). Counterfactual multi agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11794 [Google Scholar] [Crossref]

79. Foerster, J., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi agent reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.1605.06676 [Google Scholar] [Crossref]

80. Fu, J., Kumar, A., Nachum, O., Tucker, G., & Levine, S. (2020). D4RL: Datasets for deep data driven reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.2004.07219 [Google Scholar] [Crossref]

81. Fujimoto, S., Hoof, H., & Meger, D. (2018). Addressing function approximation error in actor critic methods. arXiv. https://doi.org/10.48550/arXiv.1802.09477 [Google Scholar] [Crossref]

82. Fujimoto, S., van Hoof, H., & Meger, D. (2019). Addressing function approximation error in actor critic methods. In Proceedings of the 36th International Conference on Machine Learning (pp. 1587–1596). https://doi.org/10.48550/arXiv.1802.09477 [Google Scholar] [Crossref]

83. Fuller, A., Fan, Z., Day, C., & Barlow, C. (2020). Digital twin: Enabling technologies, challenges and open research. IEEE Access, 8, 108952–108971. https://doi.org/10.1109/ACCESS.2020.2998358 [Google Scholar] [Crossref]

84. Gijsbrechts, J., Boute, R., Van Mieghem, J., & Zhang, N. (2022). Can deep reinforcement learning improve inventory management? Performance and caveats. Manufacturing & Service Operations Management, 24(4), 1871–1898. https://doi.org/10.1287/msom.2021.1064 [Google Scholar] [Crossref]

85. Giuseppi, A., & others. (2025). Enhancing federated reinforcement learning: A consensus-based perspective. International Journal of Automation and Computing, 22, 1–22. [Google Scholar] [Crossref]

86. https://doi.org/10.1007/s11633-025-1550-8 [Google Scholar] [Crossref]

87. Gleave, A., Dennis, M., Wild, C., Kant, N., Levine, S., & Russell, S. (2020). Adversarial policies: Attacking deep reinforcement learning. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1905.10615 [Google Scholar] [Crossref]

88. Grieves, M., & Vickers, J. (2017). Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems. In F. J. Kahlen, S. Flumerfelt, & A. Alves (Eds.), Transdisciplinary perspectives on complex systems (pp. 85–113). Springer. https://doi.org/10.1007/978-3-319-38756-7_4 [Google Scholar] [Crossref]

89. Gronauer, S., & Diepold, K. (2022). Multi agent deep reinforcement learning: A survey. Artificial Intelligence Review, 55, 895–943. https://doi.org/10.1007/s10462-021-09996-w [Google Scholar] [Crossref]

90. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645–1660. https://doi.org/10.1016/j.future.2013.01.010 [Google Scholar] [Crossref]

91. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys, 51(5), Article 93. https://doi.org/10.1145/3236009 [Google Scholar] [Crossref]

92. Gunasekaran, A., Patel, C., & Tirtiroglu, E. (2001). Performance measures and metrics in a supply chain environment. International Journal of Operations & Production Management, 21(1/2), 71–87. https://doi.org/10.1108/01443570110358468 [Google Scholar] [Crossref]

93. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor critic: Off policy maximum entropy deep reinforcement learning with a stochastic actor. https://doi.org/10.48550/arXiv.1801.01290 [Google Scholar] [Crossref]

94. Hall, R. W. (1978). Properties of the equilibrium state in transportation networks. Transportation Science, 12(3), 208–216. https://doi.org/10.1287/trsc.12.3.208 [Google Scholar] [Crossref]

95. Harby, A. A., & Zulkernine, F. (2025). Data lakehouse: A survey and experimental study. Information Systems, 127, 102460. https://doi.org/10.1016/j.is.2024.102460 [Google Scholar] [Crossref]

96. He, C., Annavaram, M., & Avestimehr, S. (2020). Group knowledge transfer: Federated learning of large CNNs at the edge. Advances in Neural Information Processing Systems, 33, 14068–14080. https://doi.org/10.48550/arXiv.2007.14513 [Google Scholar] [Crossref]

97. He, L., Xue, M., & Gu, B. (2020). Internet-of-things enabled supply chain planning and coordination with big data services: Certain theoretic implications. Journal of Management Science and Engineering, 5(1), 1–14. https://doi.org/10.1016/j.jmse.2020.03.002 [Google Scholar] [Crossref]

98. Helo, P., & Hao, Y. (2020). Blockchains in operations and supply chains: A model and reference implementation. Computers & Industrial Engineering, 136, 242–251. [Google Scholar] [Crossref]

99. https://doi.org/10.1016/j.cie.2019.07.023 [Google Scholar] [Crossref]

100. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2018). Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11694 [Google Scholar] [Crossref]

101. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2018). Deep reinforcement learning that matters. arXiv. https://doi.org/10.48550/arXiv.1709.06560 [Google Scholar] [Crossref]

102. Hernández Leal, P., Kartal, B., & Taylor, M. E. (2019). A survey and critique of multi agent deep reinforcement learning. Autonomous Agents and Multi Agent Systems, 33, 750–797. https://doi.org/10.1007/s10458-019-09421-1 [Google Scholar] [Crossref]

103. Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11796 [Google Scholar] [Crossref]

104. Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.1710.02298 [Google Scholar] [Crossref]

105. Hohenstein, N.-O., Feisel, E., Hartmann, E., & Giunipero, L. (2015). Research on the phenomenon of supply chain resilience. International Journal of Physical Distribution & Logistics Management, 45(1/2), 90–117. https://doi.org/10.1108/IJPDLM-05-2013-0128 [Google Scholar] [Crossref]

106. Holguín Veras, J., Jaller, M., Van Wassenhove, L. N., Pérez, N., & Wachtendorf, T. (2012). On the unique features of post disaster humanitarian logistics. Journal of Operations Management, 30(7–8), 494–506. https://doi.org/10.1016/j.jom.2012.08.003 [Google Scholar] [Crossref]

107. Hortelano, D., de Miguel, I., Barroso, R. J., Aguado, J. C., Merayo, N., Ruiz, L., Asensio, A., Masip-Bruin, X., Fernández, P., Lorenzo, R. M., & others. (2023). A comprehensive survey on reinforcement-learning-based [Google Scholar] [Crossref]

108. computation offloading techniques in edge computing systems. Journal of Network and Computer Applications, 216, 103669. https://doi.org/10.1016/j.jnca.2023.103669 [Google Scholar] [Crossref]

109. Hosseini, S., Ivanov, D., & Dolgui, A. (2019). Review of quantitative methods for supply chain resilience analysis. Transportation Research Part E: Logistics and Transportation Review, 125, 285–307. https://doi.org/10.1016/j.tre.2019.03.001 [Google Scholar] [Crossref]

110. Hsu, B. M., Hsu, L. Y., & Shu, M. H. (2013). Evaluation of supply chain performance using delivery-time performance analysis chart approach. Journal of Statistics and Management Systems, 16(1), 73–87. https://doi.org/10.1080/09720510.2013.777568 [Google Scholar] [Crossref]

111. Huang, J., Liu, J., Zhou, Y., Li, X., Ji, S., Xiong, H., & Dou, D. (2022). From distributed machine learning to federated learning: A survey. Knowledge and Information Systems, 64(4), 885–917. https://doi.org/10.1007/s10115-022-01664-x [Google Scholar] [Crossref]

112. Improving inventory management quality with reinforcement learning. (2025). Accounting Horizons. https://doi.org/10.2308/HORIZONS-2024-121 [Google Scholar] [Crossref]

113. Isard, M., Budiu, M., Yu, Y., Birrell, A., & Fetterly, D. (2007). Dryad: Distributed data parallel programs from sequential building blocks. Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, 59–72. https://doi.org/10.1145/1272996.1273005 [Google Scholar] [Crossref]

114. Ivanov, D. (2020). Viability of supply chain networks and supply chain resilience: A systems approach. Annals of Operations Research, 319, 1023–1063. https://doi.org/10.1007/s10479-020-03640-6 [Google Scholar] [Crossref]

115. Ivanov, D., & Dolgui, A. (2020). A digital supply chain twin for managing the disruption risks and resilience in the era of Industry 4.0. Production Planning & Control, 32(9), 775–788. https://doi.org/10.1080/09537287.2020.1768450 [Google Scholar] [Crossref]

116. Ivanov, D., & Dolgui, A. (2020). Viability of intertwined supply networks: Extending the supply chain resilience angles towards survivability. International Journal of Production Research, 58(10), 2904–2915. https://doi.org/10.1080/00207543.2020.1750727 [Google Scholar] [Crossref]

117. Jacobs, F. R., & Weston, F. C., Jr. (2007). Enterprise resource planning (ERP)—A brief history. Journal of Operations Management, 25(2), 357–363. https://doi.org/10.1016/j.jom.2006.11.005 [Google Scholar] [Crossref]

118. Jiang, Q., Shi, S., Zhu, X., & Zhang, X. (2022). Multi agent reinforcement learning for traffic signal control. arXiv. https://doi.org/10.48550/arXiv.2204.12190 [Google Scholar] [Crossref]

119. Jobin, A., Ienca, M., & Vayena, E. (2019). The global landscape of AI ethics guidelines. Nature Machine Intelligence, 1(9), 389–399. https://doi.org/10.1038/s42256-019-0088-2 [Google Scholar] [Crossref]

120. Joe, W., & Lau, H. C. (2020). Deep reinforcement learning approach to solve dynamic vehicle routing problem with stochastic customers. In Proceedings of the Thirtieth International Conference on Automated Planning and Scheduling (ICAPS) (pp. 394–402). AAAI Press. [Google Scholar] [Crossref]

121. https://doi.org/10.1609/icaps.v30i1.6685 [Google Scholar] [Crossref]

122. Juliani, A., Berges, V. P., Vckay, E., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2018). Unity: A general platform for intelligent agents. arXiv. https://doi.org/10.48550/arXiv.1809.02627 [Google Scholar] [Crossref]

123. Kache, F., & Seuring, S. (2017). Challenges and opportunities of digital information at the intersection of Big Data Analytics and supply chain management. International Journal of Operations & Production Management, 37(1), 10–36. https://doi.org/10.1108/IJOPM-02-2015-0078 [Google Scholar] [Crossref]

124. Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., D’Oliveira, R. G. L., Rouayheb, S. E., Evans, D., Gardner, J., Garrett, Z., Gascón, A., Ghazi, B., Gibbons, P. B., ... Wright, R. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1–2), 1–210. https://doi.org/10.1561/2200000083 [Google Scholar] [Crossref]

125. Kephart, J. O., & Chess, D. M. (2003). The vision of autonomic computing. Computer, 36(1), 41–50. https://doi.org/10.1109/MC.2003.1160055 [Google Scholar] [Crossref]

126. Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A. A., Milan, K., Quan, J., Ramalho, T., Grabska Barwińska, A., Hassabis, D., Clopath, C., Kumaran, D., & Hadsell, R. (2017). [Google Scholar] [Crossref]

127. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526. https://doi.org/10.1073/pnas.1611835114 [Google Scholar] [Crossref]

128. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274. [Google Scholar] [Crossref]

129. https://doi.org/10.1177/0278364913495721 [Google Scholar] [Crossref]

130. Kolat, M., Maciag, K., & Piotrowski, K. (2023). Multi agent reinforcement learning for traffic signal control. Sustainability, 15(4), 3479. https://doi.org/10.3390/su15043479 [Google Scholar] [Crossref]

131. Konda, V. R., & Tsitsiklis, J. N. (2003). Actor–critic algorithms. SIAM Journal on Control and Optimization, 42(4), 1143–1166. https://doi.org/10.1137/S0363012901385691 [Google Scholar] [Crossref]

132. Kool, W., Van Hoof, H., & Welling, M. (2019). Attention, learn to solve routing problems. https://doi.org/10.48550/arXiv.1803.08475 [Google Scholar] [Crossref]

133. Kouhizadeh, M., Saberi, S., & Sarkis, J. (2021). Blockchain technology and the sustainable supply chain: Theoretically exploring adoption barriers. International Journal of Production Economics, 231, 107831. https://doi.org/10.1016/j.ijpe.2020.107831 [Google Scholar] [Crossref]

134. Kouicem, D. E., Bouabdallah, A., & Lakhlef, H. (2018). Internet of Things security: A top down survey. Computer Networks, 141, 199–221. https://doi.org/10.1016/j.comnet.2018.03.012 [Google Scholar] [Crossref]

135. Kovács, G., & Spens, K. M. (2007). Humanitarian logistics in disaster relief operations. International Journal of Physical Distribution & Logistics Management, 37(2), 99–114. https://doi.org/10.1108/09600030710734820 [Google Scholar] [Crossref]

136. Kritzinger, W., Karner, M., Traar, G., Henjes, J., & Sihn, W. (2018). Digital twin in manufacturing: A categorical literature review and classification. IFAC PapersOnLine, 51(11), 1016–1022. https://doi.org/10.1016/j.ifacol.2018.08.474 [Google Scholar] [Crossref]

137. Kulkarni, T. D., Narasimhan, K., Saeedi, A., & Tenenbaum, J. B. (2016). Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Advances in Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1604.06057 [Google Scholar] [Crossref]

138. Kumar, A., Fu, J., Soh, M., Tucker, G., & Levine, S. (2020). Conservative Q learning for offline reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.2006.04779 [Google Scholar] [Crossref]

139. Kwon, O., Lee, N., & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. International Journal of Information Management, 34(3), 387–394. https://doi.org/10.1016/j.ijinfomgt.2014.02.002 [Google Scholar] [Crossref]

140. Lambert, D. M., & Cooper, M. C. (2000). Issues in supply chain management. Industrial Marketing Management, 29(1), 65–83. https://doi.org/10.1016/S0019-8501(99)00113-3 [Google Scholar] [Crossref]

141. Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), 558–565. https://doi.org/10.1145/359545.359563 [Google Scholar] [Crossref]

142. Lanctot, M., Lockhart, E., Lespiau, J B., Zambaldi, V., Upadhyay, S., Pires, B. A. O., Yang, S., Tuyls, K., Pérolat, J., & Graepel, T. (2019). OpenSpiel: A framework for reinforcement learning in games. arXiv. https://doi.org/10.48550/arXiv.1908.09453 [Google Scholar] [Crossref]

143. Laporte, G. (1992). The vehicle routing problem: An overview of exact and approximate algorithms. European Journal of Operational Research, 59(3), 345–358. https://doi.org/10.1016/0377-2217(92)90192-C [Google Scholar] [Crossref]

144. Laporte, G. (2007). What you should know about the vehicle routing problem. Naval Research Logistics, 54(8), 811–819. https://doi.org/10.1002/nav.20261 [Google Scholar] [Crossref]

145. Le, D. N., & Fan, L. (2024). Digital twin in logistics and supply chain management: A systematic literature review and future research agenda. Computers & Industrial Engineering, 190, 109768. https://doi.org/10.1016/j.cie.2023.109768 [Google Scholar] [Crossref]

146. Lee, D. H., & Kwon, H. (2023). A deep reinforcement learning approach to solve the team orienteering problem. In AIAA SCITECH 2023 Forum. https://doi.org/10.2514/6.2023-2662 [Google Scholar] [Crossref]

147. Lee, D., Kim, S., & Cho, H. (2025). Digital twin driven deep reinforcement learning for real time control of automated guided vehicles in intralogistics. International Journal of Production Research. https://doi.org/10.1080/00207543.2025.2543491 [Google Scholar] [Crossref]

148. Lee, E. A. (2008). Cyber physical systems: Design challenges. Proceedings of the 11th IEEE International Symposium on Object Oriented Real Time Distributed Computing (ISORC), 363–369. https://doi.org/10.1109/ISORC.2008.25 [Google Scholar] [Crossref]

149. Lesort, T., Díaz Rodríguez, N., Goudou, J. F., & Filliat, D. (2020). Continual learning for robotics: Definition, frameworks, challenges, and opportunities. Information Fusion, 58, 52–68. [Google Scholar] [Crossref]

150. https://doi.org/10.1016/j.inffus.2019.12.004 [Google Scholar] [Crossref]

151. Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End to end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39), 1–40. https://doi.org/10.48550/arXiv.1504.00702 [Google Scholar] [Crossref]

152. Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv. https://doi.org/10.48550/arXiv.2005.01643 [Google Scholar] [Crossref]

153. Li, J., Zhang, X., & Chen, H. (2025). Digital twin driven deep reinforcement learning for real time intralogistics optimization. International Journal of Production Research. [Google Scholar] [Crossref]

154. https://doi.org/10.1080/00207543.2025.2543491 [Google Scholar] [Crossref]

155. Li, M., Long, Y., Li, T., Liang, H., & Chen, C. L. P. (2024). Dynamic event triggered consensus control for input constrained multi agent systems with a designable minimum inter event time. IEEE CAA Journal of Automatica Sinica, 11(3), 649–660. https://doi.org/10.1109/JAS.2023.123582 [Google Scholar] [Crossref]

156. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60. [Google Scholar] [Crossref]

157. https://doi.org/10.1109/MSP.2020.2975749 [Google Scholar] [Crossref]

158. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. https://doi.org/10.48550/arXiv.1509.02971 [Google Scholar] [Crossref]

159. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1509.02971 [Google Scholar] [Crossref]

160. Lim, M. K., Bahr, W., & Leung, S. C. H. (2013). RFID in the warehouse: A literature analysis (1995–2010) of its applications, benefits, challenges and future trends. International Journal of Production Economics, 145(1), 409–430. https://doi.org/10.1016/j.ijpe.2013.05.006 [Google Scholar] [Crossref]

161. Lin, K., Zhao, R., Xu, Z., & Zhou, J. (2018). Efficient large-scale fleet management via multi-agent deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 1774–1783). Association for Computing Machinery. https://doi.org/10.1145/3219819.3219993 [Google Scholar] [Crossref]

162. Liu, J., Huang, J., Zhou, Y., Li, Y., & others. (2022). From distributed machine learning to federated learning: A survey. Knowledge and Information Systems, 64(4), 885–917. [Google Scholar] [Crossref]

163. https://doi.org/10.1007/s10115-022-01664-x [Google Scholar] [Crossref]

164. Liu, X., Xu, Y., & Yang, S. (2024). Multi agent deep reinforcement learning for multi echelon inventory management. Journal of Supply Chain Management. [Google Scholar] [Crossref]

165. https://doi.org/10.1177/10591478241305863 [Google Scholar] [Crossref]

166. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O. P., & Mordatch, I. (2017). Multi agent actor critic for mixed cooperative competitive environments. arXiv. https://doi.org/10.48550/arXiv.1706.02275 [Google Scholar] [Crossref]

167. Lu, Y., Liu, C., Wang, K. I K., Huang, H., & Xu, X. (2020). Digital twin-driven smart manufacturing: Connotation, reference model, applications and research issues. Robotics and Computer-Integrated Manufacturing, 61, 101837. https://doi.org/10.1016/j.rcim.2019.101837 [Google Scholar] [Crossref]

168. Macal, C. M., & North, M. J. (2010). Tutorial on agent based modelling and simulation. Journal of Simulation, 4(3), 151–162. https://doi.org/10.1057/jos.2010.3 [Google Scholar] [Crossref]

169. MacCarthy, B. L., Blome, C., Olhager, J., Srai, J. S., & Zhao, X. (2016). Supply chain evolution—Theory, concepts and science. International Journal of Operations & Production Management, 36(12), 1696–1718. https://doi.org/10.1108/IJOPM-02-2016-0080 [Google Scholar] [Crossref]

170. Mach, P., & Becvar, Z. (2017). Mobile edge computing: A survey on architecture and computation offloading. IEEE Communications Surveys & Tutorials, 19(3), 1628–1656. [Google Scholar] [Crossref]

171. https://doi.org/10.1109/COMST.2017.2682318 [Google Scholar] [Crossref]

172. Mai, T., Zhang, H., & Leung, V. C. M. (2020). Multi agent actor critic reinforcement learning based intelligent resource allocation. In 2020 IEEE Global Communications Conference (GLOBECOM) (pp. 1–6). https://doi.org/10.1109/GLOBECOM42002.2020.9322277 [Google Scholar] [Crossref]

173. Malewicz, G., Austern, M. H., Bik, A. J. C., Dehnert, J. C., Horn, I., Leiser, N., & Czajkowski, G. (2010). [Google Scholar] [Crossref]

174. Pregel: A system for large scale graph processing. Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 135–146. https://doi.org/10.1145/1807167.1807184 [Google Scholar] [Crossref]

175. Mao, Y., You, C., Zhang, J., Huang, K., & Letaief, K. B. (2017). A survey on mobile edge computing: The communication perspective. IEEE Communications Surveys & Tutorials, 19(4), 2322–2358. https://doi.org/10.1109/COMST.2017.2745201 [Google Scholar] [Crossref]

176. Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An Empirical Study of Rich Subgroup Fairness for Machine Learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). Association for Computing Machinery, New York, NY, USA, 100–109. https://doi.org/10.1145/3287560.3287592 [Google Scholar] [Crossref]

177. Mili, K. (2025). Adaptive vehicle routing for humanitarian aid in conflict settings: A stochastic and AI based approach. Frontiers in Future Transportation, 6, 1603726. [Google Scholar] [Crossref]

178. https://doi.org/10.3389/ffutr.2025.1603726 [Google Scholar] [Crossref]

179. Minerva, R., Lee, G. M., & Crespi, N. (2020). Digital twin in the IoT context: A survey on technical features, scenarios, and architectural models. Proceedings of the IEEE, 108(10), 1789–1824. https://doi.org/10.1109/JPROC.2020.2998530 [Google Scholar] [Crossref]

180. Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, [Google Scholar] [Crossref]

181. T. (2019). Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), 220–229. https://doi.org/10.1145/3287560.3287596 [Google Scholar] [Crossref]

182. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. [Google Scholar] [Crossref]

183. (2016). Asynchronous methods for deep reinforcement learning. International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1602.01783 [Google Scholar] [Crossref]

184. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236 [Google Scholar] [Crossref]

185. Moshood, T. D., Nawanir, G., Sorooshian, S., Okfalisa, & van Viet, P. (2020). Digital twin driven supply chain and logistics: A review of digital twin applications in supply chain management. Logistics, 4(2), 29. https://doi.org/10.3390/logistics4020029 [Google Scholar] [Crossref]

186. Nazari, M., Oroojlooy, A., Snyder, L. V., & Takáč, M. (2018). Reinforcement learning for solving the vehicle routing problem. arXiv. https://doi.org/10.48550/arXiv.1802.04240 [Google Scholar] [Crossref]

187. Nedic, A., & Ozdaglar, A. (2009). Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1), 48–61. https://doi.org/10.1109/TAC.2008.2009515 [Google Scholar] [Crossref]

188. Negri, E., Fumagalli, L., & Macchi, M. (2017). A review of the roles of digital twin in CPS based production systems. Procedia Manufacturing, 11, 939–948. [Google Scholar] [Crossref]

189. https://doi.org/10.1016/j.promfg.2017.07.198 [Google Scholar] [Crossref]

190. Ngai, E. W. T., Moon, K. K.-L., Riggins, F. J., & Yi, C. Y. (2008). RFID research: An academic literature review (1995–2005) and future research directions. International Journal of Production Economics, 112(2), 510–520. https://doi.org/10.1016/j.ijpe.2007.05.004 [Google Scholar] [Crossref]

191. Nguyen, L. K. N., Howick, S., & Megiddo, I. (2024). A framework for conceptualising hybrid system dynamics and agent based simulation models. European Journal of Operational Research, 315(3), 1153–1166. https://doi.org/10.1016/j.ejor.2024.01.027 [Google Scholar] [Crossref]

192. Nichol, A., Achiam, J., & Schulman, J. (2018). On first order meta learning algorithms. arXiv. https://doi.org/10.48550/arXiv.1803.02999 [Google Scholar] [Crossref]

193. Ning, B., Liu, Z., Fang, C., Yang, H., & Zhang, J. (2024). A survey on multi agent reinforcement learning. Journal of Artificial Intelligence, 6(1), 1–32. https://doi.org/10.1016/j.jai.2024.02.003 [Google Scholar] [Crossref]

194. Ning, Z., & Xie, L. (2024). A survey on multi agent reinforcement learning and its application. Journal of Automation and Intelligence, 3(2), 73–91. https://doi.org/10.1016/j.jai.2024.02.003 [Google Scholar] [Crossref]

195. Ning, Z., Zhang, Z., Xia, F., Ullah, N., Kong, X., & Hu, X. (2024). A survey on multi agent reinforcement learning and its application. Journal of Artificial Intelligence, 1(1), 1–36. https://doi.org/10.1016/j.jai.2024.02.003 [Google Scholar] [Crossref]

196. Nowzari, C., Garcia, E., & Cortés, J. (2019). Event triggered communication and control of networked systems for multi agent consensus. Automatica, 105, 1–27. [Google Scholar] [Crossref]

197. https://doi.org/10.1016/j.automatica.2019.03.009 [Google Scholar] [Crossref]

198. Olfati Saber, R., Fax, J. A., & Murray, R. M. (2007). Consensus and cooperation in networked multi agent systems. Proceedings of the IEEE, 95(1), 215–233. https://doi.org/10.1109/JPROC.2006.887293 [Google Scholar] [Crossref]

199. Oliveira, T., Thomas, M., & Espadanal, M. (2014). Assessing the determinants of cloud computing adoption: An analysis of the manufacturing and services sectors. Information & Management, 51(5), 497–510. https://doi.org/10.1016/j.im.2014.03.006 [Google Scholar] [Crossref]

200. Özdamar, L., & Ertem, M. A. (2015). Models, solutions and enabling technologies in humanitarian logistics. European Journal of Operational Research, 244(1), 55–65. [Google Scholar] [Crossref]

201. https://doi.org/10.1016/j.ejor.2014.11.030 [Google Scholar] [Crossref]

202. Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. https://doi.org/10.1109/TKDE.2009.191 [Google Scholar] [Crossref]

203. Panetto, H., & Molina, A. (2008). Enterprise integration and interoperability in manufacturing systems: Trends and issues. Computers in Industry, 59(7), 641–646. [Google Scholar] [Crossref]

204. https://doi.org/10.1016/j.compind.2007.12.010 [Google Scholar] [Crossref]

205. Papazoglou, M. P., & van den Heuvel, W. J. (2007). Service oriented architectures: Approaches, technologies and research issues. The VLDB Journal, 16(3), 389–415. https://doi.org/10.1007/s00778-007-0044-3 [Google Scholar] [Crossref]

206. Pardo, F., Tavakoli, A., Levdik, V., & Kormushev, P. (2018). Time limits in reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.1712.00378 [Google Scholar] [Crossref]

207. Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., & Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural Networks, 113, 54–71. https://doi.org/10.1016/j.neunet.2019.01.012 [Google Scholar] [Crossref]

208. Peng, X. B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018). Sim to real transfer of robotic control with dynamics randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3803–3810). IEEE. https://doi.org/10.1109/ICRA.2018.8460528 [Google Scholar] [Crossref]

209. Pillac, V., Gueret, C., & Medaglia, A. L. (2013). A review of dynamic vehicle routing problems. European Journal of Operational Research, 225(1), 1–11. https://doi.org/10.1016/j.ejor.2012.08.015 [Google Scholar] [Crossref]

210. Powell, W. B. (2019). A unified framework for stochastic optimization. European Journal of Operational Research, 275(3), 795–821. https://doi.org/10.1016/j.ejor.2018.07.014 [Google Scholar] [Crossref]

211. Psaraftis, H. N., Wen, M., & Kontovas, C. A. (2016). Dynamic vehicle routing problems: Three decades and counting. Networks, 67(1), 3–31. https://doi.org/10.1002/net.21628 [Google Scholar] [Crossref]

212. Qi, J.; Zhou, Q.; Lei, L.; Zheng, K. Federated reinforcement learning: techniques, applications, and open challenges. Intell. Robot. 2021, 1, 18-57. http://dx.doi.org/10.20517/ir.2021.02 [Google Scholar] [Crossref]

213. Qi, Q., & Tao, F. (2018). Digital twin and big data towards smart manufacturing and Industry 4.0: 360 degree comparison. IEEE Access, 6, 3585–3593. https://doi.org/10.1109/ACCESS.2018.2793265 [Google Scholar] [Crossref]

214. Queiroz, M. M., Ivanov, D., Dolgui, A., & Wamba, S. F. (2020). Impacts of epidemic outbreaks on supply chains: Mapping a research agenda amid the COVID-19 pandemic through a structured literature review. Annals of Operations Research, 319, 1159–1196. https://doi.org/10.1007/s10479-020-03685-7 [Google Scholar] [Crossref]

215. Rahimi, F., Møller, C., & Hvam, L. (2016). Business process management and IT management: The missing integration. International Journal of Information Management, 36(1), 142–154. https://doi.org/10.1016/j.ijinfomgt.2015.10.004 [Google Scholar] [Crossref]

216. Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal [Google Scholar] [Crossref]

217. algorithmic auditing. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 33– 44. https://doi.org/10.1145/3351095.3372873 [Google Scholar] [Crossref]

218. Rajkumar, R., Lee, I., Sha, L., & Stankovic, J. (2010). Cyber physical systems: The next computing revolution. Proceedings of the 47th Design Automation Conference, 731–736. [Google Scholar] [Crossref]

219. https://doi.org/10.1145/1837274.1837461 [Google Scholar] [Crossref]

220. Rasheed, A., San, O., & Kvamsdal, T. (2020). Digital twin: Values, challenges and enablers from a modeling perspective. IEEE Access, 8, 21980–22012. https://doi.org/10.1109/ACCESS.2020.2970143 [Google Scholar] [Crossref]

221. Rashid, T., Farquhar, G., Peng, B., & Whiteson, S. (2020). Monotonic value function factorisation for deep multi agent reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.2003.08839 [Google Scholar] [Crossref]

222. Rashid, T., Samvelyan, M., De Witt, C. S., Farquhar, G., Foerster, J., & Whiteson, S. (2018). QMIX: Monotonic value function factorisation for deep multi agent reinforcement learning. arXiv. https://doi.org/10.48550/arXiv.1803.11485 [Google Scholar] [Crossref]

223. Ren, W., & Beard, R. W. (2008). Distributed consensus in multi vehicle cooperative control. Springer. https://doi.org/10.1007/978-1-84800-015-5 [Google Scholar] [Crossref]

224. Roijers, D. M., Vamplew, P., Whiteson, S., & Dazeley, R. (2014). A survey of multi-objective sequential decision-making. arXiv. https://doi.org/10.48550/arXiv.1402.0590 [Google Scholar] [Crossref]

225. Roman, R., Zhou, J., & Lopez, J. (2013). On the features and challenges of security and privacy in distributed Internet of Things. Computer Networks, 57(10), 2266–2279. [Google Scholar] [Crossref]

226. https://doi.org/10.1016/j.comnet.2012.12.018 [Google Scholar] [Crossref]

227. Roughgarden, T., & Tardos, É. (2002). How bad is selfish routing. Journal of the ACM, 49(2), 236–259. https://doi.org/10.1145/506147.506153 [Google Scholar] [Crossref]

228. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-0190048-x [Google Scholar] [Crossref]

229. Samvelyan, M., Rashid, T., de Witt, C. S., Farquhar, G., Nardelli, N., Rudner, T. G. J., Hung, C. M., Torr, P. H. S., Foerster, J., & Whiteson, S. (2019). The StarCraft multi agent challenge. arXiv. https://doi.org/10.48550/arXiv.1902.04043 [Google Scholar] [Crossref]

230. Santos, R., Costa, A., Rocha, A., & Barbosa, J. (2024). A simulation-based digital twin architecture for enhanced decision making in production systems. Computers & Industrial Engineering, 197, 110616. https://doi.org/10.1016/j.cie.2024.110616 [Google Scholar] [Crossref]

231. Satyanarayanan, M. (2017). The emergence of edge computing. Computer, 50(1), 30–39. https://doi.org/10.1109/MC.2017.9 [Google Scholar] [Crossref]

232. Satyanarayanan, M., Bahl, P., Caceres, R., & Davies, N. (2009). The case for VM based cloudlets in mobile computing. IEEE Pervasive Computing, 8(4), 14–23. https://doi.org/10.1109/MPRV.2009.82 [Google Scholar] [Crossref]

233. Sayed, A. H. (2014). Adaptation, learning, and optimization over networks. Foundations and Trends in Machine Learning, 7(4–5), 311–801. https://doi.org/10.1561/2200000051 [Google Scholar] [Crossref]

234. Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2016). Prioritized experience replay. In Proceedings of the International Conference on Learning Representations. [Google Scholar] [Crossref]

235. https://doi.org/10.48550/arXiv.1511.05952 [Google Scholar] [Crossref]

236. Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T., & Silver, D. (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609. https://doi.org/10.1038/s41586-020-03051-4 [Google Scholar] [Crossref]

237. Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015). Trust region policy optimization. https://doi.org/10.48550/arXiv.1502.05477 [Google Scholar] [Crossref]

238. Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2016). High dimensional continuous control using generalized advantage estimation. https://doi.org/10.48550/arXiv.1506.02438 [Google Scholar] [Crossref]

239. Schulman, J., Moritz, P., Levine, S., Jordan, M., & Abbeel, P. (2015). High dimensional continuous control using generalized advantage estimation. arXiv. https://doi.org/10.48550/arXiv.1506.02438 [Google Scholar] [Crossref]

240. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347 [Google Scholar] [Crossref]

241. Schwartz, R., Dodge, J., Smith, N. A., & Etzioni, O. (2020). Green AI. Communications of the ACM, 63(12), 54–63. https://doi.org/10.1145/3381831 [Google Scholar] [Crossref]

242. Seipolt, A., & Bauernhansl, T. (2024). Reinforcement learning and digital twin-driven optimization of production scheduling. Manufacturing Review, 11, 17. https://doi.org/10.1007/s43926-024-00087-0 [Google Scholar] [Crossref]

243. Shalev Shwartz, S., Shammah, S., & Shashua, A. (2016). Safe, multi agent, reinforcement learning for autonomous driving. arXiv. https://doi.org/10.48550/arXiv.1610.03295 [Google Scholar] [Crossref]

244. Shapiro, A., Dentcheva, D., & Ruszczyński, A. (2014). Lectures on stochastic programming: Modeling and theory (2nd ed.). SIAM. https://doi.org/10.1137/1.9781611973433 [Google Scholar] [Crossref]

245. Sheu, J. B. (2007). An emergency logistics distribution approach for quick response to urgent relief demand in disasters. Transportation Research Part E: Logistics and Transportation Review, 43(6), 687–709. https://doi.org/10.1016/j.tre.2006.04.004 [Google Scholar] [Crossref]

246. Shi, W., Cao, J., Zhang, Q., Li, Y., & Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3(5), 637–646. https://doi.org/10.1109/JIOT.2016.2579198 [Google Scholar] [Crossref]

247. Shoham, Y., Powers, R., & Grenager, T. (2007). If multi agent learning is the answer, what is the question. Artificial Intelligence, 171(7), 365–377. https://doi.org/10.1016/j.artint.2006.02.006 [Google Scholar] [Crossref]

248. Sicari, S., Rizzardi, A., Grieco, L. A., & Coen-Porisini, A. (2015). Security, privacy and trust in Internet of Things: The road ahead. Computer Networks, 76, 146–164. https://doi.org/10.1016/j.comnet.2014.11.008 [Google Scholar] [Crossref]

249. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961 [Google Scholar] [Crossref]

250. Smith, S. L., Pavone, M., Bullo, F., & Frazzoli, E. (2010). Dynamic vehicle routing with priority classes of stochastic demands. SIAM Journal on Control and Optimization, 48(5), 3224–3245. https://doi.org/10.1137/090749347 [Google Scholar] [Crossref]

251. Son, K., Kim, D., Kang, W., Hostallero, D., & Yi, Y. (2019). QTRAN: Learning to factorize with transformation for cooperative multi agent reinforcement learning. International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1905.05408 [Google Scholar] [Crossref]

252. Spieser, K., Treleaven, K., Zhang, R., Frazzoli, E., Morton, D., & Pavone, M. (2014). Toward a systematic approach to the design and evaluation of automated mobility-on-demand systems. Road Vehicle Automation, 229–245. https://doi.org/10.1007/978-3-319-05990-7_20 [Google Scholar] [Crossref]

253. Stranieri, F., Jorjani, S., & Trivedi, K. (2024). Performance of deep reinforcement learning algorithms in supply chain inventory management. International Journal of Production Research. https://doi.org/10.1080/00207543.2024.2311180 [Google Scholar] [Crossref]

254. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. arXiv. https://doi.org/10.48550/arXiv.1906.02243 [Google Scholar] [Crossref]

255. Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multi agent communication with backpropagation. Advances in Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1605.07736 [Google Scholar] [Crossref]

256. Sunehag, P., Lever, G., Gruslys, A., Czarnecki, W. M., Zambaldi, V., Jaderberg, M., Lanctot, M., Sonnerat, N., Leibo, J. Z., Tuyls, K., & Graepel, T. (2017). Value decomposition networks for cooperative multi agent learning. arXiv. https://doi.org/10.48550/arXiv.1706.05296 [Google Scholar] [Crossref]

257. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. https://doi.org/10.1023/A:1022633531479 [Google Scholar] [Crossref]

258. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (key concepts widely used in architecture). IEEE Transactions on Neural Networks, 9(5), 1054–1054. https://doi.org/10.1109/TNN.1998.712192 [Google Scholar] [Crossref]

259. Sutton, R. S., Maei, H. R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., & Wiewiora, E. (2009). Fast gradient descent methods for temporal difference learning with linear function approximation. In Proceedings of the 26th International Conference on Machine Learning (pp. 993–1000). https://doi.org/10.1145/1553374.1553501 [Google Scholar] [Crossref]

260. Tako, A. A., & Robinson, S. (2012). The application of discrete event simulation and system dynamics in the logistics and supply chain context. Decision Support Systems, 52(4), 802–815. https://doi.org/10.1016/j.dss.2011.11.015 [Google Scholar] [Crossref]

261. Tan, Q., Li, Z., & Wang, Y. (2025). Defending against backdoor attacks in federated learning for edge intelligence. CMES Computer Modeling in Engineering & Sciences. https://doi.org/10.32604/cmes.2025.063811 [Google Scholar] [Crossref]

262. Tang, C. S. (2006). Perspectives in supply chain risk management. International Journal of Production Economics, 103(2), 451–488. https://doi.org/10.1016/j.ijpe.2005.12.006 [Google Scholar] [Crossref]

263. Tao, F., Cheng, J., Qi, Q., Zhang, M., Zhang, H., & Sui, F. (2018). Digital twin driven product design, manufacturing and service with big data. The International Journal of Advanced Manufacturing Technology, 94(9–12), 3563–3576. https://doi.org/10.1007/s00170-017-0233-1 [Google Scholar] [Crossref]

264. Tao, F., Zhang, H., Liu, A., & Nee, A. Y. C. (2019). Digital twin in industry: State of the art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415. https://doi.org/10.1109/TII.2018.2873186 [Google Scholar] [Crossref]

265. Thomas, P. S., & Brunskill, E. (2016). Data efficient off policy policy evaluation for reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (pp. 2139–2148). https://doi.org/10.48550/arXiv.1604.00923 [Google Scholar] [Crossref]

266. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 23–30). IEEE. https://doi.org/10.1109/IROS.2017.8202133 [Google Scholar] [Crossref]

267. Toth, P., & Vigo, D. (2014). Vehicle routing: Problems, methods, and applications (2nd ed.). SIAM. https://doi.org/10.1137/1.9781611973594 [Google Scholar] [Crossref]

268. Truex, S., Baracaldo, N., Anwar, A., Zhou, S., Ludwig, H., & Zhang, R. (2019). A hybrid approach to privacy preserving federated learning. Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 1–12. https://doi.org/10.1145/3338501.3357370 [Google Scholar] [Crossref]

269. Tuyls, K., & Weiss, G. (2012). Multi agent learning: Basics, challenges, and prospects. AI Magazine, 33(3), 41–52. https://doi.org/10.1609/aimag.v33i3.2426 [Google Scholar] [Crossref]

270. Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., & Dekker, E. (2011). Empirical evaluation methods for multi objective reinforcement learning algorithms. Machine Learning, 84, 51–80. https://doi.org/10.1007/s10994-010-5232-5 [Google Scholar] [Crossref]

271. van der Valk, W., Haijema, R., & Reiner, G. (2022). Supply chains in the era of digital twins: A review. Procedia Computer Science, 200, 227–234. https://doi.org/10.1016/j.procs.2022.08.019 [Google Scholar] [Crossref]

272. Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10295 [Google Scholar] [Crossref]

273. Van Wassenhove, L. N. (2006). Humanitarian aid logistics: Supply chain management in high gear. Journal of the Operational Research Society, 57(5), 475–489. [Google Scholar] [Crossref]

274. https://doi.org/10.1057/palgrave.jors.2602125 [Google Scholar] [Crossref]

275. Varghese, B., & Buyya, R. (2018). Next generation cloud computing: New trends and research directions. Future Generation Computer Systems, 79, 849–861. [Google Scholar] [Crossref]

276. https://doi.org/10.1016/j.future.2017.09.020 [Google Scholar] [Crossref]

277. Vernadat, F. B. (2007). Interoperable enterprise systems: Architectures and methods. Annual Reviews in Control, 31(1), 137–147. https://doi.org/10.1016/j.arcontrol.2007.03.004 [Google Scholar] [Crossref]

278. Villamizar, M., Garcés, O., Castro, H., Salamanca, L., Casallas, R., & Gil, S. (2015). Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud. 2015 10th Computing Colombian Conference (10CCC), 583–590. [Google Scholar] [Crossref]

279. https://doi.org/10.1109/ColumbianCC.2015.7333476 [Google Scholar] [Crossref]

280. Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., [Google Scholar] [Crossref]

281. Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J., Jaderberg, M., … Silver, D. (2019). Grandmaster level in StarCraft II using multi agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z [Google Scholar] [Crossref]

282. Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2014). Short-term traffic forecasting: Where we are and where we’re going. Transportation Research Part C: Emerging Technologies, 43, 3–19. https://doi.org/10.1016/j.trc.2014.01.005 [Google Scholar] [Crossref]

283. Vogels, W. (2009). Eventually consistent. Communications of the ACM, 52(1), 40–44. https://doi.org/10.1145/1435417.1435432 [Google Scholar] [Crossref]

284. Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: A revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77–84. https://doi.org/10.1111/jbl.12010 [Google Scholar] [Crossref]

285. Wamba, S. F., Gunasekaran, A., Akter, S., Ren, S. J.-F., Dubey, R., & Childe, S. J. (2017). Big data analytics and firm performance: Effects of dynamic capabilities. Journal of Business Research, 70, 356–365. https://doi.org/10.1016/j.jbusres.2016.08.009 [Google Scholar] [Crossref]

286. Wang, D., Chen, Y., & Lee, D. (2025). Digital twin driven management strategies for logistics operations with unmanned vehicles. Scientific Reports, 15, 96641. https://doi.org/10.1038/s41598-025-96641-z [Google Scholar] [Crossref]

287. Wang, D., Sun, L., & Szeto, W. Y. (2020). Dynamic holding control to avoid bus bunching: A multi-agent deep reinforcement learning framework. Transportation Research Part C: Emerging Technologies, 116, 102661. https://doi.org/10.1016/j.trc.2020.102661 [Google Scholar] [Crossref]

288. Wang, G., Gunasekaran, A., Ngai, E. W. T., & Papadopoulos, T. (2016). Big data analytics in logistics and supply chain management: Certain investigations for research and applications. International Journal of Production Economics, 176, 98–110. https://doi.org/10.1016/j.ijpe.2016.03.014 [Google Scholar] [Crossref]

289. Wang, J. X., Kurth Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., Hassabis, D., & Botvinick, M. (2016). Learning to reinforcement learn. arXiv. [Google Scholar] [Crossref]

290. https://doi.org/10.48550/arXiv.1611.05763 [Google Scholar] [Crossref]

291. Wang, J., Zhang, Z., & Wang, Y. (2022). More centralized training, still decentralized execution: Multi agent conditional policy factorization. arXiv. https://doi.org/10.48550/arXiv.2209.12681 [Google Scholar] [Crossref]

292. Wang, L., Deng, T., Shen, Z.-J. M., Hu, H., & Qi, Y. (2022). Digital twin-driven smart supply chain. Engineering Management, 9(1), 56–70. https://doi.org/10.1007/s42524-021-0186-9 [Google Scholar] [Crossref]

293. Wang, X., Chen, X., Wang, Y., & Zhang, H. (2023). Event triggered consensus control of heterogeneous leader follower multi agent systems. Science China Information Sciences, 66, 152202. https://doi.org/10.1007/s11432-022-3683-y [Google Scholar] [Crossref]

294. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., & de Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. Proceedings of the 33rd International Conference on Machine Learning. https://doi.org/10.48550/arXiv.1511.06581 [Google Scholar] [Crossref]

295. Watkins, C. J. C. H., & Dayan, P. (1992). Q learning. Machine Learning, 8(3–4), 279–292. https://doi.org/10.1007/BF00992698 [Google Scholar] [Crossref]

296. Wen, J., Zhang, X., Lan, Y., Liu, Q., & Wang, J. (2022). From distributed machine learning to federated learning: A survey. Knowledge and Information Systems, 64(4), 885–917. [Google Scholar] [Crossref]

297. https://doi.org/10.1007/s10115-02201664-x [Google Scholar] [Crossref]

298. Wieland, A., & Wallenburg, C. M. (2013). The influence of relational competencies on supply chain resilience: A relational view. International Journal of Physical Distribution & Logistics Management, 43(4), 300–320. https://doi.org/10.1108/IJPDLM-08-2012-0243 [Google Scholar] [Crossref]

299. Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. [Google Scholar] [Crossref]

300. https://doi.org/10.1038/sdata.2016.18 [Google Scholar] [Crossref]

301. Williams, R. J. (1992). Simple statistical gradient following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), 229–256. https://doi.org/10.1007/BF00992696 [Google Scholar] [Crossref]

302. Winkelhaus, S., & Grosse, E. H. (2020). Logistics 4.0: A systematic review towards a new logistics system. [Google Scholar] [Crossref]

303. International Journal of Production Research, 58(1), 18–43. [Google Scholar] [Crossref]

304. https://doi.org/10.1080/00207543.2019.1612964 [Google Scholar] [Crossref]

305. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2021). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386 [Google Scholar] [Crossref]

306. Wurman, P. R., D’Andrea, R., & Mountz, M. (2008). Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine, 29(1), 9–20. [Google Scholar] [Crossref]

307. https://doi.org/10.1609/aimag.v29i1.2082 [Google Scholar] [Crossref]

308. Xia, Q., Ye, W., Tao, Z., Wu, J., & Li, Q. (2021). A survey of federated learning for edge computing: Research problems and solutions. High Confidence Computing, 1(1), 100008. [Google Scholar] [Crossref]

309. https://doi.org/10.1016/j.hcc.2021.100008 [Google Scholar] [Crossref]

310. Xie, X., Ban, X. (Jeff), Chen, H., Chen, Z., & Xu, S. (2023). Dynamic ridepooling with heterogeneous riders: A deep reinforcement learning approach. Transportation Science, 57(4), 1021–1044. https://doi.org/10.1287/trsc.2022.1188 [Google Scholar] [Crossref]

311. Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2), 1–19. [Google Scholar] [Crossref]

312. https://doi.org/10.1145/3298981 [Google Scholar] [Crossref]

313. Yau, K. L. A., Qadir, J., Khoo, H. L., Ling, M. H., & Komisarczuk, P. (2017). A survey on reinforcement learning models and algorithms for traffic signal control. ACM Computing Surveys, 50(3), Article 34. https://doi.org/10.1145/3068287 [Google Scholar] [Crossref]

314. Yi, W., & Özdamar, L. (2007). A dynamic logistics coordination model for evacuation and support in disaster response activities. European Journal of Operational Research, 179(3), 1177–1193. https://doi.org/10.1016/j.ejor.2005.03.077 [Google Scholar] [Crossref]

315. Zhang, C., Patras, P., & Haddadi, H. (2019). Deep learning in mobile and wireless networking: A survey. IEEE Communications Surveys & Tutorials, 21(3), 2224–2287. [Google Scholar] [Crossref]

316. https://doi.org/10.1109/COMST.2019.2904897 [Google Scholar] [Crossref]

317. Zhang, K., Yang, Z., & Basar, T. (2021). Multi agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, 321–384. https://doi.org/10.1007/978-3030-60990-0_12 [Google Scholar] [Crossref]

318. Zhang, K., Yang, Z., & Başar, T. (2021). Multi agent reinforcement learning: A selective overview of theories and algorithms. https://doi.org/10.48550/arXiv.1911.10635 [Google Scholar] [Crossref]

319. Zhang, K., Yang, Z., & Başar, T. (2021). Multi agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control (pp. 321–384). Springer. https://doi.org/10.1007/978-3-030-60990-0_12 [Google Scholar] [Crossref]

320. Zhang, Q., Yang, L. T., Chen, Z., & Li, P. (2018). A survey on deep learning for big data. Information Fusion, 42, 146–157. https://doi.org/10.1016/j.inffus.2017.10.006 [Google Scholar] [Crossref]

321. Zhang, R., & Pavone, M. (2016). Control of robotic mobility on demand systems: A queueing theoretical perspective. The International Journal of Robotics Research, 35(1–3), 186–203. https://doi.org/10.1177/0278364915581863 [Google Scholar] [Crossref]

322. Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non IID data. arXiv. https://doi.org/10.48550/arXiv.1806.00582 [Google Scholar] [Crossref]

323. Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., … Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81. https://doi.org/10.1016/j.aiopen.2021.01.001 [Google Scholar] [Crossref]

324. Zhu, C., Dastani, M., & Wang, S. (2024). A survey of multi agent deep reinforcement learning with communication. Autonomous Agents and Multi Agent Systems, 38, 4. https://doi.org/10.1007/s10458-02309633-6 [Google Scholar] [Crossref]

Intelligent Multi-Agent Reinforcement Learning Architectures for Coordinated Autonomous Logistics and Real-Time Network Optimization

Authors

Article Information

Publication Timeline

Abstract

Keywords

Downloads

References

Metrics

Views & Downloads

Similar Articles