Towards MeluBot: A Multimodal AI Agent Integrating Text, Voice, Image, and Automation for Education and Health
Authors
Gabriel Henrique Alencar Medeiros
SeaFortress / INSA Rouen Normandie (France)
Article Information
DOI: 10.51584/IJRIAS.2025.10100000123
Subject Category: Artificial Intelligence
Volume/Issue: 10/10 | Page No: 1392-1400
Publication Timeline
Submitted: 2025-10-20
Accepted: 2025-10-26
Published: 2025-11-13
Abstract
This paper presents MeluBot, a multimodal AI agent that integrates text, voice, and image modalities, combined with workflow automation, for interactive applications in education and healthcare. We describe the architectural design, enabling technologies, use-case scenarios, and discuss the potential, lim- itations, and future directions. We also position MeluBot with respect to related work in multimodal agents and intelligent tutoring or medical assistants.
Keywords
AI Agent
Downloads
References
1. J. Xie, Z. Chen, R. Zhang, X. Wan, and G. Li, “Large multimodal agents:A survey,” 2024. [Online]. Available: https://arxiv.org/abs/2402.15116 [Google Scholar] [Crossref]
2. Z. Durante, Q. Huang, N. Wake, R. Gong, J. S. Park, B. Sarkar,R. Taori, Y. Noda, D. Terzopoulos, Y. Choi, K. Ikeuchi, H. Vo, [Google Scholar] [Crossref]
3. L. Fei-Fei, and J. Gao, “Agent ai: Surveying the horizons of multimodal interaction,” 2024. [Online]. Available: https://arxiv.org/abs/2401.03568 [Google Scholar] [Crossref]
4. L. R. Soenksen, Y. Ma, C. Zeng, L. Boussioux, K. Villalobos Carballo,L. Na, H. M. Wiberg, M. L. Li, I. Fuentes, and D. Bertsimas, “Integrated multimodal artificial intelligence framework for healthcare applications,” NPJ Digit. Med., vol. 5, no. 1, p. 149, Sep. 2022. [Google Scholar] [Crossref]
5. F. Krones, U. Marikkar, G. Parsons, A. Szmul, and A. Mahdi, “Review of multimodal machine learning approaches in healthcare,” 2024. [Online]. Available: https://arxiv.org/abs/2402.02460 [Google Scholar] [Crossref]
6. K. Saab and J. Freyberg, ““amie gains vision: A research ai agent for multimodal diagnostic dialogue”,” Blog post, Google Research, May 2025, accessed: YYYY- MM-DD. [Online]. Available: https://research.google/blog/ amie-gains-vision-a-research-ai-agent-for-multi-modal-diagnostic-dialogue/ [Google Scholar] [Crossref]
7. Z. Gao, B. Zhang, P. Li, X. Ma, T. Yuan, Y. Fan, Y. Wu,Y. Jia, S.-C. Zhu, and Q. Li, “Multi-modal agent tuning: Building a vlm-driven agent for efficient tool usage,” 2025. [Online]. Available: https://arxiv.org/abs/2412.15606 [Google Scholar] [Crossref]
8. L. Chen, Y. Zhang, S. Ren, H. Zhao, Z. Cai, Y. Wang, P. Wang, T. Liu, and B. Chang, “Towards end-to-end embodied decision making via multi-modal large language model: Explorations with gpt4-vision and beyond,” 2023. [Online]. Available: https://arxiv.org/abs/2310.02071 [Google Scholar] [Crossref]
9. H. Yao, R. Zhang, J. Huang, J. Zhang, Y. Wang, B. Fang, R. Zhu,Y. Jing, S. Liu, G. Li, and D. Tao, “A survey on agentic multimodal large language models,” Oct. 2025, version v1; accessed: YYYY-MM-DD. [Online]. Available: https://arxiv.org/abs/2510.10991 [Google Scholar] [Crossref]
10. X. Ma, Y. Wang, Y. Yao, T. Yuan, A. Zhang, Z. Zhang, and Zhao, “Caution for the environment: Multimodal llm agents are susceptible to environmental distractions,” 2025. [Online]. Available: https://arxiv.org/abs/2408.02544 [Google Scholar] [Crossref]
11. G. Verma, R. Kaur, N. Srishankar, Z. Zeng, T. Balch, andM. Veloso, “Adaptagent: Adapting multimodal web agents with few- shot learning from human demonstrations,” 2024. [Online]. Available: https://arxiv.org/abs/2411.13451 [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- The Role of Artificial Intelligence in Revolutionizing Library Services in Nairobi: Ethical Implications and Future Trends in User Interaction
- ESPYREAL: A Mobile Based Multi-Currency Identifier for Visually Impaired Individuals Using Convolutional Neural Network
- Comparative Analysis of AI-Driven IoT-Based Smart Agriculture Platforms with Blockchain-Enabled Marketplaces
- AI-Based Dish Recommender System for Reducing Fruit Waste through Spoilage Detection and Ripeness Assessment
- SEA-TALK: An AI-Powered Voice Translator and Southeast Asian Dialects Recognition