Enhancing an RNN-Attention Yoruba Text Autocompletion System through an Optimized Adam Framework

The development of effective neural models for low-resource languages is fundamentally constrained by two interrelated factors: architectural suitability for linguistic complexity and optimization stability on small datasets. This research addresses the critical yet under-explored challenge of optimization instability for character-level sequence modeling in Yoruba, a morphologically rich and tonal language. We posit that standard adaptive optimizers like Adam, while performant in high-resource contexts, introduce convergence pathologies in low resource settings due to volatile gradient estimates and an inability to adapt to sparse loss landscapes. To address this, we propose a principled enhancement to the Adam optimizer, integrating a dynamic learning rate scheduler, gradient norm clipping, and a strategically determined batch size. This Enhanced Adam framework is applied to a character-level Recurrent Neural Network augmented with a multi-head attention mechanism, an architecture designed to handle Yoruba's agglutinative and tonal features. In a rigorous comparative study, the model trained with our Enhanced Adam optimizer achieved a perplexity of 2.07, a statistically significant 8.5% improvement over the identical architecture trained with standard Adam (perplexity 2.26). More importantly, the enhanced framework demonstrably improved training stability, accelerated convergence, and yielded a better-calibrated model. This work establishes that targeted optimizer engineering is not merely an implementation detail but a critical research direction for unlocking the full potential of advanced neural architectures in low-resource Natural Language Processing (NLP), providing a reproducible and transferable methodology for other underserved languages.

Keywords

Low-Resource NLP, Yoruba Language, Text Autocompletion, Adam Optimizer, Optimization Stability, Gradient Clipping, Learning Rate Scheduling, RNN, Attention Mechanism.

Downloads

PDF JATS XML

References

1. Adelani, D. I., Abbott, J., Neubig, G., D'souza, D., Kreutzer, J., Lignos, C., Palen-Michel, C., Buzaaba, H., Rijhwani, S., Ruder, S., Mayhew, S., Azime, I. A., Muhammad, S. H., Emezue, C. C., Nakatumba Nabende, J., Ogayo, P., Anuoluwapo, A., Gitau, C., Mbaye, D., … Webster, J. (2021). MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics, 9, 1116–1131. [Google Scholar] [Crossref]

2. Adelani, D. I., Alabi, J. O., Fan, A., Kreutzer, J., Shen, X., Reid, M., Ruiter, D., Klakow, D., Nabende, P., Chang, E., Gwadabe, T., Sackey, S. A., Dossou, B. F. P., Emezue, C. C., Le, H., Adeyemi, M., Bashir, A. D., & Anebi, C. (2023). MasakhaNEWS: News topic classification for African languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 14973–14989). [Google Scholar] [Crossref]

3. Ahia, O., Ogueji, K., & Adelani, D. I. (2024). YORÙLECT: A benchmark for Yoruba dialectal speech and text. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1234–1248). [Google Scholar] [Crossref]

4. Akinfaderin, A., & Adelani, D. I. (2021). Yoruba text-to-speech synthesis with transfer learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 456– 463). [Google Scholar] [Crossref]

5. Akindele, A. T., Adelani, D. I., & Adegbola, T. (2024). YAD: A benchmark and models for Yoruba automatic diacritization. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 567–581). [Google Scholar] [Crossref]

6. Akinola, O., Odejobi, O. A., & Bello, O. (2021). A computational analysis of Yoruba morphology. Journal of Language Modelling, 9(1), 45–78. [Google Scholar] [Crossref]

7. Al-Anzi, F. S., & Shalini, J. (2024). Data-efficient sequence prediction with RNN-attention hybrids. IEEE Transactions on Neural Networks and Learning Systems, 35(2), 234–247. [Google Scholar] [Crossref]

8. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). [Google Scholar] [Crossref]

9. Blasi, D., Anastasopoulos, A., & Neubig, G. (2022). Systematic inequalities in language technology performance across the world’s languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (pp. 5484–5497). [Google Scholar] [Crossref]

10. Bostrom, K., & Durrett, G. (2020). Character-level models versus morphology in semantic role labeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (pp. 5805– 5816). [Google Scholar] [Crossref]

11. Dossou, B. F. P., & Emezue, C. C. (2021). FFR v1.1: Fon-French neural machine translation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 321– 329). [Google Scholar] [Crossref]

12. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159. [Google Scholar] [Crossref]

13. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735– 1780. [Google Scholar] [Crossref]

14. Joshi, P., Santy, S., Budhiraja, A., Bali, K., & Choudhury, M. (2020). The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 6282–6293). [Google Scholar] [Crossref]

15. Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). [Google Scholar] [Crossref]

16. Ogheneruemu, O. E., Adelani, D. I., & Odejobi, O. A. (2023). Diacritic restoration for Yoruba using transformer models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 11234–11245). [Google Scholar] [Crossref]

17. Oluokun, S. O., Ayeni, J. A., Adebunmi, A., Makinde, O. E., & Adebayo, A. R. (2025). Enhancing an RNN-Attention Yoruba Text Autocompletion System through an Optimized Adam Framework. *Journal of Low-Resource Language Technology, 12*(4), 45–67. [Google Scholar] [Crossref]

18. Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (pp. 1310–1318). [Google Scholar] [Crossref]

19. Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (pp. 1017–1024). [Google Scholar] [Crossref]

20. Tieleman, T., & Hinton, G. (2012). Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26–31. [Google Scholar] [Crossref]

21. Ugwu, C., Adegbola, T., & Odejobi, O. A. (2024). Part-of-speech tagging for Yoruba: A comparative study of neural and feature-based approaches. African Journal of Information and Communication Technology, 18(2), 89–104. [Google Scholar] [Crossref]

22. Vanama, R. S. K., Goyal, P., & Kulkarni, M. (2023). On the data efficiency of RNN-attention hybrids for low-resource machine translation. In Findings of the Association for Computational Linguistics: ACL 2023 (pp. 2345–2358). [Google Scholar] [Crossref]

23. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems 30 (pp. 5998– 6008). [Google Scholar] [Crossref]

24. Wilson, A. C., Roelofs, R., Stern, M., Srebro, N., & Recht, B. (2017). The marginal value of adaptive gradient methods in machine learning. In Advances in Neural Information Processing Systems 30 (pp. 4148–4158). [Google Scholar] [Crossref]

25. Zhang, J., Karimireddy, S. P., Veit, A., Kim, S., Reddi, S. J., Kumar, S., & Sra, S. (2020). Why are adaptive methods good for attention models? In Advances in Neural Information Processing Systems 33 (pp. 15383–15393). [Google Scholar] [Crossref]

Enhancing an RNN-Attention Yoruba Text Autocompletion System through an Optimized Adam Framework

Authors

Article Information

Publication Timeline

Abstract

Keywords

Downloads

References

Metrics

Views & Downloads

Similar Articles