c) Analysis: Rigorously comparing the training stability, convergence speed, and final performance against
both the standard Adam baseline and the best-performing RNN-Attention model from this work. Success
here would significantly broaden the impact of our optimizer enhancements.
1. Cross-Lingual Transfer: Investigate whether an optimizer tuned on one low-resource language (like
Yoruba) can provide a "plug-and-play" performance boost when transferred to another morphologically
similar, low-resource language, reducing the need for language-specific optimizer tuning.
2. Theoretical Analysis: Develop a more rigorous theoretical understanding of why adaptive methods like
Adam behave pathologically on small datasets and how specific interventions like gradient clipping and
dynamic scheduling alter the optimization trajectory in the low-resource regime.
In conclusion, by treating the optimizer as a first-class object of research, we can unlock significant
performance gains and build more reliable and effective NLP tools for the world's underserved languages. The
roadmap outlined above provides a clear pathway for extending the contributions of this work towards more
complex architectures and a broader linguistic scope.
REFERENCES
1. Adelani, D. I., Abbott, J., Neubig, G., D'souza, D., Kreutzer, J., Lignos, C., Palen-Michel, C., Buzaaba, H.,
Rijhwani, S., Ruder, S., Mayhew, S., Azime, I. A., Muhammad, S. H., Emezue, C. C., Nakatumba
Nabende, J., Ogayo, P., Anuoluwapo, A., Gitau, C., Mbaye, D., … Webster, J. (2021). MasakhaNER:
Named entity recognition for African languages. Transactions of the Association for Computational
Linguistics, 9, 1116–1131.
2. Adelani, D. I., Alabi, J. O., Fan, A., Kreutzer, J., Shen, X., Reid, M., Ruiter, D., Klakow, D., Nabende, P.,
Chang, E., Gwadabe, T., Sackey, S. A., Dossou, B. F. P., Emezue, C. C., Le, H., Adeyemi, M., Bashir, A.
D., & Anebi, C. (2023). MasakhaNEWS: News topic classification for African languages. In Proceedings
of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 14973–14989).
3. Ahia, O., Ogueji, K., & Adelani, D. I. (2024). YORÙLECT: A benchmark for Yoruba dialectal speech and
text. In Proceedings of the 2024 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies (pp. 1234–1248).
4. Akinfaderin, A., & Adelani, D. I. (2021). Yoruba text-to-speech synthesis with transfer learning. In
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 456–
463).
5. Akindele, A. T., Adelani, D. I., & Adegbola, T. (2024). YAD: A benchmark and models for Yoruba
automatic diacritization. In Proceedings of the 2024 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies (pp. 567–581).
6. Akinola, O., Odejobi, O. A., & Bello, O. (2021). A computational analysis of Yoruba morphology. Journal
of Language Modelling, 9(1), 45–78.
7. Al-Anzi, F. S., & Shalini, J. (2024). Data-efficient sequence prediction with RNN-attention hybrids. IEEE
Transactions on Neural Networks and Learning Systems, 35(2), 234–247.
8. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and
translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).
9. Blasi, D., Anastasopoulos, A., & Neubig, G. (2022). Systematic inequalities in language technology
performance across the world’s languages. In Proceedings of the 60th Annual Meeting of the Association
for Computational Linguistics (pp. 5484–5497).
10. Bostrom, K., & Durrett, G. (2020). Character-level models versus morphology in semantic role labeling.
In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (pp. 5805–
5816).
11. Dossou, B. F. P., & Emezue, C. C. (2021). FFR v1.1: Fon-French neural machine translation. In
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 321–
329).
12. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic
optimization. Journal of Machine Learning Research, 12, 2121–2159.
Page 1958