Indigenous People’s Language Identification Using Machine Learning for Linguistic Preservation
Authors
College of Engineering, Architecture and Technology, Notre Dame of Dadiangas University, General Santos City (Philippines)
College of Engineering, Architecture and Technology, Notre Dame of Dadiangas University, General Santos City (Philippines)
College of Engineering, Architecture and Technology, Notre Dame of Dadiangas University, General Santos City (Philippines)
College of Engineering, Architecture and Technology, Notre Dame of Dadiangas University, General Santos City (Philippines)
College of Engineering, Architecture and Technology, Notre Dame of Dadiangas University, General Santos City (Philippines)
Article Information
DOI: 10.51584/IJRIAS.2025.1010000080
Subject Category: Language
Volume/Issue: 10/10 | Page No: 973-984
Publication Timeline
Submitted: 2025-10-24
Accepted: 2025-10-30
Published: 2025-11-08
Abstract
Language is a fundamental aspect of human identity, deeply connected to geographical origins, cultural heritage, and social belonging. However, many indigenous languages across the world are gradually declining due to modernization, migration, and the growing influence of technology and global languages. The loss of these languages often leads to the disappearance of cultural values, oral traditions, and historical knowledge. This study explores the integration of machine learning techniques such as Long Short-Term Memory (LSTM), Yoon Kim’s Convolutional Neural Network model, and TextConvoNet in developing a mobile text-to-text identification and translation application for Blaan dialects spoken in General Santos City, Polomolok, and Sarangani. The goal of the application is to aid in the preservation and revitalization of the Blaan language while providing an accessible platform for both native speakers and learners to understand, translate, and communicate in their local dialects.
To evaluate the usability and effectiveness of the application, User Acceptance Testing (UAT) was conducted among selected users. Data were collected through structured interviews, document analysis, and standardized evaluation tools to ensure comprehensive assessment and validation. Experimental results showed that the TextConvoNet model achieved the highest accuracy rate of 74.00 percent, surpassing the performance of both LSTM and CNN-based models. This demonstrates the model’s efficiency in identifying and classifying Blaan dialects, highlighting its potential in the field of Natural Language Processing (NLP).
Future research should focus on expanding the dataset by collecting transcriptions from diverse age groups, locations, and communication contexts to improve model generalization and accuracy. Further refinement of the model’s architecture and parameter tuning is also recommended to enhance dialect classification and translation capabilities. Moreover, integrating speech-to-text and text-to-speech functionalities could facilitate real-time translation, pronunciation learning, and accessibility for non-literate speakers, ensuring the continued preservation and appreciation of indigenous languages.
Keywords
Natural Language Processing (NLP), TextConvoNet, Yoon Kim, LSTM
Downloads
References
1. UNESCO Ad Hoc Expert Group on Endangered Languages. (2003, March 10–12). Language vitality and endangerment. International Expert Meeting on the UNESCO Programme Safeguarding of Endangered Languages, Paris, France. https://ich.unesco.org/doc/src/00120-EN.pdf [Google Scholar] [Crossref]
2. Lewis, D. M., Simons, G. F., & Fennig, C. D. (Eds.). (2015). Ethnologue: Languages of the world (18th ed.). SIL International. http://www.ethnologue.com [Google Scholar] [Crossref]
3. Komisyon sa Wikang Filipino. (2018). Kapasiyahan ng kalupunan ng mga komisyoner blg. 18-33 serye 2018. [Google Scholar] [Crossref]
4. Headland, T. N. (2003). Thirty endangered languages in the Philippines. Work Papers of the Summer Institute of Linguistics, University of North Dakota Session, 47(1). https://commons.und.edu/sil-work-papers/vol47/iss1/1 [Google Scholar] [Crossref]
5. Pelila, J. R. O., Ayao-ao, S. L., & Casiano, M. B. (2023). If these languages could talk: The extinct languages of the Philippines. International Journal of Multidisciplinary Research and Publications (IJMRAP), 6(3), 127–134. [Google Scholar] [Crossref]
6. Villaluz, RSCJ, PhD., Geraldine D.; Tagalog, EdD, ISRM, Rita May P.; and Saway, Aduna L. Bai (2023). Engaging Indigenous Community Towards a Talaandig Language Learning and Cultural Sustainability. ASEAN Journal of Community Engagement, 7(2), 129-150. https://doi.org/10.7454/ajce.v7i2.1227 [Google Scholar] [Crossref]
7. Abinaya, N., Jayadharshini, P., Priyanka, S., Keerthika, S., & Santhiya, S. (2023). Identification of language from multi-lingual dataset using classification algorithms. Journal of Physics: Conference Series, 2664(1), 012009. https://doi.org/10.1088/1742-6596/2664/1/012009 [Google Scholar] [Crossref]
8. Zhou, J., & Huang, T. (2023). Application of machine learning algorithm in electronic book database management system. SN Applied Sciences, 5(11), 287. https://doi.org/10.1007/s42452-023-05508-3 [Google Scholar] [Crossref]
9. Soni, S., Chouhan, S. S., & Rathore, S. S. (2023). TextConvoNet: a convolutional neural network based architecture for text classification. Applied Intelligence, 53(11), 14249-14268. https://doi.org/10.1007/s10489-022-04221-9 [Google Scholar] [Crossref]
10. Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882. https://arxiv.org/abs/1408.5882 [Google Scholar] [Crossref]
11. Bahad, P., Saxena, P., & Kamal, R. (2019). Fake News Detection using Bi-directional LSTM-Recurrent Neural Network. Procedia Computer Science, 165, 74-82. [DOI: 10.1016/j.procs.2020.01.072] [Google Scholar] [Crossref]
12. Shivani Rana, Rakesh Kanji, & Shruti Jain. (2024). Comprehensive Analysis of Oversampling Techniques for Addressing Class Imbalance Employing Machine Learning Models. In Recent Advances in Computer Science and Communications. [Google Scholar] [Crossref]
13. https://www.semanticscholar.org/paper/cbc2b83a63befe1d8631828d3fa7365c087579f5 [Google Scholar] [Crossref]
14. Elsobky, A. Keshk, & M. Malhat. (2023). A Comparative Study for Different Resampling Techniques for Imbalanced datasets. In IJCI. International Journal of Computers and Information. https://www.semanticscholar.org/paper/d046d080e8caf355e5af48ae6dd6bdb14ec4cec3 [Google Scholar] [Crossref]
15. Gouranga Jha. (2024). Popular Machine Learning Models Prone to Overfitting and Why It ... https://medium.com/@post.gourang/popular-machine-learning-models-prone-to-overfitting-and-why-it-happens-8050e9c3a944 [Google Scholar] [Crossref]
16. Chenlei Fang. (2020). 4 – The Overfitting Iceberg – Machine Learning Blog | ML@CMU. https://blog.ml.cmu.edu/2020/08/31/4-overfitting/ [Google Scholar] [Crossref]
17. Darshan M. (2022). How do Kernel Regularizers work with neural networks? https://analyticsindiamag.com/ai-trends/kernel-regularizers-with-neural-networks/ [Google Scholar] [Crossref]
18. Olamendy, J. (2024). Practical ML: Addressing Class Imbalance | by Juan C Olamendy. https://medium.com/@juanc.olamendy/practical-ml-addressing-class-imbalance-25c4f1b97ee3 [Google Scholar] [Crossref]
19. Himanshi Singh. (2024). 10 Techniques to Solve Imbalanced Classes in Machine Learning. https://www.analyticsvidhya.com/articles/class-imbalance-in-machine-learning/ [Google Scholar] [Crossref]
20. Lobel, J. W. (2015). Philippine and North Bornean languages: Issues in description, subgrouping, and reconstruction (Doctoral dissertation, University of Hawai'i Manoa). http://www.ling.hawaii.edu/graduate/Dissertations/JasonLobelFinal.pdf [Google Scholar] [Crossref]
21. Glossika Content Team. (2023). Tagalog Language Overview: A Bigger Picture For Beginners. https://ai.glossika.com/blog/tagalog-language-overview [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- Evaluating the Impacts of Mind Mapping Strategy on Developing EFL Students’ Critical Reading Skills
- Significance of Reading Instructions for Language Improvement in Children with Down Syndrome
- Prenasalised Consonants in Liangmai
- Metadiscourse Matters: Definitions, Models, and Advantages for ESL/ EFL Writing
- Blank Minds and Stuck Voices: Understanding and Addressing Cognitive Anxiety in High-Stakes ESL Speaking Tests