Predicting Student Retention and Dropout Rates in Cronasia Foundation College Inc. Using Educational Data Mining and Machine Learning Regression Techniques
Authors
Graduate School, University of the Immaculate Conception, Davao City (Philippines)
Article Information
DOI: 10.47772/IJRISS.2025.91100456
Subject Category: Computer Science
Volume/Issue: 9/11 | Page No: 5812-5824
Publication Timeline
Submitted: 2025-12-01
Accepted: 2025-12-09
Published: 2025-12-18
Abstract
This study investigated the potential for machine learning (ML) and educational data mining (EDM) capabilities in predicting retention and dropout rates at Cronasia Foundation College Inc. (CFCI). Predicting student persistence is a founding element of improving and enabling retention efforts in higher education institutions. However, understanding what contributes to retention and dropout still presents complications. Hence, this study crafted predictive models via the analysis of historical academic records, levels of engagement, socio-economic level, and psychological components and examined a dataset of 9,100 student records (75% training and 25% testing). According to the performance of the models analyzed using a total of five machine learning classifiers (Decision Trees, Random Forest, Support Vector Machines, Neural Networks, and Logistic Regression), and the models have been analyzed using F1-score, recall, accuracy, and precision. The accuracy from the models we analyzed from the highest being the model that was Neural Network with 80.42%, which had precision of 0.840, recall of 0.895, and F1-score of 0.867 for retention (Class 0); and precision of 0.692, recall of 0.582, and an F1-score of 0.632 for dropout (Class 1). Random Forest and Decision Tree had similar accuracy with Random Forest's accuracy being 79.90% with an F1-score for dropout of 0.622, and Decision Tree's accuracy was 80.58%. Logistic regression performed with the lowest accuracy of 73.98%, and had poor recall associated with dropout; inducing action for a number of academic leaders to begin intervening and take responsibility in helping retain students prior to their exit point, or after their first semester. The study found that retention was most strongly related to important variables such as intrinsic academic performance, attendance, and scholarship status. The results of the study can aid in data-based decision-making in higher education by helping institutions develop focused programs to increase retention and decrease dropout.
Keywords
Student Retention, Student Dropout, Educational Data Mining
Downloads
References
1. A. Bombaes, J. Fuasan, & W. Garcia, "Exploring the factors in student’s retention of e-learning mathematics: a case of grade 12 senior high school students at the university of perpetual help system-pueblo de panay campus", International Journal of Education Teaching and Social Sciences, vol. 1, no. 1, p. 1-7, 2021. https://doi.org/10.47747/ijets.v1i1.341 [Google Scholar] [Crossref]
2. A. Cheong, P. Singh, N. Saat, & J. Hoon, "Retention amongst pre-university students at a foreign university branch campus in malaysia: an exploratory study", Journal of Education and Learning, vol. 10, no. 3, p. 39, 2021. https://doi.org/10.5539/jel.v10n3p39 [Google Scholar] [Crossref]
3. A. Desfiandi and B. Soewito, "Student graduation time prediction using logistic regression, decision tree, support vector machine, and adaboost ensemble learning", Ijiscs (International Journal of Information System and Computer Science), vol. 7, no. 3, p. 195, 2023. https://doi.org/10.56327/ijiscs.v7i2.1579 [Google Scholar] [Crossref]
4. A. Hadiyanoor, S. Cholifah, H. Junaidi, & I. Febrian, "Using c4.5 decision tree to determine the majors of students in sman 4 banjarmasin to reduce the cause of dropout from school", Iiai Letters on Informatics and Interdisciplinary Research, vol. 5, p. 1, 2024. https://doi.org/10.52731/liir.v005.209 [Google Scholar] [Crossref]
5. A. Nabil, M. Seyam, & A. AbouElfetouh, "Prediction of students’ academic performance based on courses’ grades using deep neural networks", Ieee Access, vol. 9, p. 140731-140746, 2021. https://doi.org/10.1109/access.2021.3119596 [Google Scholar] [Crossref]
6. B. Flores-Caballero, "Higher education: factors and strategies for student retention", Hets Online Journal, vol. 10, no. 2, p. 82-105, 2022. https://doi.org/10.55420/2693.9193.v10.n2.14 [Google Scholar] [Crossref]
7. C. Li, N. Herbert, S. Yeom, & J. Montgomery, "Retention factors in stem education identified using learning analytics: a systematic review", Education Sciences, vol. 12, no. 11, p. 781, 2022. https://doi.org/10.3390/educsci12110781 [Google Scholar] [Crossref]
8. C. Panda, K. Christopher, A. Paswan, D. Patel, & R. Sohane, "Students perception on enrolment factors in their retention in higher agricultural education", Current Journal of Applied Science and Technology, p. 107-113, 2020. https://doi.org/10.9734/cjast/2020/v39i630565 [Google Scholar] [Crossref]
9. C. Wekullo, "Institution type, selectivity, and financial aid: an examination of institutional factors influencing first-time students retention in public universities", Social Education Research, p. 1-14, 2022. https://doi.org/10.37256/ser.4120231725 [Google Scholar] [Crossref]
10. D. Ifenthaler and J. Yau, "Utilising learning analytics to support study success in higher education: a systematic review", Educational Technology Research and Development, vol. 68, no. 4, p. 1961-1990, 2020. https://doi.org/10.1007/s11423-020-09788-z [Google Scholar] [Crossref]
11. D. Rodgers-Tonge, M. Wray, & C. Baldwin, "Supportive programs and financial aid: measuring their impact on retention of blacks and latinx college students in the new england region", Journal of Business Diversity, vol. 23, no. 4, 2023. https://doi.org/10.33423/jbd.v23i4.6614 [Google Scholar] [Crossref]
12. D. Shafiq, M. Marjani, R. Habeeb, & D. Asirvatham, "Student retention using educational data mining and predictive analytics: a systematic literature review", Ieee Access, vol. 10, p. 72480-72503, 2022. https://doi.org/10.1109/access.2022.3188767 [Google Scholar] [Crossref]
13. E. Bambacus and A. Conley, "The impact of dosage on a mindfulness intervention with first-year college students", Journal of College Student Retention Research Theory & Practice, vol. 25, no. 4, p. 979-1000, 2021. https://doi.org/10.1177/15210251211041695 [Google Scholar] [Crossref]
14. E. Sousa, B. Rosa, R. Mello, T. Falcão, B. Vesin, & D. Gašević, "Applications of learning analytics in high schools: a systematic literature review", Frontiers in Artificial Intelligence, vol. 4, 2021. https://doi.org/10.3389/frai.2021.737891 [Google Scholar] [Crossref]
15. F. Alshareef, H. Alhakami, T. Alsubait, & A. Baz, "Educational data mining applications and techniques," International Journal of Advanced Computer Science and Applications, vol. 11, no. 4, pp. 1-8, 2020. https://doi.org/10.14569/ijacsa.2020.0110494 [Google Scholar] [Crossref]
16. F. Tan, J. Lim, W. Chan, & M. Idris, "Computational intelligence in learning analytics: A mini review," Asean Engineering Journal, vol. 14, no. 4, pp. 135-151, 2024. https://doi.org/10.11113/aej.v14.21375 [Google Scholar] [Crossref]
17. G. Gonçalves, F. Serra, J. Storópoli, I. Scafuto, & D. Rafael, "Undergraduate student retention activities: challenges and research agenda", Sage Open, vol. 14, no. 3, 2024. https://doi.org/10.1177/21582440241249334 [Google Scholar] [Crossref]
18. G. Oswald, R. DuVivier, S. Wood, & T. Freeman, "Surviving and thriving at a uk university through a minority lens.", Journal of the Australian and New Zealand Student Services Association, vol. 29, no. 1, p. 35-51, 2021. https://doi.org/10.30688/janzssa.2021.1.05 [Google Scholar] [Crossref]
19. G. Sani, F. Oladipo, E. Ogbuju, & F. Agbo, "Development of a predictive model of student attrition rate," Journal of Applied Artificial Intelligence, vol. 3, no. 2, pp. 1-12, 2022. https://doi.org/10.48185/jaai.v3i2.601 [Google Scholar] [Crossref]
20. H. Aal, "Academic self-esteem and its relationship to practicing extracurricular activities among university students", Cypriot Journal of Educational Sciences, vol. 18, no. 1, p. 228-238, 2023. https://doi.org/10.18844/cjes.v18i1.8306 [Google Scholar] [Crossref]
21. H. Al-Kadri, N. Nellitawati, S. Syahril, E. Ramli, J. Jasrial, L. Susantiet al., "Analyzing of extracurricular program management technical in junior high school",, 2020. https://doi.org/10.4108/eai.11-12-2019.2290899 [Google Scholar] [Crossref]
22. H. Khalilia, T. Sammar, & Y. Sleet, "Predicting students performance based on their academic profile", مجلة جامعة فلسطين التقنية للأبحاث, vol. 8, no. 2, p. 23-39, 2020. https://doi.org/10.53671/pturj.v8i2.91 [Google Scholar] [Crossref]
23. I. Salehin and D. Kang, "A review on dropout regularization approaches for deep neural networks within the scholarly domain", Electronics, vol. 12, no. 14, p. 3106, 2023. https://doi.org/10.3390/electronics12143106 [Google Scholar] [Crossref]
24. J. Clement and P. Mwila, "Extracurricular activities: prospects and challenges among female students in secondary schools in chanika ward, tanzania", IJSSMR, vol. 01, no. 01, p. 14-30, 2023. https://doi.org/10.61421/ijssmer.2023.1102 [Google Scholar] [Crossref]
25. J. Jamaluddin, S. Syam, S. Saleh, & N. Nasrullah, "The influence of extracurricular activities on character building of students of smpn 22 makassar", Jurnal Office, vol. 7, no. 1, p. 1, 2021. https://doi.org/10.26858/jo.v7i1.18989 [Google Scholar] [Crossref]
26. J. Norvilitis, H. Reid, & K. O’Quin, "Amotivation: a key predictor of college gpa, college match, and first-year retention", International Journal of Educational Psychology, vol. 11, no. 3, p. 314-338, 2022. https://doi.org/10.17583/ijep.7309 [Google Scholar] [Crossref]
27. J. Swacha and K. Muszyńska, "Predicting dropout in programming moocs through demographic insights", Electronics, vol. 12, no. 22, p. 4674, 2023. https://doi.org/10.3390/electronics12224674 [Google Scholar] [Crossref]
28. J. Wong and T. Yip, "Measuring students' academic performance through educational data mining", International Journal of Information and Education Technology, vol. 10, no. 11, p. 797-804, 2020. https://doi.org/10.18178/ijiet.2020.10.11.1461 [Google Scholar] [Crossref]
29. K. Talebi, Z. Torabi, & N. Daneshpour, "Predicting mooc dropout using ensemble models based on rnn and gru",, 2024. https://doi.org/10.21203/rs.3.rs-5243770/v1 [Google Scholar] [Crossref]
30. L. Cagliero, L. Canale, L. Farinetti, E. Baralis, & E. Venuto, "Predicting student academic performance by means of associative classification", Applied Sciences, vol. 11, no. 4, p. 1420, 2021. https://doi.org/10.3390/app11041420 [Google Scholar] [Crossref]
31. M. Adnan, A. Habib, J. Ashraf, S. Mussadiq, A. Raza, М. Abidet al., "Predicting at-risk students at different percentages of course length for early intervention using machine learning models", Ieee Access, vol. 9, p. 7519-7539, 2021. https://doi.org/10.1109/access.2021.3049446 [Google Scholar] [Crossref]
32. M. Amare and S. Šimonová, "Global challenges of students dropout: a prediction model development using machine learning algorithms on higher education datasets", SHS Web of Conferences, vol. 129, p. 09001, 2021. https://doi.org/10.1051/shsconf/202112909001 [Google Scholar] [Crossref]
33. M. Elobaid, R. Elobaid, L. Romdhani, & A. Yehya, "Impact of the first-year seminar course on student gpa and retention rate across colleges in qatar university", International Journal of Learning Teaching and Educational Research, vol. 22, no. 5, p. 658-673, 2023. https://doi.org/10.26803/ijlter.22.5.34 [Google Scholar] [Crossref]
34. M. Mihăescu and P. Popescu, "Review on publicly available datasets for educational data mining", Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery, vol. 11, no. 3, 2021. https://doi.org/10.1002/widm.1403 [Google Scholar] [Crossref]
35. M. Nadeem and S. Palaniappan, "Predictive model of postgraduate student’s dropout and delay using machine learning algorithms", International Journal of Advanced Trends in Computer Science and Engineering, vol. 10, no. 2, p. 894-900, 2021. https://doi.org/10.30534/ijatcse/2021/591022021 [Google Scholar] [Crossref]
36. M. Peralta and J. Vunueza-Martinez, "Application of academic analytical models in education management", Journal of Educational and Social Research, vol. 14, no. 6, p. 274, 2024. https://doi.org/10.36941/jesr-2024-0171 [Google Scholar] [Crossref]
37. N. Samsudin, S. Shaharudin, N. Sulaiman, S. smail, N. Mohamed, & N. Husin, "Prediction of student‘s academic performance during online learning based on regression in support vector machine", International Journal of Information and Education Technology, vol. 12, no. 12, p. 1431-1435, 2022. https://doi.org/10.18178/ijiet.2022.12.12.1768 [Google Scholar] [Crossref]
38. O. Rotar, "A missing theoretical element of online higher education student attrition, retention, and progress: a systematic literature review", Sn Social Sciences, vol. 2, no. 12, 2022. https://doi.org/10.1007/s43545-022-00550-1 [Google Scholar] [Crossref]
39. S. Amjad, M. Younas, M. Anwar, Q. Shaheen, M. Shiraz, & A. Gani, "Data mining techniques to analyze the impact of social media on academic performance of high school students", Wireless Communications and Mobile Computing, vol. 2022, p. 1-11, 2022. https://doi.org/10.1155/2022/9299115 [Google Scholar] [Crossref]
40. S. Ashraf, S. Saleem, T. Ahmed, Z. Aslam, & D. Muhammad, "Conversion of adverse data corpus to shrewd output using sampling metrics", Visual Computing for Industry Biomedicine and Art, vol. 3, no. 1, 2020. https://doi.org/10.1186/s42492-020-00055-9 [Google Scholar] [Crossref]
41. S. Bulathwela, M. Pérez‐Ortiz, E. Novak, E. Yılmaz, & J. Shawe‐Taylor, "Peek: a large dataset of learner engagement with educational videos",, 2021. https://doi.org/10.48550/arxiv.2109.03154 [Google Scholar] [Crossref]
42. S. Goundar, A. Deb, G. Lal, & M. Naseem, "Using online student interactions to predict performance in a first-year computing science course", Technology Pedagogy and Education, vol. 31, no. 4, p. 451-469, 2022. https://doi.org/10.1080/1475939x.2021.2021977 [Google Scholar] [Crossref]
43. S. Lai, N. Shahri, M. Mohamad, H. Rahman, & A. Rambli, "Comparing the performance of adaboost, xgboost, and logistic regression for imbalanced data", Mathematics and Statistics, vol. 9, no. 3, p. 379-385, 2021. https://doi.org/10.13189/ms.2021.090320 [Google Scholar] [Crossref]
44. S. Radovanović, B. Delibašić, & M. Suknović, "Predicting dropout in online learning environments", Computer Science and Information Systems, vol. 18, no. 3, p. 957-978, 2021. https://doi.org/10.2298/csis200920053r [Google Scholar] [Crossref]
45. T. Cardona, E. Cudney, R. Hoerl, & J. Snyder, "Data mining and machine learning retention models in higher education", Journal of College Student Retention Research Theory & Practice, vol. 25, no. 1, p. 51-75, 2020. https://doi.org/10.1177/1521025120964920 [Google Scholar] [Crossref]
46. T. Panagiotakopoulos, S. Kotsiantis, G. Kostopoulos, O. Iatrellis, & A. Kameas, "Early dropout prediction in moocs through supervised learning and hyperparameter optimization", Electronics, vol. 10, no. 14, p. 1701, 2021. https://doi.org/10.3390/electronics10141701 [Google Scholar] [Crossref]
47. U. ÖZKAN, "The effect of students' participation in extracurricular activities on academic achievement according to pisa-2015", İnönü Üniversitesi Eğitim Fakültesi Dergisi, vol. 21, no. 1, p. 254-269, 2020. https://doi.org/10.17679/inuefd.504780 [Google Scholar] [Crossref]
48. X. Liu, T. Wang, D. Bressington, B. Easpaig, L. Wikander, & J. Tan, "Factors influencing retention among regional, rural and remote undergraduate nursing students in australia: a systematic review of current research evidence", International Journal of Environmental Research and Public Health, vol. 20, no. 5, p. 3983, 2023. https://doi.org/10.3390/ijerph20053983 [Google Scholar] [Crossref]
49. Z. Sun, A. Harit, J. Yu, A. Cristea, & L. Shi, "A brief survey of deep learning approaches for learning analytics on moocs",, p. 28-37, 2021. https://doi.org/10.1007/978-3-030-80421-3_4 [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet