Machine Learning for Personalized Education: Optimizing Learning Outcomes among Diverse Online Learner Populations in Ghana Using Reinforcement Learning

Anthony Vincent Arkhurst
Joseph Kojo Asampanbilla
Isaac M. Ametemeh
5482-5489
Aug 12, 2025
Education

Machine Learning for Personalized Education: Optimizing Learning Outcomes among Diverse Online Learner Populations in Ghana Using Reinforcement Learning

Anthony Vincent Arkhurst., Joseph Kojo Asampanbilla., Isaac M. Ametemeh

Faculty of Computer Science, Ghana Communication Technology University, Ghana

DOI: https://dx.doi.org/10.47772/IJRISS.2025.903SEDU0399

Received: 04 July 2025; Accepted: 12 July 2025; Published: 12 August 2025

ABSTRACT

The rapid growth of online learning platforms in Ghana offers increased educational access but faces serious challenges in addressing the very diverse needs of the learners (Agbe et al, 2022). Traditional models often does not help with individual learning styles, digital literacy levels, and infrastructure constraints leads to suboptimal outcomes. Machine learning (ML), particularly reinforcement learning (RL), presents a promising approach to personalize education by tailoring content to individual learner profiles.

The aim of this study is to develop an RL-based framework to optimize learning outcomes for diverse online learner populations in Ghana. This seeks to address various challenges such as varying digital literacy, limited technology access as well as cultural differences by dynamically adapting learning paths using ML techniques (Asabere & Mends-Brew, 2021).

The methodology employed a qualitative approach, utilizing the Open University Learning Analytics Dataset (OULAD) with pre-processing to account for Ghanaian learner demographics. The RL framework leveraged on Markov Decision Process (MDP) with Q-learning and Deep Q-Networks (DQN) in order to model learner states, actions, and rewards (Baker & Inventado, 2014). Features were engineered to reflect local contexts, such as connectivity fluctuations and mobile learning prevalence.

The RL model successfully converged on personalized learning paths for three learner profiles: Visual Learners, Textual Learners, and Assessment-Oriented Learners. The Q-learning algorithm optimized content delivery (video, text, quizzes) across five learning levels, achieving higher cumulative rewards for tailored content strategies, with Visual Learners preferring videos, Textual Learners favouring text, and Assessment-Oriented Learners excelling with quizzes.

Keywords: Machine Learning, Reinforcement Learning, Personalized Education, Online Learning, Ghana

INTRODUCTION

The increase of online learning platforms globally presents both opportunities and challenges for educational systems. While offering increased accessibility and flexibility, ensuring effective learning outcomes for diverse student populations in these digital environments remains a critical concern (Ally, 2019). In the case of Ghana, increase adoption of online learning has necessitated a closer examination of scholarly approach which can cater to the varied needs and backgrounds of its learners. Traditional one-size-fits-all online education models often fail to address the unique learning styles, paces, and prior knowledge of individual students, potentially leading to disengagement and suboptimal learning outcomes, particularly among diverse populations (Mensah et al., 2010).

Personalized education, a new form of approach that tailors learning experiences to individual student needs, has emerged as a promising solution to enhance learning effectiveness. By adapting content, delivery methods, and assessment strategies, personalized learning aims to optimize engagement and achievement (Hwang et al., 2012). The coming of machine learning (ML) offers very powerful tools to facilitate this personalization at a greater scale. ML algorithms can analyze vast amounts of student data to identify patterns, predict learning behaviours, and recommend tailored interventions, thereby promising to revolutionize online education (Baker, 2016).

Problem Statement

Machine learning methods can be used to provide personalized learning experiences and improve student engagement in online education (Rajagopal et al, 2023). Even though it has potential when it comes to personalize online education, its application within the specific context of diverse learner populations in Ghana remains largely underexplored.

There are problems faced by online learners in Ghana which includes levels of digital literacy, difficult access to technology, language issues and cultural backgrounds and poor educational experiences. These are important when selecting the right approach to ML personalization. Current research often lacks specific insights into how ML algorithms can be effectively adapted and implemented to address this new problem and optimize learning outcomes in this context.

Furthermore, there seems to be a limited understanding of the specific types of ML-driven personalized interventions that are most effective for different subgroups of online learners in Ghana. Questions remain regarding how to ethically and effectively leverage student data to inform these interventions while respecting privacy and cultural sensitivities (Sharon, 2016). Without pinhole research into these areas, the likelihood of ML to truly change online education and improve learning outcomes for all students in Ghana is at risk of being unrealized. This research seeks to address this gap by investigating the application of machine learning model for personalized education, with the specific aim of optimizing learning outcomes among diverse online learner populations in Ghana.

LITERATURE REVIEW

This literature review examines the current state of research on machine learning applications in personalized education, with specific focus on reinforcement learning approaches and their potential application (Feng Gu et al, 2024) within Ghana’s diverse online learning ecosystem. In developing countries like Ghana, where educational resources are often limited and learner populations are highly diverse, the potential for AI-driven personalized education systems to optimize learning outcomes is particularly significant.

Global Perspectives on AI in Education

The integration of artificial intelligence in education has gained significant momentum globally, with research demonstrating positive impacts on student engagement, performance, and motivation. AI can improve students’ performance, engagement, and motivation; at the same time, some challenges like bias and discrimination should be noted (Daradoumis et al., 2024). This comprehensive analysis of 85 studies reveals the broad potential of AI in e-learning while highlighting critical considerations for implementation.

Current AI and Machine Learning Initiatives

Ghana has demonstrated increasing interest in artificial intelligence and machine learning applications in education. Programs like the MTN Ghana Foundation’s coding and AI workshops are teaching young learners how to interact with AI and build the skills needed for tomorrow’s jobs (Knowledge Innovations, 2024). These initiatives represent foundational steps toward building the technical capacity necessary for implementing advanced personalized learning systems.

Educational institutions in Ghana have begun exploring AI applications, with research examining how AI-personalized learning systems affect academic performance across different age groups at Kumasi Technical University (Owusu et al., 2024). This research represents an important contribution to understanding the effectiveness of AI-powered personalized learning in Ghanaian higher education contexts.

Limited Research on Ghana’s Educational Contexts

Even though worldwide research on AI in education is extensive, there is still more room for improvement specifically addressing African educational contexts. Many of the exciting studies focus on developed countries with different technological infrastructures, cultural contexts, and educational challenges than those found in Ghana and similar African nations.

Current Reinforcement Learning research in education primarily focuses on resource-rich environments with reliable internet connectivity and sophisticated technological infrastructure. There is limited research on how RL systems can be adapted for resource-constrained environments like those common in many parts of Ghana.

Last but not the least, the development of reinforcement learning systems that can effectively personalize education across multiple languages and cultural contexts remains an underexplored area. Ghana’s linguistic diversity provides an excellent test bed for developing and evaluating such systems.

METHODOLOGY

To address the challenge of delivering effective and engaging online learning experiences for the diverse student population in Ghana, we attempt to propose a machine learning framework which primarily leverages on Reinforcement Learning algorithm’s ability to learn and adapt to the skillset of students as against the traditional method of learning.

The model which will be trained on a rich dataset of learner characteristics (e.g., background, prior performance), engagement metrics (e.g., time spent on activities, interaction frequency), and learning outcomes (e.g., assessment scores, completion status). By learning the complex relationships between these features and student success, the Reinforcement learning model can predict how a learner might perform on different learning resources or along various learning trajectories. This predictive power is crucial for proactively tailoring the learning path to maximize the likelihood of positive outcomes and provide timely interventions when needed (Baker, 2016).

Data Acquisition

In enhancing the personalization process, the proposed framework will also incorporate Markov Decision Process (MDP) utilizing Bellman algorithm to recommend relevant learning resources. Data for simulation and training was retrieved from kaggle.com via the Open University Learning Analytics Dataset (OULAD).

A crucial aspect of pre-processing for the Ghanaian context is adjusting for sampling bias in the original OULAD dataset, as its demographic and behavioural distributions may differ significantly from those of Ghanaian learners (Asabere et al., 2022). This requires careful weighting and calibration of features based on known distributions in the target population.

Effective pre-processing ensures the data is suitable for machine learning model training while maintaining its contextual relevance to Ghanaian learners (Romero and Ventura, 2020).

Data Cleaning and Splitting

Missing data is a very common challenge in educational datasets, especially in online learning environments where student engagement patterns vary significantly mostly due to digital literacy. Our approach to handling these missing values employs multiple imputation strategies which was tailored to the nature of variables identified.

More over the IQR method was employed as the primary outlier detection technique due to its robustness and interpretability . The IQR was used to define the outlier boundaries and also flagged observations which fell outside the boundaries.

During the cleaning process, records were identified with students who had identical IDs, timestamps and cross-reference with systems logs to distinguish between legitimate repeated actions and any data entry errors. An audit trail was maintained from removed duplicates for quality assurance.

Training Set was split 60% for model fitting and parameter learning. This is because, the training set represented the largest portion of our dataset which was carefully selected to ensure comprehensive representation of the diverse online learner population in Ghana. The objective was to learn optimal policy parameters for reinforcement learning algorithms by establishing a baseline performance metrics across different learner segments. This helped train the neural network architecture for student state representation by calibrating the reward function for personalized learning pathways.

The Validation was set to 20% for Hyperparameter Tuning and Model Selection which served as an unbiased evaluation mechanism during the model development phase.

Data Loading and Preparation

First, we used the codes below to imports the necessary libraries

import numpy as np import pandas as pd import random import matplotlib.pyplot as plt

Then, it loads three datasets from CSV files into pandas Data Frames: studentInfo.csv, courses.csv, and assessments.csv. We made sure these files were in the specified directory or update the path accordingly

student_info_df = pd.read_csv(“/content/data_set/studentInfo.csv”) courses_df = pd.read_csv(“/content/data_set/courses.csv”) assessments_df = pd.read_csv(“/content/data_set/assessments.csv”)

Q-Learning

This algorithm is an off-policy algorithm that will directly approximates the optimal action-value function:

Q(s,a)←Q(s,a) + α[R+γmax_a′ Q(s′,a′) − Q(s,a)]

where α is the learning rate.

Q-Learning Training Loop

This section implements the core Q-learning algorithm. It iterates through each learner profile and runs a simulated learning process for a set number of episodes.

for profile, prefs in student_profiles.items(): for ep in range(episodes): state = 0 total_reward = 0

while state < n_states – 1: if random.uniform(0, 1) < epsilon: action = random.randint(0, n_actions – 1) else: action = np.argmax(q_tables[profile][state]) reward = prefs[action] * (1 + 0.1 * state) next_state = state + 1 if reward >= 0.6 else state # Q-value update q_tables[profile][state][action] = \ (1 – alpha) * q_tables[profile][state][action] + \ alpha * (reward + gamma * np.max(q_tables[profile][next_state])) state = next_state total_reward += reward epsilon = max(min_epsilon, epsilon * np.exp(-decay * ep)) reward_logs[profile].append(total_reward)

Inside the loops:

The learning process starts at state = 0.
In each step, the agent (representing a learner of a specific profile) chooses an action (content type) based on an epsilon-greedy strategy: with probability epsilon, it explores a random action; otherwise, it exploits the action with the highest Q-value in the current state.
A reward is calculated based on the chosen action and the current state, reflecting the learner’s preference for the content and the progress made.
The next_state is determined: if the reward is above a threshold (0.6), the learner progresses to the next level; otherwise, they stay at the current level.
The Q-value for the current state and chosen action is updated using the Q-learning formula, which incorporates the immediate reward and the maximum future reward from the next state.
The state is updated, and the total_reward for the episode is accumulated.
After each episode, epsilon is decayed to reduce exploration over time, encouraging the agent to exploit learned knowledge. The total reward for the episode is stored in reward_logs.

Q-Learning Parameters and Initialization

The code defines parameters for the Q-learning algorithm: alpha (learning rate), gamma (discount factor), epsilon (exploration rate), min_epsilon (minimum exploration rate), decay (epsilon decay rate), and episodes (number of learning iterations).

alpha = 0.1 gamma = 0.9 epsilon = 1.0 min_epsilon = 0.01 decay = 0.005 episodes = 300

Q-Learning Outcomes

The Q-learning algorithm converged toward distinct policies for each learner profile:

Visual Learners consistently preferred video content at lower levels, shifting slightly toward quizzes at higher levels.
Textual Learners showed the strongest response to text-based content across all learning levels.
Assessment-Oriented Learners demonstrated high preference for quizzes, achieving faster transitions to higher learning states.

The figure below shows the learning outcome (total reward) accumulated over 300 episodes for each profile:

Figure 1: Cumulative Rewards over Episodes

Learned Policies

The following table summarizes the best content delivery strategy learned by the RL model for each learner profile at different learning levels:

Table 1: RL model for each learner profile

Learner Type	Level 0	Level 1	Level 2	Level 3	Level 4
Visual Learner	Video	Video	Video	Quiz	Quiz
Textual Learner	Text	Text	Text	Text	Text
Assessment-Oriented	Quiz	Quiz	Quiz	Quiz	Quiz

These learned policies indicate that adaptive learning systems using reinforcement learning can effectively personalize content pathways that match learner preferences, potentially improving engagement and learning outcomes.

RESULTS

The reinforcement learning model was implemented to simulate the learning behavior of diverse online student population based on profiles derived from the Open University Learning Analytics Dataset (OULAD). Student profiles were assigned based on demographic and academic variables from the studentInfo.csv, courses.csv, and assessments.csv datasets.

Three learner profiles were defined:

Visual Learners (prefer video-based content)
Textual Learners (prefer reading materials)
Assessment-Oriented Learners (prefer quizzes and testing)

A Q-learning algorithm was used to optimize the delivery of learning content types (video, text, and quizzes) across five learning levels. The model aimed to find an optimal learning path that maximizes cumulative reward, which in this context reflects knowledge gain or learning satisfaction.

RECOMMENDATIONS

A comprehensive personalized learning system could integrate these approaches. For instance, supervised learning models could predict the immediate outcome of a learner engaging with a specific resource, while a reinforcement learning agent uses these predictions as part of its reward function to guide the long-term path generation. Collaborative or content-based filtering could inform the set of potential actions available to the RL agent or provide initial recommendations before sufficient learner data is available for purely model-based approaches.

This graphical framing provides a structured approach to developing a machine learning model for personalized education, allowing for the prediction of learning outcomes and the dynamic generation of adaptive learning paths tailored to the diverse needs of online learners in Ghana. The specific choice of supervised learning algorithms, RL techniques, state and action spaces, and reward functions will be critical design decisions guided by the characteristics of the learning environment and the available data.

Interpretation and Implications

These findings demonstrate the utility of reinforcement learning in online education platforms. By modeling student preferences and learning progression as a Markov Decision Process, educational content can be dynamically optimized to suit individual learner needs. This is particularly relevant for Ghana’s online learning landscape, where learner diversity (age, education level, digital literacy) requires personalized support mechanisms.

LMS Integration as a Recommendation Module

The personalized learning system can be integrated into existing Learning Management Systems (LMS) as a recommendation module through several approaches:

API-Based Integration:

RESTful API endpoints allowing the ML system to receive learner data and return recommendations
Webhook implementation for real-time event processing
OAuth authentication to maintain data security across systems

Plugin Architecture:

Modular design enabling deployment across different LMS platforms common in Ghana
Configurable components that adapt to varying institutional needs
Lightweight implementation options for resource-constrained environments

Data Synchronization:

Offline-first architecture with synchronization capabilities
Batch processing options for limited-connectivity environments
Data compression techniques to minimize bandwidth requirements

This integration approach acknowledges the heterogeneous technology landscape in Ghanaian educational institutions, where various LMS solutions may be deployed with different levels of technical infrastructure (Antwi-Boampong, 2020).

REFERENCES

Manikanda Rajagopal, BaigMuntajeeb Ali, S. Sharon Priya, W. Banu, Madhavi G. M, Punamkumar, 2023 Eighth International Conference on Science Technology Engineering and Mathematics. https://doi.org/10.1109/ICONSTEM56934.2023.10142626)
Agbe, E., Sefa-Nyarko, C. and Kabutey, P. (2022) ‘Digital divide and online learning experiences during COVID-19 in Ghana’, International Journal of Educational Research Open, 3(2), pp. 100-112.
Sharon Slade, Applications of Student Data in Higher Education: Issues and Ethical Considerations, September 6, 2016 https://doi.org/10.18665/283891
Feng Gu, Peng Wang, Huiyuan Jiao, Xuxi Li, Lixian Wang, Comparison of the Application of Machine Learning Technology in Online Teaching, Published in 7th International Conference on Education, Network and Information Technology (ICENIT) https://doi.org/10.1109/icenit61951.2024.00027
Antwi-Boampong, A. (2020) ‘Towards a faculty blended learning adoption model for higher education in developing countries’, Education and Information Technologies, 25(3), pp. 2129-2152.
Asabere, N.Y. and Mends-Brew, E. (2021) ‘Distance Learning and E-Learning Service Adoption in Ghanaian Higher Education during COVID-19’, International Journal of Technology in Education, 4(2), pp. 256-282.
Baker, R.S. and Inventado, P.S. (2014) ‘Educational data mining and learning analytics’, in Learning Analytics, Springer, New York, NY, pp. 61-75.
Doleck, T., Lemay, D.J., Basnet, R.B. and Bazelais, P. (2020) ‘Predictive analytics in education: A comparison of deep learning frameworks’, Education and Information Technologies, 25(3), pp. 1951-1963.
Khosravi, H., Shum, S.B., Chen, G., Conati, C., Tsai, Y.S., Kay, J., Knight, S., Martinez-Maldonado, R., Sadiq, S. and Gašević, D. (2022) ‘Explainable artificial intelligence in education’, Computers and Education: Artificial Intelligence, 3, p.100074.
Nye, B.D. (2015) ‘Intelligent tutoring systems by and for the developing world: A review of trends and approaches for educational technology in a global context’, International Journal of Artificial Intelligence in Education, 25(2), pp. 177-203.
Ally, M. (2019). Foundations of educational theory for online learning (2nd ed.). Athabasca University Press.
Baker, R. S. J. d. (2016). Stupid tutoring systems, intelligent humans. International Journal of Artificial Intelligence in Education, 26(2-3), 600-614.
Hwang, G. J., Chen, N. S., & Li, L. Y. (2012). A study on personalized learning assistance based on ubiquitous learning environment. Educational Technology & Society, 15(1), 237-251.
Means, B., Toyama, Y., Murphy, R., Bakia, M., & Jones, K. (2010). Evaluation of evidence-based practices in online learning: A meta-analysis and review of online learning studies. U.S. Department of Education, Office of Planning, Evaluation, and Policy Development.
Baker, R. S. J. d. (2016). Stupid tutoring systems, intelligent humans. International Journal of Artificial Intelligence in Education, 26(2-3), 600-614.
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Graves, A. (2012). Supervised sequence labelling with recurrent neural networks. Springer.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Linden, G., Smith, B., & York, J. (2003). Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76-80.
Ricci, F., Rokach, L., & Shapira, B. (2011). Introduction to recommender systems handbook. In Recommender systems handbook (pp. 1-35). Springer.
Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT press.