A Machine Learning Model for Analysis and Prediction of Football Match Outcomes in the English Premier League
Authors
Department of Computer Science, Federal University of Technology, Akure, Ondo State (Nigeria)
Department of Computer Science, Federal University of Technology, Akure, Ondo State (Nigeria)
Article Information
DOI: 10.51584/IJRIAS.2026.11010020
Subject Category: Computer Science
Volume/Issue: 11/1 | Page No: 244-252
Publication Timeline
Submitted: 2025-12-27
Accepted: 2026-01-03
Published: 2026-01-24
Abstract
Football stands as the world's most popular sport, captivating billions globally. The English Premier League, in particular, is widely regarded as the pinnacle of professional football, boasting immense global viewership and attracting widespread interest. Its dynamic and unpredictable nature fuels a massive industry built around match analysis, reflecting the deep desire to anticipate match outcomes. Early attempts at football match prediction often relied on static historical data, assumed independence among events, failed to adapt quickly to football's rapid evolution, and lacked the capacity to capture complex nonlinear interactions among multiple features. This study develops a machine learning model for football match analysis in the English Premier League to predict match outcomes, addressing gaps in previous models by using ensemble machine learning algorithms to provide timely, accurate, and real-time analysis. The study utilised Random Forest (RF), XGBoost, and LightGBM. Performance evaluation using standard classification metrics, including Accuracy, Precision, Recall, F1-Score, and ROC-AUC, showed that Random Forest achieved the best overall performance, with an accuracy of 87.14% and an ROC-AUC of 99.00%. The ensemble model further enhanced prediction consistency by combining the strengths of the three machine learning models. This study demonstrates the effectiveness of machine learning for match predictions and, from an industry perspective, offers practical recommendations for football to enhance retention, efficiency, and competitiveness.
Keywords
Football, Machine learning, Dataset, Random Forest, XGBoost , LightGBM
Downloads
References
1. Almarri, M. M., Alotaibi, S. A., & Al-Thani, A. (2022). Ensemble-based machine learning for classification and prediction of diabetic patients' status using a Saudi Arabian dataset: pre-diabetes, Tidm, and T2DM. Computers in Biology and Medicine, 147, 105757. https://doi.org/10.1016/j.compbiomed.2022.105757 [Google Scholar] [Crossref]
2. Atitallah, S. B., Driss, M., & Almomani, I. (2022). A novel detection and multi-classification approach for IoT-malware using random forest voting of fine-tuning convolutional neural networks. Sensors, 22(11), 4302. https://doi.org/10.3390/s22114302 [Google Scholar] [Crossref]
3. Baboota, R., & Kaur, H. (2018). Predictive analysis and modelling football results using machine learning approach for English Premier League. International Journal of Forecasting. Advance online publication. https://doi.org/10.1016/j.ijforecast.2018.01.003 [Google Scholar] [Crossref]
4. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324 [Google Scholar] [Crossref]
5. Eryarsoy, E., & Delen, D. (2019). Predicting the outcome of a football game: A comparative analysis of single and ensemble analytics methods. In Proceedings of the 52nd Hawaii International Conference on System Sciences (p. 1107). https://hdl.handle.net/10125/59550 [Google Scholar] [Crossref]
6. FiveThirtyEight. (n.d.). FiveThirtyEight football predictions. Retrieved from https://projects.fivethirtyeight.com/soccer-predictions/ [Google Scholar] [Crossref]
7. Forebet. (2018). Mathematical football predictions, Tips, Statistics, Previews. Retrieved from https://www.forebet.com [Google Scholar] [Crossref]
8. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232. https://projecteuclid.org/journals/annals-of-statistics/volume-29/issue-5/Greedy-function-approximation-A-gradient-boosting-machine/10.1214/aos/1013203451.full [Google Scholar] [Crossref]
9. Hubáček, O., Šourek, G., & Železný, F. (2019). Learning to predict soccer results from relational data with gradient boosted trees. Machine Learning, 108(1), 29-47. https://doi.org/10.1007/s10994-018-5704-6 [Google Scholar] [Crossref]
10. Kaggle. (n.d.). Kaggle European Soccer Database. Retrieved from https://www.kaggle.com/hugomathien/soccer12 [Google Scholar] [Crossref]
11. Opta Sport. (n.d.). Opta Sport data provider. Retrieved from http://www.optasports.com/ [Google Scholar] [Crossref]
12. Razali, N., Mustapha, A., Yatim, F. A., & Ab Aziz, R. (2017). Predicting football matches results using Bayesian networks for English Premier League (EPL). IOP Conference Series: Materials Science and Engineering, 226(1), 012099. https://doi.org/10.1088/1757-899X/226/1/012099. [Google Scholar] [Crossref]
13. Ulmer, B., Fernandez, M., & Peterson, M. (2013). Predicting soccer match results in the English Premier League. Stanford University CS229 Final Project. [Google Scholar] [Crossref]
14. Wunderlich, F., & Memmert, D. (2016). Analysis of the predictive qualities of betting odds and FIFA rankings in forecasting the results of football matches. PLOS ONE, 11(2), e0148982. https://doi.org/10.1371/journal.pone.0148982 [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet