Future Work
To extend the impact of this research, several areas of improvement and expansion are proposed:
Multi-Center Validation: Expanding the dataset across other counties and hospitals in Kenya will help
validate the model’s generalizability and robustness across diverse populations.
Real-Time Deployment: The model will be integrated into a web-based or mobile platform to support
on-the-ground screening by health professionals and CHVs.
Model Explainability: To improve transparency and clinical trust, future versions will incorporate
explainable AI techniques, such as SHAP (SHapley Additive Explanations) and LIME (Local
Interpretable Model-Agnostic Explanations), to visualize individual predictions and the contribution of
specific features.
Incorporation of Additional Biomarkers: As local diagnostic capacity improves, future models can
include biomarkers like HbA1c, insulin resistance indices, and genomic data to further refine predictions.
REFERENCES
1. Bhargava, S., & Zafar, S. (2019). Socioeconomic and behavioral predictors in diabetes risk: An ML-
based population health study in Pakistan. Journal of Public Health Research, 8(3), 164–170.
https://doi.org/10.4081/jphr.2019.164
2. Chen, H., et al. (2021). Using real-world data from rural China to predict diabetes risk via ensemble
learning. BMC Endocrine Disorders, 21, 198. https://doi.org/10.1186/s12902-021-00870-w
3. Deberneh, H. M., & Kim, I. (2021). Prediction of type 2 diabetes based on machine learning algorithm.
International Journal of Environmental Research and Public Health, 18(6), 3317.
https://doi.org/10.3390/ijerph18063317
4. Farran, B., et al. (2022). An explainable machine learning approach to early T2DM prediction in Qatar’s
primary care. BMC Medical Informatics and Decision Making, 22, 183. https://doi.org/10.1186/s12911-
022-01948-6
International Diabetes Federation. (2021). IDF Diabetes Atlas (10th ed.). Brussels, Belgium:
International Diabetes Federation. https://diabetesatlas.org/
5. Islam, S. M. S., et al. (2022). Development of a non-invasive diabetes prediction tool using behavioral
and anthropometric data in rural Bangladesh. Scientific Reports, 12, 14378.
https://doi.org/10.1038/s41598-022-18022-6
6. Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine
learning and data mining methods in diabetes research. Computational and Structural Biotechnology
Journal, 15, 104–116. https://doi.org/10.1016/j.csbj.2016.12.005
7. Lee, S., et al. (2021). Diabetes risk classification with explainable ML: Application in underserved
Korean population. PLoS ONE, 16(6), e0253312. https://doi.org/10.1371/journal.pone.0253312
8. Mohan, V., et al. (2019). A deep learning model for diabetes prediction using Indian rural cohort.
Diabetes Technology & Therapeutics, 21(10), 562–569. https://doi.org/10.1089/dia.2019.0172
9. Nguyen, Q. C., et al. (2020). Leveraging social determinants and EHR data to predict diabetes risk in
underserved populations. International Journal of Medical Informatics, 141, 104241.
https://doi.org/10.1016/j.ijmedinf.2020.104241
10. Rahman, M. M., et al. (2020). T2DM risk assessment using random forest and decision tree in community
health datasets. Informatics in Medicine Unlocked, 21, 100461.
https://doi.org/10.1016/j.imu.2020.100461
11. Wang, F., & Hu, J. (2019). Predicting chronic disease risk using machine learning on health survey data:
A case study on diabetes. IEEE Journal of Biomedical and Health Informatics, 23(6), 2548–2556.
https://doi.org/10.1109/JBHI.2018.2887383