Improved Supervised Machine Learning Classification Approach For Heart Disease Detection

Authors

Michael Funskin

Computer and Information Systems Towson University Towson (USA)

Article Information

DOI: 10.47772/IJRISS.2026.10190030

Subject Category: Computer Science

Volume/Issue: 10/19 | Page No: 370-379

Publication Timeline

Submitted: 2026-01-06

Accepted: 2026-01-29

Published: 2026-02-14

Abstract

Heart disease remains one of the leading causes of death globally, thus highlighting the need for tools that can support early and accurate detection. This work develops a machine learning model to predict heart disease based on the UCI Heart Disease dataset, which combines data from four clinical cohorts (Cleveland, Hungarian, Switzerland, and VA Long Beach). Datasets were preprocessed by addressing missing values with median imputation, normalizing numeric ranges with min–max scaling, and applying SMOTE to correct class imbalance. Five classification algorithms (Logistic Regression, Support Vector Machine (SVM), Random Forest, k-Nearest Neighbors (kNN), and XGBoost) were trained and evaluated with XGBoost achieving the best performance with an accuracy of 0.88, F1-score of 0.88, ROC-AUC of 0.88, and PR-AUC of 0.87. SHAP analysis showed oldpeak (ST depression), ca (number of major vessels), thalach (maximum heart rate achieved), exang (exercise-induced angina), and thal (thalassemia type) as the most significant predictors of heart disease demonstrating consistency with prior work and reinforcing their importance in clinical diagnosis. Overall, the model balanced accuracy, interpretability and consistency across datasets and suitable for integration into clinical decision-support systems

Keywords

Heart disease, Machine Learning, Model

Downloads

References

1. Y. Mao, B. L. Jimma, and T. B. Mihretie, “Machine learning algorithms for heart disease diagnosis: A systematic review,” Current Problems in Cardiology, vol. 50, no. 8, p. 103082, Aug. 2025. doi:10.1016/j.cpcardiol.2025.103082 [Google Scholar] [Crossref]

2. M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthcare Analytics, vol. 3, p. 100130, Nov. 2023. doi:10.1016/j.health.2022.100130 [Google Scholar] [Crossref]

3. P. Kanchanamala, A. S. Alphonse, and P. V. B. Reddy, “Heart disease prediction using hybrid optimization enabled deep learning network with Spark Architecture,” Biomedical Signal Processing and Control, vol. 84, p. 104707, Jul. 2023. doi:10.1016/j.bspc.2023.104707 [Google Scholar] [Crossref]

4. S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge: Cambridge University Press, 2022. [Google Scholar] [Crossref]

5. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002. doi:10.1613/jair.953 [Google Scholar] [Crossref]

6. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, Jun. 2006. doi:10.1016/j.patrec.2005.10.010 [Google Scholar] [Crossref]

7. Powers, D. M. W., “Evaluation: From Precision, Recall and F-Measure to ROC and AUC,” Journal of Machine Learning Technologies, 2(1):37–63, 2011. ArXiv. https://arxiv.org/abs/2010.16061 [Google Scholar] [Crossref]

8. J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” Proceedings of the 23rd international conference on Machine learning - ICML ’06, pp. 233–240, 2006. doi:10.1145/1143844.1143874 [Google Scholar] [Crossref]

9. de Leeuw, E., Robustness of Evaluation Metrics for Predicting Probability Estimates of Binary Outcomes, Erasmus Univ. Rotterdam, 2019, p. 7. [Google Scholar] [Crossref]

10. Youden, W J. Index for rating diagnostic tests. Cancer. 1950 Jan;3(1):32-5. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. PMID: 15405679. [Google Scholar] [Crossref]

11. S. M. Lundberg and S. Lee, “Unified deep learning model for multitask reaction predictions with explanation,” 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA., 2017. doi:10.1021/acs.jcim.1c01467.s0 [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles