Improved Supervised Machine Learning Classification Approach For Heart Disease Detection
Authors
Computer and Information Systems Towson University Towson (USA)
Article Information
DOI: 10.47772/IJRISS.2026.10190030
Subject Category: Computer Science
Volume/Issue: 10/19 | Page No: 370-379
Publication Timeline
Submitted: 2026-01-06
Accepted: 2026-01-29
Published: 2026-02-14
Abstract
Heart disease remains one of the leading causes of death globally, thus highlighting the need for tools that can support early and accurate detection. This work develops a machine learning model to predict heart disease based on the UCI Heart Disease dataset, which combines data from four clinical cohorts (Cleveland, Hungarian, Switzerland, and VA Long Beach). Datasets were preprocessed by addressing missing values with median imputation, normalizing numeric ranges with min–max scaling, and applying SMOTE to correct class imbalance. Five classification algorithms (Logistic Regression, Support Vector Machine (SVM), Random Forest, k-Nearest Neighbors (kNN), and XGBoost) were trained and evaluated with XGBoost achieving the best performance with an accuracy of 0.88, F1-score of 0.88, ROC-AUC of 0.88, and PR-AUC of 0.87. SHAP analysis showed oldpeak (ST depression), ca (number of major vessels), thalach (maximum heart rate achieved), exang (exercise-induced angina), and thal (thalassemia type) as the most significant predictors of heart disease demonstrating consistency with prior work and reinforcing their importance in clinical diagnosis. Overall, the model balanced accuracy, interpretability and consistency across datasets and suitable for integration into clinical decision-support systems
Keywords
Heart disease, Machine Learning, Model
Downloads
References
1. Y. Mao, B. L. Jimma, and T. B. Mihretie, “Machine learning algorithms for heart disease diagnosis: A systematic review,” Current Problems in Cardiology, vol. 50, no. 8, p. 103082, Aug. 2025. doi:10.1016/j.cpcardiol.2025.103082 [Google Scholar] [Crossref]
2. M. Ozcan and S. Peker, “A classification and regression tree algorithm for heart disease modeling and prediction,” Healthcare Analytics, vol. 3, p. 100130, Nov. 2023. doi:10.1016/j.health.2022.100130 [Google Scholar] [Crossref]
3. P. Kanchanamala, A. S. Alphonse, and P. V. B. Reddy, “Heart disease prediction using hybrid optimization enabled deep learning network with Spark Architecture,” Biomedical Signal Processing and Control, vol. 84, p. 104707, Jul. 2023. doi:10.1016/j.bspc.2023.104707 [Google Scholar] [Crossref]
4. S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. Cambridge: Cambridge University Press, 2022. [Google Scholar] [Crossref]
5. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, Jun. 2002. doi:10.1613/jair.953 [Google Scholar] [Crossref]
6. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, Jun. 2006. doi:10.1016/j.patrec.2005.10.010 [Google Scholar] [Crossref]
7. Powers, D. M. W., “Evaluation: From Precision, Recall and F-Measure to ROC and AUC,” Journal of Machine Learning Technologies, 2(1):37–63, 2011. ArXiv. https://arxiv.org/abs/2010.16061 [Google Scholar] [Crossref]
8. J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” Proceedings of the 23rd international conference on Machine learning - ICML ’06, pp. 233–240, 2006. doi:10.1145/1143844.1143874 [Google Scholar] [Crossref]
9. de Leeuw, E., Robustness of Evaluation Metrics for Predicting Probability Estimates of Binary Outcomes, Erasmus Univ. Rotterdam, 2019, p. 7. [Google Scholar] [Crossref]
10. Youden, W J. Index for rating diagnostic tests. Cancer. 1950 Jan;3(1):32-5. doi: 10.1002/1097-0142(1950)3:1<32::aid-cncr2820030106>3.0.co;2-3. PMID: 15405679. [Google Scholar] [Crossref]
11. S. M. Lundberg and S. Lee, “Unified deep learning model for multitask reaction predictions with explanation,” 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA., 2017. doi:10.1021/acs.jcim.1c01467.s0 [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet