A Comparative Analysis and Implementation of Supervised Machine Learning Models for Gui-Based Heart Disease Prediction
- Noor Asyikin Sulaiman
- Hazman Haizaini Mohamad
- Azdiana Md Yusop
- Muhammad Noorazlan Shah Zainudin
- Adie Mohd Khafe
- Siti Fatimah Sulaiman
- Md Pauzi Abdullah
- 4969-4977
- Sep 13, 2025
- Education
A Comparative Analysis and Implementation of Supervised Machine Learning Models for Gui-Based Heart Disease Prediction
Noor Asyikin Sulaiman1*, Hazman Haizaini Mohamad2, Azdiana Md Yusop3, Muhammad Noorazlan Shah Zainudin4, Adie Mohd Khafe5, Siti Fatimah Sulaiman6, Md Pauzi Abdullah7
1,3,4,5,6Centre for Telecommunication Research & Innovation (CeTRI), Faculty Technology dan Kejuruteraan Elektronik dan Computer (FTKEK), University Technical Malaysia Melaka (UTeM), Hang Tuah Jaya, 76100, Durian Tunggal, Melaka, Malaysia
2Civil Aviation Authority of Malaysia (CAAM), Jalan Airport,98000, Miri, Sarawak, Malaysia
7Centre of Electrical Energy Systems (CEES), Faculty of Electrical Engineering, University Technology Malaysia (UTM), 81310 Skudai, Johor, Malaysia
*Corresponding Author
DOI: https://dx.doi.org/10.47772/IJRISS.2025.908000401
Received: 10 August 2025; Accepted: 18 August 2025; Published: 13 September 2025
ABSTRACT
Heart failure is a critical global health issue, contributing substantially to morbidity and mortality worldwide. Early and accurate detection is essential for timely intervention and improved patient outcomes. This study presents a comparative analysis and implementation of five supervised machine learning algorithms—Decision Tree (DT), Logistic Regression (LR), Support Vector Machine (SVM), Gaussian Naive Bayes (GNB), and K-Nearest Neighbors (KNN)—for heart disease prediction. An open-access dataset comprising 1,025 patient records with 13 relevant features was used, with a 70:30 training-to-testing split. The performance of each model was evaluated based on accuracy, precision, recall, and F1 score. Results showed that the Decision Tree model outperformed other models, with 94.48% accuracy, 90.79% precision, 96.66% recall, and a 94.56% F1-score. The analysis of hyperparameter tuning max_depth parameter was also analysed to optimise the DT model. Logistic Regression and SVM performed competitively but with lower metrics, while KNN recorded the lowest accuracy, highlighting its limitations for complex datasets. At the end of this study, a Graphical User Interface (GUI) was developed using Python’s Tkinter library. Integrating the optimized Decision Tree model, the GUI allows healthcare practitioners to input 13 clinical parameters and instantly receive heart disease risk predictions. The GUI facilitates the decision-making in patient pre-screening and early intervention treatment. The GUI enhanced diagnostic precision and delivering accessible, cost-effective tools for healthcare settings.
Keywords: Predictive Model, Machine Learning, Decision Trees, Heart Disease, Graphical User Interface
INTRODUCTION
Heart disease is a leading cause of death worldwide. Each year, an estimated 17.9 million people die from heart disease, accounting for 31% of all fatalities globally (Chua et al. 2022). Heart disease, particularly cardiac arrest, can strike at any moment and in any place, sometimes with no signs or indicators (Chua et al. 2022).
Meanwhiles, machine learning has applications in various fields, including healthcare (Qi et al. 2019) (Li 2020), finance (Moeini Najafabadi et al. 2019), marketing (Agarwal et al. 2022), building management (Sulaiman et al. 2022) as well as in agricultural (Zainudin et al. 2021). For example, machine learning (ML) has emerged as a game changer in the financial sector. Machine Learning models, particularly those based on supervised learning, play a role in predictive analytics. These models can anticipate future stock prices or market trends by examining previous data, which includes indications such as price fluctuations and trade volumes. This capacity provides significant information to investors, resulting in more informed decision-making and better risk management. One of the journal titles, Making Investment Decisions in Stock Markets Using forecasting-Markowitz based Decision-making Approaches, written by Zahra Moeini Najaf Abadi, Mehdi Bijari, and Mehdi Khashei. This journal aims to make financial decisions using forecasting-Markowitz-based approaches. The machine learning methods used are Time series prediction methods, including autoregressive, autoregressive moving averages, and artificial neural networks (Moeini Najafabadi 2019).
In the medical field, machine learning is commonly used in chronic diseases to prevent diseases from worsening, such as cardiovascular disease, cancers, chronic respiratory diseases, and diabetes (Thakur and Paika 2023). Medical practitioners state that heart disease, known as cardiovascular disease, is one of the common issues in that field (Suseendran et. Al. 2019). Thus, predicting this disease is essential to prevent the patient from getting the disease. There are many predictions of heart failure using machine learning to assess the condition of the patient. The key component in predicting heart failure is the features of the dataset because finding suitable features for the model can significantly improve its performance. The decision tree method is suitable for making predictions of heart failure based on recent research because it does not require feature scaling to be effective and can handle missing values without imputation.
Vijaya Saraswathi et al. 2022 presented study focuses on heart disease prediction using decision trees and SVM. It mentions in the paper that predicting heart disease is a challenging task due to the multifaceted nature of cardiac health indicators. It also highlights that employing machine learning algorithms in the medical domain can provide significant benefits. The machine learning algorithms used in this study are decision trees and support vector machines (SVM). The results show that the decision tree has the highest accuracy compared to the support vector machine, with the decision tree achieving 89.6% accuracy and the SVM achieving 87.6%. Another study by C. Sateesh and R. Balamanigandan explores heart disease prediction using an innovative decision tree technique to increase accuracy compared to convolutional neural networks (CNN). The results from this study indicate that the decision tree achieves an accuracy of 87.75%, compared to 84.5% for CNN. This demonstrates that decision trees are suitable for the prediction of heart disease (Sateesh and Balamanigandan 2022).
This paper aims to develop a predictive system for heart disease detection using various machine learning algorithms, focusing on the design, implementation, and performance evaluation of multiple classification models. The system is developed with an interactive and user-friendly graphical interface that enables users to input relevant health parameters and receive immediate predictive outcomes. To achieve this, five machine learning models—Support Vector Machine (SVM), Logistic Regression, Decision Tree, Gaussian Naive Bayes, and K-Nearest Neighbors (KNN)—are implemented and trained using patient datasets. The results from each model are analyzed and compared to identify which algorithm offers the highest prediction accuracy. By leveraging the capabilities of machine learning, this study seeks to optimize classification performance, minimize human error, and improve system reliability in decision-making processes. The core emphasis of this research lies in algorithmic design and comparative evaluation, making it relevant for engineering applications where intelligent data-driven systems are required. This work contributes to the development of efficient, scalable, and accurate predictive models that can be integrated into real-world applications for automated health-related assessments.
METHODOLOGY
Model selection is a critical phase in the development of a machine learning-based classification system, as it determines the most appropriate algorithm for achieving optimal predictive performance. This process involves evaluating multiple models, not only across different algorithmic families but also within the same model type using varied hyperparameter configurations. For this study, five classification models were selected—Support Vector Machine (SVM), Logistic Regression, Decision Tree, Gaussian Naive Bayes, and K-Nearest Neighbors (KNN). The selection was guided by the nature of the problem, which involves binary classification using a structured numerical dataset. These models are well-established in the field of supervised learning and are particularly suitable for tasks involving health-related data classification due to their interpretability, computational efficiency, and proven effectiveness in prior studies.
Dataset
An open-access heart disease dataset was obtained from the Kaggle platform (www.kaggle.com) for the purpose of model training and evaluation. The dataset comprises clinical and biological information from 1,025 individuals, represented by 13 input attributes that are relevant to cardiovascular health. Each record is labelled with a “target” variable, which indicates the presence or absence of heart disease. Specifically, a value of “0” denotes that the individual does not have heart disease, while a value of “1” signifies that the individual is diagnosed with the condition. This labelled structure facilitates supervised learning and enables the development of binary classification models for heart disease prediction.
The dataset consists of 13 attributes, each representing a clinically relevant parameter associated with cardiovascular health. These attributes serve as input features for the supervised machine learning models used in this study. Table 1 provides a detailed description of each attribute and its corresponding meaning. The variables include demographic information (such as age and sex), physiological measurements (such as resting blood pressure and serum cholesterol), and results from diagnostic tests (such as electrocardiographic readings and stress test indicators). This structured feature set allows for a comprehensive representation of patient health profiles, which is essential for building robust predictive models.
Table1: Description of Dataset Attributes
Attribute | Meaning |
Age | Age in years |
Sex | 1 = male, 0 = female |
Cp | Chest Pain Type |
Trestbps | Resting Blood Pressure in mmHg on admission to the hospital |
Chol | Serum cholesterol in mg/dl |
Restecg | Resting electrocardiographic results (values 0,1,2) |
Thalach | Maximum heart rate achieved (71=min, 135 = 25%, 152=50%, 166=75%, 202=Max) |
Exang | Exercise induced angina |
Oldpeak | S-T depression induced by exercise relative to rest |
Slope | The slop of the peak exercise S-T segment |
Ca | Number of major vessel (0-3) coloured by fluoroscopy |
Data Spliting
The dataset, comprising 1,025 instances and 13 features, is partitioned into training and testing sets using a 70:30 ratio through the train_test_split function in Jupyter Notebook. This standard two-part split is commonly applied in supervised learning to facilitate model development, parameter tuning, and performance evaluation. The training set, which contains 70% of the total data (717 instances), is used to train and optimize the predictive models. The remaining 30% (308 instances) is allocated for testing, allowing for an objective assessment of each model’s generalization capability.
Prior to the split, the feature matrix and target vector are defined as X and y, respectively. These are then divided into four subsets: X_train, X_test, y_train, and y_test. The train_test_split function is configured with the parameters test_size=0.3 and random state=1984, where test_size specifies the proportion of data assigned to testing, and random state ensures reproducibility by setting a fixed seed for random number generation.
This data partitioning approach ensures that the machine learning models are trained on a representative subset of the data and evaluated on previously unseen samples, enabling a fair comparison of their predictive accuracy.
Model Training, Tuning and Evaluation
Five machine learning models were developed and evaluated: Logistic Regression, Decision Tree, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Gaussian Naive Bayes. Model development and training were conducted using the Jupyter Notebook environment. The dataset was pre-processed and partitioned into training and testing subsets to ensure unbiased evaluation. Each model was trained on the training set and tested on the unseen data to assess its generalization capability.
To evaluate the classification performance of the machine learning models, four widely used metrics were employed: accuracy, precision, recall, and F1-score. Accuracy represents the overall proportion of correctly predicted instances and serves as a general indicator of the model’s effectiveness. Precision measures the proportion of true positive cases among all instances that the model predicted as positive, indicating the reliability of positive classifications. Recall, on the other hand, reflects the model’s ability to correctly identify all actual positive cases, which is particularly important in medical diagnoses where missing a positive case could have serious consequences. The F1-score, which is the harmonic mean of precision and recall, provides a balanced evaluation by considering both false positives and false negatives. This metric is especially useful when the dataset exhibits class imbalance, ensuring that both types of errors are accounted for in the model assessment.
A confusion matrix was used to provide a detailed breakdown of prediction results. The matrix was generated using the confusion matrix function, comparing the true labels from the test set (stored in y_test) with the predicted labels (y_pred1). This enabled calculation of all performance metrics and facilitated insight into classification errors.
Hyperparameter Tuning for Decision Tree
To further improve model performance and avoid overfitting, hyperparameter tuning was applied to the Decision Tree classifier. Hyperparameters are configuration variables that govern the learning process and must be set before training. Among the available parameters, max_depth was selected for tuning, as it directly controls the complexity of the tree by limiting how deep the model can grow. Setting the depth too high may lead to overfitting, where the model memorizes noise in the training data, while a shallow tree may result in underfitting.
Through iterative experimentation, various values of max_depth were tested to identify the optimal setting that balances predictive performance on both the training and testing sets. The final selected value ensured that the model maintained high accuracy while generalizing well to unseen data. This tuned model was subsequently integrated into the graphical user interface for real-time heart failure risk prediction.
Graphical User Interface (GUI) Development
A graphical user interface (GUI) was developed using Python’s Tkinter library to enable interactive heart failure risk prediction. The interface allows clinical users to input patient data, which is then processed by an embedded decision tree classifier trained on validated clinical datasets. The model computes and displays the predicted risk level and associated score in real time.
The GUI emphasizes clarity, usability, and interpretability, facilitating integration into clinical workflows. By providing immediate and comprehensible risk feedback, the system supports informed decision-making in the early detection and management of heart failure.
RESULTS AND DISCUSSION
This section presents and discusses the performance results of five machine learning models: Logistic Regression, Decision Tree, KNN, SVM, and Gaussian Naive Bayes.
Performance of Decision Tree Model with Different ‘max_depth’ Values
Table 2 summarizes the performance of the Decision Tree classifier across varying values of the max_depth parameter, ranging from 1 to 6. A clear upward trend is observed, where greater tree depth generally leads to improved predictive accuracy.
At max_depth = 1, the model achieves 75% accuracy, reflecting underfitting due to an overly simplistic structure that fails to capture the complexity of the data. A similar result is recorded at max_depth = 2, indicating limited model expressiveness. When the depth increases to 3, the accuracy rises sharply to 85.39%, suggesting that the model begins to capture more meaningful patterns. Further increments continue to enhance performance, with accuracies of 86.36% and 91.23% at depths 4 and 5, respectively. The highest accuracy of 94.48% is attained at max_depth = 6, demonstrating the model’s ability to effectively capture intricate decision boundaries and improve classification reliability.
These findings highlight the critical role of hyperparameter tuning in balancing underfitting and overfitting. In this study, the optimal performance was achieved with max_depth = 6, making it the most suitable configuration for integration into the subsequent graphical user interface (GUI) application.
Table2: Accuracy of Decision Tree with Different max_depth Values
Max_depth | Accuracy (%) |
1 | 75 |
2 | 75 |
3 | 85.39 |
4 | 86.36 |
5 | 91.23 |
6 | 94.48 |
Model Comparison
Table 3 presents the evaluation metrics—accuracy, precision, recall, and F1 score—for all five developed machine learning models. Among these, the Decision Tree model demonstrates the most outstanding overall performance, achieving values exceeding 90% across all metrics. This superior performance highlights the model’s ability to establish clear and accurate classification boundaries. Its robustness against outliers and capacity to model nonlinear relationships contribute significantly to its effectiveness. Additionally, the high scores can be attributed to effective hyperparameter tuning, particularly the optimization of max_depth, which enhances the model’s generalization and predictive power.
In contrast, the K-Nearest Neighbors (KNN) model records the lowest accuracy at 73.38%, reflecting limitations in handling complex and overlapping data distributions. Despite this, it shows a relatively high precision of 78.33%, suggesting that it performs reasonably well in reducing false positives. However, its recall is the lowest among all models, indicating that it fails to detect a significant number of actual positive cases, which impacts the overall F1 score.
Logistic Regression (LR) and Support Vector Machine (SVM) achieve moderate but consistent performances, with F1 scores around 85%. Gaussian Naive Bayes (GNB) records slightly lower results across all metrics but still performs within acceptable thresholds for general classification tasks.
Table3: Evaluation Matrix for All Models
Model | Accuracy | Precision | Recall | F1 Score |
DT | 94.40% | 90.79% | 96.66% | 94.56% |
SVM | 84.42% | 80% | 90.66% | 85% |
LR | 85.39% | 83.02% | 88% | 85.44% |
GNB | 82.14% | 79.87% | 84.66% | 82.20% |
KNN | 73.38% | 78.33% | 62.67% | 69.63% |
Figures 1 to 4 visually compare model performance across the four metrics. Figure 1 clearly shows that the Decision Tree outperforms all other models in terms of accuracy, followed by Logistic Regression and SVM, with KNN performing the worst. In Figure 2, the Decision Tree maintains its lead in precision, while KNN performs surprisingly well in this metric despite its low overall accuracy.
Figure 3 illustrates recall performance, where the Decision Tree again dominates, indicating its effectiveness in identifying actual positive cases. SVM and LR also show satisfactory recall values, whereas KNN struggles significantly. Lastly, Figure 4 confirms the Decision Tree’s dominance in F1 score—demonstrating a balanced and reliable classification performance—while KNN’s poor F1 score reflects its instability and reduced suitability for complex classification tasks.
Figure 1: Accuracy Performance of Machine Learning Models
Figure 2: Precision Performance of Machine Learning Models
Figure 3: Recall Performance of Machine Learning Models
Figure 4: F1 Score Performance of Machine Learning Models
The evaluation and comparison of model performances not only highlight the Decision Tree as the most reliable classifier for this study but also serve as the basis for implementing it in a practical application. Building on these findings, the next section presents the development of a Graphical User Interface (GUI) designed to integrate the chosen model and provide a user-friendly tool for heart disease prediction.
Graphical User Interface for Heart Predictive System
A Graphical User Interface (GUI) was developed to operationalize the trained Decision Tree model for heart failure prediction. The GUI comprises 13 input fields, each corresponding to a clinical feature from the dataset. These include parameters such as age, blood pressure, serum creatinine, and other relevant medical indicators. The interface is designed to allow end-users, particularly healthcare practitioners, to manually input patient data in a structured and guided manner.
To enhance usability and minimize input errors, each field is supplemented with descriptions and constraints on valid input formats or ranges. This design ensures data integrity and facilitates accurate predictions. Figure 5 illustrates the main input interface of the GUI.
Figure 5: Interface of GUI
Upon completion of data entry, the user initiates prediction by clicking the Predict button. The system then processes the inputs through the embedded Decision Tree model and presents the prediction outcome on a subsequent screen. The result clearly states whether the patient is at risk of heart disease, as demonstrated in Figure 6.
Figure 6: Example of GUI-Based Prediction Result
This GUI serves as a lightweight, platform-independent application that bridges machine learning model outputs with practical clinical use. It facilitates real-time, interpretable decision support and may assist clinicians in preliminary screening or patient triage in settings where rapid assessment is beneficial.
CONCLUSIONS
This study aimed to develop a predictive system for heart disease using machine learning techniques. Five models: Decision Tree, Linear Regression, KNN, SVM, and Gaussian Naive Bayes were successfully developed and trained using Jupyter Notebook. All models were evaluated for their accuracy, precision, recall and F1 score. Results show that Decision Tree outperformed the other models, achieving above 90% of all evaluated elements, followed by Linear Regression model. These results highlight the Decision Tree model’s effectiveness in correctly identifying heart disease cases and its potential reliability in real-world applications.
A Graphical User Interface (GUI) was developed for Decision Tree model using the Tkinter library to improve end user accessibility to the system. The GUI allows users to key in 13 clinical parameters obtained to immediately predict the heart disease. Early and accurate prediction can help healthcare to pre-screening heart disease patients and plan the necessary treatment to save patient’s lives. The system provides a promising foundation for decision support in medical diagnostics and can be further enhanced through model optimization, extended feature sets, and clinical validation in future work.
ACKNOWLEDGMENTS
The authors would like to thank Centre for Research and Innovation Management (CRIM), University Technical Malaysia Melaka (UTeM) for sponsoring this work.
REFERENCES
- Chua, S., Sia, V., & Nohuddin, P. N. (2022). Comparing machine learning models for heart disease prediction. 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET). https://doi.org/10.1109/iicaiet55139.2022.9936861
- Qi Fang Bi, Katherine E Goodman, Joshua Kaminsky, Justin Lessler. (2019). What is Machine Learning? A Primer for the Epidemiologist, American Journal of Epidemiology, Volume 188, Issue 12, December 2019, Pages 2222–2239, https://doi.org/10.1093/aje/kwz189
- P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan and A. Saboor. (2020) “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” in IEEE Access, vol. 8, pp. 107562-107582, 2020, doi: 10.1109/ACCESS.2020.3001149.
- Moeini Najafabadi, Z., Bijari, M., & Khashei, M. (2019). “Making investment decisions in stock markets using a forecasting-Markowitz based decision-making approaches. Journal of Modelling in Management”, 15(2), 647–659. https://doi.org/10.1108/jm2-12-2018-0217.
- Agarwal, S. Taware, S.A. Yadav, D. Gangodkar, A.L.N. Rao, V.K. Srivastav. (2022) “Customer – Churn Prediction Using Supervised Machine Learning Technique Along With Recommendation”. International Research Journal of Modernization in Engineering Technology and Science. https://doi.org/10.56726/irjmets46565
- A Sulaiman, K.W. Chuink, M.N.S. Zainudin, A. M. Yusop, S.F. Sulaiman, M.P. Abdullah. (2022). “Data-driven fault detection and diagnosis for centralised chilled water air conditioning system”, Przegląd Elektrotechniczny, 1(1), pp. 217–221. https://doi.org/ 10.15199/48.2022.01.47.
- Zainudin, M.N.S., Hussin, N., Saad, W.H.M., Radzi S.M., Noh Z.M., Sulaiman N.A., Razak M.S.J.A. (2021). “A Framework for Chili Fruits Maturity Estimation using Deep Convolutional Neural Network”, Przegląd Elektrotechniczny, vol. 97 (12), pp. 77 – 81. https://doi.org/10.15199/48.2021.12.13.
- Thakur, J., & Paika, R. (2023). The world noncommunicable disease federation’s international certification course of primary health-care physician in noncommunicable diseases: Key to strengthen primary health-care interventions in noncommunicable diseases. International Journal of Noncommunicable Diseases, 8(3), 115. https://doi.org/10.4103/jncd.jncd_90_23.
- Suseendran, N. Zaman, M. Thyagaraj, and R. K. Bathla. (2019). “Heart Disease Prediction and Analysis using PCO, LBP and Neural Networks,” 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE), Dec. 2019.
- V. Saraswathi, K. Gajavelly, A. K. Nikath, R. Vasavi, and R. R. Anumasula. (2022). “Heart Disease Prediction Using Decision Tree and SVM,” in Algorithms for intelligent systems, pp. 69–78. doi: 10.1007/978-981-16-7389-4_7.
- Sateesh and R. Balamanigandan, (2022). “Heart Disease Prediction using Innovative Decision tree Technique for increasing the Accuracy compared with Convolutional Neural Networks,” 2022 2nd International Conference on Innovative Practices in Technology and Management (ICIPTM), Gautam Buddha Nagar, India, 2022, pp. 583-587, doi: 10.1109/ICIPTM54933.2022.9754196.