Detecting Kidney Stones Using Urine Test Analysis: A Machine Learning Perspective
Isaac Osei1, Acheampong Baafi-Adomako2, Dennis Opoku Boadu2
1Amity University
2University of Ghana
DOI: https://doi.org/10.51244/IJRSI.2024.1110061
Received: 02 October 2024; Accepted: 07 October 2024; Published: 20 November 2024
Kidney stones, a prevalent urological condition, can cause severe discomfort and serious health complications if untreated. Traditional diagnostic methods, such as CT scans and ultrasounds, while effective, are often costly, expose patients to radiation, and may not be accessible in low-resource settings. This study explores a machine learning-based alternative that uses urine test data for kidney stone detection, aiming to provide a non-invasive, cost-effective, and accessible diagnostic tool. The study evaluates various machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Logistic Regression, Decision Trees, and Gradient Boosting, to predict kidney stones using urine analysis data. Key urine parameters analyzed include specific gravity, pH, osmolality, conductivity, urea, and calcium concentrations. With a dataset of 79 samples, each labeled for kidney stone presence, preprocessing steps ensured data quality through normalization and exploratory analysis. Models were trained on 80% of the data and tested on the remaining 20%, with performance measured through accuracy, precision, recall, F1 score, and AUC-ROC metrics. The Random Forest model achieved the highest performance, with an accuracy of 94%, precision of 0.95, recall of 0.94, F1 score of 0.94, and AUC-ROC of 0.94, while Gradient Boosting achieved a slightly higher AUC-ROC at 0.96. Feature analysis identified osmolality and urea as the most significant predictors, followed by specific gravity and calcium concentration. These findings align with clinical knowledge on kidney stone formation. The high accuracy and reliability of the Random Forest model underscore its potential as a diagnostic tool for kidney stones. However, limitations include the need for larger datasets to improve generalizability and model transparency for clinical trust. Addressing these factors and facilitating integration into clinical workflows could enhance early detection, improve patient outcomes, and offer a promising alternative to traditional methods.
Keywords: Machine Learning, Classification Algorithm, Kidney Stones, Classification Algorithm, Random Forest, Support Vector Machine.
Kidney stones, or renal calculi, are solid formations resulting from the aggregation of minerals and salts within the kidneys. They can manifest anywhere along the urinary tract, from the kidneys to the bladder, often due to highly concentrated urine that facilitates the crystallization of minerals. The incidence of kidney stones is rising globally, leading to considerable health complications and increased healthcare expenses. It’s projected that around 12% of people worldwide will experience a kidney stone during their lifetime, with recurrence rates for those affected being as high as 50% within five years of an initial episode (Romero et al., 2010; Pearle et al., 2014). Traditional diagnostic methods for kidney stones include imaging techniques like non-contrast computed tomography (CT), ultrasound, and X-rays. While these methods are generally effective, they come with drawbacks. CT scans, regarded as the gold standard, expose patients to ionizing radiation and can be expensive (Fulgham et al., 2013). Ultrasound, although less risky and more affordable, might miss smaller stones or provide less detailed imaging. This has sparked interest in developing non-invasive, cost-effective, and rapid diagnostic alternatives that could be utilized in primary care settings or even at home.
Problem Statement
Kidney stones are a prevalent and recurrent urological condition that lead to significant pain, morbidity, and healthcare costs globally. While traditional diagnostic methods like computed tomography (CT) scans and ultrasounds are effective, they come with several limitations, including high costs, radiation exposure, and limited accessibility in resource-limited settings. These methods also require advanced medical infrastructure and skilled personnel, making them less practical for primary care or remote settings. Urine analysis, being non-invasive, cost-effective, and widely available, offers valuable insights into the biochemical conditions that predispose individuals to kidney stone formation. However, interpreting urine analysis data can be complex and demands sophisticated analytical methods to detect subtle patterns indicative of kidney stones. Despite the potential advantages, the use of machine learning for urine test analysis in kidney stone detection is still underexplored. Existing studies have been constrained by small sample sizes, data imbalances, and a lack of comprehensive feature sets. Moreover, integrating these models into clinical practice presents challenges related to model interpretability, data security, and the necessity for clinical validation.
Objectives
The following are the research objectives:
Urine Analysis in Kidney Stone Detection
Urine analysis has been a cornerstone in the clinical evaluation of kidney stones. It provides crucial information on the urine’s chemical composition, helping identify factors contributing to stone formation. Parameters typically measured include pH, specific gravity, and concentrations of calcium, oxalate, uric acid, citrate, and creatinine, among others (Rodgers et al., 2017). These measurements help identify individuals at risk of developing kidney stones and inform preventive and therapeutic strategies. Recent advancements in urine analysis techniques, such as liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) spectroscopy, have enhanced the accuracy and sensitivity in detecting urinary metabolites associated with kidney stones. However, these advanced methods generate complex datasets that require sophisticated analytical tools capable of managing large data volumes and uncovering subtle patterns that might elude traditional statistical methods.
Machine Learning in Medical Diagnostics
Machine learning (ML), a branch of artificial intelligence (AI), focuses on developing algorithms that learn from data to make predictions. Unlike traditional programming, where explicit instructions are provided to the computer, ML algorithms enhance their performance as they process more data. This ability makes ML particularly suitable for medical diagnostics, where variable relationships can be intricate and non-linear (Esteva et al., 2019). In kidney stone detection, ML can analyze urine test data, identifying patterns and combinations of urinary parameters indicative of stone formation. By training ML models on extensive datasets of urine analysis results paired with diagnostic outcomes, these models can learn to predict the presence of kidney stones with high accuracy.
Urine Analysis and Kidney Stones
Urine analysis is a crucial diagnostic tool that provides insights into the biochemical environment conducive to kidney stone formation. Key parameters include pH, specific gravity, osmolality, conductivity, urea, and calcium concentrations. These metrics can help determine an individual’s risk of developing kidney stones and inform clinical decisions regarding preventive and therapeutic strategies. Recent technological advancements, such as liquid chromatography-mass spectrometry (LC-MS) and nuclear magnetic resonance (NMR) spectroscopy, have enhanced the precision of detecting urinary metabolites. However, the complexity of these data sets requires advanced analytical methods capable of managing large volumes of data and identifying complex patterns.
Recent Studies and Findings
Lin et al. (2020) used support vector machines (SVM) to predict kidney stones based on urine metabolomics data, achieving an accuracy of 88%. Chen et al. (2019) applied deep learning techniques to urine microscopy images, resulting in an accuracy of 92%. Black et al. (2020) created a deep learning algorithm utilizing ResNet-101 to identify kidney stone composition from images, achieving an accuracy of 85.71% which underscores the potential of deep learning in the medical image analysis for detecting kidney stones. These studies highlight the efficacy of ML models in improving diagnostic accuracy.
Serrat et al. (2017) developed the myStone system, which utilizes Random Forest classifier for automatic kidney stone classification from images, achieving an accuracy of 63%. Esteva et al. (2019) demonstrated the utility of gradient boosting (GB) models in clinical urine tests, achieving an accuracy of 92%. Despite the complexity and interpretability challenges of GB models, they outperform simpler models in terms of accuracy. Bédard et al. (2020) explored logistic regression, finding it less accurate compared to more sophisticated models like random forest (RF) and GB.
Analyzing feature importance within these models reveals that osmolality and urea are critical predictors of kidney stones. Specific gravity and calcium concentration are also significant, while pH and conductivity, though less influential, contribute to the model’s overall performance (Rodgers & Webber, 2017).
Wu et al. (2018) compared the performance of SVM, logistic regression, and RF models. Kourou et al. (2018) compared decision trees, k-nearest neighbors (KNN), and naïve Bayes classifiers, highlighting decision trees for their interpretability, though RF and GB outperformed in accuracy. Kazemi & Mirroshandel (2018) compared some classifiers to an ensemble learning approach to predict kidney stone types from textual data, achieving a high accuracy of 97.10%.
Integrating Machine Learning into Clinical Practice and Real-World Applications
Implementing ML models in clinical settings presents challenges, primarily related to model interpretability and data security. Techniques like SHAP values and LIME can enhance model transparency, making them more acceptable for clinical use (Chen et al., 2019). Data privacy and security are critical, necessitating compliance with regulations such as HIPAA and GDPR. Robust encryption, secure storage, and strict access controls are essential to safeguard patient data (Esteva et al., 2019).
Oba et al. (2021) investigated ML models in resource-limited settings, emphasizing their potential to offer accessible and cost-effective diagnostic tools. While advanced models like RF and GB provide high accuracy, simpler models like logistic regression are easier to implement in low-resource environments.
Challenges and Future Directions
While promising, the application of ML for kidney stone detection faces several challenges. Large, high-quality datasets are crucial for training accurate models. Existing studies often suffer from small sample sizes, limiting generalizability. Future research should focus on expanding datasets and including diverse patient populations and urine parameters. Integrating ML models into clinical workflows requires collaboration among data scientists, clinicians, and regulatory bodies to ensure safety, effectiveness, and user-friendliness. Training healthcare providers to use and interpret these models is also critical. Improving model interpretability remains a significant challenge. Transparent models that offer clear, actionable insights are essential for gaining clinicians’ trust and facilitating adoption.
Conclusion
The application of machine learning to urine test analysis for kidney stone detection has the potential to revolutionize medical diagnostics. Recent studies demonstrate the effectiveness of various ML models, including Random Forest (RF), Gradient Boosting (GB), Deep Learning models, and Support Vector Machines (SVM), in accurately predicting kidney stones. Identifying key urine parameters, such as osmolality, urea, specific gravity, and calcium concentration, aligns with clinical knowledge.
Despite promising results, challenges like dataset size, model interpretability, and clinical integration need addressing. Future research should focus on expanding datasets, enhancing model transparency, and validating models in clinical settings to ensure their practical applicability and improve patient outcomes.
Table I A Summary Review Of Related Works
# | Year | Author | Title | Data | Classifier | Accuracy |
1 | 2020 | Black et al. | Deep learning computer vision algorithm for detecting kidney stone composition | Images | ResNet-101 | 85.71% |
2 | 2021 | Williams et al. | Urine and stone analysis for the investigation of the renal stone former: a consensus conference | N/A | N/A | N/A |
3 | 2009 | Rule et al. | Kidney Stones and the Risk for Chronic Kidney Disease | Text | N/A | N/A |
4 | 2017 | Serrat et al. | myStone: A system for automatic kidney stone classification | Images | RF | 63.00% |
5 | 2021 | Mao et al. | Relationship between urine specific gravity and the prevalence rate of kidney stone | Text | N/A | N/A |
6 | 2018 | Kazemi & Mirroshandel | A novel method for predicting kidney stone type using ensemble learning | Text | Ensemble | 97.10% |
7 | 2018 | Wu et al. | Machine learning approach for detecting urinary stone disease in microscopic images | Images | Support Vector Machine (SVM) | 89% |
8 | 2020 | Lin et al. | Urine metabolomics analysis for early detection of kidney stones using support vector machines | Text | Support Vector Machine (SVM) | 88% |
9 | 2019 | Chen et al. | Detecting kidney stones in urine microscopy images using convolutional neural networks | Images | Convolutional Neural Network (CNN) | 92% |
10 | 2017 | Rodgers & Webber | The role of urine analysis in the management of urolithiasis | Text | K-Nearest Neighbours (KNN) | 84% |
The classification task is used to predict future instances based on historical data. Previous research has seen experts applying various data mining techniques, such as clustering and classification, to accurately diagnose kidney stones and kidney diseases. In this study, the researcher employs several machine learning algorithms (classifiers) to detect kidney stones, including Random Forest (RF), Logistic Regression, K-Nearest Neighbors (KNN), Decision Trees, Gaussian Naïve Bayes, Support Vector Machine (SVM), Multi-Layer Perceptron (ANN), and Gradient Boosting.
Data Source
Secondary data was utilized to carry out this research work. The dataset used was downloaded from Kaggle (uploaded by Vuppala Adithya Sairam, Kaggle Datasets Expert). This involves an authentic dataset comprising 79 data instances or observations, encompassing 7 distinct features (6 predictive features and 1 class). Gravity, ph, osmo, cond, urea, calc, and target. (Fig 1).
Fig. 1. Attributes and details of the dataset
Feature Description
With the exception of the target variable (target, which is categorical), all the remaining features are numeric in nature. The table below (Table II) depicts a short description of the various features in the dataset.
Table II A Short Description Of The Features In The Dataset
# | Feature | Description |
1 | gravity | Specific gravity of the urine sample |
2 | ph | pH level of the urine sample |
3 | osmo | Osmolality of the urine sample |
4 | cond | Conductivity of the urine sample |
5 | urea | Urea concentration in the urine sample |
6 | calc | Calcium concentration in the urine sample |
7 | target | A binary target variable indicating the presence (1) or absence (0) of kidney stones |
Process Model (Working Process)
The dataset was loaded and pre-processed, followed by an analysis to uncover hidden patterns and insights. It was then divided into two sets: training and testing, with a ratio of 4:1. Eighty percent (80%) for training and twenty percent (20%) for testing. The training set was used for the training of the various models (classifiers) whiles the testing set was used to test or validate the various models. The models were evaluated using several metrics so that the best one could be chosen. The diagram below (Fig. 2) depicts the process flow of the proposed model.
Fig. 2. Flow chart (Process flow) of the proposed model
Data Pre-processing
Real-world data often isn’t in a format ideal for machine learning applications. It may contain noise and missing values. To address these issues and generate accurate predictions, the data must be processed thoroughly. Consequently, the dataset underwent extensive pre-processing. This included activities such as data cleaning, transformation, normalization, and handling imbalanced data, among other techniques.
Data cleaning generally involves identifying and addressing noise, fake data, duplicate entries, and missing values. To ensure accurate and useful results, it is essential to remove noise and fill in the missing values. Fortunately, this dataset did not have any missing values, duplicate entries, and fake data.
Transformation involves converting data from one format to another to enhance its comprehensibility. This process includes tasks such as aggregation, data type casting, encoding, and smoothing. All numeric and categorical features or variables are supposed to be converted to their appropriate data types and formats.
Scaling involves modifying the range of feature values to a standard scale without altering the differences in their ranges. This process ensures that each feature has an equal contribution to the model, thereby enhancing the performance and accuracy of machine learning algorithms. Standardization method (Z-score) was used to scale all the features.
Dimensionality Reduction (Data Reduction) involves removing unwanted or less relevant features. Here no variable was removed.
Handling Imbalance Data entails adjusting the data distribution to prevent biases during analysis and modelling. The data was fairly biased with 34 observations being patients with kidney stones and 45 without kidney stones. Therefore, there was the need to balance the dataset in order to prevent the biasness. Over sampling method was used on the patients with kidney stones so as to increase the observation in order to match up with those without kidney stones. The diagrams below depict the dataset before and after handling the imbalance data.
Fig. 3. Dataset before and after balancing
Metrics for Evaluating Machine Learning Models
Confusion Matrix: A Confusion Matrix is a simple yet effective method to evaluate the performance of a classification model. It achieves this by comparing the number of positive and negative instances that were correctly or incorrectly classified (Osei, I., & Adomako, A. B., 2024)
Table III Confusion Matrix
Predicted Positive | Predicted Negative | |
Actual Positive | TP | FN |
Actual Negative | FP | TN |
True Positives (TP):
True positives are instances where both the predicted class and the actual class are positive (true).
True Negatives (TN):
True negatives are instances where both the predicted class and the actual class are negative (false).
False Negatives (FN):
False negatives are instances where the predicted class is negative (0), but the actual class is positive (1).
False Positives (FP):
False positives are instances where the predicted class is positive (1), but the actual class is negative (0).
From the confusion matrix, metrics such as accuracy, precision, recall, and F1-score can be calculated using the following formulas.
Area under Curve: The Area under Curve (AUC) is a valuable metric with values ranging from 0 to 1. The closer the AUC is to 1, the better the machine learning model is at distinguishing between kidney stone cases and non-kidney stone cases. A model that completely differentiates between the two classes has an AUC of 1. Conversely, if all non-kidney stone instances are incorrectly classified as kidney stones and vice versa, the AUC is 0 (Osei, I., & Adomako, A. B., 2024).
Deployment of the Proposed Model
With the help of Flask framework, HTML, and CSS, the model was deployed in a web based which can easily be integrated into existing healthcare systems. The figures below show the respective interfaces.
Fig. 4. Homepage of the web application
Fig. 5. Prediction phase
Fig. 6. Results or output phase
Exploratory Data Analysis (EDA)
Fig. 7. Boxplots for continuous variables in the dataset
From the above box plots, the following insights were discovered;
Fig. 8. Correlation coefficients between the continuous variables
Fig. 8. Pair plots for continuous variables
Confirmatory Data Analysis (CDA)
A parametric statistical test (Logistic regression) was used on the variables against the target to test for causality. The following are the deductions made;
Confusion Matrix
The figure below (Fig. 9) depicts the confusion matrices for the various classifiers
Fig. 9. Confusion Matrices for the classifiers
Performance Comparison of Various Classifiers
The table below depicts a comparison of the different metrics used to evaluate the classifiers.
Table IV Comparison Of Classifiers Using Various Evaluation Metrics
# | Classifier | ACC | PRE | REC | F1 | AUC |
1 | Logistic Regression (LR) | 78% | 81% | 78% | 78% | 88% |
2 | ANN | 83% | 83% | 83% | 83% | 91% |
3 | SVM | 83% | 84% | 83% | 84% | 83% |
4 | KNN | 78% | 77% | 78% | 76% | 76% |
5 | Decision Tree (DT) | 83% | 83% | 83% | 83% | 83% |
6 | Random Forest (RF) | 94% | 95% | 94% | 94% | 94% |
7 | Gaussian Naïve Bayes (GNB) | 72% | 74% | 72% | 73% | 86% |
8 | Gradient Boosting (GB) | 89% | 89% | 89% | 89% | 96% |
ACC = Accuracy, PRE = Precision, REC = Recall, F1 = F1-Score, AUC = Area under Curve
Fig. 10. Accuracies for the classifiers
Results and Performance Analysis
The confusion matrices in Figure 9 (Fig. 9) highlighted the rates of false positives (FP) and false negatives (FN), which are crucial considerations for any model. A false positive may lead to unnecessary treatment, while a false negative, especially in cases of undetected kidney stones, could result in a severe misdiagnosis. The Random Forest classifier showed a low incidence of FP and FN, enhancing its reliability. The false positives indicate that some records of patients without kidney stones exhibit characteristics similar to those of patients with kidney stones, while the false negatives suggest that some kidney stones patients show non-kidney stones-like characteristics.
Table IV evaluates accuracy, precision, recall, F-1 score, and AUC for various classification methods, as defined in equations (3.1) to (3.4). The Random Forest (RF) model achieved a 94% accuracy rate, outperforming the other classifiers. Precision, the ratio of correctly predicted positive observations to the total predicted positives, was highest for RF (0.95), indicating a lower false-positive rate. Recall, the measure of correctly predicted positive cases relative to all cases in the class, was also superior for RF (0.94).
The F-1 score, the harmonic mean of precision and recall, considers both false positives and negatives. Although it is not as straightforward as accuracy, the F-1 score is often more informative, especially with imbalanced class distributions. RF scored highest in this metric as well. The final metric, Area under Curve (AUC), evaluates the total area under the ROC curve, extending from (0, 0) to (1, 1). A higher score, closer to 1, signifies better performance. Here, RF excelled with a score of 0.94, although Gradient Boosting (GB) recorded a slightly higher score of 0.96.
Overall, the RF model outperformed all other classifiers in all metrics except AUC, suggesting that RF performed well on the dataset used for this research.
Benchmarking
The table below (Table V) shows the accuracy of some related work as compared to this work.
Table V Results Comparison Of The Related Works
# | Ref | CU | TD | ACC |
1 | This Study | Random Forest | Urine analysis data | 94% |
2 | Wu et al. | Support Vector Machine | Urine analysis data from clinical trials | 89% |
3 | Lin et al. | Support Vector Machine | Urine metabolomics data | 88% |
4 | Chen et al. | Convolutional Neural Network (CNN) | Urine microscopy images | 92% |
5 | Esteva et al. | Gradient Boosting | Urine analysis data integrated with clinical metadata | 92% |
6 | Rodgers and Webber | K-Nearest Neighbors (KNN) | Urine chemistry profiles | 84% |
7 | Black et al. | ResNet-101 | Images | 86% |
Ref = Reference, CU = Classifier used, TD = Type of data, ACC = Accuracy
The table above offers a comparative analysis of the accuracy of various machine learning models utilized for detecting kidney stones through urine test analysis across different types of datasets. The Random Forest model from the this study shows the highest accuracy at 94%, followed by Gradient Boosting and CNN models, both with accuracies of 92%. The range of dataset types and accuracy rates underscores the flexibility of machine learning approaches to different forms of urine analysis data and indicates potential for further enhancement in prediction accuracy.
Practical Implications
The machine learning approach, especially the Random Forest model, presents several advantages over traditional diagnostic methods:
Summary
The research demonstrated that machine learning, particularly the Random Forest classifier, can effectively detect kidney stones using urine test analysis. The high accuracy and reliability of the Random Forest model highlight its potential as a valuable diagnostic tool, offering a non-invasive, cost-effective, and accessible means of detecting kidney stones. By identifying key urine parameters, such as osmolality, urea, specific gravity, and calcium concentration, the study aligns with clinical knowledge and emphasizes the relevance of these features in kidney stone formation. This machine learning approach can significantly enhance early detection and patient outcomes, providing a promising alternative to traditional diagnostic methods.
Despite the promising results, certain limitations need to be addressed, including the necessity for larger and more diverse datasets and improved model interpretability. Future research should focus on expanding the dataset, developing hybrid models that combine machine learning with traditional diagnostic methods, and conducting clinical trials to validate model performance in real-world settings.
In conclusion, integrating machine learning models into clinical practice represents a significant advancement in leveraging data-driven approaches to enhance healthcare outcomes. By addressing current limitations and focusing on practical implementation, this method could substantially improve kidney stone management and patient care.
Challenges and problems encountered
The following challenges and problems were encountered during the research work:
Recommendations
Limitations
Conclusion
This study demonstrates the potential of using machine learning models to detect kidney stones through urine test analysis. However, addressing the outlined limitations and following the recommendations is crucial for advancing this research and improving patient care. Future work should focus on expanding the dataset, enhancing model interpretability, and conducting clinical validations to ensure the practical applicability of these models in healthcare settings.