EXPLAINABLE MACHINE LEARNING MODEL FOR PREDICTING THE SEVERITY LEVEL OF CHRONIC KIDNEY DISEASE

TOYBA, MOHAMMED

EXPLAINABLE MACHINE LEARNING MODEL FOR PREDICTING THE SEVERITY LEVEL OF CHRONIC KIDNEY DISEASE

TOYBA, MOHAMMED

URI: http://ir.bdu.edu.et/handle/123456789/16449

Date: 2024-04

Abstract:

Chronic kidney disease (CKD) has far-reaching impacts on individuals and healthcare systems. Without timely and accurate prediction of CKD severity, patients may face delayed interventions, suboptimal treatment plans, and increased risk of adverse health outcomes. Clinicians struggle to identify high-risk patients who require more aggressive interventions like dialysis or kidney transplantation. This results in suboptimal utilization of healthcare resources and compromised patient outcomes. Even if there was research conducted on CKD severity, the explainability of typical machine learning models is lacking, which makes it difficult to comprehend the underlying causes influencing the severity prediction. In actual clinical situations, this lack of explainability make the model less trustworthy, unaccepted, and difficult to use. The objective of this research is to develop an explainable Machine Learning model for accurately predicting the severity of chronic kidney disease (CKD). In this research, we used GBM, XGBoost, and LightGBM algorithms for prediction purposes and ELI5, LIME, and SHAP for model explainability. To address the issue of class imbalance in the dataset, we applied the Synthetic Minority Over-sampling Technique (SMOTE) prior to model training. We then compared the performance of the models before and after the SMOTE application. Additionally, we explored different train-validation-test data split ratios, including 60:20:20, 70:15:15, and 80:10:10. According to the experimental findings, XGBoost outperforms GBM and LightGBM, achieving a prediction accuracy of 97% using Bayesian HPO after SMOTE. The 70:15:15 ratio was found to be the most optimal, providing a good balance between model training, validation, and testing. User studies indicate that LIME provides preferable explanations in terms of user understandability, trustworthiness, and satisfaction. Further analysis of feature importance revealed that the most influential factors in the XGBoost model were sex, creatinine, age, heart disease, hypertension, diabetes, and blood pressure. Lastly, we include explainable machine learning models in a mobile application so that medical professionals throughout the globe can use the suggested models. Keywords: Chronic Kidney Disease, Explainable Machine Learning, Black-box model, Severity level vi

Show full item record