Developing an Explainable Model for Early Detection of Malaria Outbreaks Using Machine Learning Approach with Hyperparameter Optimization

Girma, Moges Ayenew

Developing an Explainable Model for Early Detection of Malaria Outbreaks Using Machine Learning Approach with Hyperparameter Optimization

Girma, Moges Ayenew

URI: http://ir.bdu.edu.et/handle/123456789/16463

Date: 2024-07

Abstract:

Malaria is a global health challenge causing infections and deaths. It continues to be one of the leading communicable causes of death worldwide. Currently studying malaria often rely on basic surveillance system which is not efficient for capturing the factors influencing malaria outbreak occurrence in Ethiopia. These statistical models are slow, or limited in scope, leading to delayed responses and not flexible in handling complex relationships and non-linearity in data. In addition, many machine learning models operate as black boxes, making it challenging to understand their decision-making process. Early detection of epidemics based on existing technology is crucial for effective disease control and prevention strategies. This study aims to develop an explainable machine learning model for early detection of malaria outbreak. The datasets, which use to build the model collected from Amhara Regional Health Bureau and Amhara Public Health Institute in Ethiopia. The prediction model developed to predict whether or not an outbreak has occurred based on the information in the dataset. In this study, machine learning algorithms were utilized to develop the model. Multiple models, including Logistic regression, Decision Tree, K-Nearest Neighbors, Artificial Neural Network, Random Forest, and Extreme Gradient Boosting, were trained to predict the occurrence of malaria outbreaks using the collected dataset. SMOTE was applied to address class imbalance. Cross-validation was utilized to reduce overfitting by splitting the data into multiple subsets and iteratively training the model on one. Hyper-parameters were optimized using Bayesian, Grid Search, and Random Search techniques. Among these techniques, Grid Search yielded the best combinations of hyper-parameters compared to Random Search and Bayesian methods. The Performance of the prediction model evaluated with evaluation metrics such as accuracy, precision, recall, F1-score and AUC ROC curve. The results of the experiments indicate that XGBoost, achieving an accuracy of 0.98 and an AUC value of 0.99 after SMOTE, outperformed other machine learning techniques. The combination of gradient boosting, regularization, feature importance, efficiency, and flexibility make XGBoost a powerful choice for machine learning tasks. Model explainability techniques such as LIME and SHAP were employed in this study to make the model more understandable. The study has significant potential to save lives, optimize resources, strengthen healthcare systems, and contribute to global health goals. Key words: - Explainable ML techniques, Hyperparameter Optimization, Malaria Outbreaks, SMOTE,

Show full item record