PREDICTING DIABETES BASED ON RISK FACTORS AND ASSOCIATED DISEASES USING ENSEMBLE MACHINE LEARNING

HAILEMARIAM, MULUALEM SIMENEH

PREDICTING DIABETES BASED ON RISK FACTORS AND ASSOCIATED DISEASES USING ENSEMBLE MACHINE LEARNING

HAILEMARIAM, MULUALEM SIMENEH

URI: http://ir.bdu.edu.et/handle/123456789/16442

Date: 2024-06

Abstract:

The aim of this research is to develop predictive model for diabetes based on risk factors and associated diseases using ensemble machine learning. The problem addressed in this research is to enhance the public health and take the correct action. The research emphasizes the need for timely detection and prediction of diabetes to prevent complications and improve public health. The study was conducted by using experimental research. The data source for this research is the CDC, which was collected by BRFSS. The dataset was 253680 and there is imbalanced. After applying the data pre processing tasks and class balance using random under sampling majority class there is 70692 instances were used for the model. The attribute was reduced to 18 from their original 21features, by using feature selection technique wrapper method (recursive feature elimination)). To construct the best proposed model six experiments were conducted by splitting the dataset in to train, validation and test set with the ratio of 80%, 10%, 10% respectively using Random forest, Catboost, bagging decision tree, AdaBoost, XGBoost and Extra tree algorithms. The performance of the model were evaluate using different evaluation parameters such as precision, recall, accuracy, F1 score, AUC and confusion matrix. The overall accuracy of Random forest, Catboost, bagging decision tree, AdaBoost, XGBoost and Extra tree are 90.16%, 88.94%, 88.97%, 87.87%, 88.81% and 89.86% respectively. Random forest is the best predictive model with an accuracy of 90.16% and ROC of 96% from the others. Model explainability is made to understand and interpret how a machine learning model makes predictions or decisions using local interpretable model explanations (lime). Key words: diabetes, risk factors, associated diseases, lime, ensemble machine learning, predictive model.

Show full item record