BDU IR

PREDICTING DIABETES BASED ON RISK FACTORS AND ASSOCIATED DISEASES USING ENSEMBLE MACHINE LEARNING

Show simple item record

dc.contributor.author HAILEMARIAM, MULUALEM SIMENEH
dc.date.accessioned 2025-02-14T11:36:15Z
dc.date.available 2025-02-14T11:36:15Z
dc.date.issued 2024-06
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/16442
dc.description.abstract The aim of this research is to develop predictive model for diabetes based on risk factors and associated diseases using ensemble machine learning. The problem addressed in this research is to enhance the public health and take the correct action. The research emphasizes the need for timely detection and prediction of diabetes to prevent complications and improve public health. The study was conducted by using experimental research. The data source for this research is the CDC, which was collected by BRFSS. The dataset was 253680 and there is imbalanced. After applying the data pre processing tasks and class balance using random under sampling majority class there is 70692 instances were used for the model. The attribute was reduced to 18 from their original 21features, by using feature selection technique wrapper method (recursive feature elimination)). To construct the best proposed model six experiments were conducted by splitting the dataset in to train, validation and test set with the ratio of 80%, 10%, 10% respectively using Random forest, Catboost, bagging decision tree, AdaBoost, XGBoost and Extra tree algorithms. The performance of the model were evaluate using different evaluation parameters such as precision, recall, accuracy, F1 score, AUC and confusion matrix. The overall accuracy of Random forest, Catboost, bagging decision tree, AdaBoost, XGBoost and Extra tree are 90.16%, 88.94%, 88.97%, 87.87%, 88.81% and 89.86% respectively. Random forest is the best predictive model with an accuracy of 90.16% and ROC of 96% from the others. Model explainability is made to understand and interpret how a machine learning model makes predictions or decisions using local interpretable model explanations (lime). Key words: diabetes, risk factors, associated diseases, lime, ensemble machine learning, predictive model. en_US
dc.language.iso en_US en_US
dc.subject Computer Science en_US
dc.title PREDICTING DIABETES BASED ON RISK FACTORS AND ASSOCIATED DISEASES USING ENSEMBLE MACHINE LEARNING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record