Abstract:
The problem of bank distress in the World banking industry has been a major issue for all the
stakeholders, investors in the economy, and also the business world at large. In order to tackle any
ensuing conditions of bank collapse, predictive analysis of a bank's financial situation and
customer connection is quite beneficial. This study will be conducted to design a bank distress
predicting model for the banks. To comply with the research objectives, Secondary sources of data
will be used. To predict or forecast bank distress, an efficient Bank Distress Prediction (BDP)
model has become necessary. In this regard, a wide range of Machine Learning (ML) models has
been developed to predict distress in the banks. But, those BDP models have insufficient
performance due to challenges like the presence of redundant, irrelevant features, and imbalance
class problems. Imbalanced class occurs with data samples from two groups, the minority group
contains considerably smaller samples than the majority group. The imbalanced class nature of the
distressed data increases the learning difficulty of the classification algorithms to train the model.
The use of imbalanced data leads to off-target predictions of the minority class, but which is
considered to be more important than the majority class. These challenges depreciate the
performance of the distress prediction model depending on the predictor’s ability to tackle data
frauds. In this study, we proposed a bank distress prediction model that addresses imbalance class
problems using Feature selection techniques (for selecting the significant features), Synthetic
Minority Oversampling Techniques (SMOTE) used to produce balanced data and Random Forest
(RF) for classification algorithms. Further, we implement four classifier algorithms Logistic
Regression (LR), K-Nearest Neighbors (KNN), Decision Tree (DT), and Support Vector Machine
(SVM). We implement Random Forest (RF) on the transformed or resampled dataset. To evaluate
the performance of the proposed model, we did experiments on imbalanced datasets of the Polish
Bankruptcy dataset from the UCI Machine Learning repository. Hereafter, the proposed model is
expected to allow them to anticipate the status of businesses in the future and make decisions
accordingly. The Experimental results show that the proposed model makes a very good result, in
which 83% prediction accuracy and 78% by Decision Tree accuracy is attained for Polish
Bankruptcy datasets. So, we conclude that the proposed model improves the performance of BDP
effectively, and provides a brand-new way of dealing with the imbalanced dataset problem.
Keywords: Decision Tree, Support Vector Machine, Bank Distress Prediction, Synthetic Minority
Oversampling Techniques, Logistic Regression, K-Nearest Neighbors, & Random Forest