Abstract:
Software development is a highly structured process that involves the creation and maintenance
of a particular system, ranging from simple applications to complex enterprise software. Despite
following a well-defined process, unforeseen events can occur at any stage of the SDLC that may
impact the software development process, leading to losses or failures in software development.
Software projects inherently involve risks, and no software development project is immune to
these risks. Identifying and predicting such risks accurately is a challenge in software project
development. Specially most of risk occur at requirement and design phase which leads to
expand the risks for other later phase and more economic losses. To address this challenge, this
study aims to develop a software risk prediction model using homogenous ensemble machine
learning algorithms. These algorithms were selected due to their proven effectiveness in handling
complex datasets and their ability to achieve high prediction accuracy. We have used an
experimental research methodology to develop a software risk prediction model. The
methodology involved collecting datasets related to requirements and design from publicly
available websites such as Zenodo and Harvard education dataset around 400 number of
instances. These datasets were then used to train and validate the performance of the machine
learning algorithms. Our study has achieved impressive prediction with the algorithms Gradient
Boost, Random Forest, AdaBoost, and bagging algorithms with their homogenous decision tree
which are score 98.67%, 97.3%, 96.0%, and 96.0% respectively. Using the four different
homogeneous ensemble machine learning algorithms we develop software risk predictive
models. Ultimately, Gradient Boost was selected as the algorithm to construct our risk predictive
model due to its superior performance and ability to handle complex data. By employing this
model, software development organizations can improve their ability to identify and mitigate
risks, thereby improving the quality and reliability of their software products.
Keywords: ensemble machine learning algorithms, requirements phase, design phase, software
risk prediction.