Abstract:
Software development is the process of developing a certain system by following different
steps, which are usually called software development life cycles. In each phase/step, there
may be a change in artifacts. Changes that are accepted by the change approval board go
to the next maintenance process. Implementing changes is difficult and expensive because
most of the time modules and artifacts in previous versions are not the same as in newer
versions. In addition, changes usually cause impacts on other modules and artifacts. Due
to the impact of the change, the time required to implement changes also varied and became
high. The type of maintenance, the impact of changes, and maintenance time can be
determined by analyzing software repository data such as issue descriptions, the issue's
created and resolved dates, personnel assigned to resolve the issue, and a list of affected
versions. But the selection of the important software repositories and extracting the relevant
information from those repositories is a challenging task. Some research has been done
previously using software repositories to support maintenance types, analyzing the impact
of the change and maintenance time. But the research focuses on single maintenance tasks
and specific software types, so the generality of the studies is in question both in
maintenance tasks and software types. In addition to this limited amount of data, only
version history data was used. So, in this research, we extracted relevant information from
software repositories using different extraction methods such as PyDriller. A linear support
vector classifier, random forest, logistic regression, LSTM, Bi-LSTM, and other machine
learning algorithms are applied to predict maintenance types and change impacts. An
artificial neural network is used to estimate maintenance time. The result of the experiment
shows that random forest and LSTM performed better, with an accuracy of 94% and 95%,
respectively. Other machine learning algorithms have dependable performance as well.
The mean squared error of the artificial neural network algorithm is 0.0028. According to
Pearson’s and Spearman’s correlation analysis results, maintenance type and maintenance
time show a positive correlation, while change impact shows a negative correlation both
with maintenance type and maintenance time.
Keywords: - Software development, change impact, software repositories, maintenance
tasks, PyDriller