Abstract:
Software development has several stages to be followed by developers, starting from planning to
deployment and maintenance. One of the most important steps in the software development life
cycle is software testing. Software testing is a process used to check the correctness, completeness,
and the quality of developed software. Regression testing is a type of testing which encompasses
the activities of retesting a software system after changes have occurred. Due to limited resources,
just a subset of test cases is usually executed for each release. As a result, identifying test cases
which are most likely to detect the majority of errors becomes a challenge. To generate efficient
test cases in regression and other testing approaches, test case prioritization has paramount
advantages. Taste case prioritization can be done based on different criteria. In this research we
have employed requirement correlation and fault severity to build a model for test case
prioritization. Previous researchers designed test case prioritization techniques based on
requirements. But they applied hard computing algorithms, which implies inflexibility and
imprecision of results. To perform this study, we used an experimental research approach. For the
experiment, we have prepared one thousand (1000) test cases which were labeled by experts as
positive and negative. The datasets that were fed to our proposed model were pre-processed using
Natural Language Processing (NLP) principles. We used Term Frequency-Inverse Document
Frequency (TF-IDF) vectorization after preprocessing to convert the textual data format to
vectorized form. The machine learning techniques applied to build the model are Support Vector
Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), and Decision Tree (DT). To
reach and identify a model that had high performance, we executed twelve different combinations
of experiments. The accuracy obtained from each experiment ranges from 75% to 94%. SVM
based on requirement correlation and severity of fault outperformed for test case prioritization.
When we consider the severity of a fault, only KNN is the best technique. The reason behind this
result is that KNN is good for a small number of features, and SVM is a better technique for a
large number of features.
Keywords: Machine Learning, NLP, Software testing. Test Case, Test Case Prioritization