SOFTWARE METRICS SELECTION FOR SOFTWARE FAULT PREDICTION USING MACHINE LEARNING TECHNIQUES

BIHONEGN, ABEBE GETAHUN

SOFTWARE METRICS SELECTION FOR SOFTWARE FAULT PREDICTION USING MACHINE LEARNING TECHNIQUES

BIHONEGN, ABEBE GETAHUN

URI: http://ir.bdu.edu.et/handle/123456789/12636

Date: 2021-07

Abstract:

The quality of any software can be measured in the earlier phases of software development by figuring the values of metrics found from software fault prediction models. Even if there are several software metrics, it is not essential to use all of them for SFP. Thus, the selection of software metrics is a critical point to build a software fault prediction mode. In this regard, a wide range of Machine Learning models has been developed for software metric selection for software fault prediction. But most researchers don’t focus on which class of metrics (process and product) are better for binary class software fault predictions. So, knowing software metrics which are the most important pointers of software fault prediction is vital for accurate fault prediction. Since software process and product metrics are the core for software measurements. The contribution of this study is building a binary class classification of fault prediction model using Support vector machine (SVM) and Naïve Bayes (NB) classification algorithms. We trained and tested the model with each metric dataset and checked the performance of the model with each metric dataset through different experiments rather than selecting metrics using feature selection methods. The experimental result indicates that function in, coupling between methods, afferent coupling and function out product metrics are most significance with accuracy of 98.7%, 98.5%, 98.2%, and 97.2% respectively, for SFP model when the classifier is SVM whereas, data access metrics, coupling between objects, inheritance coupling and efferent coupling product metrics are most significance with accuracy of 99.1%, 99.0%, 98.5%, and 98.4% respectively, for SFP model when the classifier is NB. Total-line of code, delta line comment and max-code churn process metrics are most significance with accuracy of 98.7%, 98.2% and 97.8% respectively, for SFP model when the classifier is SVM whereas, number of defects in the previous version, max-code churn and number of distinct committer process metrics are most significance with accuracy of 99.5%, 95.7%, and 95.2% respectively, for SFP model when the classifier is NB. our model identify the effects and relationships of each process and product metrics on software fault prediction model and select most significant metrics for those models. Therefor selecting software metrics using each metrics threshold value as training and testing data is more appropriate than selecting metrics using feature selection methods.

Show full item record