Software Defect Prediction Using Source Code Metric and Semantic Features: A Deep Learning Approach

Esrael, Geremew

dc.contributor.author	Esrael, Geremew
dc.date.accessioned	2021-08-13T06:25:36Z
dc.date.available	2021-08-13T06:25:36Z
dc.date.issued	2021-02-02
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/12385
dc.description.abstract	Software Defect Prediction (SDP) aims to observe defective modules to modify the cheap allocation of testing resources, which is associate with economically important activity in software quality assurance. Many contributions have been made in this area through the features representation—from source code metric and/or contextual data—by Deep Learning (DL) models; however, the prediction performance they present with high false positive and unactionable alerts. Specifically, Obtaining Combined Defect Data (CDD), lack of proper features combining methods and learning effective features representation from CDD are some of the major challenges encountered in supervised defect prediction domain. Combined features have the potential to improve the performance of individual features-based prediction models and thereby reduce false-positive rates. Thus, the goal of this research effort was to improve the performance of individual features-based defect prediction model by proposing and testing novel frameworks named CDDM and LHFR by Combined Defect Data Modelling from source code metrics and context representation and Learning Hybrid Feature Representation from CDD for SDP, respectively. Specifically, we use a deep neural network with a new hybrid network that consists of a Multi-Layered Perceptron (MLP) to learning a more discriminative features representation of the hand-crafted features and Bi-directional Long Short Term Memory (Bi-LSTM) to learn semantic features. To evaluate the effectiveness of the proposed frameworks, we conduct extensive experiments on a benchmark dataset with 12 software defect datasets (each with four types of features), using five traditional indicators. Comparing to the random forest model built solely using the individual features set, our LHFR approach improves the average accuracy, precision, recall, F1-score, and AUC by 1.6%, 0.71%, 1.6%, 2.27%, and 0.01% respectively. This work contributes a file-level combined software defect dataset based upon 12 open-source java systems. It also enhanced an existing hand-crafted data set generation framework to include additional software context-based predictive features. However, the main contribution is a feature combination methodology that may be used to discover effective features combination and representations that increase the defect prediction performance.	en_US
dc.language.iso	en_US	en_US
dc.subject	computer science	en_US
dc.title	Software Defect Prediction Using Source Code Metric and Semantic Features: A Deep Learning Approach	en_US
dc.type	Thesis	en_US