BDU IR

Software Defect Prediction Using Source Code Metric and Semantic Features: A Deep Learning Approach

Show simple item record

dc.contributor.author Esrael, Geremew
dc.date.accessioned 2021-08-13T06:25:36Z
dc.date.available 2021-08-13T06:25:36Z
dc.date.issued 2021-02-02
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/12385
dc.description.abstract Software Defect Prediction (SDP) aims to observe defective modules to modify the cheap allocation of testing resources, which is associate with economically important activity in software quality assurance. Many contributions have been made in this area through the features representation—from source code metric and/or contextual data—by Deep Learning (DL) models; however, the prediction performance they present with high false positive and unactionable alerts. Specifically, Obtaining Combined Defect Data (CDD), lack of proper features combining methods and learning effective features representation from CDD are some of the major challenges encountered in supervised defect prediction domain. Combined features have the potential to improve the performance of individual features-based prediction models and thereby reduce false-positive rates. Thus, the goal of this research effort was to improve the performance of individual features-based defect prediction model by proposing and testing novel frameworks named CDDM and LHFR by Combined Defect Data Modelling from source code metrics and context representation and Learning Hybrid Feature Representation from CDD for SDP, respectively. Specifically, we use a deep neural network with a new hybrid network that consists of a Multi-Layered Perceptron (MLP) to learning a more discriminative features representation of the hand-crafted features and Bi-directional Long Short Term Memory (Bi-LSTM) to learn semantic features. To evaluate the effectiveness of the proposed frameworks, we conduct extensive experiments on a benchmark dataset with 12 software defect datasets (each with four types of features), using five traditional indicators. Comparing to the random forest model built solely using the individual features set, our LHFR approach improves the average accuracy, precision, recall, F1-score, and AUC by 1.6%, 0.71%, 1.6%, 2.27%, and 0.01% respectively. This work contributes a file-level combined software defect dataset based upon 12 open-source java systems. It also enhanced an existing hand-crafted data set generation framework to include additional software context-based predictive features. However, the main contribution is a feature combination methodology that may be used to discover effective features combination and representations that increase the defect prediction performance. en_US
dc.language.iso en_US en_US
dc.subject computer science en_US
dc.title Software Defect Prediction Using Source Code Metric and Semantic Features: A Deep Learning Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record