BDU IR

AUTOMATIC SOURCE CODE VULNERABILITY DETECTION, CLASSIFICATION AND PRIORITIZATION USING DEEP LEARNING ALGORITHM

Show simple item record

dc.contributor.author MELESE, AWOKE
dc.date.accessioned 2022-11-17T11:48:19Z
dc.date.available 2022-11-17T11:48:19Z
dc.date.issued 2022-08
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/14445
dc.description.abstract Currently, investigating software vulnerabilities is getting more attention throughout the world. Giving much of the attention is because of the impact of those vulnerabilities in software qualities such as availability, reliability, security, and others. One of the serious problems in this area is the cyber-attack which the intruders used to access and cause the integrity problem of the software system. In the existing works of literature, the automation of source code vulnerability detection has been studied; however, most of them focused on binary class classification which deals with whether the source code is vulnerable or non-vulnerable, and lacked in multi-classification and prioritization of those vulnerabilities. The objective of the study is to make multi-classification and prioritization of source code vulnerabilities. For training the model, the dataset is collected from an online repository. We collected a total of 6,130 vulnerabilities for all classes of vulnerabilities namely Sensitive Information Exposure (SIE), Standard Query Language (SQL) injection, Uniform Resource Locator (URL) redirect, Cross Site Script (XSS), missing Authorization, and safe. We used Cyclomatic Complexity (CC) metric, Line of Code (LOC) metric, and the severity level of vulnerabilities to prioritize vulnerabilities. So the main focus is on the detection, classification, and prioritization of vulnerabilities in source codes written in Hypertext Preprocessor (PHP) programming language. To do this, we constructed Long Short Term Memory (LSTM), Bayesian Neural Network (BNN), and Auto Encoder (AE) deep learning models. The BNN model achieved an accuracy of 84%, LSTM achieved an accuracy of 94%, and AE achieved an accuracy of 77%. So the result shows that LSTM is the best performer than BNN and AE models because LSTM is best when sequence of inputs have long dependency. By using this model, the study prioritized source code vulnerabilities based on severity and complexity of complexity. Moreover, we compared the classification performance of multi-classification with recent previous researcher’s work of binary classification. The result of the comparison shows that binary and multi-classification achieved an accuracy of 94% and 95% respectively. So, we can deduce that making multi-classification of source code vulnerability doesn’t reduce the classification performance. Keywords: software complexity, source code vulnerabilities, severity of source code vulnerability, deep learning, en_US
dc.language.iso en_US en_US
dc.subject FACULTY OF COMPUTING en_US
dc.title AUTOMATIC SOURCE CODE VULNERABILITY DETECTION, CLASSIFICATION AND PRIORITIZATION USING DEEP LEARNING ALGORITHM en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record