AUTOMATIC SOURCE CODE VULNERABILITY DETECTION,  CLASSIFICATION AND  PRIORITIZATION USING DEEP LEARNING ALGORITHM

MELESE, AWOKE

dc.contributor.author	MELESE, AWOKE
dc.date.accessioned	2022-11-17T11:48:19Z
dc.date.available	2022-11-17T11:48:19Z
dc.date.issued	2022-08
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/14445
dc.description.abstract	Currently, investigating software vulnerabilities is getting more attention throughout the world. Giving much of the attention is because of the impact of those vulnerabilities in software qualities such as availability, reliability, security, and others. One of the serious problems in this area is the cyber-attack which the intruders used to access and cause the integrity problem of the software system. In the existing works of literature, the automation of source code vulnerability detection has been studied; however, most of them focused on binary class classification which deals with whether the source code is vulnerable or non-vulnerable, and lacked in multi-classification and prioritization of those vulnerabilities. The objective of the study is to make multi-classification and prioritization of source code vulnerabilities. For training the model, the dataset is collected from an online repository. We collected a total of 6,130 vulnerabilities for all classes of vulnerabilities namely Sensitive Information Exposure (SIE), Standard Query Language (SQL) injection, Uniform Resource Locator (URL) redirect, Cross Site Script (XSS), missing Authorization, and safe. We used Cyclomatic Complexity (CC) metric, Line of Code (LOC) metric, and the severity level of vulnerabilities to prioritize vulnerabilities. So the main focus is on the detection, classification, and prioritization of vulnerabilities in source codes written in Hypertext Preprocessor (PHP) programming language. To do this, we constructed Long Short Term Memory (LSTM), Bayesian Neural Network (BNN), and Auto Encoder (AE) deep learning models. The BNN model achieved an accuracy of 84%, LSTM achieved an accuracy of 94%, and AE achieved an accuracy of 77%. So the result shows that LSTM is the best performer than BNN and AE models because LSTM is best when sequence of inputs have long dependency. By using this model, the study prioritized source code vulnerabilities based on severity and complexity of complexity. Moreover, we compared the classification performance of multi-classification with recent previous researcher’s work of binary classification. The result of the comparison shows that binary and multi-classification achieved an accuracy of 94% and 95% respectively. So, we can deduce that making multi-classification of source code vulnerability doesn’t reduce the classification performance. Keywords: software complexity, source code vulnerabilities, severity of source code vulnerability, deep learning,	en_US
dc.language.iso	en_US	en_US
dc.subject	FACULTY OF COMPUTING	en_US
dc.title	AUTOMATIC SOURCE CODE VULNERABILITY DETECTION, CLASSIFICATION AND PRIORITIZATION USING DEEP LEARNING ALGORITHM	en_US
dc.type	Thesis	en_US