AUTOMATIC SOURCE CODE VULNERABILITY DETECTION,  CLASSIFICATION AND  PRIORITIZATION USING DEEP LEARNING ALGORITHM

MELESE, AWOKE

AUTOMATIC SOURCE CODE VULNERABILITY DETECTION, CLASSIFICATION AND PRIORITIZATION USING DEEP LEARNING ALGORITHM

MELESE, AWOKE

URI: http://ir.bdu.edu.et/handle/123456789/14445

Date: 2022-08

Abstract:

Currently, investigating software vulnerabilities is getting more attention throughout the world. Giving much of the attention is because of the impact of those vulnerabilities in software qualities such as availability, reliability, security, and others. One of the serious problems in this area is the cyber-attack which the intruders used to access and cause the integrity problem of the software system. In the existing works of literature, the automation of source code vulnerability detection has been studied; however, most of them focused on binary class classification which deals with whether the source code is vulnerable or non-vulnerable, and lacked in multi-classification and prioritization of those vulnerabilities. The objective of the study is to make multi-classification and prioritization of source code vulnerabilities. For training the model, the dataset is collected from an online repository. We collected a total of 6,130 vulnerabilities for all classes of vulnerabilities namely Sensitive Information Exposure (SIE), Standard Query Language (SQL) injection, Uniform Resource Locator (URL) redirect, Cross Site Script (XSS), missing Authorization, and safe. We used Cyclomatic Complexity (CC) metric, Line of Code (LOC) metric, and the severity level of vulnerabilities to prioritize vulnerabilities. So the main focus is on the detection, classification, and prioritization of vulnerabilities in source codes written in Hypertext Preprocessor (PHP) programming language. To do this, we constructed Long Short Term Memory (LSTM), Bayesian Neural Network (BNN), and Auto Encoder (AE) deep learning models. The BNN model achieved an accuracy of 84%, LSTM achieved an accuracy of 94%, and AE achieved an accuracy of 77%. So the result shows that LSTM is the best performer than BNN and AE models because LSTM is best when sequence of inputs have long dependency. By using this model, the study prioritized source code vulnerabilities based on severity and complexity of complexity. Moreover, we compared the classification performance of multi-classification with recent previous researcher’s work of binary classification. The result of the comparison shows that binary and multi-classification achieved an accuracy of 94% and 95% respectively. So, we can deduce that making multi-classification of source code vulnerability doesn’t reduce the classification performance. Keywords: software complexity, source code vulnerabilities, severity of source code vulnerability, deep learning,

Show full item record