Abstract:
Currently, investigating software vulnerabilities is getting more attention throughout the world.
Giving much of the attention is because of the impact of those vulnerabilities in software qualities
such as availability, reliability, security, and others. One of the serious problems in this area is the
cyber-attack which the intruders used to access and cause the integrity problem of the software
system. In the existing works of literature, the automation of source code vulnerability detection
has been studied; however, most of them focused on binary class classification which deals with
whether the source code is vulnerable or non-vulnerable, and lacked in multi-classification and
prioritization of those vulnerabilities. The objective of the study is to make multi-classification
and prioritization of source code vulnerabilities. For training the model, the dataset is collected
from an online repository. We collected a total of 6,130 vulnerabilities for all classes of
vulnerabilities namely Sensitive Information Exposure (SIE), Standard Query Language (SQL)
injection, Uniform Resource Locator (URL) redirect, Cross Site Script (XSS), missing
Authorization, and safe. We used Cyclomatic Complexity (CC) metric, Line of Code (LOC)
metric, and the severity level of vulnerabilities to prioritize vulnerabilities. So the main focus is on
the detection, classification, and prioritization of vulnerabilities in source codes written in
Hypertext Preprocessor (PHP) programming language. To do this, we constructed Long Short
Term Memory (LSTM), Bayesian Neural Network (BNN), and Auto Encoder (AE) deep learning
models. The BNN model achieved an accuracy of 84%, LSTM achieved an accuracy of 94%, and
AE achieved an accuracy of 77%. So the result shows that LSTM is the best performer than BNN
and AE models because LSTM is best when sequence of inputs have long dependency. By using
this model, the study prioritized source code vulnerabilities based on severity and complexity of
complexity. Moreover, we compared the classification performance of multi-classification with
recent previous researcher’s work of binary classification. The result of the comparison shows that
binary and multi-classification achieved an accuracy of 94% and 95% respectively. So, we can
deduce that making multi-classification of source code vulnerability doesn’t reduce the
classification performance.
Keywords: software complexity, source code vulnerabilities, severity of source code
vulnerability, deep learning,