Abstract:
Software Development Lifecycle (SDLC) is a process for developing high-quality
software products that satisfy client needs. Software requirement detection and
classification is one of the challenging and tedious tasks in SDLC as most of the time
it is done manually. Thus, many software projects fail because of issues related to
requirements. In this paper, we proposed a supervised machine learning (SML)
approach to automate the detection and classification process of software requirements
from Amharic textual documents such as meeting minutes, interview notes,
requirements specifications, user guides, reports, and memos. Automating software
requirements detection and classification process reduces development costs, time,
personnel, and risk of software project failures. The datasets were prepared from five
different software development companies in Addis Ababa, Ethiopia. We splitted the
dataset into 80% training set and 20% testing set and both sets passed through three text
preprocessing steps (tokenization, normalization, stemming), and then feature
extraction is done using TF-IDF and word2vec. Then we tunned the five most common
ML classifier algorithms (SVM, NB, KNN, LR, and DT) and we trained them. Then to
compare different algorithms we used 10-fold cross-validation and we did
experimentation for each model 10 times. Finally based on the 10-fold CV average
value of accuracy, precision, recall, and f1- score we compared the performance of
algorithms. An SVM model is the best performing model for classification with 95%
accuracy, 87% precision, 89% f1-measure, and 88% recall.
Keywords— Functional requirements, Non-functional requirements, Natural
Language Processing, Machine Learning, Amharic requirement specification.