USE CASE IDENTIFICATION FROM THE REQURIMENT  TEXTS USING MACHINE LEARNING APPROACH

TADILO, GETANEH GELAGAY

USE CASE IDENTIFICATION FROM THE REQURIMENT TEXTS USING MACHINE LEARNING APPROACH

TADILO, GETANEH GELAGAY

URI: http://ir.bdu.edu.et/handle/123456789/15764

Date: 2023-06

Abstract:

The requirement gathering and design phases play important roles in the software development lifecycle. One of the requirements gathering phases is identifying the use case and actors. A use case is a specification of a set of actions that are performed by a system and that give observable results for the value of one or more actors or other stakeholders in the system. However, the main problem arises when identifying use cases from the clients' use of ambiguous language when speaking with software analysts, which might result in a misunderstanding of the software functional requirements due to a literal interpretation. In the existing literature, the use case identification has been studied most of the studies were done by natural language processing (NLP) with identified use cases and actor by case study, heuristic rule and checklist methods, which led to the wastage of time and resources and the others also did not cluster the types of relationships between use cases in the requirement text. To address these gaps, we set the objective of identifying use cases and actors as well as cluster relationships between use cases in the requirements text using different machine learning approaches. To perform this study, we used an experimental research approach. We applied machine learning techniques, such as SVM, NB, LR, and RF, to build the model. For the experiment, we prepared a dataset of 1884 requirement texts, which were labeled by Boost Software Development PLC experts for identifying actors and use cases. For clustering relationships, we used 1600 an unlabeled dataset that could be experimented with by unsupervised clustering algorithm, in which DBSCAN, K-mean, and hierarchical clustering were applied. The datasets that were fed to our proposed model were pre-processed using Natural Language Processing (NLP) principles, and we used TF-IDF and word2vec feature extraction methods. Based on our experiment, we observed that for use case identification logistic regression, Random Forest, and SVM had the best accuracy of 98%, whereas NB had an accuracy of 95%. For actor identification, SVM, RF, and LR had the best accuracy of 99%, whereas NB had an accuracy of 97%. For relationship type cluster, K-means had the best silhouette score of 0.76, which was better than DBSCAN and hierarchical clustering due to the small dataset that preferable to our research. Keyword: -Requirements text, actor, use cases, relationship types, machine learning, natural language processing

Show full item record