Abstract:
Early software size estimation helps to manage software projects ahead of resources,
especially in agile methods. Common Software Measurement International Consortium
(COSMIC) is an objective functional size measurement standard, evolved to overcome
the shortcomings of previous approaches. With the proliferation of agile software
industries, large number of requirements are not clearly defined at the early phase of
the software development and are left unmeasured, this leads to inaccurate size and
effort estimations and in turn failure of software projects. It is also challenging to apply
COSMIC in agile developments, this is because COSMIC needs strict formalization of
requirements, whereas agile relies on less formal specifications. By exploiting the
advantages of COSMIC and agile methods, in this study, we address the problems by
developing domain-specific vocabularies for automating COSMIC functional size
estimations in agile developments. We employ an experimental research methodology
for implementing our proposed approach. We further pretrain a generic BERT model
over requirement engineering domain texts and produce a new domain-specific pretrained model called RE-BERT. Using RE-BERT, we develop deep learning classifiers
and regressors for COSMIC-based functional process classification and size estimation
tasks respectively. The experimental results show that RE-BERT Seq. Classifier
provides 78.97% prediction accuracy, which is better among other classifier models
(RE-BERT LSTM, RE-BERT Bi-LSTM, BASE BERT LSTM, BASE-BERT BiLSTM, and BASE BERT Seq. Classifier). Overall, RE-BERT-based classifiers provide
a 1.40 to 4.80% average improvement over BASE BERT Classifiers. For the size
estimation task, RE-BERT MLP provides 0.691 MAE and 0.988 MSE, which is better
among other regression models (BASE-BERT MLP, RE-BERT regressor, and BASEBERT regressor). Likewise, RE-BERT-based regressors provide a 1.23 to 3.19%
average improvement over BASE BERT regressor models. In general, domain-specific
pre-trained models has a promising effect on improving the performance of machine
learning or deep learning models towards a particular downstream task in that domain.
Keywords: BERT, COSMIC, Functional Size Estimation, Domain-specific
Pretraining, Downstream Tasks, RE-BERT, Agile Development