Abstract:
The innovative techniques of NLP in MWEs have become a very vital area of research in
today’s scenario. Multiword expression (MWE) referred to lexical unit larger than a word
that can allow both idiomatic and compositional meaning. The purpose of this study is to
investigate how to automate multiword expression detection for the Amharic language.
Natural Language processing research has been influenced by the existence of multiword
expressions. It has been shown that multiword expressions affect NLP researches such as
machine translation, question answering, WSD, information retrieval and next word
prediction. Other languages like English, Japanese, Indian multiword expressions are
identified through different approaches in different researches, however for the Amharic
language; there is no research to detect multiword expressions. This study aimed to
develop multiword expressions detection model for the Amharic language using a
supervised Machine learning approach. Three thousand three hundred datasets are
collected from Amharic text Books, Amharic Bible, Fiction, Amharic idiom Books,
Amharic Dictionaries and Novels. We used an experimental research methodology to
develop the model. TFIDF and keras embedding techniques are applied for vectorizing
the dataset for traditional machine learning and deep learning models respectively . Based
on the Experimental result we show that MLP algorithm able to outperform SVM, LSTM
and BiLSTM algorithm, it achieved an accuracy of 98.94 percent because of the reason it
is suitable for classification prediction problem where inputs are assigned a class or a
label and the neural network in MLP capable of learning more complex patterns due to its
multiple layers of neurons. In general for this study large data set will need with
multiword expression of more than two word combination.
Keywords: -NLP, Machine learning, Multiword, Multiword Expression, Multiword
Expression Detection.