Abstract:
Domain-specific jargon words are lists of words used in formal communication of a particular profession between experts of the same field; however, it is difficult to understand by non-experts and society. Experts of an organization use domain-specific Amharic jargon words in scientific and science communication to keep the protocol of the communication within a domain. The domain-specific Amharic jargon words negatively impact people out of the domain to understand the main theme of the disseminated content. We followed a design science research approach to conduct our study and come up with solutions; hence, domain-specific Amharic jargon words are required to convey prominent information to understand the writer’s discourse and for further lexical processing. Machine learning classifiers algorithms are employed to develop a model and train the dataset, and predict a text as jargony or non-jargony. We employed three popular machine learning classifiers for text classification with Support Vector Machine, Artificial Neural Network, and Naïve Bayes to develop models with TFIDF feature selection. We labeled the dataset based on the two-way classification. We developed a hybrid system with machine learning and knowledge-based for domain-specific Amharic jargon words identification. We prepared a knowledge source with a list of domain-specific Amharic jargon words and the words meaning. The developed machine learning models with SVM, ANN, and NB show a classification accuracy of 96.2%, 95.2%, and 94.7% respectively. The knowledge-based of the proposed system best performs when a smaller number of input sentences are entered into the knowledge base system. For the input of 20, 40, 60, and 80 test data, an accuracy of 88.2%, 86.7%, 85.4%, and 83.1% is observed. Therefore, we observed the promised result with the hybrid of machine learning and knowledge base for the identification of jargon words in the jargony text.