Abstract:
Textual Entailment Recognition is the task of deciding whether the Text entails the hypothesis or the hypothesis sentence, which is short when compared with the Text sentences. The Textual Entailment Recognition is challenging and there is a semantic variability when one sentence express by another sentence. It tries to judge the relationship of the Text and Hypothesis pairs like human being made. However, process is complex and need a dynamic process to recognize the most probable text entailment. Human experience, backgrounds and text sentences context are also matters for entailment recognition. In this thesis, we proposed a combination of multiple feature for the task of Amharic Textual Entailment Recognition that uses different types of features, which includes word overlap, Jaccard similarity, cosine similarity, bi-gram matching, Noun and verb match, Named Entity Recognition, Antonym feature, Negation and numeric mismatch (i.e. Date mismatch and Num mismatch). To train the machine learning classifiers, which used to predict whether the Text –Hypothesis pair are among three classes (Entailment, Contradiction and Unknown classes). We applied four machine learning algorithms –via Support Vector Machine(SVM), Naïve Bayes classifier(NB), Artificial Neural Network(ANN) and Random Forest classifier(RF), and we annotate the dataset based on the three-way classification. The result of this work compared on the two-way classification (i.e. Entailment and No Entailment class) and three-way classification, the accuracy of the two-way classification is SVM with 94% accuracy, ANN with 94% accuracy and RF with 92 % accuracy and the three-way classification is 93%, 93.5 %, 90 % and 84% for SVM, ANN, RF and NB respectively.