Abstract:
Dependency parser is a task in NLP used for analyzing the relationship between entities in a sentence. It have an important role in NLP tasks such as machine translation, information extraction, speech recognition and so on. Even though, there are many studies to develop dependency parser, they are mainly concentrated on English and other European languages. Attempts are done to apply existing dependency parser on Arabic and Hebrew languages by using additional morphological features, but the parser did not show a significant improvement relative to the original system.
In this study, a dependency parser for Amharic language is implemented. The system has two network models; the first network model uses Amharic treebank to learn and then predict arc-eager transition action. The predicted transition actions are used as a rule to build unlabeled dependency structure. The second network model is used to predict the relationship types of a given unlabeled dependency structure to build labeled dependency for a given sentence. The study also introduces another way of building labeled relationship type, especially for languages that have small dataset. We designed a separate model for learning relationship type by using the part of speech tag of dependent word and the part of speech tag of head word. This method helps the number of classes to decrease from 2n+2 into n, where n is the number of relationship types in the language and increases the number of examples for each class in the dataset.
Finally, the parser has been evaluated using syntactically annotated dataset of Amharic sentences and achieves 91.54 % and 81.4% unlabeled and labeled attachment score respectively.
The challenge in this study was the availability and quality of a treebank for Amharic sentences. As experiments shows, the parser can be more effective when we have a larger dataset like English and Chinese Pen treebank.