Abstract:
Every day, a massive amount of information is reported through various media such as television, radio, social media, and web blogs in the form of video, audio, or text. Finding relevant events has become tedious, boring, and labor-intensive as the number of unstructured documents on the internet has grown. This explosion of events needs more structured and efficient use of event extraction tools that automatically extract and structure events. Event extraction takes natural language texts from News articles and social media and produces structured text events specified by certain criteria, that are relevant to a particular user and applications. Event extraction is used in a variety of natural language processing applications, including information retrieval, information extraction, question answering, document summarization, knowledge and reasoning, and others. We proposed an event extraction model from Amharic texts to extract free events by combining a Deep Learning approach with Natural Language Processing techniques. In this paper, we first come up with Amharic language-specific issues and then propose Bidirectional Long Short Memory (BiLSTM) with Word2vec to detect and classify Amharic events from unstructured documents. In addition to event detection and classification, the model also extracts event arguments that contain additional information about events (Time, Place). We have also implemented deep learning approaches (CNN, LSTM, and BiLSTM) separately to event detection and event classification and compare the performance of each model. We have prepared 9,050 Amharic datasets for the Event Extraction model and 420,910 documents for Word2vec model building. The experimental results show that the Bidirectional long short-term memory approach outperforms the best in terms of Amharic event detection and event classification, with 94 % and 89 % accuracy, respectively.