INFORMATION EXTRACTION MODEL FROM GE’EZ TEXTS

Worke, Wolde Asfaw

INFORMATION EXTRACTION MODEL FROM GE’EZ TEXTS

Worke, Wolde Asfaw

URI: http://ir.bdu.edu.et/handle/123456789/13220

Date: 2021-09

Abstract:

Currently, there is a vast volume of unstructured textual data on the internet that offers diverse useful information for health care, commerce, education, religious, cultural, historical, and other domains. The problem is that, the amount of unstructured data grows, extracting valuable information from unstructured data and the need for tools and techniques for extract (automatic text extraction of text has become a critical task)and explore useful information to address and satisfy the user’s needs is a challenging task due to the overloading of information on the internet. Information extraction extracting structured text from unstructured data using Natural Language Processing statistical techniques. In this study, we developed an information extraction model from Ge'ez text for extracting named entity text information. However, the limitation of the study the relation between entity attribution and scanned, image voice, and video text did not considerable. The proposed model has its main component dataset preprocessing, traning or learning and testing phase, and predciton phase. The preprocessing phase performs tokenization of sentence, stop word removal, affix removal or stemming, and paddind sequence, the , traning or learning and testing phase used to train or learn the model, and test the learned model and the prediction phase predict or extract the catagories of texts. In this work, the accuracy of traning, validation, and testing is employed as an evaluation metric for the information extraction model from Ge'ez text. We used a 5270 sentence dataset (63262 tokens) from the Addis Ababa Ethiopian Orthodox Tewahdo Church that was being trained and tested for our research.We used two experimental setup i.e Long short-term memory and bidirectional long short-term memory to demonstrate the experimental evaluation with 80% training and, 20% testing size of dataset split ratio. Finally, the results of an experimental evaluation, evaluates using a long short-term memory accuracy is 98.89% traning, 98.89% validation, and 95.78%, testing and bidirectional long short-term memory 98.59% training, 97.96% validation and 96.21% testing accuracy the proposed model performed. For the design of a full-fledged information extraction system, further researchers incorporating the post of speech tagging for extracting relationships between things, or relation extraction. Keywords: EOTC, NLP, IE, LSTM, BILSTM.

Show full item record