Abstract:
Currently, there is a vast volume of unstructured textual data on the internet that offers
diverse useful information for health care, commerce, education, religious, cultural,
historical, and other domains. The problem is that, the amount of unstructured data grows,
extracting valuable information from unstructured data and the need for tools and
techniques for extract (automatic text extraction of text has become a critical task)and
explore useful information to address and satisfy the user’s needs is a challenging task due
to the overloading of information on the internet. Information extraction extracting
structured text from unstructured data using Natural Language Processing statistical
techniques. In this study, we developed an information extraction model from Ge'ez text
for extracting named entity text information. However, the limitation of the study the
relation between entity attribution and scanned, image voice, and video text did not
considerable. The proposed model has its main component dataset preprocessing, traning
or learning and testing phase, and predciton phase. The preprocessing phase performs
tokenization of sentence, stop word removal, affix removal or stemming, and paddind
sequence, the , traning or learning and testing phase used to train or learn the model, and
test the learned model and the prediction phase predict or extract the catagories of texts.
In this work, the accuracy of traning, validation, and testing is employed as an evaluation
metric for the information extraction model from Ge'ez text. We used a 5270 sentence
dataset (63262 tokens) from the Addis Ababa Ethiopian Orthodox Tewahdo Church that
was being trained and tested for our research.We used two experimental setup i.e Long
short-term memory and bidirectional long short-term memory to demonstrate the
experimental evaluation with 80% training and, 20% testing size of dataset split ratio.
Finally, the results of an experimental evaluation, evaluates using a long short-term
memory accuracy is 98.89% traning, 98.89% validation, and 95.78%, testing and
bidirectional long short-term memory 98.59% training, 97.96% validation and 96.21%
testing accuracy the proposed model performed. For the design of a full-fledged
information extraction system, further researchers incorporating the post of speech tagging
for extracting relationships between things, or relation extraction.
Keywords: EOTC, NLP, IE, LSTM, BILSTM.