Multi-Script Ethiopic Document Images Recognition Using Deep Learning Approaches

Birhanu, Asrade

Multi-Script Ethiopic Document Images Recognition Using Deep Learning Approaches

Birhanu, Asrade

URI: http://ir.bdu.edu.et/handle/123456789/12377

Date: 2021-01

Abstract:

In recent years, deep learning has widely applied to pattern classification, object detection, image segmentation, speech recognition, and another field. Printed text image recognition is one of the most challenge tasks since decades. The text recognition plays an important in document image processing. It is a type of techniques for document image analysis to recognize the useful content in the text documents to be archived in softcopy for different purposes. The technique involves the conversion of the given image of text to its most probable similar character in a given domain language scripts. Therefore, traditional machine learning approach deals with time-dependent data. The Ethiopic script uses a large number of characters in the writing and existing visually similar characters, which results in a challenge for OCR development. The study to prepared a new dataset contains 20,000 text-line images and 517,610 characters synthetic image. The performance of the proposed multiple witting scripts Ethiopic OCR model is tested by a printed artificial generated dataset. The proposed model addresses the text image recognition problem on the text-line level. In extracting the important feature values of text-line images of the scripts, preprocessing activities such as, binarization, text line segmentation, size normalization activities are performed. Then we proposed a model combine CNN and LSTM with together CTC which was integrated single framework of CNN is used for automatic feature extract from raw images, LSTM is used to learn sequential data, and CTC is used for of transcription of the output of the LSTM to character labels without any post-processing module it can directly decode the input sequence to the output. The experimental results were reported as character error rates (CERs) as results of insertion. deletion, and substitution of the characters in predicted output, and that achieves state-of-the-art result is 9.5% CER.

Show full item record