AMHARIC LANGUAGE VISUAL SPEECH RECOGNITION USING COMPUTER VISION AND MACHINE LEARNING APPROACH

TAMRIE, ZELALEM

dc.contributor.author	TAMRIE, ZELALEM
dc.date.accessioned	2020-10-06T12:20:40Z
dc.date.available	2020-10-06T12:20:40Z
dc.date.issued	2020
dc.identifier.uri	http://hdl.handle.net/123456789/11277
dc.description.abstract	Lip motion reading is a process of knowing the words spoken from a video with or without an audio signal by observing the motion of the lips of the speaker. It has a great role when the environment is noisy and listener have hearing impairment problem. Even though the lip motion reading system previously done records a good result its accuracy is limited because of not applying appropriate image enhancement methods and the algorithms used for feature extraction. In order to tackle this gap we propose a machine learning and computer vision techniques for Amharic language lip motion reading. We collect the video of Amharic speech by recording directly using mobile devices, because of we didn’t find the previous datasets. The 14 Amharic words that are frequently talked by patients or faculty members in the hospital is selected to be recorded in this study. In order to minimize noise when we record the video caused by the handshaking we use fixer that makes the camera in a stable state. We also record the speakers on a fixed place and time to remove lighting effect. After we collect the Amharic speech, we calculate the mouth location by detecting the face automatically using viola jones algorithm. After calculating the region of interest calculation, gamma correction was implemented on colored images and contrast limited adaptive histogram equalization was performed on gray level images. To extract the features, deep and traditional machine learning algorithms are implemented. In deep learning, we use convolutional neural network (CNN) and in traditional method we use histogram of oriented gradients (HOG) is selected. The input for CNN is colored image because of the performance has assured and the same is true gray level image for HOG. Then, we feed this feature to Support Vector Machine (SVM) and random forest independently and with combination to recognize the spoken word. Since, we are doing multi class classification we use one versus all SVM. In one versus all SVM one classifier is generated per class. Our system records 72%, 82% and 85% accuracy on HOG, CNN and combined features using SVM. Our system also records 69%, 75% and 77% accuracy on HOG, CNN and combined features using random forest. We also evaluate our model using confusion matrix. The word ‘ተሽሎኛል’ has higher recognition rate and the word ‘ራስ ምታት’ has low recognition rate.	en_US
dc.language.iso	en	en_US
dc.subject	Computer Science	en_US
dc.title	AMHARIC LANGUAGE VISUAL SPEECH RECOGNITION USING COMPUTER VISION AND MACHINE LEARNING APPROACH	en_US
dc.type	Thesis	en_US