Designing Noise-Resistant Ethiopian Spoken Languages Identification Model Using Machine Learning Approach

Demil, Getye

Designing Noise-Resistant Ethiopian Spoken Languages Identification Model Using Machine Learning Approach

Demil, Getye

URI: http://hdl.handle.net/123456789/11269

Date: 2020

Abstract:

Spoken language identification is the process of deciding which language a speaker is speaking. Spoken language identification is used as a front-end processing in human-computer interaction, speech to text translation, speech to speech translation, and automatic caller routing to the intended operator. Lots of studies on spoken language identification were done using a Gaussian mixture model, i-vector, and neural network approaches. However, a Gaussian mixture model and i-vector approaches are not robust in the noise environment. Even though a deep neural network has better performance in short utterance, it is computationally expensive. In order to overcome these problems, we propose a noise-resistant Ethiopian spoken language identification model for Amharic, Tigrigna, Oromia, and Somalia languages. For the dataset, we have used a noisy data from meetings, discussions, conferences, and reports. Since back propagation neural network are slow, we proposed a feed-forward neural network and convolutional neural network based models. In the first model, an acoustic feature with a feed-forward neural network classifier was used. In this method, we compared five acoustic features and we found a better accuracy of 88% with delta Mel frequency cepstral coefficient. The second method we used an end to end convolutional neural network and convolutional neural network with a support vector machine. We found an accuracy of 98% in the end to end convolutional neural network and 97% in the convolutional neural network with support vector machine. So, the support vector machine can improve the training time of the convolutional neural network without significantly degrading the accuracy.

Show full item record