SPECTROGRAM IMAGE AND MEL-FREQUENCY CEPSTRAL  COEFFICIENT ASSISTED CONVOLUTIONAL NEURAL NETWORK FOR TEXT INDEPENDENT AMHARIC LANGUAGE SPEAKER  IDENTIFICATION

Mulusew, Fentahun Getu

dc.contributor.author	Mulusew, Fentahun Getu
dc.date.accessioned	2022-03-18T06:35:51Z
dc.date.available	2022-03-18T06:35:51Z
dc.date.issued	2021-11-09
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/13211
dc.description.abstract	Speech is a natural way of transforming information between the speaker and the listener. Speaker identification is the process of identifying who is speaking based on his /her unique voiceprint features. Studies on speaker identification systems were done for different languages using the traditional Mel Frequency Cepstral Coefficient, Gaussian Mixture Models, i-vector methods, and fusion techniques. However, such models are limited by their dependency on hand-crafted feature engineering, processing time, susceptibility to noise, and performance deficiency for short utterances. Related to this, due to all-natural languages having their particular characteristics, it is impossible to use an identical speaker identification model for different languages. In this thesis, an end-to-end Convolutional Neural Network and a combined convolutional neural network with a support vector machine approach had been used for a text-independent Amharic language speaker identification. For our identification model, speech signals were collected from thirty individual speakers including both genders. For our dataset, a total of 1500 speakers’ speech samples were collected, and each speech has 10 seconds duration. To build our model, we have used 1200 speech samples for training, 300 speech samples for testing. After being collected and pre-processed, each speech is transformed into a spectrogram image by using digital signal processing techniques. Then, the resized spectrogram images are used as input to the proposed model to learn and extract speaker-specific spectral features. Our model achieved 94.4 % and 98.8 % accuracy for an end-to-end Convolutional Neural Network and a convolutional neural network with a support vector machine approach respectively. Finally, to evaluate the performance of our proposed model, we compared our model with a pre-trained AlexNet model using our datasets. Then, we have found 80 % accuracy for a pre-trained end-to-end AlexNet model. So, using Convolutional Neural Network as a feature extraction and support vector machine as a classification approach enhances both the accuracy and training time of an end-to-end Convolutional Neural Network and a pre-trained AlexNet model. Keywords: CNN, MFCC, STFT, SVM, AlexNet, Spectrogram, Speaker Identification and, Amharic Language	en_US
dc.language.iso	en_US	en_US
dc.subject	INFORMATION TECHNOLOGY	en_US
dc.title	SPECTROGRAM IMAGE AND MEL-FREQUENCY CEPSTRAL COEFFICIENT ASSISTED CONVOLUTIONAL NEURAL NETWORK FOR TEXT INDEPENDENT AMHARIC LANGUAGE SPEAKER IDENTIFICATION	en_US
dc.type	Thesis	en_US