BDU IR

AN ENSEMBLE OF VECTOR QUANTIZATION AND CNN FOR DIGITAL BASED TEXT INDEPENDENT AMHARIC LANGUAGE SPEAKER RECOGNITION

Show simple item record

dc.contributor.author SEWUNET, ASMARE
dc.date.accessioned 2024-03-21T10:51:28Z
dc.date.available 2024-03-21T10:51:28Z
dc.date.issued 2023-10
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/15705
dc.description.abstract Speaker recognition is the process of identifying the speaker based on the acquired voices. There are different studies conducted on speaker recognition systems for different languages using different techniques i.e. Mel Frequency Cepstrum Coefficient, Gaussian Mixture Models, i vector methods, vector quantization and fusion techniques. However, such models are limited by their dependency on hand-crafted feature engineering, model limitation as the number of dataset increases and susceptibility to noise, handling mimicry and performance deficiency for short utterances. Related to this, due to all-natural languages having their particular characteristics, it is impossible to use an identical speaker recognition model for different languages. In this thesis, an ensemble of vector quantization (VQ) and Convolutional Neural Network (CNN) had been used for a text-independent Amharic language speaker identification. For our recognition model, speech signals were collected from 200 individual speakers including both genders. For our dataset, a total of 2000 speakers’ speech samples were collected, and each speech has 10 seconds duration. To build our model, we have used 80% speech samples for training, and 20% speech samples for testing. After being collected and pre-processed, each speech is transformed into a spectrogram image. Then, CNN extract spectral features from the resized spectrogram images. Besides, VQ is used to extract features from the framed voice signal. Both the CNN and VQ features are ensemble together and submitted to our model. We have also tested an end to end CNN, VQ and CNN-VQ. From the experiment an ensemble feature vector of CNN and VQ on VQ classifier achieved 97.23%. But we have found 89.61% using an end to end CNN; 74.87% accuracy achieved using CNN features on VQ classifier; 91.45% accuracy achieved using CNN VQ features using CNN classifier. Finally, to evaluate the performance of our proposed model, we compared our model with a pre-trained AlexNet model using our dataset. Then, we have found 90.19 % accuracy. So, using CNN and VQ as a feature extraction and VQ as a classification approach enhances both the accuracy of speaker identification. Keywords: CNN, Signal processing, VQ and Spectrogram en_US
dc.language.iso en_US en_US
dc.subject Information Technology en_US
dc.title AN ENSEMBLE OF VECTOR QUANTIZATION AND CNN FOR DIGITAL BASED TEXT INDEPENDENT AMHARIC LANGUAGE SPEAKER RECOGNITION en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record