SPECTROGRAM  IMAGE  ASSISTED  SPEAKER  INDEPENDENT  GE’EZ LANGUAGE PRONUNCIATION CLASSIFICATION

Bekalu, Mogne

BDU IR Home
→
Bahir Dar Institute of Technology (BiT)
→
Faculty of Electrical and Computer Engineering
→
Communication System Engineering
→
thesis
→
View Item

SPECTROGRAM IMAGE ASSISTED SPEAKER INDEPENDENT GE’EZ LANGUAGE PRONUNCIATION CLASSIFICATION

Bekalu, Mogne

URI: http://ir.bdu.edu.et/handle/123456789/15454

Date: 2023-03-15

Abstract:

Language is a communication medium by which humans can be able to communicate in their everyday life. Ge’ez is a classical language of Ethiopia in which ancient histories and manuscripts have been written with. Ge’ez has four types of word pronunciation which are Tenesh/ተነሽ Tetay/ ተጣይ Wodaki/ወዳቂ, Seyaf/ ሰያፍ they have their way of utterance that can be able to distinguish from each other, the first thing anyone has to know to be experienced in reading Ge’ez scripts is to know word pronunciation category. This study is proposed to minimize the challenge of categorizing the Ge’ez words to their desired pronunciation style through spectrogram-assisted Ge’ez language pronunciation classification. A total of 2308 words of audio utterances have been used directly recording from Ge’ez students and Ge’ez experts, we have used 3 men’s and 3 women’s using Infinix samart 4 with a sampling rate of 16KHZ (kilohertz). since the environment is uncontrolled, we used MMSE noise removal techniques to mitigate the noise. FFT and STFT have been used for spectrogram and Mel spectrogram generation respectively. Before classification, we have done preprocessing both at the audio stage and generated spectrogram image. MFCC (Mel frequency cepstral coefficients) from the enhanced wav file was used as a feature extraction techniques and texture features from the spectrogram image using GLCM were used. SVM with combined MFCC (MFCC delta, delta-delta MFCC) we get an accuracy of 83.116%, with the GLCM texture feature we get an accuracy of 70.04%, Combined MFCC and GLCM texture feature we get an accuracy result of 88.96%, With SoftMax classifier using combined Texture and MFCC feature we get an accuracy of 85.93%. Using combined texture feature and MFCC features KNN classifier has attained 84.55%. SVM with MFCC and textural features achieved a better result of 91.12%. Key Words: GLCM, Spectrogram, MFCC, SoftMax, pronunciation classification

Show full item record