Abstract:
Language is a communication medium by which humans can be able to communicate in
their everyday life. Ge’ez is a classical language of Ethiopia in which ancient histories and
manuscripts have been written with. Ge’ez has four types of word pronunciation which are
Tenesh/ተነሽ Tetay/ ተጣይ Wodaki/ወዳቂ, Seyaf/ ሰያፍ they have their way of utterance that can
be able to distinguish from each other, the first thing anyone has to know to be experienced
in reading Ge’ez scripts is to know word pronunciation category. This study is proposed
to minimize the challenge of categorizing the Ge’ez words to their desired pronunciation
style through spectrogram-assisted Ge’ez language pronunciation classification. A total of
2308 words of audio utterances have been used directly recording from Ge’ez students and
Ge’ez experts, we have used 3 men’s and 3 women’s using Infinix samart 4 with a sampling
rate of 16KHZ (kilohertz). since the environment is uncontrolled, we used MMSE noise
removal techniques to mitigate the noise. FFT and STFT have been used for spectrogram
and Mel spectrogram generation respectively. Before classification, we have done
preprocessing both at the audio stage and generated spectrogram image. MFCC (Mel
frequency cepstral coefficients) from the enhanced wav file was used as a feature extraction
techniques and texture features from the spectrogram image using GLCM were used. SVM
with combined MFCC (MFCC delta, delta-delta MFCC) we get an accuracy of 83.116%,
with the GLCM texture feature we get an accuracy of 70.04%, Combined MFCC and
GLCM texture feature we get an accuracy result of 88.96%, With SoftMax classifier using
combined Texture and MFCC feature we get an accuracy of 85.93%. Using combined
texture feature and MFCC features KNN classifier has attained 84.55%. SVM with MFCC
and textural features achieved a better result of 91.12%.
Key Words: GLCM, Spectrogram, MFCC, SoftMax, pronunciation classification