AMHARIC LIGHTWEIGHT SPEECH SYNTHESIS SYSTEM FOR  BLIND AND VISUALLY IMPAIRED PEOPLE

Addis, Workneh Teklie

BDU IR Home
→
Bahir Dar Institute of Technology (BiT)
→
Faculty of Electrical and Computer Engineering
→
Communication System Engineering
→
thesis
→
View Item

AMHARIC LIGHTWEIGHT SPEECH SYNTHESIS SYSTEM FOR BLIND AND VISUALLY IMPAIRED PEOPLE

Addis, Workneh Teklie

URI: http://ir.bdu.edu.et/handle/123456789/15451

Date: 2022-07-29

Abstract:

The development of a lightweight on-device Amharic speech synthesizer based on stateof-the-art techniques will have a significant advantage in language technology as speech synthesis has many uses in services like audio navigation, newscasting, tourism interpretation, and reading for the blind. The aim of this study is to develop lightweight speech synthesis system for Amharic language that runs on raspberry pi and mobile phone CPU using LPCNet besides In LPCNet speech synthesis system errors in the Barkfrequency cepstral coefficients(BFCC) are tolerated better than errors in fundamental frequency(F0) and for audio sample having signal to noise ratio (SNR) below 10 dB the accuracy of pitch estimator drops significantly so we developed easy to use speech dataset analysis algorithm based on waveform amplitude distribution analysis. To achieve this study first we prepared audio dataset containing 3055 audio samples with their corresponding transcription then we analyze the dataset using our waveform amplitude distribution analysis(WADA) based algorithm to avoid bad audio samples then we extract features and train LPCNet during synthesis phase LPCNet computes 16 linear predictive coefficients which represents the vocal tract response and it also compute excitation source using neural network and add the two according to source filter model to output the predicted speech. The proposed system is trained for 7 epochs using batch size of 50 with learning rate of 0.01 and decay rate of 5x10 −5 . Finally, to evaluate the performance we used the most commonly used subjective evaluation method and proposed system achieved mean opinion score (MOS) result of 3.95 and 3.6 for intelligibility and naturalness respectively. We have also compared the proposed system with high-fidelity generative adversarial network (HiFi-GAN) based speech synthesizer by training Tacotron2 for 6000 iterations with the same training duration HiFi-GAN based synthesizer achieved MOS result 4.1 and 3.75 for intelligibility and naturalness respectively but we were not able to run it on raspberry pi due to its computational complexity. Keywords: On-device TTS, Amharic Text to speech, Amharic reading system, Amharic speech synthesis

Show full item record