Abstract:
The development of a lightweight on-device Amharic speech synthesizer based on stateof-the-art techniques will have a significant advantage in language technology as speech
synthesis has many uses in services like audio navigation, newscasting, tourism
interpretation, and reading for the blind. The aim of this study is to develop lightweight
speech synthesis system for Amharic language that runs on raspberry pi and mobile phone
CPU using LPCNet besides In LPCNet speech synthesis system errors in the Barkfrequency cepstral coefficients(BFCC) are tolerated better than errors in fundamental
frequency(F0) and for audio sample having signal to noise ratio (SNR) below 10 dB the
accuracy of pitch estimator drops significantly so we developed easy to use speech dataset
analysis algorithm based on waveform amplitude distribution analysis. To achieve this
study first we prepared audio dataset containing 3055 audio samples with their
corresponding transcription then we analyze the dataset using our waveform amplitude
distribution analysis(WADA) based algorithm to avoid bad audio samples then we extract
features and train LPCNet during synthesis phase LPCNet computes 16 linear predictive
coefficients which represents the vocal tract response and it also compute excitation source
using neural network and add the two according to source filter model to output the
predicted speech.
The proposed system is trained for 7 epochs using batch size of 50 with learning rate of
0.01 and decay rate of 5x10
−5
. Finally, to evaluate the performance we used the most
commonly used subjective evaluation method and proposed system achieved mean opinion
score (MOS) result of 3.95 and 3.6 for intelligibility and naturalness respectively. We have
also compared the proposed system with high-fidelity generative adversarial network
(HiFi-GAN) based speech synthesizer by training Tacotron2 for 6000 iterations with the
same training duration HiFi-GAN based synthesizer achieved MOS result 4.1 and 3.75 for
intelligibility and naturalness respectively but we were not able to run it on raspberry pi
due to its computational complexity.
Keywords: On-device TTS, Amharic Text to speech, Amharic reading system,
Amharic speech synthesis