BDU IR

AMHARIC LIGHTWEIGHT SPEECH SYNTHESIS SYSTEM FOR BLIND AND VISUALLY IMPAIRED PEOPLE

Show simple item record

dc.contributor.author Addis, Workneh Teklie
dc.date.accessioned 2023-07-04T07:22:42Z
dc.date.available 2023-07-04T07:22:42Z
dc.date.issued 2022-07-29
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/15451
dc.description.abstract The development of a lightweight on-device Amharic speech synthesizer based on stateof-the-art techniques will have a significant advantage in language technology as speech synthesis has many uses in services like audio navigation, newscasting, tourism interpretation, and reading for the blind. The aim of this study is to develop lightweight speech synthesis system for Amharic language that runs on raspberry pi and mobile phone CPU using LPCNet besides In LPCNet speech synthesis system errors in the Barkfrequency cepstral coefficients(BFCC) are tolerated better than errors in fundamental frequency(F0) and for audio sample having signal to noise ratio (SNR) below 10 dB the accuracy of pitch estimator drops significantly so we developed easy to use speech dataset analysis algorithm based on waveform amplitude distribution analysis. To achieve this study first we prepared audio dataset containing 3055 audio samples with their corresponding transcription then we analyze the dataset using our waveform amplitude distribution analysis(WADA) based algorithm to avoid bad audio samples then we extract features and train LPCNet during synthesis phase LPCNet computes 16 linear predictive coefficients which represents the vocal tract response and it also compute excitation source using neural network and add the two according to source filter model to output the predicted speech. The proposed system is trained for 7 epochs using batch size of 50 with learning rate of 0.01 and decay rate of 5x10 −5 . Finally, to evaluate the performance we used the most commonly used subjective evaluation method and proposed system achieved mean opinion score (MOS) result of 3.95 and 3.6 for intelligibility and naturalness respectively. We have also compared the proposed system with high-fidelity generative adversarial network (HiFi-GAN) based speech synthesizer by training Tacotron2 for 6000 iterations with the same training duration HiFi-GAN based synthesizer achieved MOS result 4.1 and 3.75 for intelligibility and naturalness respectively but we were not able to run it on raspberry pi due to its computational complexity. Keywords: On-device TTS, Amharic Text to speech, Amharic reading system, Amharic speech synthesis en_US
dc.language.iso en_US en_US
dc.subject Electrical and Computer Engineering en_US
dc.title AMHARIC LIGHTWEIGHT SPEECH SYNTHESIS SYSTEM FOR BLIND AND VISUALLY IMPAIRED PEOPLE en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record