BDU IR

Amharic Text-to-Speech Synthesis Using Deep Learning Approach

Show simple item record

dc.contributor.author Tadele, Demeke Sntie
dc.date.accessioned 2024-12-06T07:57:40Z
dc.date.available 2024-12-06T07:57:40Z
dc.date.issued 2024-12
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/16309
dc.description.abstract Text-to-Speech (TTS) translation is a process that generates synthetic speech artificially for a variety of uses, including telephone services, reading electronic documents, and speaking models for handicapped people. Currently, many text-to-speech translation models are available for different languages such as English, Afan Oromo, Tigrigna, and Welaytta. However, research on the Amharic language is extremely rare, so the study suffers from some limitations. Speech is generated from natural language text using deep learning approaches. Standard and nonstandard words, such as numbers, abbreviations, money, and dates, both SWs and NSWs found in written texts in a language. These NSWs cannot be detected by an application of the "letter-to-sound" rule. In general, the previous work converted text to speech using a rule-based and Hidden Markov Model. The main problem of HMM-based synthesis is that certain features for speech synthesis are hard coded by humans, but they are not necessarily the best features to synthesize speech. Hence to solve this problem we used LSTM and BiLSTM deep learning approaches. Because Deep learning has the ability to learn complex patterns in data and synthesize speech not required hard coding by humans. The performance of the LSTM model of MCD, MSE, and MAE is 0.2961, 0.0940, and 0.2474 respectively. And The performance of the BiLSTM model of MCD, MSE, and MAE is 0.2910, 0.0916, and 0.2400 respectively. As we have computed the translation performance of these models the BiLSTM has better performance. The second performance measurement of subjective evaluation metrics MOS is used to measure the quality of ineligibility and naturalness 4.14 and 3.93 respectively. Keywords: Text-to-Speech translation, Deep learning, long short-term memory (LSTM), and Bidirectional LSTM. en_US
dc.language.iso en_US en_US
dc.subject Information Technology en_US
dc.title Amharic Text-to-Speech Synthesis Using Deep Learning Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record