BDU IR

END TO END SPEECH RECOGNITION FOR AMHARIC LANGUAGE USING DEEP LEARNING

Show simple item record

dc.contributor.author Yohannes, Ayana
dc.date.accessioned 2024-04-19T08:32:03Z
dc.date.available 2024-04-19T08:32:03Z
dc.date.issued 2023-07
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/15771
dc.description.abstract Speech recognition, also known as automatic speech recognition (ASR), is a technology that enables software to transcribe spoken language into text. However, traditional ASR methods require multiple separate blocks, such as language, acoustic, and pronunciation models with dictionaries, which can be time-consuming and impact performance. This study proposes an approach that replaces much of the speech pipeline with a single recurrent neural network (RNN) architecture. Our proposed architecture is based on a hybrid approach that combines a convolutional neural network (CNN) with a recurrent neural network (RNN) and a connectionist temporal classification (CTC) loss function. We perform three main experiments using different datasets: one with clean audio data consisting of 576,656 valid sentences, another with noisy audio data containing 20,000 valid sentences, and a third experiment that combined both datasets resulting in 596,656 valid sentences. The system was evaluated using the word error rate (WER) metric, achieving impressive results of 2% WER on noise-free data, 7% WER on noisy data, and 5% WER on combined data. This approach has significant implications for the field of speech recognition, as it reduces the human effort required to create dictionaries and improves the efficiency and accuracy of ASR systems, making them more practical for real-world applications. For future improvements, we suggest considering the inclusion of dialectal and spontaneous data in the dataset. Additionally, fine-tuning the model on specific tasks can help tailor its performance to specific objectives or domains, further improving its effectiveness in those areas. en_US
dc.language.iso en_US en_US
dc.subject Software Engineering en_US
dc.title END TO END SPEECH RECOGNITION FOR AMHARIC LANGUAGE USING DEEP LEARNING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record