END TO END SPEECH RECOGNITION FOR AMHARIC LANGUAGE USING DEEP LEARNING

Yohannes, Ayana

END TO END SPEECH RECOGNITION FOR AMHARIC LANGUAGE USING DEEP LEARNING

Yohannes, Ayana

URI: http://ir.bdu.edu.et/handle/123456789/15771

Date: 2023-07

Abstract:

Speech recognition, also known as automatic speech recognition (ASR), is a technology that enables software to transcribe spoken language into text. However, traditional ASR methods require multiple separate blocks, such as language, acoustic, and pronunciation models with dictionaries, which can be time-consuming and impact performance. This study proposes an approach that replaces much of the speech pipeline with a single recurrent neural network (RNN) architecture. Our proposed architecture is based on a hybrid approach that combines a convolutional neural network (CNN) with a recurrent neural network (RNN) and a connectionist temporal classification (CTC) loss function. We perform three main experiments using different datasets: one with clean audio data consisting of 576,656 valid sentences, another with noisy audio data containing 20,000 valid sentences, and a third experiment that combined both datasets resulting in 596,656 valid sentences. The system was evaluated using the word error rate (WER) metric, achieving impressive results of 2% WER on noise-free data, 7% WER on noisy data, and 5% WER on combined data. This approach has significant implications for the field of speech recognition, as it reduces the human effort required to create dictionaries and improves the efficiency and accuracy of ASR systems, making them more practical for real-world applications. For future improvements, we suggest considering the inclusion of dialectal and spontaneous data in the dataset. Additionally, fine-tuning the model on specific tasks can help tailor its performance to specific objectives or domains, further improving its effectiveness in those areas.

Show full item record