BDU IR

Constructing a Model for Ethiopian Traditional Musical Instrument Classification Using Deep Learning Approach

Show simple item record

dc.contributor.author Sewunet, Mosie Fentahune
dc.date.accessioned 2024-12-05T07:54:19Z
dc.date.available 2024-12-05T07:54:19Z
dc.date.issued 2024-02
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/16287
dc.description.abstract Music is a form of communication that expresses cultural assumptions about social relations and defines subcultures, organizations, classes, or nations. In the past, there have been studies on both traditional and modern musical instrument sound classification. However, the majority of these studies have focused on the sounds of solo instruments, using the diatonic music scale. On the other hand, efforts in polyphonic material analysis mainly concentrate on recognizing the predominant instruments. Additionally, there is a lack of research and available datasets for classifying ETMIS. To address this, we developed a classification model for ETMIs using CNN, LSTM, and BILSTM. This multilabel classification problem aims to identify all possible musical instruments from given audio data. We collected 5104 audio files from YouTube and Bahirdar Mahber Kidusan Training Center. Additionally, it is important to note that this dataset was annotated by music experts who work for the Mulalem Culture Center. Our study examines the effects of noise reduction methods, audio normalization techniques, hyperparameter values, audio segment time duration, feature extraction methods, and augmentation techniques on the model's performance. After augmenting the dataset, its size increased from 5,104 to 8,500 instances. Through experiments comparing MFCC, Delta MFCC, Delta2 MFCC, and concatenation, we found that MFCC yielded the best results for the CNN, LSTM, and BILSTM models. Dropout and early stopping techniques were employed to prevent overfitting. The CNN model achieved the highest test accuracy of 96% after augmentation without noise reduction, while the LSTM test accuracy of 93%, and BILSTM models achieved accuracies of 94%. Precision values for the CNN, BILSTM, and LSTM models were 94%, 92%, and 91% respectively. The recall values were 94%, 93%, and 93% for the CNN, BILSTM, and LSTM models respectively. Based on these evaluations, we concluded that the CNN model is the most suitable choice for the Ethiopian traditional musical instrument sound dataset.This model is able to identify all instruments played in the audio. However, it does not identify what a musical instrument looks like or what sound it makes. It only lists all available musical instruments played on the audio data. Keywords: ETMIS, MFCC, Delta MFCC, Delta2 MFCC, LSTM, CNN, BILSTM, and WAV. en_US
dc.language.iso en_US en_US
dc.subject Computer Science en_US
dc.title Constructing a Model for Ethiopian Traditional Musical Instrument Classification Using Deep Learning Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record