dc.description.abstract |
This thesis delves into the intricate realm of Automatic Music Transcription (AMT), with a
specific focus on the Begena, a ten-stringed Ethiopian musical instrument. The primary aim
is to address challenges associated with manual music transcriptions, including time-consuming
efforts, inconsistencies, and financial burdens. Additionally, the study aims to overcome limitations
in existing AMT models and systems, such as the absence of scale identification, variations
in notation, and low precision. A groundbreaking aspect of this research is the integration of
Music Genre Classification (MGC) into AMT for the Begena, marking a unique advancement in
the field. The investigation combines manual feature extraction methods and spectrogram-based
experiments within the MGC model. The manual feature extraction approach employs multiple
traditional machine learning algorithms and a Convolutional Neural Network (CNN) architecture.
Meanwhile, the spectrogram-based experiment utilizes features like Constant-Q Transform
(CQT), Mel scale, and Mel-frequency cepstral coefficients (MFCC), trained with various deep
learning algorithms. Remarkably, the MFCC-trained models stand out, achieving a perfect accuracy
score of 1.00 in both the validation and testing phases for all models. Shifting the focus
to AMT models, three deep learning algorithms—CNN, CRNN, and Generative Adversarial
Network (GAN)—are implemented and meticulously trained on distinct dataset groups. The
GAN model, crafted to guide the CRNN model during training, emerges as the most effective,
achieving outstanding frame-level F1 scores of 0.867 for validation and 0.860 for testing. Furthermore,
an innovative attempt to refine AMT predictions through MGC model auto-correction
exhibits minimal improvement. A thorough analysis of the experiments reveals the model’s robust
performance in specific playing styles, such as Qoutera songs, but comparatively less in
Derib songs. Our models showcase advantages in note identification, capturing note overlaps,
and scale identification, while also revealing limitations in terms of note omissions and expanding
errors. In conclusion, this research makes a significant contribution to the progression of AMT,
particularly tailored for traditional instruments like the Begena. However, the study signifies a
starting point rather than an endpoint, leaving room for further improvement. Enhancing model
performance through the exploration of more advanced architectures and refining the dataset,
especially in terms of time interval annotations, holds promise for achieving increased accuracy
and reliability. Additionally, the potential adoption of Connectionist Temporal Classification
(CTC) offers an intriguing avenue, suggesting a more streamlined approach that could eliminate
the need for explicit time interval annotations, thereby simplifying the learning process and
potentially elevating overall transcription performance.
Keywords: Automatic Music Transcription, Begena, Music Genre Classification, Convolutional Neural
Network, Convolutional Recurrent Neural Network, Generative Adversarial Network, Constant-Q
Transform, Mel scale, Mel-frequency cepstral coefficients. |
en_US |