Abstract:
Machine learning approaches are applied in different fields of disciplines. The approach used in
each area is implemented with a supervised or unsupervised learning method. The new and
rapidly growing research area has emerged with the digitalization of music, called Music
Information Retrieval (MIR), which emphasizes the extraction of information from music audio
and musical notes. This recent technology focuson the categorization of the given audio music
into several classes based on its characteristics. It isa researchable area which includes genre
classification, song identification, chord recognition, sound event detection, and mood detection.
Zema defined as tactical shouting to produce a sweet song with zema notation for listeners. Zema
classification is one category of MIR which is defined as the technique of grouping audio zema
into appropriate classes. The first composer of spiritual melody was St. Yared with three Zema
forms. These forms are Geez, Ezil, and Araray. He given six compositions of zema and stated its
own features. Kum Zema is one of his compositions which is sung with only vocal sound, no
instruments are used like that of Kebero, Tsinatsil, Mekuamia.
The main things which initiated us to conduct this study was most of the flocks as well as some
disciples who passed with traditional school are not identified each zema genres properly. The
knowledge gap between modern education and traditional on zema genres. Most study were
carried out on classifying the data which doesn’t have inter as well as intra similarity between the
dataset.The dataset is prepared from the recorded audio Zema taken from experts. Each audio
zema segmented into an equal size of 10 seconds. The segmented audio Zema is changed into a
visual representation form called a spectrogram.
We applied a convolutional neural network for classification, because it has better performance
in image processing. So, the spectrogram with a specified size becomes an input for CNN, and
each layer of the network filters the image. Features are also extracted from the spectrogram and
finally, the SoftMask classifier classifies the input audio into three classes. The research method
we used is experimental and the result obtained from our model, SYKZC, is 98% training
accuracy and 88 % testing accuracy.