AMHARIC CAPTION GENERATION FROM HOLY PICTURES USING DEEP LEARNING

AGEGNEHU, TESHOME

AMHARIC CAPTION GENERATION FROM HOLY PICTURES USING DEEP LEARNING

AGEGNEHU, TESHOME

URI: http://ir.bdu.edu.et/handle/123456789/15769

Date: 2023-06

Abstract:

The widespread availability of portable cameras and mobile phone cameras has led to a significant increase in the number of captured images. However, effectively describing the information contained in these images remains a major challenge. Image caption generation, a complex task in computer vision and natural language processing, aims to automatically generate accurate descriptions of image content. Unfortunately, many existing approaches in image captioning rely on simplistic feature extraction techniques that do not fully exploit the potential of object detection and color values, leading to inaccurate or incomplete descriptions. In the context of image captioning, holy pictures hold special religious significance. Surprisingly, there has been a noticeable lack of studies focusing on generating accurate captions for such images. To address this gap, we proposed the Holy Pictures Amharic Caption Generation system, which leverages advanced techniques. Our approach involved collecting 2,300 holy pictures from various sources and manually localizing and preparing the captions and object labels for 2,070 images in the training and validation datasets. To enhance image quality, we applied techniques such as CLAHE histogram equalization, YCrCb color space conversion, and bilateral filtering for noise removal. For the caption generation process, we utilized the SSD object detection model for accurate object localization and color feature extraction. The input images were encoded using the XceptionV3 architecture to generate image features, which were then decoded using the LSTM with Attention mechanism to generate descriptive captions. Our experimental results demonstrated superior performance, achieving scores of 0.84, 0.88, 0.89, and 0.81 for the BLEU-4, CIDEr, SPICE, and HEM metrics, respectively. These results highlight the effectiveness of leveraging advanced image processing techniques, color values, and object detection in improving the accuracy and detail of image captions. However, a weakness lies in the small dataset used, limiting generalization. Future research should focus on expanding the dataset to improve the model’s applicability and performance Keywords: image caption; color space; LSTM; SSD; object detection, color feature, attention

Show full item record