BDU IR

TOPIC CLUSTERING ON AMHARIC COMMENTS USING BERT EMBEDDING AND PARTITIONING ALGORITHMS.

Show simple item record

dc.contributor.author HAIMANOT, HAILU
dc.date.accessioned 2022-12-31T06:40:32Z
dc.date.available 2022-12-31T06:40:32Z
dc.date.issued 2022-08
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/14784
dc.description.abstract Topic clustering is one of the methods to organize comments posted on Online Media Service (OMS) news. Online news such as social media news has many comments every day. However, most comments are not well - organized to easily find relevant information on a specific topic. And topic clustering for short text documents is a very challenging task , especially for comments that are very concise and contain few words per document. In addition , the short text has the problems of data sparsity and irregularity, and most words only appear once in a short text. To the best of our knowledge, there is no work for clustering short Amharic comments. To address the aforementioned problems, we have developed an Amharic comments topic clustering model using contextual sentence representation and partition - based algorithms. Th is thesis aim s to design and develop a topic clustering model for Amharic comments on OMS news. We used BERT (Bidirectional Encoder Representations from Transformers) models for contextual sentence representation . The transfer learning method is used for sentence embedding of Amharic comments using English BERT. Finally, we applied mini - batch k- means and Fuzzy c- means clustering algorithms. We conducted experiments on the two models and the experiment results show that BERT embedding with mini - batch K- means clustering algorithm and BERT with fuzzy C- means clustering h as equal values of 1.0 of the v - measure score, adjusted- rand- score, and adjusted- mutual - information - score. But fuzzy C - means have a lower silhouette- score value of 0.996 than mini - batch K- means which have a 0.998 score value. Mini - batch K- means clustering is more accurate and takes less time to compute. Fuzzy C - means clustering shows similar results that are comparable to mini - batch K- means clustering, but it takes longer to compute. Therefore, the mini - batch K- means clustering algorithm was found to be more appropriate to cluster Amharic comments to news en_US
dc.language.iso en_US en_US
dc.subject INFORMATION TECHNOLOGY en_US
dc.title TOPIC CLUSTERING ON AMHARIC COMMENTS USING BERT EMBEDDING AND PARTITIONING ALGORITHMS. en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record