BDU IR

THESAURUS-BASED QUERY EXPANSION FOR DESIGNING SIDAAMA INFORMATION RETRIEVAL

Show simple item record

dc.contributor.author Kayimo, Senbeto
dc.date.accessioned 2020-03-16T09:24:46Z
dc.date.available 2020-03-16T09:24:46Z
dc.date.issued 2020-03-16
dc.identifier.uri http://hdl.handle.net/123456789/10355
dc.description.abstract There is mass storage of information produced by different individuals, groups, companies and institutions in their business activities. Not all the stored documents are relevant to the user need at a certain point of time. IR systems are developed to organize, and find the relevant documents to satisfy user’s needs, but irrelevant documents are still being retrieved together with the relevant one. The synonymous word mismatches and ambiguous short queries are major challenges hindering the performance of IR systems. This research work is aimed to design and evaluate the Sidaama Text Retrieval System by applying the thesaurus based query expansion approach in order to reduce the omission of important text files due to synonym word mismatch. We employed the handcraft thesaurus for user query expansion. We designed the proposed IR system with three subcomponents such as indexing, query expansion and searching. We used inverted file index to create index structure, the popular VSM model with TF*IDF to compute the term weight, and the cosine similarity to measure similarity score. To evaluate the effectiveness of the proposed system, we prepared 250 documents as dataset and 10 queries. The relevance judgement between queries and dataset was done by the expertise, and the recall, precision and Fmeasures are used to quantify the efficiency of the proposed prototype system. The experimentations were done in two ways: baseline VSM without query expansion and using the thesaurus based query expansion to VSM. In the first experiment: the standard VSM resulted the average precision, recall and F-measure values of 55.13%, 77.91% and 63.19% respectively. In the second experiment: the thesaurus based query expansion approach applied to VSM model resulted in the average precision, recall and F-measure values of 58.18%, 90.99% and 69.67% respectively. The experimental results show that, the IR system with thesaurus based query expansion outperforms the standard VSM model by 6.48% F-measure. Additionally, the visible improvement of 13.08% recall value was recorded in query expansion based IR than the baseline VSM. Finally we can conclude that the proposed thesaurus based query expansion approach associated with the VSM model is more preferable for the Sidaama information retrieval system than the standard VSM. During the study, the absence of rule based stemming algorithm and standardized thesaurus were the major challenges that reduced the performance, henceforward require further studies. en_US
dc.language.iso en en_US
dc.subject Information Technology en_US
dc.title THESAURUS-BASED QUERY EXPANSION FOR DESIGNING SIDAAMA INFORMATION RETRIEVAL en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record