BDU IR

AUTOMATIC THESAURUS CONSTRUCTION FROM AFAN OROMO TEXT

Show simple item record

dc.contributor.author Wakjira, Firomsa
dc.date.accessioned 2020-03-17T09:32:04Z
dc.date.available 2020-03-17T09:32:04Z
dc.date.issued 2020-03-17
dc.identifier.uri http://hdl.handle.net/123456789/10539
dc.description.abstract Thesaurus is a reference of words or of information about a particular field or set of concepts, especially, a tome of words and their synonyms or a list of subject-headings or descriptors usually with a cross-reference system for use in the organization of a collectionof documents for reference and retrieval. One of the major problems of modern information retrieval systems is the vocabulary problem that concerns with the discrepancies between terms used for describing documents and the terms used by the searcher to describe their information need which forms the information overload or information mismatch. One way of handling the vocabulary problem is using a thesaurus that shows the relationships between terms and query expansion which provides us the alternative terms for query to improve the effectiveness of retrieval. Since the manual thesaurus construction is a labor-intensive task and hence also expensive to build and hard to update in timely manner, Afan Oromo automatic thesaurus is implemented by using the term-clustering approach. In this research, 36869 selected words from the collected document are used and are suggested to improve the expansion process and to get more relevance documents for the user's query. The performance of the experiment is very encouraging and promising as the accuracy of the system performance is 56.6% on Afan Oromo documents. And also 73.11% of the terms in the collection are registered to be similar. More challenge here is, the complexity of Afan Oromo which results in under or over stemmed and this is due to the non-proper preprocessing of the document. The performance and the accuracy of this system is improved if the document is properly preprocessed and more effective in large collections over multiple domains. The quality of the cluster is measured by intra-cluster and interclustering techniques and the result registers 1.33. en_US
dc.language.iso en en_US
dc.subject Information Technology en_US
dc.title AUTOMATIC THESAURUS CONSTRUCTION FROM AFAN OROMO TEXT en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record