dc.description.abstract |
Information retrieval is a mechanism that enables finding relevant information material of
unstructured nature that satisfies information needs of user from large collection. Since
there are usually many ways to express the same concepts, the terms in the user’s query
may not appear in a relevant document. Alternatively, many words can also have more than
one meaning which may confuse the retrieval system. This research intended to apply latent
semantic indexing to handle synonymous and polysemous words in the Silt’e text
document and users’ query. Silt’e text retrieval developed in this study has indexing and
searching subsystems. While indexing organizes index terms, searching enables matching
query terms with index terms in order to retrieve relevant documents. For the
experimenting purpose, we have used 700 Silt’e text documents and 56 queries were used
to test the prototype of the system.Silt’e text document corpus is prepared by the researcher
encompassing different reports from Silt’e culture and tourism bureau and books. Also,
various techniques of text preprocessing including tokenization, normalization, stop word
removal and stemming were used to identify content-bearing words. Experimental result
shows that the prototype registered on the average 68% recall, 79% precision and 72% F-measure. The major challenges that affect the performance of the IR prototype include lack
of standard dataset for Silt’e language and the ineffectiveness of Silt’e stemmer to conflate
Silt’e inflectional words into their stem. Therefore, in order to improve the performance of
the prototype, there is a need to develop Silt’e dataset as well as Silt’e stemmer.
Keywords: Information Retrieval, Latent Semantic Indexing, Singular Value
Decomposition |
en_US |