Abstract:
Text summarization is one of the application of natural language processing and is hatting to be prevalent for data condensation. A developing number of Amharic service providers are publishing their content online. To mention a few, the Ethiopian Reporter, Addis Admass, and Addis Zena have been updating their websites regularly with news of all per a day. Indeed in spite of the fact that a few inquiries about have been done on Amharic text summarization utilizing distinctive calculations, most of them were explored for summarizing single Amahric news. Nowadays, data customers are suffocating in natural language text. Whereas the web expanded to text collections on a assortment of points, customers presently confront a significant amount of redundancy within the writings they encounter online. Previous researchers have used different text summarizers for Amharic single text summarization and the generated summary is from a single Amharic news article. However if a user wants comprehensive information on a certain topic or different topic at once, it is quite likely that a single document would not provide all the required information. In such a case multiple documents selected by the user would be given to a multi document summarization system. This study focus on Amharic multi document text news summarization using the latent semantic analysis (LSA) algorithm this algorithm have three phases that are input matrix creation, singular value decomposition and sentences selection. We are using neatbeans with python software’s by using neatbeans plugging environment.
The data source for this study were Ethiopian news reporter, Addis Admass, and Addis Zena and walta information centre and totally 83Amharic text news were collected and different testing were conducted using 20%,30% and 40% extraction rates. The performance of the LSA text summarizer were evaluated using intrinsic evaluation techniques those were both subjective and objective evaluation methods. The thesis introduces sentence extraction -based summarization for Amharic multi document text news using LSA text summarizer. The summary that is generated at 20% extraction rate got 68.00% performance and the summary that have been generated at 30% extraction rate has scored 76.00% and the last summary which have been generated at 40% extraction rate has got 85.00% out of 100% in subjective evaluation technique.
In the objective evaluation technique, we used f-measure and the summarizer at 20% extraction rate performs an average of 66.86% f-measure, at 30% extraction rate it scored 74.12% f- measure and at 40% extraction rate the summarizer got f-measure of 79.71%.