BDU IR

MAPREDUCE BASED-DENCLUE: AN EFFECTIVE DENSITY-BASED CLUSTERING FOR BIG DATA ANALYSIS

Show simple item record

dc.contributor.author Dagne, Ephrem
dc.date.accessioned 2020-03-17T09:21:49Z
dc.date.available 2020-03-17T09:21:49Z
dc.date.issued 2020-03-17
dc.identifier.uri http://hdl.handle.net/123456789/10533
dc.description.abstract In the contemporary world, human beings are facing the reality of the challenges with uncontrollable exploration and accumulation of huge volume of data. The augmentation of advancements and usability of ubiquitous technologies contribute highly for the resolution of complex nature of data in terms of volume and heterogeneity. Since the customary data mining methods fail to process such data, we need to search for extended optimal techniques to deal with such big data. Clustering or segmentation is one of the widely used unsupervised data analyzing techniques in the field of Data Science that classifies data points based on similarity properties. Taking the advantages of density-based clustering, DENCLUE is one of the methods that endow some added striking characteristics that other clustering techniques don’t possess. DENCLUE is established with solid mathematical foundation and working with noisy, high-dimensional feature vectors. In this research, we proposed a model to parallelizing DENLCUE using MapReduce that is tailored for the big data scenario. We implemented a new approach named as MR-based DENLCUE (MR-DENCLUE) that can breed several clusters of data points residing and running on parallel machines. On the basis of our experimentations on two different datasets (UCI SEEDS, ABALONE and MFCCs datasets) runningon 4 data nodes and a master node on a Hadoop cluster, the execution timings of MR-DENCLUE are recorded to be 48.22% and 73.8% of the execution times of DENCLUE respectively. Thus, our experiment confirms that the proposed MR-DENLCUE results in efficient clustering quality in terms of speed when tested on different datasets. Moreover, our proposed method is organized to be scalable on Hadoop data nodes that work well on added data nodes to accommodate massive amount of datasets. In this way, we deliver a schemetoresolve the issue of velocity and volume- two major characteristic of big data analytics. en_US
dc.language.iso en en_US
dc.subject Information Technology en_US
dc.title MAPREDUCE BASED-DENCLUE: AN EFFECTIVE DENSITY-BASED CLUSTERING FOR BIG DATA ANALYSIS en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record