Abstract:
Telecommunications companies collect and store enormous amounts of data, including call detail data, network data, and customer data. Nowadays, most telecom companies with poor clustering have a difficult time delivering the exact product or service to satisfy the needs of their customers.
In this research work, the KDD methodology is followed to conduct the data mining process. The mobile subscribers' CDR data along with billing information and customer category are collected, cleansed, transformed, and integrated for experimenting with the clustering data mining method. Various experiments are conducted using the MATLAB data mining tool.
In this thesis, two clustering algorithms, K-means and Fuzzy C-Means clustering algorithm, were used to cluster customers using real CDR datasets from an ethio telecom mobile company. Based on the Silhouette evaluation criterion, the optimal number of clusters was three. We have been two major experimental phases in this research work. In the first experiment phase, we have used normal K-means clustering and Fuzzy C-means clustering techniques. In the second experimental phase, we have used the normalization process before doing the K-means clustering and a Fuzzy C-means clustering techniques on the data set. Finally, using a normalization method procedure, we were able to obtain a satisfactory result.
The valued cluster was found using the K-means clustering algorithm and the Fuzzy C-means clustering algorithm. Hence, these clusters are compared with the ground truth label data set, the k-means clustering algorithm was 99.6% accurate in clustering telecom customers, while the Fuzzy C-means clustering algorithm was 99.1% accurate in clustering telecom customers.
The K-Means clustering algorithm have a high purity value and a low entropy value. Good clustering confirms this. Whereas, compared to K-Means, the Fuzzy C-Means clustering have a lower purity value and a high entropy value. K-means clustering is more accurate and takes less time to compute. Fuzzy C-means clustering shows similar results that are comparable to K-means clustering, but it takes longer to compute. Therefore, the K-means clustering algorithm was found to be more appropriate for clustering telecom users.