Review and Comparative Analysis of Data Clustering Algorithms
- September 16, 2020
- Posted by: RSIS Team
- Categories: IJRIAS, Mathematics
International Journal of Research and Innovation in Applied Science (IJRIAS) | Volume V, Issue III, February 2020 | ISSN 2454–6186
Review and Comparative Analysis of Data Clustering Algorithms
Ugonna Victor Okolichukwu1, Beatrice Adenike Sunday2, Friday E. Onuodu3
1 Department of Computer Science Education, Federal College of Education, Eha-Amufu, Enugu State, Nigeria
2 Department of Computer Science, School of Postgraduate Studies, Ignatius Ajuru University of Education, Rivers State, Nigeria
3 Department of Computer Science, University of Port Harcourt, River State, Nigeria
Abstract—- Data mining is a process with an objective of information extraction from huge datasets. Data mining involves extracting useful data from a huge quantity of raw data to solve a given problem of clustering. Thus, it is otherwise called Knowledge Discovery of Data (KDD). Clustering is an aspect of machine learning that is of great importance in the area of data mining analysis. Clustering involves the grouping of a set of similar data objects into the same group (clusters) considering their unique qualities and similarity. A good clustering algorithm will result to an increased rate of intra-grouped similarity and a decreased rate of inter-grouped similarity. Clustering algorithm are grouped into Hierarchical, Partitioning and Density-based clustering algorithm. The Partitioning clustering algorithm splits the data objects into a number groups called partition and each partition represents a cluster. Hierarchical clustering techniques creates a hierarchy or tree of clusters for the data objects. Density-based algorithms groups its data objects based on a particular neighbourhood and locates the cluster in regions with high density. The purpose of this paper was to do a comparison between hierarchical, partitioning and density-based clustering algorithms based on their observed features and functions, and the metrics used is ability to deal with or handle noise and/or outliers. We conclude our findings using a summary table on their performance, stating that Density-based algorithm is highly sensitive in dealing with outliers and/or noise easily than hierarchical and partitioning clustering algorithm.
Keywords— Data mining, Clustering, Data clustering algorithm, Outlier.