Clustering Algorithms for High Dimensional Data Literature Review
- March 28, 2018
- Posted by: RSIS
- Category: Computer Science and Engineering
International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue III, March 2018 | ISSN 2321–2705
Clustering Algorithms for High Dimensional Data Literature Review
S. Geetha#, K. Thurkai Muthuraj*
#Department of Computer Applications, Mepco Schlenk Engineering College, Sivakasi, TamilNadu, India
*Tata Consultancy Services
Abstract – In modern world, the complex data sets are growing. Clustering high dimensional data is challenging due to its dimensionality problem and it affects time complexity, space complexity, scalability and accuracy of clustering methods. This review will be more helpful to find clustering algorithms suitable for high dimensional data.
I. INTRODUCTION
In modern scientific and business domains, the high-dimensional data are involved. Clustering in high dimensional spaces presents much difficulty [1]. When talking about clustering high dimensional data, the clustering steps, Dimensionality Reduction, Subspace Clusteringand Co-Clustering will be more helpful to address the problem of high dimension [20].
Clustering is helpful to understand the structure and abstract of the large data set [2]. Clustering methods are available for categorical data, spatial data, etc. Clustering methods are applied in object recognition, pattern recognition, image processing, text mining and information retrieval [21].
Clustering means partitioning data point into a set of groups.Simplification can be achieved by representing data in fewer clusters [3]. There are number of clustering algorithms introduced for clustering high dimensional data. It is broadly classified into partitioning and Hierarchical.Partitioning subdivided into K-means and K-medoids [8].CLARA and CLARANS are popular to deal with large datasets. Hierarchical further classified into Agglomerative and Divisive. BIRCH, Chameleon, ROCK and CURE are examples of hierarchical method which deal with large amount of data [5]. Other categories of clustering methods are Model based clustering, Density based clustering, Grid–based clustering, and Constrained based clustering.
Cluster analysis has been an area of research for many decades. Many new methods are still being developed. In section 2, popular and mostly used clustering algorithms are discussed.