RSIS International

Clustering Algorithms for High Dimensional Data Literature Review

Submission Deadline: 29th November 2024
November 2024 Issue : Publication Fee: 30$ USD Submit Now
Submission Deadline: 20th November 2024
Special Issue on Education & Public Health: Publication Fee: 30$ USD Submit Now
Submission Deadline: 05th December 2024
Special Issue on Economics, Management, Psychology, Sociology & Communication: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue III, March 2018 | ISSN 2321–2705

Clustering Algorithms for High Dimensional Data Literature Review

S. Geetha#, K. Thurkai Muthuraj*

IJRISS Call for paper

  #Department of Computer Applications, Mepco Schlenk Engineering College, Sivakasi, TamilNadu, India
*Tata Consultancy Services

Abstract – In modern world, the complex data sets are growing. Clustering high dimensional data is challenging due to its dimensionality problem and it affects time complexity, space complexity, scalability and accuracy of clustering methods. This review will be more helpful to find clustering algorithms suitable for high dimensional data.

I. INTRODUCTION

In modern scientific and business domains, the high-dimensional data are involved. Clustering in high dimensional spaces presents much difficulty [1]. When talking about clustering high dimensional data, the clustering steps, Dimensionality Reduction, Subspace Clusteringand Co-Clustering will be more helpful to address the problem of high dimension [20].

Clustering is helpful to understand the structure and abstract of the large data set [2]. Clustering methods are available for categorical data, spatial data, etc. Clustering methods are applied in object recognition, pattern recognition, image processing, text mining and information retrieval [21].

Clustering means partitioning data point into a set of groups.Simplification can be achieved by representing data in fewer clusters [3]. There are number of clustering algorithms introduced for clustering high dimensional data. It is broadly classified into partitioning and Hierarchical.Partitioning subdivided into K-means and K-medoids [8].CLARA and CLARANS are popular to deal with large datasets. Hierarchical further classified into Agglomerative and Divisive. BIRCH, Chameleon, ROCK and CURE are examples of hierarchical method which deal with large amount of data [5]. Other categories of clustering methods are Model based clustering, Density based clustering, Grid–based clustering, and Constrained based clustering.

Cluster analysis has been an area of research for many decades. Many new methods are still being developed. In section 2, popular and mostly used clustering algorithms are discussed.