An Effective Big Data Role of Data Storage and Job Tracking Analysis through the HDFS

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue III, March 2018 | ISSN 2321–2705

An Effective Big Data Role of Data Storage and Job Tracking Analysis through the HDFS

S.Janardhan¹, Rajeshkumar.P², B.Madhu Sudhan Reddy³

¹Student, Master of Computer Applications, SKIIMS, Srikalahasti, Andhra Pradesh India
²Research Scholar, Computer science, Bharathiar University, Coimbatore, India
³Asst.Professor, Master of Computer Applications, SKIIMS, Srikalahasti, India

Abstraction: – Bigdata inspires new ways to transform processes, organizations, entire industries and even society itself. Yet extensive media coverage makes it hard to distinguish hype from reality. Big data is a collection of massive and complex data sets that include the huge quantities of data, social media analytics, data management capabilities, real-time data. Big Data includes e-mail messages, snaps, business deals, surveillance video recordings, posts of social Medias, mobile phone GPS signals and RFID readers, microphones, cameras, sensors and activity logs. That’s why, the data that exceeds the processing capacity of conventional database systems. Big data processing such as Hadoop which uses the map-reduce paradigm. Using MapReduce programming paradigm, the big data is processed. Whatever the label, organizations are starting to understand and explore how to process and analyze a vast array of information in new ways. This paper will explain about HDFS architecture, high volume of data storage and query processing.

Key Words: Hadoop, HDFS, Name Node, Secondary Name Node, JobTracker, DataNode, Task Tracker

I. INTRODUCTION

Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyzes data. Unlike traditional systems, Hadoop enables multiple types of analytic workloads to run on the same data, at the same time, at massive scale on industry-standard hardware. CDH, Cloudera’s open source platform, is the most popular distribution of Hadoop and related projects in the world (with support available via a Cloudera Enterprise subscription). The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS[1] is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nautch web search engine project.

Related Posts