The Role of Predictive Big Data Analysis of Airline Data Report by using Hive

Submission Deadline-30th July 2024
June 2024 Issue : Publication Fee: 30$ USD Submit Now
Submission Deadline-20th July 2024
Special Issue of Education: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume V, Issue III, March 2018 | ISSN 2321–2705

The Role of Predictive Big Data Analysis of Airline Data Report by using Hive

Ankaiah.G1, Rajeshkumar.P2, Munihemakumar3

IJRISS Call for paper

  1Student, Master of Computer Applications, SKIIMS, Srikalahasti, Andhra Pradesh India
2Research Scholar, Computer science, Bharathiar University, Coimbatore, India
3Asst. Professor Master of Computer Applications, SKIIMS, Srikalahasti, Andhra Pradesh India

Abstract: – The analysis of the airline data set is performed using Cloudera which runs Hadoop in the cloud. Hive and Hive QL statements have been used for querying the data. Data visualization has been done by extracting the output of the HIVE query in excel and plotting the data using line and scatter plot charts. The visualization of the data shows some patterns that exist between flight diversions and flight distance, flight cancellation and flight distance and so forth. The U.S. Department of Transportation’s (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights appear in DOT’s monthly Air Travel Consumer Report, published about 30 days after the month’s end, as well as in summary tables posted on this website. Summary statistics and raw data are made available to the public at the time the Air Travel Consumer Report is released.

Keywords: Hive, java, hadoop, Time Big Data Analytics, RTBDA, airline data320.

I. INTRODUCTION

Big data [1] is the term for a collection of data sets so large and complex. It becomes difficult to process using on-hand database management tools or traditional data procesing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.