RDBMS Based Hadoop Metadata and Log Data Management Optimization

Haiying Che, Octave Iradukunda, Khalilov Shahin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

At the moment, metadata is one of the fastest growing sub-segments of enterprise data management. While metadata is growing, it is not able to keep pace with the rapid increase of Big Data projects being currently initiated by organizations. Nowadays, it refers to this as the 'Big Data Gap'. This paper introduces novel approach by bringing Apache Hadoop and Relational database together to minimize the query time, resource usage, and increase the fault tolerance, and efficiency. Hadoop's metadata and log files are synchronously being migrated to the PGSQL and easily controlled through the graphical user interface. The experiment part has used 100.000's of movie rates dataset and decreased the resource usage of NameNode by giving the task of log and metadata analysis to the PGSQL. The query time in PGSQL is 1.5 times faster than Hadoop and the data format is in structured format comparing to Hadoop. Although, the technique implemented on a single node, it outperformed existing hadoop on premise and on cloud. The technique makes the metadata and log data management easier through the GUI that uses charts and graphs. The results suggest that the proposed approach performs better than existing solution and sharply decreases the usage of Big Data hardware systems and budget as well.

Original languageEnglish
Title of host publicationICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728123455
DOIs
Publication statusPublished - Dec 2019
Event2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019 - Chongqing, China
Duration: 11 Dec 201913 Dec 2019

Publication series

NameICSIDP 2019 - IEEE International Conference on Signal, Information and Data Processing 2019

Conference

Conference2019 IEEE International Conference on Signal, Information and Data Processing, ICSIDP 2019
Country/TerritoryChina
CityChongqing
Period11/12/1913/12/19

Keywords

  • PostGresql
  • Relational Databases
  • big data
  • hadoop
  • log data
  • log management
  • metadata
  • query optimization
  • resource optimization

Fingerprint

Dive into the research topics of 'RDBMS Based Hadoop Metadata and Log Data Management Optimization'. Together they form a unique fingerprint.

Cite this