What is Hive?

Hive is one of the important tools in Hadoop eco-system and it provides an SQL like dialect to Hadoop Distributed File System (HDFS).

Features of Hive:-

  • Tools to enable easy data extract/transform/load (ETL)
  • A mechanism to project structure on a variety of data formats
  • Access to files stored either directly in HDFS or other data storage systems as HBase
  • Query execution through MapReduce jobs.
  • SQL like language called HiveQL that facilitates querying and managing large data sets residing in Hadoop.

Limitations of Hive:-

  • Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.
  • Hive does not provide record-level update, insert or delete.
  • Hive queries have higher latency than SQL queries, because of start-up overhead for MapReduce jobs submitted for each hive query.
  • As Hadoop is a batch-oriented system, Hive doesn’t support OLTP (Online Transaction Processing).
  • Hive is close to OLAP (Online Analytic Processing) but not ideal since there is significant latency between issuing a query and receiving a reply, both due to the overhead of Mapreduce jobs and due to the size of the data sets Hadoop was designed to serve.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s