Hive is one of the important tools in Hadoop eco-system and it provides an SQL like dialect to Hadoop Distributed File System (HDFS).
Features of Hive:-
- Tools to enable easy data extract/transform/load (ETL)
- A mechanism to project structure on a variety of data formats
- Access to files stored either directly in HDFS or other data storage systems as HBase
- Query execution through MapReduce jobs.
- SQL like language called HiveQL that facilitates querying and managing large data sets residing in Hadoop.
Limitations of Hive:-
- Hive is best suited for data warehouse applications, where a large data set is maintained and mined for insights, reports, etc.
- Hive does not provide record-level update, insert or delete.
- Hive queries have higher latency than SQL queries, because of start-up overhead for MapReduce jobs submitted for each hive query.
- As Hadoop is a batch-oriented system, Hive doesn’t support OLTP (Online Transaction Processing).
- Hive is close to OLAP (Online Analytic Processing) but not ideal since there is significant latency between issuing a query and receiving a reply, both due to the overhead of Mapreduce jobs and due to the size of the data sets Hadoop was designed to serve.