– Designed for batch processing.
– Real time query capabilities added to Hive (Tez)
– HiveQL query language
Allows data stored in HDFS to be accessed from within Hadoop or from databases and datawarehouses
Compare Hive & RDBMS
Focused on analytics.
Supports sequential inserts and appends.
Low cost storage using local disks
Fast data access with data skipping and sorting
Focused on real-time queries and analytics.
Random INSERT and UPDATE supported
Expensive storage using SAN technology
Fast data access through indexing
hive> CREATE TABLE sample(id INT);
hive> DESCRIBE sample;
How to process Hive Sql Statements?
Clients connect to hive server instance.
Hive parse and plan query
Query convert to map reduce
Map Reduce run Hadoop
CREATE TABLE customer (custID INT,fName STRING,lName STRING,birthday TIMESTAMP,) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
CREATE EXTERNAL TABLE SALARIES (
gender string,age int,salary int,code int) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’ LOCATION ‘/home/custsalaries/’;
LOAD DATA INPATH ‘/home/custsalaries.csv’ OVERWRITE INTO TABLE customers;