– Designed for batch processing.
– Real time query capabilities added to Hive (Tez)
– HiveQL query language
Allows data stored in HDFS to be accessed from within Hadoop or from databases and datawarehouses
Compare Hive & RDBMS
Hive
Focused on analytics.
Supports sequential inserts and appends.
Low cost storage using local disks
Many Nodes
Fast data access with data skipping and sorting
Map/reduce.
RDBMS
Focused on real-time queries and analytics.
Random INSERT and UPDATE supported
Expensive storage using SAN technology
Few Nodes
Fast data access through indexing
Parallel queries
$ hive
hive> CREATE TABLE sample(id INT);
hive> DESCRIBE sample;
How to process Hive Sql Statements?
Clients connect to hive server instance.
Execute Query
Hive parse and plan query
Query convert to map reduce
Map Reduce run Hadoop
Table smaples
CREATE TABLE customer (custID INT,fName STRING,lName STRING,birthday TIMESTAMP,) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’;
CREATE EXTERNAL TABLE SALARIES (
gender string,age int,salary int,code int) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’ LOCATION ‘/home/custsalaries/’;
LOAD DATA INPATH ‘/home/custsalaries.csv’ OVERWRITE INTO TABLE customers;