Hive is a query language developed by facebook. Hadoop can
give support to any kind of data
·
Structured data like database tables
·
Unstructured data like videos, audios, pdf, txt
files etc
·
Semi structured data like xml
Hadoop supports HiveQL.
Difference between
SQL & HiveQL
- In SQL we can insert data values row by row but not in HQL
- In SQL we can update any row or column but not in HQL because data is stored in hdfs, after putting data into hdfs you shouldn’t change the contents of data.
- In SQL we can use delete but not in HQL.
- In HIVE every table is created as a directory.
HQL datatypes
Like other rdbms (Oracle, mysql, sql server), it also has
databases
TinyInt
|
Float
|
Map
|
ShortInt
|
Double
|
Array
|
BigInt
|
String
|
Struct
|
Here map, array, struct are called collection datatypes.
Creating hive tables:
Hive tables can be created two ways :
1.
Managed tables or Internal tables
2.
External tables
Managed tables or
Internal tables:
user@machine:~$ hive
hive>
create table employee(id int, name string,salary float)
>row
format delimited
>fields
terminated by ‘\t’;
Important points:
Ø
String can contain any kind of data
Ø
In SQL if you want to insert data you have to
first create schema or table but in HQL you can either create table and insert
data or you can insert data and then create table.
Ø
If you will apply ; after table column in create statement in HQL, it will give
you null,null but not actual data but it will not give you any error, so you
need to write delimiter & terminated line.
Loading data into
HIVE tables:
Data can be loaded two ways-
Either from local file system or from hdfs
Loading data from
local file system:
hive>load data local inpath <filepath> into table
<tablename>
Loading data from hdfs:
hive>load data inpath <filepath> into table
<tablename>
Ø
If it is a local file system the default path is
home/user
Ø
If it is hdfs, it is user/user
Here we will hear few words like metadata which means data
about the data & metastore which means keeping metadata to store.
External Tables:
hive> create external
table employeeE(id int, name string,salary float)
>row
format delimited
>fields
terminated by ‘\t’
>location
“/vimal/newfolder”;
Concept:
Ø
If we are creating internal tables the table
name is created as a directory on warehouse. If we are creating external tables
the table name will never be created as a directory name but is just trying to
refresh some location /vimal/newfolder.
Ø
For global usage you can refer external table
but not internal table.
Intenal Table:
/user / hive / warehouse
employee (directory)
employee
(file)
employee1
(file)
External Table:
/vimal / newfolder /
employee
(file)
No comments:
Post a Comment