BigData with Vimal: 2017

Hi guys! In previous article, I described you about some basic commands and editors of Linux. If you are new to this series you can read that from below link-

Basic Linux Commands Part2

Most of us are well familiar with windows OS. Linux is also good platform. If you will start working on it, you will realize it.

Note : Most of the issues in Big Data came from Linux security section. I will try to cover it too. Today we will learn new topics under Linux basics. Be careful, Linux is case sensitive.

echo command:

This is used to print some statements.

Eg. user@machine:~$echo “This is sample statement by vimal”

We have few concepts for defining and saving standard input, output and error.

0 – for standard input

1 – for standard output

2 – for standard error

Please look into below image to understand it -

Mail command:

To send a mail, we will use mail command.

user@machine:~$ mail –s”This is subject” user<messagebody.txt

grep command:

grep is searching finding filtering tool. It can do things like pattern matching & mapping

eg. user@machine:~$ grep sometext file.txt

user@machine:~$ grep sometext ./*

user@machine:~$ grep sometext ./* | unique | cut –d: -f1

Pipes & Directional operators:

è user@machine:~$Program1 | Program2

§ This means output of program1 works as input for program2

§ Eg.

§ user@machine:~$ ps aux (to check processes)

§ user@machine:~$ ps aux | less | uniq | sort

è user@machine:~$Program1 && Program2

§ This means when program1 true then program2 execute

§ Eg.

§ user@machine:~$ ls filenotexist.txt && echo “Unsuccessful”

Package management with apt-get:

apt-get mainly used for installation of softwares, packages

§ user@machine:~$ apt-get update

o It may prompt permission denied. To avoid this we will use-

§ user@machine:~$ sudo apt-get update

§ user@machine:~$ sudo apt-get upgrade

§ user@machine:~$ sudo apt-get install <applicationname>

o To install any software or application

§ user@machine:~$ apt-cache search editor

o To find editor

§ user@machine:~$ sudo apt-get remove <applicationname>

o To remove software or application

Hive is a query language developed by facebook. Hadoop can give support to any kind of data

· Structured data like database tables

· Unstructured data like videos, audios, pdf, txt files etc

· Semi structured data like xml

Hadoop supports HiveQL.

Difference between SQL & HiveQL

In SQL we can insert data values row by row but not in HQL
In SQL we can update any row or column but not in HQL because data is stored in hdfs, after putting data into hdfs you shouldn’t change the contents of data.
In SQL we can use delete but not in HQL.
In HIVE every table is created as a directory.

HQL datatypes

Like other rdbms (Oracle, mysql, sql server), it also has databases

TinyInt	Float	Map
ShortInt	Double	Array
BigInt	String	Struct

Here map, array, struct are called collection datatypes.

Creating hive tables:

Hive tables can be created two ways :

1. Managed tables or Internal tables

2. External tables

Managed tables or Internal tables:

user@machine:~$ hive

hive> create table employee(id int, name string,salary float)

>row format delimited

>fields terminated by ‘\t’;

Important points:

Ø String can contain any kind of data

Ø In SQL if you want to insert data you have to first create schema or table but in HQL you can either create table and insert data or you can insert data and then create table.

Ø If you will apply ; after table column in create statement in HQL, it will give you null,null but not actual data but it will not give you any error, so you need to write delimiter & terminated line.

Loading data into HIVE tables:

Data can be loaded two ways-

Either from local file system or from hdfs

Loading data from local file system:

hive>load data local inpath <filepath> into table <tablename>

Loading data from hdfs:

hive>load data inpath <filepath> into table <tablename>

Ø If it is a local file system the default path is home/user

Ø If it is hdfs, it is user/user

Here we will hear few words like metadata which means data about the data & metastore which means keeping metadata to store.

External Tables:

hive> create external table employeeE(id int, name string,salary float)

>row format delimited

>fields terminated by ‘\t’

>location “/vimal/newfolder”;

Concept:

Ø If we are creating internal tables the table name is created as a directory on warehouse. If we are creating external tables the table name will never be created as a directory name but is just trying to refresh some location /vimal/newfolder.

Ø For global usage you can refer external table but not internal table.

Intenal Table:

/user / hive / warehouse

employee (directory)

employee (file)

employee1 (file)

External Table:

/vimal / newfolder /

employee (file)

BigData with Vimal

Monday, January 30, 2017

Basic Linux Commands Part2

Wednesday, January 25, 2017

Learn Big Data : Hive