25
Slide 1 © 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data Analytics using Pig

Introduction to Pig | Pig Architecture | Pig Fundamentals

Embed Size (px)

Citation preview

Page 1: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data Analytics using Pig

Page 2: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Scope of PPT – BIG Data Analytics via PIG

ᗍ Introduction to Big Data and Hadoop

ᗍ Introduction to Pig

ᗍ Hadoop Pig Architecture

ᗍ BIG Data Analytics via Pig

ᗍ BIG Data & Hadoop Job Trends

ᗍ BIG Data & Hadoop Course Syllabus

Get Started with BIG Data & Hadoop

Page 3: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data and its Challenges

Get Started with BIG Data & Hadoop

Page 4: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Big Data and its Challenges

Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications

Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information

It’s very difficult to manage such huge data……

Get Started with BIG Data & Hadoop

Page 5: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Who Generates Big Data?

Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data?Today, it is becoming a problem for all of us to manage such BIG DATA….Get Started with BIG Data & Hadoop

Page 6: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop can be used for easy processing of such huge Data…..We will answer how?

Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop

Page 7: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Hadoop and its Characteristics

Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model

It is an Open-source Data Management technology with scale-out storage and distributed processing

Hadoop Characteristi

cs

Flexible

Reliable

Economical

Scalable Get Started with BIG Data & Hadoop

Page 8: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Flume Sqoop

Import Or Export

Unstructured or Semi-Structured data Structured Data

Apache Oozie (Workflow)

HDFS(Hadoop Distributed File System)

Pig LatinData Analysis

HiveDW System

MapReduce Framework HBase

Other YARN

Frameworks (MPI, GIRAPH)

YARNCluster Resource Management

Hadoop Ecosystem

Get Started with BIG Data & Hadoop

Page 9: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Need for Pig

Java is not a preferred language for many data

analysts

200 Java LOC ~ 10 Pig LOC

Many built-in operations are available for common data operations like join, grouping, filtering etc.

Get Started with BIG Data & Hadoop

Page 10: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Where to use Pig?

Pig is a Data Flow language, thus it is most suitable for:

ᗍ Quickly changing data processing requirementsᗍ Processing data from multiple channelsᗍ Quick hypothesis testingᗍ Time sensitive data refreshesᗍ Data profiling using sampling

Get Started with BIG Data & Hadoop

Page 11: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

What is Pig?

ᗍ It is an open source data flow language

ᗍ Pig Latin is used to express the queries and data manipulation operations in simple

scripts

ᗍ Pig converts the scripts into a sequence of underlying Map Reduce jobs

Get Started with BIG Data & Hadoop

Page 12: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Let’s internalize Pig

Let’s find out people who “overall” visit “highly ranked” pages

User URL Time

John www.cbn.com 7:00

John www.trap.com 7:05

John www.myblog.com 9:00

John www.flickr.com 9:05

Linda cnn.com/index.htm 11:00

Visits

Page URL Page Rank

www.cbn.com 0.9

www.flickr.com 0.9

www.myblog.com

0.6

www.trap.com 0.3

Pages

Get Started with BIG Data & Hadoop

Page 13: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Internalizing Pig

Joinurl = url

LoadVisits (user, url, time)

LoadPages (url, pagerank)

Group by User

Compute AveragePagerank

Group by User

Get Started with BIG Data & Hadoop

Page 14: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Pig in Industry

Since Pig is a data flow language, it naturally suits for:

ᗍ Data factory operations

ᗍ Typically data is brought from multiple servers to HDFS

ᗍ Pig is used for cleaning the data and preprocessing it

ᗍ It helps data analysts and researchers for quickly prototyping their theories

ᗍ Since Pig is extensible, it becomes way easier for data analysts to spawn their scripting language programs (like Ruby, Python programs) effectively against large data sets

Get Started with BIG Data & Hadoop

Page 15: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Ways to Handle Pig

ᗍ Grunt Mode:

• It’s interactive mode of Pig• Very useful for testing syntax checking and ad-

hoc data exploration

ᗍ Script Mode:

• Runs set of instructions from a file• Similar to a SQL script file

ᗍ Embedded Mode:

• Executes Pig programs from a Java program• Suitable to create Pig Scripts on the fly

Script

Grunt

Embedded

Get Started with BIG Data & Hadoop

Page 16: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Modes of Pig

All of the different Pig invocations can run in the following modes:

Local

ᗍ In this mode, entire Pig job runs as a single JVM processᗍ Picks and stores data from local Linux path

Map Reduce

ᗍ In this mode, Pig job runs as a series of map reduce jobsᗍ Input and output paths are assumed as HDFS paths

Get Started with BIG Data & Hadoop

Page 17: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Pig Components

Pig Data Flows

Pig Latin is used to express data flows

ExecutionEnvironments

Distributed execution on a Hadoop Cluster

Local execution in a single JVM

1.

2.

Get Started with BIG Data & Hadoop

Page 18: Introduction to Pig | Pig Architecture | Pig Fundamentals

© 2015 Blue Camphor Technologies (P) Ltd. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Pig is just a wrapper on top of Map Reduce layer

It parses, optimizes and converts the Pig script to a series of Map Reduce jobs

Pig A series of MapReduce JobsTurns the transformations into…

Pig Programs Execution

Get Started with BIG Data & Hadoop

Page 19: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Job Trends – Hadoop

Get Started with BIG Data & Hadoop

Page 20: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Why SkillSpeed?

Course Curriculum

from Industry Experts

Instructor Led Live Virtual Sessions

Lifetime access to Course

Content via LMS

100% Placement Assistance

24x7 Support

24x7

Get Started with BIG Data & Hadoop

Page 21: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Course Topics

Module 1

Introduction to Big Data and Hadoop

Module 2

HDFS Internals, Hadoop

Configurations and Data Loading

Module 3

Introduction to Map Reduce

Module 4

Advanced Map Reduce Concepts

Module 5

Introduction to Pig

Module 6

Advanced Pig and Introduction to Hive

Module 7

Advanced Hive Concepts

Module 8

Extending Hive and HBase Introduction

Module 9

Advanced HBase and Oozie Introduction

Module 10

Project Set-up Discussion

Get Started with BIG Data & Hadoop

Page 22: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Corporate Partners

Get Started with BIG Data & Hadoop

Page 23: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Lines open 24/7

To know more about the course, Please contact:

IND+91-90660-20904 USA1866-607-6547 (Toll Free)

Or reach us at

[email protected]

Contact Us

Get Started with BIG Data & Hadoop

Page 24: Introduction to Pig | Pig Architecture | Pig Fundamentals

Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com

Image References

Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots

http://pixshark.com/big-data-comic.htm

http://findicons.com/icon/66444/user_group

http://www.virtualizor.com/tour

https://accounts.it.et.byu.edu/

http://www.clipartsfree.net/tag/server.html

http://www.gopixpic.com/16/time-clock-icon-png-download

http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/

http://www.lincs.fr/research/areas/big-data/

http://www.counsellingpages.co.uk/

http://langfordsconsultancy.com/langfords-training-support-package/

http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html

http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010

Page 25: Introduction to Pig | Pig Architecture | Pig Fundamentals