36
Instructor-Led Training Course Catalog Q1’17 ®

Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

  • Upload
    others

  • View
    10

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

Instructor-Led Training

Course Catalog Q1’17

®

Page 2: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

ii

®

Contents

Welcome to MapR Academy 1

Course Formats 1 On-demand Training (ODT) 1 Public Training: Classroom (ILT) and Virtual (vILT) Training 2

Standard Per Seat Pricing – USD Only 2 Details 2

Private Training: Classroom (ILT) and Virtual (vILT) Training 2 Standard Per Course Pricing 2 Details 2

Current and Planned Courses 3 Summary of ODT Titles - 2017 3

Certifications 4 How to Schedule Instructor-led Training 5

Lead Times for Scheduling Training 5 All Customers 5

Private Training Class Size, Extra Instructor Fees 5 Onsite Classroom and Public Instructor-led Training 5 Virtual Instructor-led Training 5

Cancellation Policies 6 Public Classes 6 Right to Cancel Scheduled Courses 6 Private / Onsite Classes 6 Refunds 6

Cluster Administrator Course Descriptions 7

ADM 2000 - Hadoop Cluster Administration 7 About this course 7 Prerequisites for Success in this Course 7 Certification 7 Syllabus 7

Get Started 8 Lesson 1: Prepare for Installation 8 Lesson 2: Install the MapR Distribution 8 Lesson 3: Verify and Test the Cluster 8

Page 3: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

iii

®

Lesson 4: Users, Groups, and System Settings 8 Lesson 5: Configure Topology 8 Lesson 6: Configure MapR Volumes 8 Lesson 7: Job Logs and Scheduling 8 Lesson 8: Access the Cluster 9 Lesson 9: Snapshots 9 Lesson 10: Mirrors 9 Lesson 11: Monitor and Manage the Cluster 9 Lesson 12: Disk and Node Maintenance 9 Lesson 13: Troubleshooting 9

ADM 2100 – Upgrade a MapR Cluster 10 About this course 10 Prerequisites for Success in the Course 10 Syllabus 10

Get Started 11 Lesson 1: Plan the Upgrade 11 Lesson 2: Upgrade MapR Core 11 Lesson 3: Upgrade Ecosystem Components and MapR Clients 11

Developer Course Descriptions 12

DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course 12 Certification 12 Syllabus 12

Lesson 1: Introduction to Developing Hadoop Applications 12 Lesson 2: Job Execution Framework - MapReduce v1 & v2 13 Lesson 3: Write a MapReduce program 13 Lesson 4: Use the MapReduce API 13 Lesson 5: Managing, monitoring, and testing MapReduce jobs 13 Lesson 6: Managing Performance 13 Lesson 7: Working with Data 14 Lesson 8: Launching jobs 14 Lesson 9: Using non-Java programs (Streaming MapReduce) 14

DEV 3100 - HBase for Analysts and Architects 15 About This Course 15 Prerequisites for Success in the Course 15 Certification 15 Syllabus 16

Lesson 1 - Introduction to Apache HBase 16 Lesson 2 – Apache HBase Data Model 16 Lesson 3 – Apache HBase Architecture 16 Lesson 4 - Basic Schema Design 16 Lesson 5 - Design Schemas for Complex Data Structures 16 Lesson 6 Using Hive to Query HBase 16

DEV 3200 - HBase Application Design and Build 17

Page 4: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

iv

®

About This Course 17 Prerequisites for Success in the Course 17 Certification 17 Syllabus 18

Lesson 1 - Introduction to Apache HBase 18 Lesson 2 – Apache HBase Data Model 18 Lesson 3 – Apache HBase Architecture 18 Lesson 4 - Basic Schema Design 18 Lesson 5 - Design Schemas for Complex Data Structures 18 Lesson 6 Using Hive to Query HBase 18 Lesson 1 - Java client API Part 1 19 Lesson 2 - Java API Part 2 19 Lesson 3 - Java Client API for Administrative Features 19 Lesson 4 - Advanced HBase Java API 19 Lesson 5 – HBase Design Schema Review 19 Lesson 6 – Working with MapReduce on HBase 20 Lesson 7 - Bulk Loading of Data 20 Lesson 8 - Performance 20 Lesson 9 - Security 20

DEV 3500 - Real-time Stream Processing with MapR 21 About This Course 21 Prerequisites for Success in the Course 21 Syllabus 21

Lesson 1: Introduction to MapR Streams 21 Lesson 2: MapR Streams Architecture 21 Lesson 3: Introduction to Producers and Consumers 22 Lesson 4: Producer and Consumer Details 22

DEV 3600 - Developing Spark Applications 23 About This Course 23 Prerequisites for Success in the Course 23 Certification 23 Syllabus 24

Lesson 1 – Introduction to Apache Spark 24 Lesson 2 – Load and Inspect data in Spark 24 Lesson 3 – Build a simple Spark Application 24 Lesson 4 - Work with Pair RDD 24 Lesson 5 - Work with Spark DataFrames 25 Lesson 6 - Monitor a Spark Application 25 Lesson 7 – Introduction to Apache Spark Data Pipelines 25 Lesson 8 – Create an Apache Spark Streaming Application 25 Lesson 9 – Use Apache Spark GraphX to Analyze Flight Data 26 Lesson 10 – Use Apache Spark MLLib to Predict Flight Delays 26

Certification 26 Data Analyst Courses 27

DA 4000 – Self-service SQL with Apache Drill 27

Page 5: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

v

®

Prerequisites for Success in the Course 27 Syllabus 27

Lesson 1 – SQL Queries 27 Lesson 2 – Self Describing Data 28 Lab Exercises 28 Lesson 3 – Architecture 28

DA 4500 – Data Analysis with Apache Pig and Hive 29 About this Course 29 Prerequisites for Success in the Course 29 Syllabus 30

Lesson 1 – Apache Pig fits in the Hadoop ecosystem 30 Lesson 2 – Extract, Transform, and Load Data with Apache Pig 30 Lesson 4 –How Apache Hive fits in the Hadoop ecosystem 30 Lesson 5 – Create tables and load data in Apache Hive 30 Lesson 6 – Query data with Apache Hive 30

Learning Paths and Certifications 31

MapR Certified Cluster Administrator (MCCA) 31

MapR Certified Hadoop Developer (MCHD) 31

MapR Certified HBase Developer (MCHBD) 31

MapR Certified Apache© Spark Developer (MCSD) 31

MapR Certified Data Analyst (MCDA) 31

More Information on Certification 31

Page 6: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

Welcome to MapR Academy MapR Academy offers a full suite of Big Data courses on Apache Hadoop ecosystem projects, such as Spark, MapReduce, Hive, and Pig, as well as courses on installing and maintaining MapR’s Converged Data Platform Elements, and developing projects and Use Case scenarios. The courses are offered formats that meet the needs of all learner types, are use-case driven, and result in measureable performance improvement.

• Classroom courses are the premier learning experience, offering depth, dialog with instructors and other learners, and the opportunity to do stretch labs in a collaborative, supportive environment

• Free On-demand courses can be taken any time, or can function as learning materials that support –not replace -- the Classroom courses, to help learner extend and consolidate their learning.

• Course content is the same in both on-demand and classroom formats.

Course Formats

On-demand Training (ODT) • Free, Available 24-7

• Positioned to be taken standalone

• Also, supports ILT and vILT training

• English only

• Self-registration

• All on-demand courses Include

o lab exercises, lab guide, slide guide, job aids as appropriate

o quizzing and self-assessment pre-test

o Instruction on how to access a sandbox (free)

o Instruction on how to access a cluster (user-paid)

Page 7: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 2

®

Public Training: Classroom (ILT) and Virtual (vILT) Training

Standard Per Seat Pricing – USD Only

Type 2-day 3-day 4-day 5-day

ILT $1600 $2400 $3200 $4000

vILT $1400 $2100 $2800 $3500

Details

• 12 people max per classroom ILT course

• 10 people max per vILT course

• All courses Include o Certified MapR Instructor who is an SME in the topic, and is expert in classroom

facilitation and course delivery techniques o Collaboration and assistance for all students on completion of exercises o Lab exercises, a lab guide, slide guide, job aids as appropriate o Course Cluster for completing labs provided o Certification exam fee included – one exam try only, done on the student’s own time.

See the Certification section of this guide for more details.

Private Training: Classroom (ILT) and Virtual (vILT) Training

Standard Per Course Pricing Note that all Private training is sold and scheduled by MapR directly to clients. Training Services Partners can be deployed as contract instructors to fulfill international Private training delivery engagements.

Type 2-day 3-day 4-day 5-day

ILT $14,000 $21,000 $28,000 --

vILT $10,000 $15,000 $20,000 --

Details

• Onsite at Client location

• 12 people max per classroom ILT course

• 10 people max per vILT course

• All courses Include o Certified MapR Instructor who is an SME in the topic, and is expert in classroom

facilitation and course delivery techniques o Collaboration and assistance for all students on completion of exercises o Lab exercises, a lab guide, slide guide, job aids as appropriate o Course Cluster for completing labs provided o Certification exam fee included – one exam try only, done on the student’s own time (not

in class)

Page 8: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 3

®

Current and Planned Courses KEY:

• ILT: Instructor-led Training, in a Public classroom or Private - Onsite at client

• vILT: Virtual instructor-led Training – live, but via Webex Training Center. Can be Public virtual or Private virtual for a single client

• On-Demand Training: Video-based learning, available 24/7/365 for all learners

Summary of ODT Titles – CURRENT as of 1/2017

MapR Academy ODT Titles ILT – vILT ODT

ESS 100 – Introduction to Big Data ESS 101 – Apache Hadoop Essentials ESS 102 – MapR Converged Data Platform Essentials

Y Y Y

Y Y Y

ADM 200 – Install a MapR Cluster ADM 201 – Configure a MapR Cluster ADM 202 – Data Access and Protection ADM 203 – Cluster Maintenance ADM 210 – Upgrade a MapR Cluster

Y Y Y Y Y

Y Y Y Y Y

DEV 301 - Developing Hadoop Applications DEV 320 - HBase Data Model and Schema Design DEV 325 - HBase Architecture DEV 330 – Developing HBase Applications: Basics DEV 335 - Developing HBase Applications: Advanced DEV 340 – Bulkloading, Security, Performance DEV 350 – MapR Streams Essentials DEV 351 – Developing MapR Streams Applications DEV 360 – Apache Spark Essentials DEV 361 – Build and Monitor Apache Spark Applications DEV 362 – Create Data Pipeline Applications Using Apache Spark

Y Y Y Y Y Y Y Y Y Y Y

Y Y Y Y Y Y Y Y Y Y Y

DA 410 - Drill Essentials DA 415 – Drill Architecture DA 440 – Apache Hive DA 450 - Apache Pig

Y Y Y Y

Y Y Y Y

Page 9: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 4

®

Certifications Certification List Price MCCA - MapR Certified Cluster Administrator $250.00

MCHBD - MapR Certified HBase Developer $250.00

MCHD - MapR Certified Hadoop Developer $250.00

MCSD – MapR Certified Apache® Spark Developer $250.00

MCDA – MapR Certified Data Analyst $250.00

Page 10: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

How to Schedule Instructor-led Training

Lead Times for Scheduling Training

All Customers

Refer to the Website for the latest course Syllabi! • MapR requires 45 days notice to schedule delivery of any course, from the date of the signing of

Closed – Won Opportunity.

• You will be asked the following questions: o Address where training will take place o Training Contact or POC name o Training Dates requested? o Classroom or Virtual training? o Training Topics? o Who is being trained? What are expectations? (developers, data scientists, etc)

Private Training Class Size, Extra Instructor Fees

Onsite Classroom and Public Instructor-led Training • Classroom Instructor-led training enrollment is limited to 12 students.

• If the class enrollment exceeds 12, an extra instructor may be present to provide a quality-learning event.

• Fees for an extra instructor vary from $4500 to $7500 per 3-day engagement – rate varies based on contract instructor rates.

• Extra T&E applies to extra instructors.

Virtual Instructor-led Training • Virtual Instructor-led training enrollment is limited to 10 students.

• If the class enrollment exceeds 10, an extra instructor may be present in the online environment to provide a quality-learning event, handle chat, troubleshoot technical questions.

• Fees for an extra instructor vary from $4500 to $7500 per 3-day engagement – rate varies based on contract instructor rates.

• Extra T&E applies to extra instructors.

Page 11: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 6

®

Cancellation Policies

Public Classes Cancellation requests from the student must be received at least 14 business days prior to the start of class to be eligible for a refund. If preferred, MapR, at our discretion, can transfer your registration to a future class of equal value.

Right to Cancel Scheduled Courses MapR will review class enrollments at least 2 weeks prior to the class start date. In the case of low enrollment, classes may be cancelled or rescheduled at any time at our discretion. In the event a class is cancelled, MapR will transfer your registration to another class of equal value or refund your fee in full. MapR is not responsible for non-refundable travel or other expenses incurred by the student.

Private / Onsite Classes Scheduling: Onsite classes must be scheduled 45 business days before class start date

Cancellation: Class cancellation requests must be received three (3) weeks before class start date. MapR prefers cancellation information to be received four (4) weeks before class start date.

Client will incur expenses if cancellation occurs within 3 weeks of class start date to cover MapR's cost of rescheduling Vendors. This can include travel expenses, hotel fees and airline changes fees for instructors. In addition, if MapR uses a subcontractor instructor who charges a penalty for late cancellation within 2 weeks of the scheduled course date, MapR will pass those fees along to the reseller. These fees are a percentage of the subcontractor rate.

Refunds • Students who fail to cancel within 14 business days and/or do not attend the class will not receive

a refund and will be charged the full amount.

• Certification Fees are non-refundable.

• On-Demand Fees for items like cluster time purchases are non-refundable.

Page 12: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

Cluster Administrator Course Descriptions

ADM 2000 - Hadoop Cluster Administration • Classroom, Virtual and On-demand formats

• For system administrators who are (or will be) in charge of installing, architecting and maintaining Hadoop

• Prior Hadoop knowledge not required!

About this course This introductory course is designed to teach Hadoop administrators how to install, configure and maintain a MapR Hadoop cluster. You will learn how to test hardware for performance consistency, install The MapR Distribution including Apache Hadoop, create baseline benchmarks, and configure and maintain a Hadoop cluster. The course includes extensive hands-on labs with real-word system administrator scenarios, using virtualized clusters.

Prerequisites for Success in this Course • Completion of ESS 100 Introduction to Big Data, ESS 101 Apache Hadoop Essentials, and ESS

102 MapR Converged Data Platform Essentials • a background in Linux system administration (are able to navigate the Linux file system, use an

editor at the command-line interface, add users/groups, and execute common commands) • a Linux system, PC or Mac with access to ssh and scp (using PuTTY, Cygwin, or similar tools)

Certification This course helps prepare you for the MapR Certified Cluster Administrator (MCCA) certification exam.

Syllabus Included in this 3-day course are:

• Access to a multi-node cluster (one per student) • Slide guide • Lab guide • A MapR Enterprise trial license

Page 13: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 8

®

DAY 1

Get Started

• Understand the lab environment

• Connect to your cluster

Lesson 1: Prepare for Installation

• Identify node types

• Prepare and verify cluster hardware

• Audit the cluster nodes

• Run pre-install tests

• Plan a service layout

Lesson 2: Install the MapR Distribution

• Install the MapR Distribution

• Add a MapR license

• Explore the MCS

Lesson 3: Verify and Test the Cluster

• Verify cluster status

• Run post-install benchmark tests

• Explore the cluster structure

Lesson 4: Users, Groups, and System Settings

• Manage users and groups

• Configure system settings

DAY 2

Lesson 5: Configure Topology

• Define topology

• Configure node topology

Lesson 6: Configure MapR Volumes

• Using volumes

• Volume properties

• Configure volumes

• Create volumes and set quotas

Lesson 7: Job Logs and Scheduling

• Logging options

• Configure YARN log aggregation

Page 14: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 9

®

• Configure the Fair Scheduler

Lesson 8: Access the Cluster

• Access data

• Modify cluster files

• Configure client access

• Configure Virtual IP Addresses

• Use Access Control Expressions

Lesson 9: Snapshots

• Understand snapshots

• Configure and use snapshots

DAY 3

Lesson 10: Mirrors

• Understand how mirrors work

• Configure and use local mirrors

• Cascading and remote mirrors

Lesson 11: Monitor and Manage the Cluster

• Monitor the cluster

• Configure and respond to alarms

• Balance cluster resources

• Manage logs and snapshots

• Add and remove services

Lesson 12: Disk and Node Maintenance

• Replace a failed disk

• Add and remove disks

• Perform node maintenance

• Add nodes

Lesson 13: Troubleshooting

• Troubleshoot different problem types

• Use support utilities

Page 15: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

ADM 2100 – Upgrade a MapR Cluster • 1 days Duration

• Classroom, Virtual and On-demand formats

• For system administrators who are (or will be) in charge of installing, architecting and maintaining Hadoop

• Prior Hadoop knowledge not required!

About this course Get hands-on experience upgrading a MapR cluster to the latest version. This course takes students through the process of upgrading a MapR cluster, beginning with what to include in a cluster upgrade plan, and how to perform pre-upgrade testing. Each student is given a 3-node cluster to use to upgrade MapR core software, and the Hive and Pig ecosystem components.

Prerequisites for Success in the Course Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

Required: • Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd,

ls, ssh, and scp

• The ability to use a Linux text editor such as vi

• Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP Windows)

• Basic familiarity with administering the MapR Converged Data Platform, including the ability to

use the command line and the MapR Control System (MCS) user interface

Recommended: • Completion of the ADM 2000 curriculum which is comprised of the on-demand courses ADM 200

through ADM 203, or equivalent experience using a MapR cluster

Syllabus Included in this one-day course are:

• Access to a multi-node cluster (one per student) • Slide guide • Lab guide • A MapR Enterprise trial license

Page 16: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 11

®

DAY 1

Get Started

• Configure the training environment

• Understand the lab environment

• Connect to your cluster

Lesson 1: Plan the Upgrade

• Identify upgrade methods

• Develop and plan and prepare to upgrade

• Run a pre-upgrade test plan

• Summarize the upgrade process

Lesson 2: Upgrade MapR Core

• Prepare to upgrade

• Upgrade MapR core software

Lesson 3: Upgrade Ecosystem Components and MapR Clients

• Upgrade ecosystem components

• Upgrade Hive and Pig

• Upgrade MapR Clients

• Additional Upgrade Considerations

• Apply a Patch

Page 17: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

Developer Course Descriptions

DEV 3000 - Developing Hadoop Applications • Available in On-demand, Classroom and Virtual Instructor-led formats.

• For developers and programmers interested in developing applications on Hadoop (MapReduce)

• This is a programming course; you must have Java programming experience for best results.

About this course This course teaches developers how to write Hadoop Applications using MapReduce and YARN in Java. The course covers debugging, managing jobs, improving performance, working with custom data, managing workflows, and using other programming languages for MapReduce.

Prerequisites for Success in this Course ● Beginner-to-intermediate fluency with Java or object-oriented programming in an IDE

● basic Hadoop knowledge -- helpful but not required

● a Linux, PC or Mac with a MapR Sandbox downloaded (On-demand course)

● connected to a Hadoop cluster via SSH and web browser (for ILT or vILT course)

Certification This course helps you to prepare you for the MapR Certified Hadoop Developer (MCHD) certification exam.

Syllabus DAY 1

Lesson 1: Introduction to Developing Hadoop Applications

• Illustrate the MapReduce model conceptually

• Brief history of MapReduce

• Discuss how MapReduce works at a high level

• Define how data flows in MapReduce

• Hands-on Exercises

Page 18: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 13

®

Lesson 2: Job Execution Framework - MapReduce v1 & v2

• Describe the MapReduce v1 job execution framework

• Compare MapReduce v1 to MapReduce v2 (YARN)

• Describe how jobs execute in YARN

• Describe how to manage jobs in YARN

• Hands-on Exercises

Lesson 3: Write a MapReduce program

• Summary of the programming problem

• Design and implement the Mapper class, Reducer class and driver

• Build and execute the code then examine the output

• Hands-on Exercises

DAY 2

Lesson 4: Use the MapReduce API

• API overview

• Mapper input processing and Reducer output processing data flow

• Explore the Mapper, Reducer and Job class API

• Hands-on Exercises

Lesson 5: Managing, monitoring, and testing MapReduce jobs

• Work with counters

• Use the MCS to monitor jobs

• Use the Hadoop CLI to manage jobs

• Display job history and logs

• Write unit tests for MapReduce programs

• Hands-on Exercises

Lesson 6: Managing Performance

• Review components of MapReduce performance

• Enhance performance in your MapReduce jobs

• Overview of MapR performance enhancements

• Hands-on Exercises

Page 19: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 14

®

DAY 3

Lesson 7: Working with Data

• Work with sequence files

• Working with the distributed cache

• Working with HBase

• Hands-on Exercises

Lesson 8: Launching jobs

• Implement programmatic job control in the driver

• Use MapReduce chaining

• Use Oozie to manage MapReduce workflows

• Hands-on Exercises

Lesson 9: Using non-Java programs (Streaming MapReduce)

• Overview of the MapReduce streaming paradigm

• Configure MapReduce streaming parameters

• Define the programming contract for mappers and reducers

• Monitor and debug MapReduce streaming jobs

• Hands-on Exercises

Page 20: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 15

®

DEV 3100 - HBase for Analysts and Architects • Classroom or virtual delivery

• Not sold separately. Must be bundled with another course.

• Included in DEV 3200 – HBase Design and Build

• 1 Day duration

About This Course

Thisnon-programmingcoursecomprisestheon-demandcoursesDEV320and325onlyandistaughtinclassroomorvirtualdeliverystylesoveroneday.Targetedtowardsdataanalysts,dataarchitectsandapplicationdevelopers,thegoalofthiscourseistoenableyoutodesignHBaseschemasbasedondesignguidelines.Youwilllearnaboutthevariouselementsofschemadesignandhowtodesignfordataaccesspatterns.Thecourseoffersanin-depthlookatdesigningrowkeys,avoidinghot-spottinganddesigningcolumnfamilies.ItdiscusseshowtotransitionfromarelationalmodeltoanHBasemodel.Youwilllearnthedifferencesbetweentalltablesandwidetables.Conceptsareconveyedthroughlectures,handsonlabsandanalysisofscenarios.

Prerequisites for Success in the Course Required:

• Basic Hadoop knowledge – helpful but not required

• Prior experience with SQL preferred, but not required

• A Linux, Windows or MacOS computer with the MapR sandbox installed (On-demand course). The lab exercises for this class will include installing Hive components on a MapR cluster and executing various commands and sample programs.

• Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)

Recommended: • Completion of ESS 100 Introduction to Big Data, and ESS 101 Apache Hadoop Essentials

Certification • This course is part of the preparation for the MapR Certified HBase Developer (MCHBD)

certification exam.

Page 21: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 16

®

Syllabus

Lesson 1 - Introduction to Apache HBase ● Differentiate between RDBMS and HBase

● Identify typical HBase Use Cases

Lesson 2 – Apache HBase Data Model ● Describe the HBase data model and data model components

● Describe how logical data model maps physical storage on disk

● Use data model operations

● Create an HBase table

Lesson 3 – Apache HBase Architecture ● Identify the components of an HBase cluster

● Describe how the HBase components work together

● Describe how regions work and their benefits

● Define the function of minor and major compactions

● Describe Region Server splits

● Describe how HBase handles fault tolerance

● Differentiate MapRDB from HBase

Lesson 4 - Basic Schema Design

• List the elements of schema design

• Design row keys for data access patterns

• Design table shape & column families for data access patterns

• Define column family properties

• Design schema for given scenario

Lesson 5 - Design Schemas for Complex Data Structures

• Transition from relational model to HBase

• Use intelligent keys

• Use secondary indexes or Lookup tables

• Design for other complex data structures

• Evolve schemas over time

Lesson 6 Using Hive to Query HBase

• Use Hive to query HBase/MapR tables

Page 22: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 17

®

DEV 3200 - HBase Application Design and Build • Classroom or virtual delivery

• 3 days Duration

About This Course Learn how to architect and write HBase programs using Hadoop as a distributed NoSQL datastore. This course introduces HBase architecture, the HBase data model, and the most important APIs for writing programs. The course also introduces schema design, performance tuning, bulk-loading of data, and storing complex data structures

• This course comprises all HBase courses and is intended for HBase Application Developers. Corresponding on-demand courses are:

o DEV 320 - HBase Data Model and Architecture

o DEV 325 - HBase Schema Design

o DEV 330 - Developing HBase Applications: Basics

o DEV 335 - Developing HBase Applications: Advanced

o DEV 340 – Bulkloading, Security, Performance

Prerequisites for Success in the Course Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below. Required:

• Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd, ls, ssh, and scp

• Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP Windows)

• Beginner-to-intermediate fluency with Java or object-oriented programming in an IDE such as Eclipse

Recommended: • Completion of ESS 100 Introduction to Big Data, and ESS 101 Apache Hadoop Essentials

• Optional: Basic Hadoop and database knowledge

Certification • This course is part of the preparation for the MapR Certified HBase Developer (MCHBD)

certification exam.

Page 23: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 18

®

Syllabus DAY 1

Lesson 1 - Introduction to Apache HBase ● Differentiate between RDBMS and HBase

● Identify typical HBase Use Cases

Lesson 2 – Apache HBase Data Model ● Describe the HBase data model and data model components

● Describe how logical data model maps physical storage on disk

● Use data model operations

● Create an HBase table

Lesson 3 – Apache HBase Architecture ● Identify the components of an HBase cluster

● Describe how the HBase components work together

● Describe how regions work and their benefits

● Define the function of minor and major compactions

● Describe Region Server splits

● Describe how HBase handles fault tolerance

● Differentiate MapRDB from HBase

Lesson 4 - Basic Schema Design

• List the elements of schema design

• Design row keys for data access patterns

• Design table shape & column families for data access patterns

• Define column family properties

• Design schema for given scenario

Lesson 5 - Design Schemas for Complex Data Structures

• Transition from relational model to HBase

• Use intelligent keys

• Use secondary indexes or Lookup tables

• Design for other complex data structures

• Evolve schemas over time

Lesson 6 Using Hive to Query HBase

• Use Hive to query HBase/MapR tables

Page 24: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 19

®

DAY 2

Lesson 1 - Java client API Part 1

• Define the CRUD operations from the Hbase Java API and discuss when and how to use them:

• Get, Put, Delete, Scan

• Describe the data flow between Client & Server when using these APIs

• Define the various helper classes for these APIs: KeyValue, Result, ResultScanner (Scan)

• Lab on Java Client API Get, Put, Delete, Scan: Use these APIs to create an application

Lesson 2 - Java API Part 2 • Client-side write buffer

• HTable Batch operations

• checkAndPut: atomic put operation

• KeyValue, Result Objects

• Atomic put with checkAndPut

• Lab on Java Client API HTable Batch, checkAndPut

• Use HTable Batch APIs in an application

• Use HTable checkAndPut APIs for row transactions in an application

Lesson 3 - Java Client API for Administrative Features

• HTable descriptor

• HColumnDescriptor

• HBaseAdmin

• Lab create tables and define properties using the HBaseAdmin Java Interface

DAY 3

Lesson 4 - Advanced HBase Java API • Filters • Counters • Lab

• usingFiltersinanApplication • usingCounterIncrementforrowtransactionsinanapplication

Lesson 5 – HBase Design Schema Review • Explanation of Time Series application implementation

• Lab Programming a Time Series application

• Explanation of Social Application implementation

• Lab Programming a Social Application

Page 25: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 20

®

Lesson 6 – Working with MapReduce on HBase • How is MapReduce used on HBase?

• How to program MapReduce applications for HBase

• Lab Reading from HBase and Writing back Daily Statistics

Lesson 7 - Bulk Loading of Data • Using the importtsv bulk load tool

• Use MapReduce job to import data

• Lab using importtsv and MapReduce to load from a file into HBase

Lesson 8 - Performance • Performance Considerations

• Monitoring

• Benchmarking

• Lab YCSB Benchmarking

Lesson 9 - Security • Authentication, Authorization, Auditing, Encryption

• Access Control Expressions, roles, permissions

• Lab: Tables Authorization

Page 26: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 21

®

DEV 3500 - Real-time Stream Processing with MapR • Classroom or virtual delivery

• 1 days duration

• For developers interested in designing and developing MapR Streams applications.

About This Course This course is targeted towards developers and administrators to give them the core concepts necessary to build simple MapR Streams applications. It introduces the benefits of MapR Streams for developing big data processing applications. Then, the course provides a basic framework for building producer and consumer applications, and discusses options for configuring these applications. Core concepts are taught using real use case scenarios that form the basis of hands-on labs.

Prerequisites for Success in the Course Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below. Required:

• Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd, ls, ssh, and scp

• Knowledge of application development principles

• Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP Windows)

• Connection to a MapR cluster via SSH and web browser (for the ILT and vILT course)

Recommended: • Knowledge of functional programming

• Knowledge of Java • Knowledge of the MapR Converged Data Platform

Syllabus DAY 1

Lesson 1: Introduction to MapR Streams

• Summarize the motivation behind MapR Streams

• Apply MapR Streams to common use cases

Lesson 2: MapR Streams Architecture

• Define core components of MapR Streams

• Summarize the life of a message in MapR Streams

Page 27: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 22

®

Lesson 3: Introduction to Producers and Consumers

• Create a stream

• Develop a Java producer

• Develop a Java consumer

Lesson 4: Producer and Consumer Details

• Describe producer properties and options

• Describe consumer properties and options

• Explain messaging semantics

Page 28: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 23

®

DEV 3600 - Developing Spark Applications • Classroom or virtual delivery

• 3 days Duration

About This Course This course enables developers to get started developing big data applications with Apache Spark. In the first part of the course, you will use Spark’s interactive shell to load and inspect data. The course then describes the various modes for launching a Spark application. You will then go on to build and launch a standalone Spark application. The concepts are taught using scenarios that also form the basis of hands-on labs.

Prerequisites for Success in the Course Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

Required

• Basic to intermediate Linux knowledge, including the ability to use a text editor, such as vi and familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd

• Knowledge of application development principles

• A Linux, Windows or MacOS computer with the MapR Sandbox installed (On-demand course)

• Connection to a Hadoop cluster via SSH and web browser (for the ILT and vILT course)

Recommended

• Knowledge of functional programming

• Knowledge of Scala or Python

• Beginner fluency with SQL

Certification • This course is part of the preparation for the MapR Certified Spark Developer (MCSD)

certification exam.

Page 29: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 24

®

Syllabus DAY 1

Lesson 1 – Introduction to Apache Spark

• Describe the features of Apache Spark

o Advantages of Spark

o How Spark fits in with the Big Data application stack

o How Spark fits in with Hadoop

• Define Apache Spark components

Lesson 2 – Load and Inspect data in Spark

• Describe different ways of getting data into Spark

• Create and use Resilient Distributed Datasets (RDD)

• Apply transformation to RDDs

• Use actions on RDDs

o Lab - Load and Inspect Data in RDD

• Cache intermediate RDDs

• Use Spark DataFrames for simple queries

o Lab- Load and Inspect Data in DataFrames

Lesson 3 – Build a simple Spark Application

• Define the lifecycle of a Spark program

• Define the function of SparkContext

o Lab - Create the application

• Define different ways to run a Spark application

• Run your Spark application

o Lab- Launch the application

DAY 2

Lesson 4 - Work with Pair RDD

• Describe pair RDD

• Why use pair RDD

• Create pair RDD

• Apply transformations and actions to pair RDD

• Control partitioning across nodes

• Changing partitions

• Determine the partitioner

Page 30: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 25

®

Lesson 5 - Work with Spark DataFrames

• Create Apache Spark DataFrames

• Work with data in DataFrames

• Create user defined functions

• Repartition DataFrame

Lesson 6 - Monitor a Spark Application

• Describe the components of the Spark execution model

• Use the SparkUI to monitor a Spark application

• Debug & tune Spark applications

DAY 3

Lesson 7 – Introduction to Apache Spark Data Pipelines

• Identify Components of Apache Spark Unified Stack

• Benefits of the Apache Spark Unified Stack over Hadoop eco-system

• Describe Data Pipeline Use Cases

Lesson 8 – Create an Apache Spark Streaming Application • Spark Streaming Architecture

• Create DStreams

• Create a simple Spark Streaming Application o Lab: Create a Spark Streaming Application

• DStream Operations

o Lab: Apply operations on DStreams § Apply DStream Operations § Use Spark SQL to query DStreams

• Define Window Operations

o Lab: Add windowing operations

• Describe how DStreams are fault-tolerant

Page 31: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 26

®

Lesson 9 – Use Apache Spark GraphX to Analyze Flight Data § Describe GraphX

§ Define a property graph

o Lab: Create a Property Graph

§ Perform operations on Graphs o Lab: Apply Graph Operations

Lesson 10 – Use Apache Spark MLLib to Predict Flight Delays § Describe Spark MLLib

§ Describe a generic classification workflow

§ Describe common terms for supervised learning

§ Use a decision tree for Classfication and Regression

§ Lab:Create a DecisionTree model to predict flight delays on streaming data

Certification • This course is part of the preparation for the MapR Certified Spark Developer (MCSD)

certification exam.

Page 32: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

Data Analyst Courses

DA 4000 – Self-service SQL with Apache Drill • Classroom or virtual delivery

• 2 days Duration

• For data analysts and developers who want to learn want to learn SQL-on-Hadoop technologies.

This course comprises is intended for Data Analysts, Architects, and Developers. Covered are how to use Drill to explore known or unknown data without writing code. You will write SQL queries on a variety of data types including structured data in a Hive table, semi-structured data in HBase or MapR-DB, and complex data file types, such as Parquet and JSON. The course will show you how a query is received and executed by Drill. You will learn the different services involved at each step, and how drill optimizes a query for distributed SQL execution.

Prerequisites for Success in the Course Required:

• Basic Linux knowledge, including familiarity with basic command-line options such a mv, cp, cd, ls, ssh, and scp

• Access to, and the ability to use, a laptop with a terminal program installed (such as terminal on the Mac, or PuTTY and WinSCP on Windows)

• Beginner to intermediate fluency with SQL

Recommended:

• Completion of ESS 100 Introduction to Big Data, and ESS 101 Apache Hadoop Essentials

• Optional: Basic Hadoop and database knowledge

Syllabus

Lesson 1 – SQL Queries

• Perform familiar SQL queries with Drill on structured content

• Perform familiar SQL queries on semi structured content

• Join structured and semi structured content into a single query • Explore unknown data with drill explorer

Page 33: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 28

®

Lesson 2 – Self Describing Data

• Define self describing data

• Determine how Drill discovers schema of data

• Use drill explorer to explore unknown data and determine its structure to perform queries

• Create a view and visualize the view with BI tools

Lab Exercises

• Familiar SQL queries on structured Hive data

• Familiar SQL queries on complex data

o Query Parquet data

o Query JSON data

o A single query that joins Hive, HBase and JSON

• Explore Multiple Data Sources with the Drill Explorer

o Drill Explorer Interface

o Data sources

o Discover data schema

o Preview data

o Save a view

Lesson 3 – Architecture

• List the components of the Drillbit

• List the steps involved with executing a query

• Describe how Drill executes a SQL query

• Define the stages involved with query planning

• Describe how Drill optimizes a query

Certification

• This course is part of the preparation for the MapR Certified Data Analyst (MCDA) certification exam.

Page 34: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 29

®

DA 4500 – Data Analysis with Apache Pig and Hive • Classroom or Virtual

• For data analysts and developers interested in the data pipeline

• For data scientists and business analysts who are familiar with SQL and want to use data on an HDFS

• This is a programming course; you must have some programming experience to do the exercises

About this Course This course covers how to use Pig and Hive as part of a single data flow in a Hadoop cluster. The course begins with manipulating semi-structured raw data files in Pig, using the grunt shell and the Pig Latin programming language. Once the raw data has been manipulated into structured tables, they will be exported from Pig and imported into Hive. The structured data can be queried in Hive, and some basic data analysis can be performed.

Prerequisites for Success in the Course Review the following prerequisites carefully and decide if you are ready to succeed in this programming-oriented course. The Instructor will move forward with lab exercises, assuming that you have mastered the skills listed below.

Required:

• Familiarity with a command-line interface, such as a Unix shell

• Familiarity with RDBMS database tools, such as SQL

• Access to, and the ability to use, a laptop with an internet connection and a terminal program installed (such as terminal on the Mac, or PuTTY on Windows).

Recommended:

• Familiarity with Hadoop

• Completion of ESS 100 Introduction to Big Data

Page 35: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

© Copyright 2017 MapR Technologies Inc. 30

®

Syllabus DAY 1

Lesson 1 – Apache Pig fits in the Hadoop ecosystem

• Data pipeline

• Pig Philosophy

Lesson 2 – Extract, Transform, and Load Data with Apache Pig

• Load data into relations

• Debug Pig scripts

• Perform simple manipulations

• Save relations as files

Lesson 3 – Manipulate Data with Apache Pig

• Subset relations

• Combine relations

• Use UDFs on relations

DAY 2

Lesson 4 –How Apache Hive fits in the Hadoop ecosystem

• Understand the data pipeline

• Describe other SQL-on-Hadoop tools

Lesson 5 – Create tables and load data in Apache Hive

• Create databases

• Create simple, external, and partitioned tables

• Alter and drop tables

Lesson 6 – Query data with Apache Hive

• Query tables

• Manipulate tables with UDFs

• Combine and store tables

• Use cases of Hive

Certification

• This course is part of the preparation for the MapR Certified Data Analyst (MCDA) certification exam.

Page 36: Instructor-Led Training Course Catalog · Developer Course Descriptions 12 DEV 3000 - Developing Hadoop Applications 12 About this course 12 Prerequisites for Success in this Course

Learning Paths and Certifications Current and soon to be available MapR user certifications with affiliated courses are listed below.

MapR Certified Cluster Administrator (MCCA) Preparatory Course:

§ ADM 2000 - Hadoop Operations: Cluster Administration

MapR Certified Hadoop Developer (MCHD) Preparatory Course:

§ DEV 3000 - Developing Hadoop Applications

MapR Certified HBase Developer (MCHBD) Preparatory Courses:

§ DEV 3100 - HBase for Analysts and Architects

§ DEV 3200 - HBase Applications Design / Build

MapR Certified Apache© Spark Developer (MCSD) Preparatory Course:

§ DEV 3600 - Developing Apache Spark Applications

MapR Certified Data Analyst (MCDA) Preparatory Courses:

§ DA 4000 - Self-service SQL Analytics with Apache Drill

§ DA 4500 - Data Analysis with Apache Pig and Apache Hive

More Information on Certification The most updated Certification Exam details can be found at https://www.mapr.com/services/mapr-academy/certification.