66
Introduction to Cloudera Data Warehouse Self-Service Analytics in the Cloud with CDP 200220

Course Slides: Introduction to Cloudera Data …...Cloudera Data Warehouse (CDW) Overview Chapter 1G Course Chapters Cloudera Data Warehouse (CDW) Overview The CDW Web Interface Creating

  • Upload
    others

  • View
    29

  • Download
    1

Embed Size (px)

Citation preview

Introduction toCloudera DataWarehouseSelf-Service Analytics in the Cloudwith CDP

200220

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be re-produced or shared without prior written consent from Cloudera.

Cloudera Data Warehouse (CDW)OverviewChapter 1G

Course Chapters

▪ Cloudera Data Warehouse (CDW) Overview

▪ The CDW Web Interface

▪ Creating Database Catalogs and Virtual Warehouses (Data Engineering Track)

▪ Querying Data from CDW Web Interface (Data Analyst Track)

▪ Managing Virtual Warehouses (Data Engineering Track)

▪ Querying Data Using CLI and Third-Party Integration (Data Analyst Track)

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-2

Chapter Topics

Cloudera Data Warehouse (CDW) Overview

▪ Introduction to This Course

▪ What Is CDW?

▪ Benefits of CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-3

Course Learning Objectives

▪ Identify the primary purpose and benefits of CDW

▪ Access CDW and navigate different pages within it 

Data Engineer

▪ Create a Database Catalog and a Virtual Warehouse in CDW

▪ Describe how CDW scales automatically

▪ Change settings for Hive and Impala Virtual Warehouses 

Data Analyst

▪ Run a query using Hue or DAS with a CDW Virtual Warehouse

▪ Connect to a Virtual Warehouse from the command line

▪ Connect a third-party tool to a CDW Virtual Warehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-4

Chapter Topics

Cloudera Data Warehouse (CDW) Overview

▪ Introduction to This Course

▪ What Is CDW?

▪ Benefits of CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-5

Learning Objective

▪ Identify the primary purpose and components of CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-6

What Is CDW

▪ CDW enables creation of data warehouses and data martsfor analysts

▪ CDW has two components─ Database Catalogs─ Virtual Warehouses

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-7

Chapter Topics

Cloudera Data Warehouse (CDW) Overview

▪ Introduction to This Course

▪ What Is CDW?

▪ Benefits of CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-8

Learning Objective

▪ Identify five benefits that CDW provides

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-9

Benefits

Data warehouses and data marts are

▪ Automatically configured and isolated

▪ Optimized for existing workloads when moved to the cloud

▪ Auto-scaled to meet varying demands

▪ Auto-suspended to save costs

▪ Compliant with security controls

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 01G-10

The CDW Web InterfaceChapter 2G

Course Chapters

▪ Cloudera Data Warehouse (CDW) Overview

▪ The CDW Web Interface

▪ Creating Database Catalogs and Virtual Warehouses (Data Engineering Track)

▪ Querying Data from CDW Web Interface (Data Analyst Track)

▪ Managing Virtual Warehouses (Data Engineering Track)

▪ Querying Data Using CLI and Third-Party Integration (Data Analyst Track)

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-2

Chapter Topics

The CDW Web Interface

▪ How to Access CDW

▪ CDW Orientation

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-3

Learning Objective

▪ Access CDW using Single Sign-On

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-4

Exercise

▪ Access your CDP home page

▪ Click on the Data Warehouse icon to access CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-5

Chapter Topics

The CDW Web Interface

▪ How to Access CDW

▪ CDW Orientation

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-6

Learning Objective

▪ Navigate the different pages within CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-7

Exercise

1. Explore the sidebar: Click the grid to show the list of CDP applications, collapseand expand the sidebar, and view the Help

2. Find the Overview page and expand the Environments section

3. Click on an entity and look for the entities that are associated with it (note theDatabase Catalog and Virtual Warehouses)

4. Filter the Database Catalog or Virtual Warehouses section to show only a few,including the Database Catalog or Virtual Warehouse you just noted

5. Go to the Database Catalogs page and find the same Database Catalog thatyou just noted, using the filter or page navigation if necessary

6. Go to the Virtual Warehouses page and find the same Virtual Warehouses youjust noted, using the filter or page navigation if necessary

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 02G-8

Creating Database Catalogsand Virtual Warehouses (DataEngineering Track)Chapter 3E

Course Chapters

▪ Cloudera Data Warehouse (CDW) Overview

▪ The CDW Web Interface

▪ Creating Database Catalogs and Virtual Warehouses (Data Engineering Track)

▪ Querying Data from CDW Web Interface (Data Analyst Track)

▪ Managing Virtual Warehouses (Data Engineering Track)

▪ Querying Data Using CLI and Third-Party Integration (Data Analyst Track)

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-2

Chapter Topics

Creating Database Catalogs and Virtual Warehouses (DataEngineering Track)

▪ Creating a Database Catalog

▪ Creating a Virtual Warehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-3

Learning Objective

▪ Create a Database Catalog in CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-4

Exercise

▪ Create a Database Catalog for testing

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-5

Chapter Topics

Creating Database Catalogs and Virtual Warehouses (DataEngineering Track)

▪ Creating a Database Catalog

▪ Creating a Virtual Warehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-6

Learning Objective

▪ Create a Virtual Warehouse in CDW

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-7

Decision Factors (Type)

▪ Use case─ What features are needed

▪ File format

▪ Personal preferences

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-8

Decision Factors (Size)

▪ Concurrent queries or users

▪ Query complexity

▪ Data set size

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-9

Exercise

▪ Create a Virtual Warehouse associated with your test Database Catalog─ Use Hive─ Use the smallest size

▪ Suspend it and the test Database Catalog after it’s been created

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03E-10

Querying Data from CDW WebInterface (Data Analyst Track)Chapter 3A

Course Chapters

▪ Cloudera Data Warehouse (CDW) Overview

▪ The CDW Web Interface

▪ Creating Database Catalogs and Virtual Warehouses (Data Engineering Track)

▪ Querying Data from CDW Web Interface (Data Analyst Track)

▪ Managing Virtual Warehouses (Data Engineering Track)

▪ Querying Data Using CLI and Third-Party Integration (Data Analyst Track)

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-2

Chapter Topics

Querying Data from CDW Web Interface (Data Analyst Track)

▪ Accessing Query Editors

▪ Setting Workload Password

▪ Querying with Hue

▪ Querying with DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-3

Learning Objective

▪ Access the query editors for Hue and DAS from a CDW Virtual Warehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-4

Exercise

▪ Open a query editor─ If necessary, continue to next video to set up your workload password

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-5

Chapter Topics

Querying Data from CDW Web Interface (Data Analyst Track)

▪ Accessing Query Editors

▪ Setting Workload Password

▪ Querying with Hue

▪ Querying with DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-6

Learning Objective

▪ Set a workload password to allow access to Hue and DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-7

Exercise

▪ Create your workload password

▪ Test it by accessing Hue or DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-8

Chapter Topics

Querying Data from CDW Web Interface (Data Analyst Track)

▪ Accessing Query Editors

▪ Setting Workload Password

▪ Querying with Hue

▪ Querying with DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-9

Learning Objective

▪ Run a query on a table using Hue

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-10

Exercise

▪ In Hue, complete a simple SELECT * query on a table you can access

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-11

Chapter Topics

Querying Data from CDW Web Interface (Data Analyst Track)

▪ Accessing Query Editors

▪ Setting Workload Password

▪ Querying with Hue

▪ Querying with DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-12

Learning Objective

▪ Run a query on a table using DAS

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-13

Exercise

▪ In DAS, complete a simple SELECT * query on a table you can access

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 03A-14

Managing Virtual Warehouses(Data Engineering Track)Chapter 4E

Course Chapters

▪ Cloudera Data Warehouse (CDW) Overview

▪ The CDW Web Interface

▪ Creating Database Catalogs and Virtual Warehouses (Data Engineering Track)

▪ Querying Data from CDW Web Interface (Data Analyst Track)

▪ Managing Virtual Warehouses (Data Engineering Track)

▪ Querying Data Using CLI and Third-Party Integration (Data Analyst Track)

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-2

Chapter Topics

Managing Virtual Warehouses (Data Engineering Track)

▪ Auto-Scaling

▪ Additional Tuning for Hive

▪ Additional Tuning for Impala

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-3

Learning Objective

▪ Describe how CDW auto-scales Virtual Warehouses

▪ Set the auto-scale range for a Virtual Warehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-4

Exercise

▪ Change the auto-scale settings─ For a new Virtual Warehouse, on creation─ For an existing Virtual Warehouse

Pay attention to what values are available for different initial sizes

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-5

Chapter Topics

Managing Virtual Warehouses (Data Engineering Track)

▪ Auto-Scaling

▪ Additional Tuning for Hive

▪ Additional Tuning for Impala

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-6

Learning Objective

▪ Set auto-scale settings unique to Hive Virtual Warehouses

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-7

Hive Settings

Headroom = Number of nodes to keep free

Wait Time = Amount of time query is in queue

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-8

Exercise

▪ Change the Headroom/Wait Time settings for a new or existing Hive VirtualWarehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-9

Chapter Topics

Managing Virtual Warehouses (Data Engineering Track)

▪ Auto-Scaling

▪ Additional Tuning for Hive

▪ Additional Tuning for Impala

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-10

Learning Objective

▪ Set auto-scale settings unique to Impala Virtual Warehouses

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-11

Autoscale Mode

▪ Conservative─ Auto-scale up 60 seconds after maximum utilization─ Immediately auto-scale down when possible

▪ Aggressive─ Immediately auto-scale up at maximum utilization─ Auto-scale down 60 seconds after demand drops

▪ Balanced─ Auto-scale up 30 seconds after maximum utilization─ Auto-scale down 30 seconds after demand drops

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-12

Exercise

▪ Change the Autoscale Mode setting for a new or existing Impala VirtualWarehouse

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04E-13

Querying Data Using CLI andThird-Party Integration (DataAnalyst Track)Chapter 4A

Course Chapters

▪ Cloudera Data Warehouse (CDW) Overview

▪ The CDW Web Interface

▪ Creating Database Catalogs and Virtual Warehouses (Data Engineering Track)

▪ Querying Data from CDW Web Interface (Data Analyst Track)

▪ Managing Virtual Warehouses (Data Engineering Track)

▪ Querying Data Using CLI and Third-Party Integration (Data Analyst Track)

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-2

Chapter Topics

Querying Data Using CLI and Third-Party Integration (DataAnalyst Track)

▪ Using Impala Shell

▪ Using Third-Party Tools

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-3

Learning Objective

▪ Connect to Impala Shell from the command line

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-4

Installing Impala Shell

▪ If connecting from a CDP node, skip installation

▪ Install requires:─ Non-Windows computer─ Python 2.7─ A pip installer

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-5

Exercise

▪ Install Impala Shell─ Unless using a cluster node

▪ Use Impala Shell to query a table

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-6

Chapter Topics

Querying Data Using CLI and Third-Party Integration (DataAnalyst Track)

▪ Using Impala Shell

▪ Using Third-Party Tools

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-7

Learning Objective

▪ Connect a third-party tool to Hive

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-8

CDW and Third-Party Tools

▪ Integration with third-party tools using─ ODBC─ JDBC

▪ Example: Tableau with ODBC

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-9

Exercise

▪ Connect a business intelligence (BI) application to Hive─ Download a driver─ Download a BI if necessary

Copyright © 2010–2020 Cloudera. All rights reserved. Not to be reproduced or shared without prior written consent from Cloudera. 04A-10