52
1 YOUR DATA, NO LIMITS Kent Graziano Senior Technical Evangelist Snowflake Computing Changing the Game with Cloud Data Warehousing @KentGraziano

Changing the game with cloud dw

Embed Size (px)

Citation preview

Page 1: Changing the game with cloud dw

1

Y O UR DA TA , NO L I M I T S

Kent Graziano Senior Technical EvangelistSnowflake Computing

Changing the Game with Cloud Data Warehousing

@KentGraziano

Page 2: Changing the game with cloud dw

2

My Bio•Senior Technical Evangelist, Snowflake Computing•Oracle ACE Director (DW/BI)•OakTable•Blogger – The Data Warrior•Certified Data Vault Master and DV 2.0 Practitioner•Former Member: Boulder BI Brain Trust (#BBBT)•Member: DAMA Houston & DAMA International•Data Architecture and Data Warehouse Specialist

•30+ years in IT•25+ years of Oracle-­related work•20+ years of data warehousing experience

•Author & Co-­Author of a bunch of books (Amazon)•Past-­President of ODTUG and Rocky Mountain Oracle User Group

Page 3: Changing the game with cloud dw

3

Agenda

•Data Challenges•What is a Cloud Data Warehouse?•What can a Cloud DW do for me?•Cool Features of Snowflake•Other Cloud DW – Redshift, Azure, BigQuery•Real Metrics

Page 4: Changing the game with cloud dw

Data challenges today

Page 5: Changing the game with cloud dw

5

Scenarios with affinity for cloud

Gartner 2016 Predictions:

By 2018, six billion connected things will be requesting support.

Connecting applications, devices, and “things”

Reaching employees, business partners, and consumers

Anytime, anywhere mobility

On demand, unlimited scale

Understanding behavior;; generating, retaining, and analyzing data

Page 6: Changing the game with cloud dw

6

40 Zettabytes by 2020

Web ERP3rd party apps Enterprise apps IoTMobile

Page 7: Changing the game with cloud dw

7

It’s not the data itself

it’s how you take full advantage of the insight it provides

Web ERP3rd party apps Enterprise apps IoTMobile

Page 8: Changing the game with cloud dw

8

All possible data All possible actions

Most firms don’t consistently turn data into action

73% 29%of firms

aspire to be data-­driven.

of firms are good at turning data into action.

Source: Forrester

Page 9: Changing the game with cloud dw

9

New possibilities with the cloud

•More & more data “born in the cloud”•Natural integration point for data•Capacity on demand•Low-­cost, scalable storage•Compute nodes

Page 10: Changing the game with cloud dw

10

Cloud characteristics & attributes

DYNAMIC EASY FLEXIBLE SECURE

ScalableElasticAdaptive

Lower costFaster implementation

Supports many scenarios

Trust by design

Page 11: Changing the game with cloud dw

11

The evolution of data platformsData warehouse & platform softwareVertica, Greenplum,

Paraccel, Hadoop,Redshift

Data warehouse applianceTeradata

1990s 2000s 2010s

Cloud-­native Data

WarehouseSnowflake

1980s

Relational databaseOracle, DB2,SQL Server

Page 12: Changing the game with cloud dw

12

What is a Cloud-­Native DW?

•DW-­ Data Warehouse•Relational database•Uses standard SQL•Optimized for fast loads and analytic queries

•aaS – As a Service•Like SaaS (e.g. SalesForce.com)•No infrastructure set up•Minimal to no administration•Managed for you by the vendor•Pay as you go, for what you use

Page 13: Changing the game with cloud dw

13

Goals of Cloud DW

•Make your life easier•So you can load and use your data faster

•Support business•Make data accessible to more people•Reduce time to insights

•Handle big data too!•Schema-­less ingestion

Page 14: Changing the game with cloud dw

14

Common customer scenarios

Data warehouse for SaaS offerings

Use Cloud DW as back-­end data warehouse supporting data-­driven SaaS products

noSQL replacementReplace use of noSQL system (e.g. Hadoop) for transformation and SQL analytics of multi-­structured data

Data warehouse modernizationConsolidate legacy datamarts and support

new projects

Page 15: Changing the game with cloud dw

15

Over 250 customers demonstrate what’s possible

Up to 200x faster reports that enable analysts to make decisions in minutes rather than days

Load and update data in near real time by replacing legacy data warehouse + Hadoop clusters

Developing new applications that provide secure (HIPPA) access to analytics to 11,000+ pharmacies

Page 16: Changing the game with cloud dw

16

Introducing Snowflake

Page 17: Changing the game with cloud dw

17

About Snowflake

Experienced, accomplishedleadership team

2012 Founded by

industry veterans with over 120

database patents

Vision: A world with

no limits on data

First datawarehousebuilt for the cloud

Over 230enterprise customers inone yearsince GA

Page 18: Changing the game with cloud dw

18

The 1st Data Warehouse Built for the Cloud

Data Warehousing…

• SQL relational database• Optimized storage & processing• Standard connectivity – BI, ETL, …

•Existing SQL skills and tools•“Load and go” ease of use•Cloud-­based elasticity to fit any scale

Data scientists

SQL users & tools

…for Everyone

Page 19: Changing the game with cloud dw

19

Concurrency Simplicity

Fully managed with a pay-­as-­you-­go model. Works on any data

Multiple groups access data simultaneously

with no performance degradation

Multi petabyte-­scale, up to 200x faster performance

and 1/10th the cost

200x

The Snowflake difference

Performance

Page 20: Changing the game with cloud dw

20

The Data Warrior’sTop 10 Cool ThingsAbout Snowflake

(A Data Geeks Guide to DWaaS)

Page 21: Changing the game with cloud dw

21

#10 – Persistent Result Sets

•No setup•In Query History•By Query ID

•24 Hours•No re-­execution•No Cost for Compute

Page 22: Changing the game with cloud dw

22

#9 Connect with JDBC & ODBC

Data Sources

Custom & Packaged Applications

ODBC WEB UIJDBC

Interfaces

Java

>_

Scripting

Reporting & Analytics

Data Modeling, Management & Transformation

SDDM

SPARK too!https://github.com/snowflakedb/

JDBC drivers available via MAVIN.orgPython driver available via PyPI

Page 23: Changing the game with cloud dw

23

#8 -­ UNDROPUNDROP TABLE <table name>

UNDROP SCHEMA <schema name>UNDROP DATABASE <db name>

Part of Time Travel feature: AWESOME!

Page 24: Changing the game with cloud dw

24

#7 Fast Clone (Zero-­Copy)•Instant copy of table, schema, or database:CREATE OR REPLACE TABLE MyTable_V2CLONE MyTable

• With Time Travel:CREATE SCHEMAmytestschema_clone_restoreCLONE testschemaBEFORE (TIMESTAMP =>TO_TIMESTAMP(40*365*86400));;

Page 25: Changing the game with cloud dw

25

#6 – JSON Support with SQL

Apple 101.12 250 FIH-­2316

Pear 56.22 202 IHO-­6912

Orange 98.21 600 WHQ-­6090

"firstName": "John", "lastName": "Smith", "height_cm": 167.64, "address":

"streetAddress": "21 2nd Street", "city": "New York", "state": "NY","postalCode": "10021-3100"

, "phoneNumbers": [

"type": "home", "number": "212 555-1234" , "type": "office", "number": "646 555-4567"

]

Structured data(e.g. CSV)

Semi-structured data(e.g. JSON, Avro, XML)

• Optimized storage• Flexible schema - Native• Relational processing

select v:lastName::string as last_namefrom json_demo;;

Page 26: Changing the game with cloud dw

26

#5 – Standard SQL w/Analytic Functions

Complete SQL database• Data definition language (DDLs)• Query (SELECT)• Updates, inserts and deletes (DML)• Role based security• Multi-­statement transactions

select Nation, Customer, Totalfrom (select

n.n_name Nation,c.c_name Customer,sum(o.o_totalprice) Total,rank() over (partition by n.n_nameorder by sum(o.o_totalprice) desc)

customer_rankfrom orders o,customer c,nation nwhere o.o_custkey = c.c_custkeyand c.c_nationkey = n.n_nationkeygroup by 1, 2)

where customer_rank <= 3order by 1, customer_rank

Page 27: Changing the game with cloud dw

27

Snowflake’s multi-cluster, shared data architecture

Centralized storage

Instant, automatic scalability & elasticity

ServiceComputeStorage

#4 – Separation of Storage & Compute

Page 28: Changing the game with cloud dw

28

#3 – Support Multiple Workloads

Scale processing horsepower up and down on-­the-­fly, with zero downtime or disruption

Multi-­cluster “virtual warehouse” architecture scales concurrent users & workloads without contention

Run loading & analytics at any time, concurrently, to get data to users faster

Scale compute to support any workload

Scale concurrency without performance impact

Accelerate the data pipeline

Page 29: Changing the game with cloud dw

29

#2 – Secure by Design with Automatic Encryption of Data!

Authentication

Embedded multi-­factor authentication

Federated authentication available

Access control

Role-­based access control model

Granular privileges on all objects & actions

Data encryption

All data encrypted, always, end-­to-­end

Encryption keys managed automatically

External validation

Certified against enterprise-­class requirements

HIPPA Certified!

Page 30: Changing the game with cloud dw

30

#1 -­ Automatic Query Optimization

•Fully managed with no knobs or tuning required•No indexes, distribution keys, partitioning, vacuuming,…

•Zero infrastructure costs•Zero admin costs

Page 31: Changing the game with cloud dw

31

Other Cloud Data Warehouse Offerings

Page 32: Changing the game with cloud dw

32

Amazon Redshift

•Amazon's data warehousing offering in AWS • First announced in fall 2012 and GA in early 2013• Derived ParAccel (Postgres) moved to the cloud

•Pluses•Maturity: based on Paraccel, on the market for almost 10 years. • Ecosystem: Deeper integration with other AWS products • Amazon backing

Page 33: Changing the game with cloud dw

33

Amazon Redshift•Challenges (vs Snowflake)• Semi-­structured data: Redshift cannot natively handle flexible-­schema data (e.g. JSON, Avro, XML) at scale.• Concurrency: Redshift architecture means that there is a hard concurrency limit that cannot be addressed short of creating a second, independent Redshift cluster.• Scaling: Scaling a cluster means read-­only mode for hours while data is redistributed. Every new cluster has a complete copy of data, multiplying costs for dev, test, staging, and production as well as for datamarts created to address concurrency scaling limitations.•Management overhead: Customers report spending hours, often every week, doing maintenance such as reloading data, vacuuming, updating metadata.

Page 34: Changing the game with cloud dw

34

Customer Analysis – Snowflake vs Redshift

•Ability to provision compute and storage separately –•Storage might grow exponentially but compute needs may not •No need to add nodes to cluster to accommodate storage

•Compute capacity can be provisioned during business hours and shut down when not required (saving $$$)•Predictable/ exact processing power for user queries with dedicated separate warehouses•No concurrency issues between warehouses•No constraints on completing the ETL run before business hours •Analytical workload and ETL can run in parallel

Page 35: Changing the game with cloud dw

35

Customer Analysis

•0 maintenance/ 100% managed •No tuning (distkey, sortkey, vacuum, analyze)•100% uptime, no backups•Can restore at transaction level through time travel feature

•3X-­ 5X better compression compared to Redshift•Auto compressed

•Data at rest is encrypted by default•With Redshift, performance is degraded by 2x-­3x

Page 36: Changing the game with cloud dw

36

Customer Analysis

•Supports cross database joins•With cloning feature, we can spin up Dev/ test by cloning entire prod database in seconds → run tests→and drop the clone•There is no charge for a clone. Only incremental updates on storage are charged

•Instant Resize (scale up) •NO 20+ hrs read only mode like Redshift! •Resize also allows provisioning higher compute capacity for faster processing when required

Page 37: Changing the game with cloud dw

37

Microsoft Azure DW•Based on Analytics Platform System •Emerging out of MSFT’s on-­premises MPP data warehouse DataAllegro acquisition in 2008

•Pluses•Maturity: very mature (on-­prem) database for over 20 years • Ecosystem: Deep integration with Azure and SQL server ecosystem• Separation of compute and storage: can scale compute without the need of unloading/loading/moving the underlying at the database level. • Integration w/ Big Data & Hadoop: allows querying 'semi-­structured/unstructured' data, such as Hadoop file formats ORC, RC, and Parquet. This works via the external table concept

Page 38: Changing the game with cloud dw

38

Microsoft Azure DW

•Challenges (vs Snowflake)• Concurrency: Azure architecture means there are hard concurrency limit (currently 32 users) per DWU (Data Warehouse Units) that cannot be addressed short of creating a second warehouse and copy of the data.• Scaling: Cannot scale clusters easily and automatically.•Management overhead: Azure is difficult manage, specially with considerations around data distribution, statistics, indices, encryption, metadata operations, replication (for disaster recovery) and more.• Security: lack of end to end encryption. • Support for modern programmability: Lacks support for wide range of APIs due to commitments to its own ecosystem.

Page 39: Changing the game with cloud dw

39

Google BigQuery

•Query processing service offered in the Google Cloud, first launched in 2010• Follow up offering from Dremel service developed internally for Google only

•Pluses• Google-­scale horsepower: Runs jobs across a boatload of servers. As a result, BigQuery can be very fast on many individual queries.• Absolutely zero management: You submit your job and wait for it to return–that's it. No management of infrastructure, no management of database configuration, no management of compute horsepower, etc.

Page 40: Changing the game with cloud dw

40

Google BigQuery

•Challenges (vs Snowflake)• BigQuery is not a data warehouse: Does not implement key features that are expected in a relational database, which means existing database workloads will not work in BigQuery without non-­trivial change • Performance degradation for JOINs: Limits on how many tables can be joined• BigQuery is a black box: You submit your job and it finishes when it finishes–users have no ability to control SLAs nor performance. • Lots of usage limitations: Quotas on how many concurrent jobs can be run, how many queries can run per day, how much data can be processed at once, etc.• Obscure pricing: Prices per query (based on amount of data processed by a query), making it difficult to know what it will cost and making costs add up quickly• BigQuery only recently introduced full SQL support

Page 41: Changing the game with cloud dw

41

Snowflakein Action Today

Page 42: Changing the game with cloud dw

42

Simplifying the data pipeline

Event Data

Kafka Hadoop SQL Database Analysts & BI Tools

Import Processor

Key-­value Store

Event Data

Kafka Amazon S3 Snowflake Analysts & BI Tools

Scenario• Evaluating event data from various sources

Pain Points• 2+ hours to make new data available for

analytics• Significant management overhead • Expensive infrastructure

SolutionSend data from Kafka to S3 to Snowflake with schemaless ingestion and easy querying

Snowflake Value• Eliminate external pre-processing• Fewer systems to maintain• Concurrency without contention & performance

impact

Page 43: Changing the game with cloud dw

43

Simplifying the Data pipeline EDW

Game Event DataInternal Data

Third-­party Data

Analysts & BI Tools

StagingnoSQL Database

ExistingEDW

Game Event DataInternal Data

Third-­party Data

Analysts & BI Tools

SnowflakeKinesis

Cleanse Normalize Transform

11-­24 hours 15 minutes

ScenarioComplex pipeline slowing down analytics

Pain Points• Fragile data pipeline• Delays in getting updated data• High cost and complexity• Limited data granularity

SolutionSend data from Kinesis to S3 to Snowflake with schemaless ingestion and easy querying

Snowflake Value• >50x faster data updates• 80% lower costs• Nearly eliminated pipeline failures• Able to retain full data granularity

Page 44: Changing the game with cloud dw

44

Delivering compelling results

Simpler data pipelineReplace noSQL database with Snowflake for storing & transforming JSON event data Snowflake: 1.5 minutes

noSQL data base: 8 hours to prepare data

Snowflake: 45 minutes

Data warehouse appliance: 20+ hours

Faster analyticsReplace on-­premises data warehouse with Snowflake for analytics workload

Significantly lower costImproved performance while adding new workloads-­-­at a fraction of the cost Snowflake: added 2 new workloads for $50K

Data warehouse appliance: $5M + to expand

Page 45: Changing the game with cloud dw

45

The fact that we don’t need to do any configuration or tuning is great because we can focus on analyzing data instead of on managing and tuning a data warehouse.”

Craig Lancaster, CTO

The combination of Snowflake and Looker gives our business users a powerful, self-­service tool to explore and analyze diverse data, like JSON, quickly and with ease.”

We went from an obstacle and cost-­center to a value-­added partner providing business intelligence and unified data warehousing for our global web property business lines.”

JP Lester, CTO

Music & film distributionPublishers of Ask.com, Dictionary.com, About.com,

and other premium websitesInternet access service company toover 30 million smartphone users

• Replaced MySQL• Substituted Snowflakenative JSON handlingand queries with SQLin place of MapReduce• Integrated with Hadooprepository

• Consolidated global web data pipelining• Replaced 36-­node datawarehouse• Replaced 100-­node Hadoop cluster

Erika Baske, Head of BI

• Eliminated bottlenecksand scaling pain-­points• Consolidated multiplecloud data marts• Now handling larger datasets and higher concurrency with ease

Page 46: Changing the game with cloud dw

46

Cloud-­scale data warehouse

Page 47: Changing the game with cloud dw

47

Steady growth in data processing

•Over 20 PB loaded to date!•Multiple customers with >1PB

•Multiple customers averaging >1M jobs / week

•>1PB / day processed

•Experiencing 4X data processinggrowth over last six months

Jobs / day

Page 48: Changing the game with cloud dw

48

What does a Cloud-­native DW enable?Cost effective storage and analysis of GBs, TBs, or even PB’s

Lightning fast query performance

Continuous data loading without impacting query performance

Unlimited user concurrencyODBC WEB UIJDBC

Interfaces

Java

>_

Scripting

Full SQL relational support of both structured and semi-­structured data

Support for the tools and languages you already use

Page 49: Changing the game with cloud dw

49

Making Data Warehousing Great Again!

Page 50: Changing the game with cloud dw

50

As easy as 1-­2-­3!

Discover the performance, concurrency, and simplicity of Snowflake

1 Visit Snowflake.net

2 Click “Try for Free”

3 Sign up & register

Snowflake is the only data warehouse built for the cloud. You can automatically scale compute up, out, or down—independent of storage. Plus, you have the power of a complete SQL database, with zero management, that can grow with you to support all of your data and all of your users. With Snowflake On Demand™, pay only for what you use.

Sign up and receive$400 worth of free

usage for 30 days!

Page 51: Changing the game with cloud dw

Kent GrazianoSnowflake [email protected] Twitter @KentGraziano

More info athttp://snowflake.net

Visit my blog athttp://kentgraziano.com

Contact Information

Page 52: Changing the game with cloud dw

YOUR DATA, NO LIMITS

Thank you