99
Data Virtualization: Revolutionizing data cloning a.k.a. copy data management 1 kylehailey.com [email protected] @datavirt

Data Virtualization: Revolutionizing data cloning

Embed Size (px)

Citation preview

Data Virtualization: Revolutionizing data cloning

a.k.a. copy data management

1kylehailey.com [email protected] @datavirt

Data virtualization

• Fast becoming the new norm

• Used by Over 100 of Fortune 500

• Enables DevOps

DevOps movement

• Goals Clarify • Metrics Define • Constraints Identify • Priorities Set • Iterations Fast

DevOps :

• Goals Clarify • Metrics Define • Constraints Identify • Priorities Set • Iterations Fast

• Continuous Integration• Cloud • Agile • Kanban• Kata

“IT is the factory floor of this century”

The Goal : Theory of Constraints

Improvementnot made at the constraintis an illusion

factory floor optimization

Factory floor

Factory floor

constraint

Not a relay race

Tune before constraint

constraint

Tuning here

Stock piling

Tune after constraint

constraint

Tuning here

Starvation

Factory floor : straight forward

constraint

Goal: find constraint optimize it

The Phoenix Project

What is the constraint

in IT ?

Put your energy into the constraint

Top 5 constraints in IT

1. Dev environments setup2. QA setup3. Code Architecture4. Development5. Product management

- Gene Kim

“One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it“

Data is the constraint

60% Projects Over Schedule

85% delayed waiting for data

Data is the Constraint

CIO Magazine Survey:

only getting worse

Gartner: Data Doomsday, by 2017 1/3rd IT in crisis

• Data Constraint• Solution• Use Cases

In this presentation :

Typical Architecture

Production

Instance

File system

Database

Typical Architecture

Production

Instance

Backup

File system

Database

File system

Database

Typical Architecture

Production

Instance

Reporting Backup

File system

Database

Instance

File system

Database

File system

Database

Typical Architecture

Production

Instance

File system

Database

Instance

File system

Database

File system

Database

File system

Database

InstanceInstance

Instance

File system

Database

File system

Database

Dev, QA, UAT Reporting Backup

Triple Tax

Typical Architecture

Production

Instance

File system

Database

Instance

File system

Database

File system

Database

File system

Database

InstanceInstance

Instance

File system

Database

File system

Database

Typical Architecture

Production

Instance

File system

Database

Instance

File system

Database

File system

Database

File system

Database

InstanceInstance

Instance

File system

Database

File system

Database

Copies

21

• Oracle customers : 8-12 copies per db

• Fortune 2K: 1000s multi-TB db

• Downstream storage staggering

- 3 petabytes at just one client

• Hardware– storage, systems, network, – rack space, power cooling

• People – 1000s hours per year just for DBAs – DBAs– SYS Admin– Storage Admin– Backup Admin – Network Admin

• $10s Millions for data center modernizations

Copies require People & Time

companies unaware

companies unaware

Developer or AnalystBoss, Storage Admin, DBA

Metrics

– Time – Old Data – Storage

Other – Analysts – Audits – Data Center Modernization

companies unaware

"we say no, no, no until we can't say no anymore" response when IT asked for copies of prod DB

1. Waiting to check in code2. Production Bugs3. Expensive Slow QA

Biggest problem in Application Development

Development : bottlenecks

Frustration Waiting

Development : Bugs

Old Unrepresentative Data

Development : subsets

False NegativesFalse PositivesBugs in Production

Production Wall

30

QA : Long setup times

BugX

010203040506070

1 2 3 4 5 6 7

Delay in Fixing the bug

Cost ToCorrect

Software Engineering Economics – Barry Boehm (1981)

QA : destructive tests refresh time

32

20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST

8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs

• Data Constraint

• Solution• Use Cases

In this presentation :

Development UATQA

99% of blocks are identical

Solution

Development QA UAT

Thin Clone

• EMC Symmetrix– 16 snapshots – Write performance impact– No snapshots of snapshots

• Netapp & EMC VNX– 255 snapshots

• ZFS– Compression– Unlimited snapshots– Snapshots of Snapshots

• DxFS– Compression– Unlimited snapshots– Snapshots of Snapshots– Shared cache in memory

Technology Core : file system snapshots

Also check out new SSD storage such as: Pure Storage, EMC XtremIO

Snapshot 1 – full backup once only at link time

Jonathan Lewis © 2013 Virtual DB

38 / 30

a b c d e f g h i

We start with a full backup - analogous to a level 0 rman backup. Includes

the archived redo log files needed for recovery. Run in archivelog mode.

Snapshot 2 (from SCN)

Jonathan Lewis © 2013

b' c'

a b c d e f g h i

The "backup from SCN" is analogous to a level 1

incremental backup (which includes the relevant

archived redo logs). Sensible to enable BCT.

Delphix executes standard rman scripts

Apply Snapshot 2

Jonathan Lewis © 2013

a b c d e f g h ib' c'

The Delphix appliance unpacks the rman backup and "overwrites" the

initial backup with the changed blocks - but DxFS makes new copies of

the blocks

Drop Snapshot 1

Jonathan Lewis © 2013

b' c'a d e f g h i

The call to rman leaves us with a new level 0 backup, waiting for recovery.

But we can pick the snapshot root block. We have EVERY level 0 backup

Creating a vDB

Jonathan Lewis © 2013

b' c'a d e f g h i

The first step in creating a vDB is to take a snapshot of the filesystem as at

the backup you want (then roll it forward)

My vDB(filesystem)

Your vDB(filesystem)

b' c'a d e f g h i

Creating a vDB

Jonathan Lewis © 2013

b' c'a d e f g h i

The first step in creating a vDB is to take a snapshot of the filesystem as at

the backup you want (then roll it forward)

My vDB(filesystem)

Your vDB(filesystem)

i’b' c'a d e f g h ib' c'a d e f g h i

Fuel not equal car

Challenges

1. Technical2. Bureaucracy

Bureaucracy

Developer Asks for DB Get Access

Manager approves

DBA Request system

Setup DB

System Admin

Requeststorage

Setupmachine

Storage Admin

Allocate storage (take snapshot)

Why are hand offs so expensive?

1hour1 day

9 days

Bureaucracy

Technical Challenge

Database Luns

Production FilerTarget A

Target B

Target C

snapshotclones

InstanceInstance

InstanceInstance

InstanceInstance

InstanceInstance

Instance

Source

Database LUNs

snapshot

clonesProduction Filer

Development Filer

Technical Challenge

Instance

Target A

Target B

Target C

InstanceInstance

InstanceInstance

InstanceInstance

Instance

Technical Challenge

Copy

Time Flow

Purge

Production

File System Instance

TargetStorage

Clone (snapshot)

Compress

Share Cache

Provision

Mount, recover, rename

Self Service, Roles & Security

Instance

21 3

How to get a Data Virtualization?

Sourcesync

TargetDeploy

Storagesnapshots

21 3

Source Sync Storage Snapshots Deploy automation

ZFS Yes (unlimited)

EMC SRDF Yes (16 or 255)

Netapp SMO Yes (255)

Oracle EM 12c Data Guard Netapp, ZFS Yes (oracle only, no branching)

Actifio Yes Yes Yes (no branching)

Delphix Yes Yes yes

ActifioProduction

InstanceInstanceInstance

Actifio

InstanceInstance Instance

TargetActifio

Instance

Target

Oracle Snap Clone

ZFSSAor

NetApp

Instance

TargetEM 12c

Instance

Target

Production

InstanceInstanceInstance

Oracle Snap CloneProduction

InstanceInstanceInstance

Data Guard

InstanceInstanceInstance

ZFSSAor

NetApp

Instance

TargetEM 12c

Instance

Target

Oracle Snap CloneProduction

InstanceInstanceInstance EM 12c

Solaris

ZFS

Instance

TargetData Guard

Instance

Instance

Target

Any storage

Incremental forever collect changesProduction

InstanceInstanceInstance

Time Flow

ChangesInstance

NFS

Target

Instance

Target

Database Virtualization

Three Physical CopiesThree Virtual Copies

Data Virtualization Appliance

Before Virtual Data

Production Dev, QA, UAT

Instance

Reporting Backup

File system

Database

Instance

File system

Database

File system

Database

File system

Database

InstanceInstance

Instance

File system

Database

File system

Database

“triple data

tax”

With Virtual DataProduction

Instance

Dev & QA

Instance

Reporting

Instance

Backup

Instance Instance InstanceInstanceInstance

Instance

File system

Database

Data Virtualization Appliance

Instance

• Problem in the Industry• Solution• Use Cases

1. Development and QA 2. Production Support3. Business

Use Cases

1. Development & QA2. Production Support3. Business

Use Cases

Development: Virtual Data

Development

* Fast * Free * Full size * Self service

Virtual Data: Easy

Instance

Instance

Instance

Instance

Source

DVA

Development Virtual Data: Parallelize

gif by Steve Karam

Development Virtual Data: Full size

Development Virtual Data: Self Service

QA : Virtual Data• Fast • Parallel• Rollback• A/B testing

Dev

QA

Instance

Prod

DVA

• Eliminate build time

• Find bugs Fast

• Run Parallel QA

QA Virtual Data : Parallel

Production Time Flow

QA Virtual Data : Fast Refresh

70

20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST 20 MIN TEST

• Fast

• Full

• Fresh

• Efficient

8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs8 Hrs 8 Hrs

20 MIN

TEST

QA with Virtual Data: Rewind

DVAInstance

QA

Prod

Production Time Flow

QA with Virtual Data: A/B

DVAInstance

Instance

Instance

Index 1

Index 2

Production Time Flow

Data Version Control

1/30/2015 73

Dev

QA

2.1

Dev

QA

2.2

2.1 2.2

Instance

Prod

DVA Production Time Flow

1. Development and QA2. Production Support3. Business

Use Cases

• Backups• Recovery• Forensics• Migration• Consolidation

Production Support

9TB database 1TB change day 30 day backups storage requirements

76

0

10

20

30

40

50

60

70

wee

k 1

wee

k 2

wee

k 3

wee

k 4

original

Oracle

Delphix

Recovery

Instance

Instance

Recover VDB

Drop

Source

DVA Production Time Flow

Forensics

Instance

Development

DVA

Source

Production Time Flow

Development (the new production)

Instance

Development

DVA

Source

Development

Prod & VDB Time Flow

Migration

1. Development and QA2. Production Support3. Business Intelligence

Use Cases

Business Intelligence

• ETL• Temporal• Confidence Testing• Federated Databases• Audits

Business Intelligence: ETL and Refresh Windows

1pm 10pm 8amnoon

Business Intelligence: batch taking too long

1pm 10pm 8amnoon

2011

2012

2013

2014

2015

2011

2012

2013

2014

2015

1pm 10pm 8amnoon

10pm 8am noon 9pm

6am 8am 10pm

Business Intelligence: ETL and DW Refreshes

Instance

Prod

Instance

DW & BI

• Collect only Changes• Refresh in minutes

Instance

Prod

BI and DW

ETL24x7

DVA

Virtual Data: Fast Refreshes

Production Time Flow

Temporal Data

Confidence testing

Modernization: Federated

Instance

Instance

Source1

Source2

DVAProduction Time Flow 1

Production Time Flow 2

Modernization: Federated

“I looked like a hero”Tony Young, CIO Informatica

Modernization: Federated

Production Time Flow

Audit

1/30/2015 93

Instance

Prod

DVA

Live Archive

1. Development & QA2. Production Support3. Business

Use Case Summary

How expensive is the Data Constraint?

DVA at Fortune 500 :

Dev throughput increase by 2x

Faster

• Financial Close• BI refreshes• Surgical recovery• Projects

How expensive is the Data Constraint?

• Projects “12 months to 6 months.”– New York Life

• Insurance product “about 50 days ... to about 23 days”– Presbyterian Health

• “Can't imagine working without it”– State of California

Virtual Data Quotes

• Problem: Data is the constraint • Solution: Virtualize Data• Results:

• Half the time for projects• Higher quality• Increase revenue

Summary

Thank you!

• Kyle Hailey| Oracle ACE and Technical Evangelist, Delphix– [email protected]

– kylehailey.com

– slideshare.net/khailey

– @datavirt