36
Immutability Changes Everything! October 10, 2012 Pat Helland Salesforce.com

Immutability Changes Everything!

  • Upload
    toya

  • View
    81

  • Download
    0

Embed Size (px)

DESCRIPTION

Immutability Changes Everything!. October 10, 2012 Pat Helland Salesforce.com. Outline. Introduction Accountants Don’t Use Erasers Keeping the Stone Tablets Safe Hey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder - PowerPoint PPT Presentation

Citation preview

Page 1: Immutability Changes Everything!

Immutability Changes Everything!

October 10, 2012Pat HellandSalesforce.com

Page 2: Immutability Changes Everything!

2

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 3: Immutability Changes Everything!

Some Industry Trends to Consider

3

OldComputation (CPUS)

Expensive

Disk Storage Expensive

Coordination Easy(Latches Don’t Often Hit)

DRAM Expensive

NewComputation Cheap

(Manycore Computers)

Disk Storage Cheap(Cheap Commodity Disks)

Coordination Hard(Latches Stall a Lot, etc)

DRAM / SSD Getting Cheap

We Can Afford to Keep Immutable Copies of Lots of Data

We Need Immutability to Coordinate with Fewer Challenges

Page 4: Immutability Changes Everything!

4

Increasing Storage, Distribution, and Ambiguity

Increasing Storage Cost per Gigabyte/Terabyte/Petabyte is dropping We can keep LOTS OF data for a LONG time

Increasing Distribution More and more, we have data and

work spread across a great distance Data within the Datacenter may be far away… Data within a many-core chip may be far away…

Increasing Ambiguity When trying to coordinate with systems that are farther away, there’s

more that’s happened since you’ve heard the news Can you take action with incomplete knowledge? Can you wait for enough knowledge?

This may be easing as we get faster and flatter

networks in the datacenter

Instruction opportunities lost waiting for a

semaphore increase with more cores…

Page 5: Immutability Changes Everything!

5

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 6: Immutability Changes Everything!

6

“Append-Only” Computing Many kinds of computing are “Append-Only”

Observations are recorded forever (or a long time)Derived results are calculated on demandYou can’t rewrite history

Database transaction logs record all the changes made to the databaseHigh-speed appends to the logYou never modify the log other than by appending to it

The database is a cache of a subset of the log!The latest value of each record is kept in the database

Page 7: Immutability Changes Everything!

7

Accounting: Recorded & Derived Knowledge

Accountants don’t use erasers All entries in the ledger remain in the ledger Corrections can be made but only by new entries A company’s quarterly results are published

o They include small corrections to the previous quarter… Small fixes are OK!

Some entries describe observed facts We received these credits and debits

Some entries are derived facts We amortized these capital expenses at this rate based on their cost and

usage Your current balance depends on last months balance with applied debits

& credits

Page 8: Immutability Changes Everything!

8

The Append-Only View of Distributed Single-Master Computing

Single-Master computing means somehow we order the changes Centralized Computing Two-Phase Commit or Paxos Optimistic Concurrency Control Somehow, we semantically apply one change at a time

Each change is layered over its predecessors We can perceive a new set of values superseding the old ones This may be transactional or single-record changes but they appear in

an order We continue to append new knowledge over the

immutable history The new version of the truth is interpreted through the older versions

Page 9: Immutability Changes Everything!

9

Distributed Computing “Back in the Day” Back before telephones, people used messengers

Kids walking through town or riding bicycles to deliver the message The US Postal Service or the Pony Express would deliver the message

Sometimes, people used fancy forms to capture the computing Add new data to a new part of the form Tear off the back copy of the form and file it Send the remaining portions to the next participant Each participant received the data they needed and

added the new information to the form You cannot update earlier data on the form…

o You can only append new knowledge to the form! Distributed computing was append-only!

New messages, new additions to the forms… You couldn’t overwrite what had been written!

Part 1

Part 2

Part 3

Part 1

Part 2

Part 3

Part 1

Part 2

Part 3

Part 1

Part 2

Part 3

Part 1

Part 2

Part 3

Page 10: Immutability Changes Everything!

10

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 11: Immutability Changes Everything!

11

Files, Blocks, & Replication for Durability & Availability

GFS and HDFS (and others) offer highly-available files A file is a bunch of blocks (or chunks) The file (as a file name and description of needed blocks) is highly

available Each block (chunk) is replicated within the cluster for durability and

availabilityo Blocks are typically replicated three times with scrubbingo Replicas are placed across fault-zones

Each file is immutable and (typically) single writer The file is created, one process can append to it, it lives for a while and

is deleted Multi-writer files are hard (GFS had some challenges with failures and

replicas) Immutable files and immutable blocks empower this

replication The file system has no concept of a change to a complete file Each block’s immutability allows it to be replicated (and have extra

replicas, too)

High Availability of Immutable Blocks Is Affordable Now!

Google, Amazon, Yahoo, Microsoft, and more keep Petabytes & Exabytes

Page 12: Immutability Changes Everything!

12

Widely Sharing Immutable Files Is Easy Immutable files have an identity and a content

Neither the identity nor the content can change You can copy the immutable file whenever and wherever

you want Since you can’t change it, you don’t need to track where it’s landed!

You can share the same immutable copy across users As long as you track reference counts (when it’s OK to delete it), you can

use one copy of the file to share across many users You can distribute immutable files wherever you want

Same identity, same contents, location independent!Published Books are Immutable!

Sometimes later editions repair previous bugs

This is versioning of the book

Versions are immutable objects!

Page 13: Immutability Changes Everything!

13

Names and Immutability… Watch Out for the Slippery Slope

GFS (Google File System) and HDFS (Hadoop Distributed File System) provide immutable files Immutable blocks (chunks) are replicated across Data Nodes Immutable files are a sequence of blocks (chunks) The immutable files are identified with a GUID

The contents of a file are immutable and labeled with a GUID The GUID will always refer to exactly that file and its contents

GFS and HDFS also provide a namespace which can be changed The logical name of the immutable file may be changed to something

else It takes care in usage to ensure that you have predictable results

Is Something Really Immutable When Its Name Can Change?

Page 14: Immutability Changes Everything!

14

Storing Immutable Data in an Eventually Consistent Store

Consider a strongly consistent catalog Single master control over a namespace yielding GUIDs for the file blobs

Now, keep the GUID to immutable blob storage in Dynamo or Riak The eventually consistent store will NEVER give you the wrong answer Each GUID will only yield one result because you never store different

values Self-managing and master-less blob-store!NameNode

NameSpace

Block & DataNode Mgmt

DataNode

DataNode

DataNode

DataNode…

HD

FS

DataNode

DataNode Data

NodeDataNode

DataNode

DataNode

DataNode

DataNode

DataNodeData

Node

Riak

RDBMS Files/Blocks Identified by

GUID Nam

e S

pace

File

/Blo

ck

Sto

re

Page 15: Immutability Changes Everything!

15

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 16: Immutability Changes Everything!

16

Versions and History Linear Version History (a.k.a. Strongly Consistent):

One version replaces another – One parent and one child in the sequence

Each version is immutable Each version has an identity Typically, each new version is viewed as a replacement for the earlier

one DAG (Directed Acyclic Graph) Version History

(a.k.a. Eventual Consistency): Each version may have one or more parents Each parent may have one or more children Each parent may have children with different parents Each version is immutable Each version has an identity (but we may now need vector clocks to

describe) Each version may be viewed as one of many replacement versions for

its parentsVersions Are Immutable and (Should) Have Immutable Names

Page 17: Immutability Changes Everything!

17

Strongly Consistent Transactions Viewed as Versions

In a Database, ACID transactions appear as if they have serial order This is called serializability I know there are reduced degrees of consistency but this is usually close

to true Transaction T1 commits at one point and Transaction T2 at

a later one Transaction T1 presents a consistent view of the entire database Transaction T2 presents a different and later view of the database

An Active Database Is Constantly Presenting New Versions of Its Data

Transaction T1 Is a Version of the Database

Later, Transaction T2 Is a Version of the Database

Everything Changeable Can Be Understood as a Bunch of Versions

How Do You Identify the Versions? Can You See Old Ones?

Page 18: Immutability Changes Everything!

18

BigTable & HBase: Interpreting the Immutable Entrails

BigTable & HBase: Log: When a change occurs, write a record in the log to ensure its

durableo Limited notion of transactions

Major Compaction: an image (key sorted) of the key-value pairs at a point in time

Minor Compaction: a set of new key-values (or new values for existing keys)o Represents changes to a set of keys since the last major compaction

Both BigTable & HBase function by writing immutable files There is not an “update-in-place” to change the data There is an append to a new file (Minor Compaction) describing a new

version Both BigTable & HBase provide a programmer perspective

of versions Each key has a set of versions (in a linear, strongly-consistent sequence) A read may get the latest version or may get an earlier version

Immutability Is at the Heart of BigTable & HBase Data Change Is By Appending to Files Which Become Immutable

User Semantics Present Immutable Versions of Key-Values

Page 19: Immutability Changes Everything!

19

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 20: Immutability Changes Everything!

20

DataSets: Immutable Collections of Data

A DataSet is a fixed collection of tables:The schema for each table is created when the DataSet is madeThe contents of each table is created when the DataSet is madeA DataSet is immutable:

o It is created, it may be consumed for reading, and it may be deleted

DataSets may be relational or some other representation…

Schema

Table1

DataSet-XTable2

…TableN

……

Page 21: Immutability Changes Everything!

21

DataSets Referenced by a Relational Database

DataSets can be present within the relational store The meta-data for the DataSet is visible within the relational database We may choose to store the DataSet “by-reference” but the contents are

semantically present within the relational store

RelationalDatabase

DataSet-X

DataSet-Y

DataSet-ZSchema

Table1

DataSet-XTable2

…TableN

……

Stored Elsewhere…

Page 22: Immutability Changes Everything!

22

Functional Calculations Outside a Relational DB

Functional versus Dysfunctional calculations A functional calculation takes a set of inputs and predictably creates an

output The entire calculation and pieces of it are idempotent

o Idempotence: Doing it more than once is the same as doing it once!

Work using DataSets can be performed outside the relational store The inputs may exist outside the relational store The computation may happen outside the relational store The results may be stored outside the relational store The results may appear (by reference) inside the relational storeDataSet-M

DataSet-N

DataSet-O

DataSet-P

DataSet-R

Functional Calculation

Idempotence: It’s Not That Hard!

Page 23: Immutability Changes Everything!

23

Relational Operations on Immutable DataSets

You can meaningfully apply relational operations across locked relational data and immutable DataSets Relational operations are value based and require locking semantics Database concurrency control temporarily freezes the changing data Relational JOINS require frozen snapshots to be meaningful

Locking presents a version of the Relational DB which can be joined Named and frozen DataSets may also be joined with the classic dataRelational

Database

TableB

TableA

DataSet-X

Schema

Table1

DataSet-X

Table2

…TableN

……Join TableA and Table1Join TableA and Table1

Stored Elsewhere…

Page 24: Immutability Changes Everything!

24

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 25: Immutability Changes Everything!

25

DataSets Are Semantically Immutable

A DataSet is semantically immutable It has a set of tables, rows, and columns It may have semi-structured data (e.g. JSON) It may have app-defined data

DataSets may be defined as a SELECTION, PROJECTION, or JOIN over previously existing DataSets Semantically, all that data is copied into a new DataSet Physically optimizations can occur

Schema

Table1

DataSet-X

Table2

…TableN

……

Page 26: Immutability Changes Everything!

26

Optimizing DataSets for Read Patterns DataSets are semantically immutable but may be

physically changed You can add an index or two You can denormalize tables to optimize for read access You can make a copy of a table with far fewer columns for fast access You can place partitions of the DataSet close to where they are being

read You can dynamically watch the read usage of a DataSet

and create optimizations for the new reader

Schema

Table1

DataSet-X

Table2

…TableN

……

Index# 1

Denormalization of Parts of

Table1 & Table 2Index# 1

Page 27: Immutability Changes Everything!

27

Immutability and “Big Data” Massively parallel computations usually are functional and

based on immutable inputs MapReduce (Hadoop) and Dryad take immutable files as input The work is cut into pieces, each of which is immutable

Functional computation (based on immutable inputs) is idempotent It’s OK to croak and restartImmutability Is the Backbone of

“Big Data” Computations!Functional Computation with Immutable Inputs

Failure and Restart Based on the Idempotent Nature of Functional Computing over Immutable Inputs

Page 28: Immutability Changes Everything!

28

Immutability as a Semantic Prism

DataSets show an immutable semantic perspective Even if the underlying representation is augmented or completely

replaced The King James Bible is character for character immutable

Even when printed in a different font… Even when digitized… Even when accompanied by different pictures… ???... Hmm…

Is a DataSet changed if there is a loss-less transformation to a new schema representation The new address field has more capacity… Is that OK? The ENUM values are mapped to a new underlying representation… Is

that OK?It’s Not Enough to Have the Right Bits!You Have to Know How to Interpret Them…

“President Bush” meant a different thing in 1990 versus 2005

The word “Fanny” is interpreted differently in the US versus Australia

You Need to know what the Immutable Bits Actually Mean!

Page 29: Immutability Changes Everything!

29

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 30: Immutability Changes Everything!

Why Normalize? Normalization’s goal is to eliminating update anomalies

Can be changed without “funny behavior”Each data item lives in one place

Emp # Emp Name Mgr #Mgr NameEmp Phone Mgr Phone

47 Joe 13 Sam5-1234 6-987618 Sally 38 Harry3-3123 5-678291 Pete 13 Sam2-1112 6-987666 Mary 02 Betty5-7349 4-0101

Classic problemwith de-normalization

Can’t updateSam’s phone #since there aremany copies

De-normalization isOK if you aren’t going to

update!

30

Page 31: Immutability Changes Everything!

We Are Swimming in a Sea of Immutable Data

ListeningPartner

Service-1

ListeningPartner

Service-5

ListeningPartner

Service-7

ListeningPartner

Service-8

Tuesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-List

Tuesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-List

Tuesday’sPrice-List

Data Owning Service

Price-List

ListeningPartner

Service-1

ListeningPartner

Service-5

ListeningPartner

Service-7

ListeningPartner

Service-8

Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-ListMonday’sPrice-ListMonday’sPrice-List

Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Wednesday’sPrice-List

Monday’sPrice-ListMonday’sPrice-ListMonday’sPrice-List

Tuesday’sPrice-ListTuesday’sPrice-ListTuesday’sPrice-List

Data Owning Service

Price-List

Data Owning Service

Price-List

31

Page 32: Immutability Changes Everything!

Think First Before You Normalize

32

For God’s Sake, Don’t Normalize Immutable Data!Unless It’s to Optimize Space in the Representation…

Page 33: Immutability Changes Everything!

People Normalize ‘Cuz their Professor Said To-- That’s Why We Need All Those Joins…

Culture:

the Way We Do Things Around Here

If All You Have Is a Database,Everything Looks Like a Nail…

33

Page 34: Immutability Changes Everything!

34

Outline IntroductionAccountants Don’t Use ErasersKeeping the Stone Tablets SafeHey! Versions Are Immutable, Too! Immutability by Reference Immutability Is in the Eye of the Beholder

Normalization Is for SissiesConclusion

Page 35: Immutability Changes Everything!

35

Takeaways Things have changed towards immutability

We need immutability to coordinate at ever increasing distances We can afford immutability because we have room to store versions for a

long time Versioning allows a changing view of objects with immutable

backing Linear (strongly consistent) version histories for some (e.g. BigTable, HBase) Directed-Acyclic-Graph (eventually consistent) history for others (e.g.

Dynamo, Riak) Increasingly, systems are based on writing immutable data

Log-Structured Merge trees (e.g. HBase, BigTable, LevelDB, etc.) as implementation

Layering immutable data over a distributed file system offers robustness and scale

Immutability extends consistent relational systems Very large immutable DataSets may be embedded by reference in relational

stores The semantics of immutable DataSets joins cleanly with the changing

relational data Semantically immutable data may be changed for

optimization Projections, redundant copies, denormalization, column stores, indexing and

more… Semantically immutable means the user behavior doesn’t change

Immutability is the backbone of emerging “Big Data” systems MapReduce, Hadoop, and more leverage immutable snapshots

Page 36: Immutability Changes Everything!

36

Immutability Changes

Everything!