43
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Optimize Your Database for the Cloud with DynamoDB A Deep Dive into Global Secondary Indexes (GSI) David Pearson Siva Raghupathy 1

AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

Embed Size (px)

DESCRIPTION

Amazon DynamoDB is a fully managed, highly scalable distributed database service. Global Secondary Indexes (GSI) give you the flexibility to query your DynamoDB tables in new and powerful ways. In this session, we will: • Describe how GSI's work under the covers to ensure consistent low latency at any scale. • Walk through various access patterns so that you will learn how to take full advantage of GSI's and implement best practice designs that will scale efficiently and cost-effectively. This session is designed for developers and architects seeking to build rich applications that require performance and availability with absolute data durability.

Citation preview

Page 1: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Optimize Your Database for the Cloud

with DynamoDB

A Deep Dive into

Global Secondary Indexes (GSI)

David Pearson

Siva Raghupathy

1

Page 2: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

database service

automated operations predictable performance

durable low latency cost effective

=

2

Page 3: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

WRITES

Continuously replicated to 3 AZ’s

Quorum acknowledgment

Persisted to disk (custom SSD)

READS

Strongly or eventually consistent

No trade-off in latency

Durable Low Latency

3

Page 4: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Recent Announcements

Secondary Indexes (Local and Global)

DynamoDB Local

• Disconnected development with full API support

• No network

• No usage costs

• No SLA

Fine-Grained Access Control

• Direct-to-DynamoDB access for mobile devices

Geospatial and Transaction Libraries

4

Page 5: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

table

5

Page 6: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

table

items

6

Page 7: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

attributes

items

table

schema-less schema is defined per attribute

7

Page 8: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

hash

hash keys mandatory for all items in a table key-value access pattern

8

Page 9: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

partition 1 .. N

hash keys mandatory for all items in a table key-value access pattern determines data distribution

9

Page 10: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

range

hash

range keys model 1:N relationships enable rich query capabilities composite primary key

all items for a hash key ==, <, >, >=, <= “begins with” “between” sorted results counts top / bottom N values paged responses

10

Page 11: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

local secondary indexes (LSI) alternate range key + same hash key index and table data is co-located (same partition)

11

Page 12: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

A1 (hash)

A3 (range)

A2 (table key)

LSI Attribute Projections A1

(hash) A2

(range) A3 A4 A5

LSIs

A1 (hash)

A4 (range)

A2 (table key)

A3 (projected)

Table

KEYS_ONLY

INCLUDE A3

A1 (hash)

A4 (range)

A2 (table key)

A3 (projected)

A5 (projected) ALL

12

Page 13: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Concepts

global secondary indexes (GSI)

any attribute indexed as new hash and/or range key

13

Page 14: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Local Secondary Index Global Secondary Index

1 Key = hash key and a range key Key = hash or hash-and-range

2 Hash same attribute as that of the table. Range key

can be any scalar table attribute

The index hash key and range key (if present) can be

any scalar table attributes

3 For each hash key, the total size of all indexed items

must be 10 GB or less No size restrictions for global secondary indexes

4 Query over a single partition, as specified by the hash

key value in the query Query over the entire table, across all partitions

5 Eventual consistency or strong consistency Eventual consistency only

6 Read and write capacity units consumed from the

table.

Every global secondary index has its own provisioned

read and write capacity units

7 Query will automatically fetch non-projected attributes

from the table

Query can only request projected attributes. It will not

fetch any attributes from the table

14

Page 15: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

A5 (hash)

A3 (range)

A1 (table key)

GSI Attribute Projections A1

(hash) A2 A3 A4 A5

GSIs

A5 (hash)

A4 (range)

A1 (table key)

A3 (projected)

Table

KEYS_ONLY

INCLUDE A3

A4 (hash)

A5 (range)

A1 (table key)

A2 (projected)

A3 (projected) ALL

A2 (hash)

A1 (table key) KEYS_ONLY

15

Page 16: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

GSI Query Pattern

Query covered by GSI

• Query GSI & get the attributes

Query not covered by GSI

• Query GSI get the table key(s)

• BatchGetItem/GetItem from table

• 2 or more round trips to DynamoDB

Tip: If you need very low latency then project all required attributes into GSI

16

Page 17: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

How do GSI updates work

Table

Primary

table Primary

table Primary

table Primary

table

Global

Secondary

Index

Client

2. Asynchronous

update (in progress)

17

Page 18: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Table Operation No of GSI index

updates

• Item not in Index before or after update 0

• Update introduces a new indexed-attribute

• Update deletes the indexed-attribute

1

• Updated changes the value of an indexed attribute from

A to B

2

1 Table update = 0, 1 or 2 GSI updates

18

Page 19: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

GSI EXAMPLES

19

Page 20: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Example1: Multi-tenant application for file

storing and sharing

Access Patterns

1. Users should be able to query all the files they own

2. Search by File Name

3. Search by File Type

4. Search by Date Range

5. Keep track of Shared Files

6. Search by descending order or File Size

Page 21: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DynamoDB Data Model

Users

• Hash key = UserId (S)

• Attributes = User Name (S), Email (S), Address (SS), etc.

User_Files

• Hash key = UserId (S) – This is also the tenant id

• Range key = FileId (S)

• Attributes = Name (S), Type (S), Size (N), Date (S), SharedFlag

(S), S3key (S)

Page 22: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Global Secondary Indexes

Table Name Index Name Attribute to

Index

Projected Attribute

User_Files NameIndex Name KEYS

User_Files TypeIndex Type KEYS + Name

User_Files DateIndex Date KEYS + Name

User_Files SharedFlagIndex SharedFlag KEYS + Name

User_Files SizeIndex Size KEYS + Name

Page 23: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Access Pattern 1

Find all files owned by a user

• Query (UserId = 2)

UserId

(Hash)

FileId

(Range)

Name Date Type SharedFlag Size S3key

1 1 File1 2013-04-23 JPG 1000 bucket\1

1 2 File2 2013-03-10 PDF Y 100 bucket\2

2 3 File3 2013-03-10 PNG Y 2000 bucket\3

2 4 File4 2013-03-10 DOC 3000 bucket\4

3 5 File5 2013-04-10 TXT 400 bucket\5

Page 24: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Access Pattern 2

Search by file name

• Query (IndexName =

NameIndex, UserId =

1, Name = File1)

UserId

(hash)

Name

(range)

FileId

1 File1 1

1 File2 2

2 File3 3

2 File4 4

3 File5 5

NameIndex

Page 25: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Access Pattern 3

Search for file name

by file Type

• Query (IndexName =

TypeIndex, UserId = 2,

Type = DOC)

UserId

(hash)

Type

(range)

FileId Name

1 JPG 1 File1

1 PDF 2 File2

2 DOC 4 File4

2 PNG 3 File3

3 TXT 5 File5

Projection

TypeIndex

Page 26: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Access Pattern 4

Search for file name by

date range

• Query (IndexName =

DateIndex, UserId = 1,

Date between 2013-03-

01 and 2013-03-29)

UserId

(hash)

Date

(range)

FileId Name

1 2013-03-10 2 File2

1 2013-04-23 1 File1

2 2013-03-10 3 File3

2 2013-03-10 4 File4

3 2013-04-10 5 File5

Projection

DateIndex

Page 27: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Access Pattern 5

Search for names of

Shared files

• Query (IndexName =

SharedFlagIndex,

UserId = 1,

SharedFlag = Y)

UserId

(hash)

SharedFlag

(range)

FileId Name

1 Y 2 File2

2 Y 3 File3

Projection

SharedFlagIndex

Page 28: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Access Pattern 6

Query for file names by

descending order of file

size

• Query (IndexName =

SizeIndex, UserId = 1,

ScanIndexForward =

false)

UserId

(hash)

Size

(range)

FileId Name

1 100 1 File1

3 400 2 File2

1 1000 3 File3

2 2000 4 File4

2 3000 5 File5

Projection

SizeIndex

Page 29: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Example2: Find top score for game G1

Id (hash key)

User Game Score Date

1 Bob G1 1300 2012-12-23 18:00:00

2 Bob G1 1450 2012-12-23 19:00:00 3 Jay G1 1600 2012-12-24 20:00:00

4 Mary G1 2000 2012-10-24 17:00:00

5 Ryan G2 123 2012-03-10 15:00:00 6 Jones G2 345 2012-03-20 15:00:00

Game-scores-table

29

Page 30: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

GameScoresIndex

Id (hash key)

User Game Score Date

Games (hash)

Score (range)

Id (table key)

User (projected)

Date (projected)

Game-scores-table

Game-scores-index

30

Page 31: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Game-scores-index

Game (Hash)

Score (Range)

Id User Date

G1 2000 4 Mary 2012-10-24 17:00:00

G1 1600 3 Jay 2012-12-24 20:00:00

G1 1450 2 Bob 2012-12-23 19:00:00

G1 1300 1 Bob 2012-12-23 18:00:00

G2 345 6 Jones 2012-03-20 15:00:00

G2 123 5 Ryan 2012-03-10 15:00:00

31

Page 32: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query: Find top score for game G1

32

Page 33: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

DATA MODELING WITH GSI

33

Page 34: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Modeling 1:1 relationships

Use a table with a Hash key or a GSI with a hash key

Example:

• Users

Hash key = UserID

• Users-email-GSI

Hash key = Email

Users Table Hash key Attributes UserId = bob Email = [email protected], JoinDate = 2011-11-15 UserId = fred Email = [email protected], JoinDate = 2011-12-

01, Sex = M

Users-email-GSI Hash key Attributes Email = [email protected]

UserId = bob, JoinDate = 2011-11-15

Email = [email protected]

UserId = fred, JoinDate = 2011-12-01, Sex = M

34

Page 35: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Modeling 1:N relationships

Use a table with Hash and Range key or GSI ()

Example:

• One (1) User can play many (N) Games User-Games-GSI

Hash Key Range key

Attributes

UserId = bob GameId = Game1

HighScore = 10500, ScoreDate = 2011-10-20

UserId = fred

GameId = Game2

HIghScore = 12000, ScoreDate = 2012-01-10

UserId = bob GameId = Game3

HighScore = 20000, ScoreDate = 2012-02-12

User-Games-Table Hash Key Attributes UserId = bob

GameId = Game1, HighScore = 10500, ScoreDate = 2011-10-20

UserId = fred

GameId = Game2 HIghScore = 12000, ScoreDate = 2012-01-10

UserId = bob

GameId = Game3 HighScore = 20000, ScoreDate = 2012-02-12

35

Page 36: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Modeling N:M relationships

Use GSI

• Example: 1 user plays multiple games

and 1 game has multiple users

User-Games-Table Hash Key Range key UserId = bob GameId = Game1

UserId = fred GameId = Game2

UserId = bob GameId = Game3

Game-Users-GSI Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob

36

Page 37: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Best Practices

Choose a GSI Hash Key with high cardinality

Id (hash) Name Sex DOB Address

Employee-Table

Sex (Hash) DOB Id Name Address

SexDOB-GSI Cardinality of Sex = 2 (M/F)

Solution: Generate aliases for M/F by suffixing a known range

of integers (say 1 to 100) and Query for each value M_1 to M_100

37

Page 38: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Best Practices

Take advantage of Sparse Indexes

Id (hash)

User Game Score Date Award

1 Bob G1 1300 2012-12-23

2 Bob G1 1450 2012-12-23

3 Jay G1 1600 2012-12-24

4 Mary G1 2000 2012-10-24 Champ

5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20

Game-scores-table

Award (hash)

Id User Score

Champ 4 Mary 2000

Award-GSI

38

Page 39: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Best Practices

Query GSI for quick item lookups

• Less read capacity units consumed

Mail Box-Table

ID (hash key)

Timestamp (range key)

Attribute1

Attribute2

Attribute3

….

LargeAttachment

Mail Box-lookup-GSI

ID (hash key)

Timestamp (range key)

Attribute1

Attribute2

Attribute3

39

Page 40: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Best Practices

Provision enough throughput for GSI

• one update to the table may result in two writes to an index

If GSIs do not have enough write capacity, table writes

will eventually be throttled down to what the "slowest"

index can consume

40

Page 41: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Debugging Throughput Issues

ProvisionedThroughputExceededException (HTTP

status code 400)

• "The level of configured provisioned throughput for one or more

global secondary indexes of the table was exceeded. Consider

increasing your provisioning level for the under-provisioned

global secondary indexes with the UpdateTable API"

41

Page 42: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Debugging Throughput Issues

GSI CloudWatch Metrics

• ProvisionedReadCapacityUnits Vs ConsumedReadCapacityUnits

• ProvisionedWriteCapacityUnits Vs ConsumedWriteCapacityUnits

• ReadThrottleEvents

• WriteThrottleEvents

42

Page 43: AWS Webcast - Optimize your database for the cloud with DynamoDB – A Deep Dive into Global Secondary Indexes

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Questions

43