138

Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely
Page 2: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Parallel DatabasesCOMP3211 Advanced Databases

Dr Nicholas Gibbins - [email protected]

Page 3: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

3

Overview

3

• The I/O bottleneck

• Parallel architectures

• Parallel query processing• Inter-operator parallelism

• Intra-operator parallelism

• Bushy parallelism

• Concurrency control

• Reliability

Page 4: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

The I/O Bottleneck

Page 5: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

5

The Memory Hierarchy, Revisited

5

Type Capacity Latency

Registers 101 bytes 1 cycle

L1 104 bytes <5 cycles

L2 105 bytes 5-10 cycles

RAM 109-1010 bytes 20-30 cycles (10-8 s)

Hard Disk 1011-1012 bytes 106 cycles (10-3 s)

Page 6: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

6

The I/O Bottleneck

6

Access time to secondary storage (hard disks) dominates performance of DBMSes

Two approaches to addressing this:• Main memory databases (expensive!)

• Parallel databases (cheaper!)

Increase I/O bandwidth by spreading data across a number of disks

Page 7: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

7

Definitions

7

Parallelism• An arrangement or state that permits several operations or tasks to be performed

simultaneously rather than consecutively

Parallel Databases• have the ability to split:

• processing of data

• access to data

• across multiple processors, multiple disks

Page 8: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

8

Why Parallel Databases?

8

• Hardware trends

• Reduced elapsed time for queries

• Increased transaction throughput

• Increased scalability

• Better price/performance

• Improved application availability

• Access to more data

• in short, for better performance

Page 9: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Parallel Architectures

Page 10: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

10

Shared Memory Architecture• Tightly coupled

• Symmetric Multiprocessor (SMP)

P = processor

M = memory (for buffer pool)

10

P

Global Memory

PP

Page 11: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

11

Software – Shared Memory• Less complex database software

• Limited scalability

• Single buffer

• Single database storage

11

P

Global Memory

PP

Page 12: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

12

Shared Disc Architecture• Loosely coupled

• Distributed Memory

S = switch

12

PPP

MMM

S

Page 13: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

13

Software – Shared Disc• Avoids memory bottleneck

• Same page may be in more than one buffer at once – can lead to incoherence

• Needs global locking mechanism

• Single logical database storage

• Each processor has its own database buffer

13

PPP

MMM

S

Page 14: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

14

Shared Nothing Architecture• Massively Parallel

• Loosely Coupled

• High Speed Interconnect (between processors)

14

PPP

MMM

Page 15: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

15

Software - Shared Nothing• Each processor owns part of the data

• Each processor has its own database buffer

• One page is only in one local buffer – no buffer incoherence

• Needs distributed deadlock detection

• Needs multiphase commit protocol

• Needs to break SQL requests into multiple sub-requests

15

PPP

MMM

Page 16: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

16

Hardware vs. Software Architecture

16

• It is possible to use one software strategy on a different hardware arrangement

• Also possible to simulate one hardware configuration on another• Virtual Shared Disk (VSD) makes an IBM SP shared nothing system look like a shared disc

setup (for Oracle)

• From this point on, we deal only with shared nothing

Page 17: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

17

Shared Nothing Challenges

17

• Partitioning the data

• Keeping the partitioned data balanced

• Splitting up queries to get the work done

• Avoiding distributed deadlock

• Concurrency control

• Dealing with node failure

Page 18: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Parallel Query Processing

Page 19: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

19

Dividing up the Work

19

Application

Coordinator Process

WorkerProcess

WorkerProcess

WorkerProcess

Page 20: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

20

Database Software on each node

20

App1

DBMS

W1 W2

C1

DBMS

W1 W2

App2

DBMS

W1 W2

C2

Page 21: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

21

Inter-Query Parallelism

21

Improves throughput

Different queries/transactions execute on different processors• (largely equivalent to material in lectures on concurrency)

Page 22: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

22

Intra-Query Parallelism

22

Improves response times (lower latency)

Intra-operator (horizontal) parallelism• Operators decomposed into independent operator instances, which perform the same

operation on different subsets of data

Inter-operator (vertical) parallelism• Operations are overlapped

• Pipeline data from one stage to the next without materialisation

Bushy (independent) parallelism• Subtrees in query plan executed concurrently

Page 23: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Intra-Operator Parallelism

Page 24: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

24

Intra-Operator Parallelism

24

SQL Query

SubsetQueries

SubsetQueries

SubsetQueries

SubsetQueries

Processor Processor Processor Processor

Page 25: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

25

Partitioning

25

Decomposition of operators relies on data being partitioned across the servers that comprise the parallel database

• Access data in parallel to mitigate the I/O bottleneck

Partitions should aim to spread I/O load evenly across servers

Choice of partitions affords different parallel query processing approaches:• Range partitioning

• Hash partitioning

• Schema partitioning

Page 26: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

26

Range Partitioning

26

A-H

I-P

Q-Z

Page 27: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

27

Hash Partitioning

27

Table

Page 28: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

28

Schema Partitioning

28

Table 1

Table 2

Page 29: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

29

Rebalancing Data

29

Data in proper balance

Page 30: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

30

Rebalancing Data

30

Data in proper balance

Data grows, performance drops

Page 31: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

31

Rebalancing Data

31

Data in proper balance

Data grows, performance drops

Add new nodes and disc

Page 32: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

32

Rebalancing Data

32

Data in proper balance

Data grows, performance drops

Add new nodes and disc

Redistribute data to new nodes

Page 33: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

33

Intra-Operator Parallelism

33

Example query:SELECT c1,c2 FROM t WHERE c1>5.5

Assumptions:• 100,000 rows

• Predicates eliminate 90% of the rows

Considerations for query plans:• Data shipping

• Query shipping

Page 34: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

34

Data Shipping

34

πc1,c2

σc1>5.5

t1 t2 t3 t4

Page 35: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

35

Data Shipping

35

Coordinatorand Worker

Network

Worker Worker Worker Worker

25,000 tuples 25,000 tuples 25,000 tuples 25,000 tuples

10,000 tuples (c1,c2)

Page 36: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

36

Query Shipping

36

πc1,c2

σc1>5.5

t1 t2 t3 t4

πc1,c2

σc1>5.5

πc1,c2

σc1>5.5

πc1,c2

σc1>5.5

Page 37: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

37

Query Shipping

37

Coordinator

Network

Worker Worker Worker Worker

2,500 tuples 2,500 tuples 2,500 tuples 2,500 tuples

10,000 tuples (c1,c2)

Page 38: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

38

Query Shipping Benefits

38

• Database operations are performed where the data are, as far as possible

• Network traffic is minimised

• For basic database operators, code developed for serial implementations can be reused

• In practice, mixture of query shipping and data shipping has to be employed

Page 39: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Inter-Operator Parallelism

Page 40: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

40

Inter-Operator Parallelism

40

Allows operators with a producer-consumer dependency to be executed concurrently• Results produced by producer are pipelined directly to consumer

• Consumer can start before producer has produced all results

• No need to materialise intermediate relations on disk (although available buffer memory is a constraint)

• Best suited to single-pass operators

Page 41: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

41

Inter-Operator Parallelism

41time

Scan Join Sort

Scan

Join

Sort

Page 42: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

42

Intra- + Inter-Operator Parallelism

42time

Scan Join Sort

ScanJoin

Sort

ScanScan

JoinJoin

SortSort

Page 43: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

43

The Volcano Architecture

43

Basic operators as usual:• scan, join, sort, aggregate (sum, count, average, etc)

The Exchange operator• Inserted between the steps of a query to:

• Pipeline results

• Direct streams of data to the next step(s), redistributing as necessary

Provides mechanism to support both vertical and horizontal parallelism

Page 44: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

44

Exchange Operators

44

Example query:

SELECT county, SUM(order_item)FROM customer, orderWHERE order.customer_id=customer_idGROUP BY countyORDER BY SUM(order_item)

Page 45: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

45

Exchange Operators

45

SORT

GROUP

HASHJOIN

SCAN SCAN

Customer Order

Page 46: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

46

Exchange Operators

46

EXCHANGE

SCAN SCAN

Customer

HASHJOIN

HASHJOIN

HASHJOIN

Page 47: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

47

Exchange Operators

47

EXCHANGE

SCAN SCAN

Customer

HASHJOIN

HASHJOIN

HASHJOIN

EXCHANGE

SCAN SCAN SCAN

Order

Page 48: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

48

EXCHANGE

SCAN SCAN

Customer

HASHJOIN

HASHJOIN

HASHJOIN

EXCHANGE

SCAN SCAN SCAN

Order

EXCHANGE

EXCHANGE

GROUPGROUP

SORT

48

Page 49: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Bushy Parallelism

Page 50: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

50

Bushy ParallelismExecute subtrees concurrently

50

σ

π

R S T U⨝

π

Page 51: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Parallel Query Processing

Page 52: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

52

Some Parallel Queries

52

• Enquiry

• Co-located Join

• Directed Join

• Broadcast Join

• Repartitioned Join

Combine aspects of intra-operator and bushy parallelism

Page 53: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

53

Orders Database

53

CKEY CNAME … CNATION …

OKEY DATE … CKEY …

SKEY SNAME … SNATION …

SKEY …

CUSTOMER

ORDER

SUPPLIER

Page 54: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

54

Enquiry/Query without join"How many customers live in the UK?"

Page 55: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

55

Enquiry/Query without join"How many customers live in the UK?"

1. Count matching tuples in each partition of CUSTOMER

SCAN

COUNTWorker Tasks

CUSTOMER

Page 56: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

56

Enquiry/Query without join"How many customers live in the UK?"

1. Count matching tuples in each partition of CUSTOMER

2. Pass counts to coordinator

SCAN

COUNTWorker Tasks

Coordinator

CUSTOMER

Page 57: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

57

Enquiry/Query without join"How many customers live in the UK?"

1. Count matching tuples in each partition of CUSTOMER

2. Pass counts to coordinator

3. Sum counts and return

SCAN

COUNTWorker Tasks

SUMCoordinator

CUSTOMER

Page 58: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

58

Co-located join“Which customers placed orders in July?”

Page 59: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

59

Co-located join“Which customers placed orders in July?”

ORDER, CUSTOMER partitioned on CKEY

Therefore, corresponding entries are on the same node

Page 60: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

60

Co-located join“Which customers placed orders in July?”

ORDER, CUSTOMER partitioned on CKEY

Therefore, corresponding entries are on the same node

1. Join CUSTOMER and ORDER on each partition

SCAN

JOIN

CUSTOMER ORDER

SCAN

Worker Tasks

Page 61: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

61

Co-located join“Which customers placed orders in July?”

ORDER, CUSTOMER partitioned on CKEY

Therefore, corresponding entries are on the same node

1. Join CUSTOMER and ORDER on each partition

2. Pass joined relations to coordinator

SCAN

JOIN

CUSTOMER ORDER

SCAN

Worker Tasks

Coordinator

Page 62: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

62

Co-located join“Which customers placed orders in July?”

ORDER, CUSTOMER partitioned on CKEY

Therefore, corresponding entries are on the same node

1. Join CUSTOMER and ORDER on each partition

2. Pass joined relations to coordinator

3. Take union and returnSCAN

JOIN

CUSTOMER ORDER

SCAN

Worker Tasks

Coordinator UNION

Page 63: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

63

Directed join (Parallel associative join)“Which customers placed orders in July?”

Page 64: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

64

Directed join (Parallel associative join)“Which customers placed orders in July?”

ORDER partitioned on OKEYCUSTOMER partitioned on CKEY

Page 65: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

65

Directed join (Parallel associative join)“Which customers placed orders in July?”

ORDER partitioned on OKEYCUSTOMER partitioned on CKEY

1. Scan ORDER on each partition

SCAN

WorkerTask 1

ORDER

Page 66: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

66

Directed join (Parallel associative join)“Which customers placed orders in July?”

ORDER partitioned on OKEYCUSTOMER partitioned on CKEY

1. Scan ORDER on each partition

2. Send tuples to appropriate CUSTOMER node based on ORDER.CKEY

SCAN

WorkerTask 1

WorkerTask 2

ORDER

Page 67: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

67

Directed join (Parallel associative join)“Which customers placed orders in July?”

ORDER partitioned on OKEYCUSTOMER partitioned on CKEY

1. Scan ORDER on each partition

2. Send tuples to appropriate CUSTOMER node based on ORDER.CKEY

3. Join ORDER tuples with each CUSTOMER fragment

SCANSCAN

JOIN

WorkerTask 1

WorkerTask 2

CUSTOMERORDER

Page 68: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

68

Directed join (Parallel associative join)“Which customers placed orders in July?”

ORDER partitioned on OKEYCUSTOMER partitioned on CKEY

1. Scan ORDER on each partition

2. Send tuples to appropriate CUSTOMER node based on ORDER.CKEY

3. Join ORDER tuples with each CUSTOMER fragment

4. Send joined relations to coordinatorSCAN

SCAN

JOIN

WorkerTask 1

WorkerTask 2

CUSTOMERORDER

Coordinator

Page 69: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

69

Directed join (Parallel associative join)“Which customers placed orders in July?”

ORDER partitioned on OKEYCUSTOMER partitioned on CKEY

1. Scan ORDER on each partition

2. Send tuples to appropriate CUSTOMER node based on ORDER.CKEY

3. Join ORDER tuples with each CUSTOMER fragment

4. Send joined relations to coordinator

5. Take union and return

SCANSCAN

JOIN

WorkerTask 1

WorkerTask 2

CUSTOMERORDER

Coordinator UNION

Page 70: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

70

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

Page 71: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

71

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

Page 72: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

72

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER on each partition

SCAN

WorkerTask 1

SUPPLIER

Page 73: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

73

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER on each partition

2. Send tuples to all CUSTOMER nodes

SCAN

WorkerTask 1

WorkerTask 2

SUPPLIER

broadcast

Page 74: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

74

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER on each partition

2. Send tuples to all CUSTOMER nodes

3. Join SUPPLIER tuples with each CUSTOMER fragment

SCANSCAN

JOIN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

broadcast

Page 75: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

75

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER on each partition

2. Send tuples to all CUSTOMER nodes

3. Join SUPPLIER tuples with each CUSTOMER fragment

4. Send joined relations to coordinator

SCANSCAN

JOIN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

Coordinator

broadcast

Page 76: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

76

Broadcast join (Parallel nested loop join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER on each partition

2. Send tuples to all CUSTOMER nodes

3. Join SUPPLIER tuples with each CUSTOMER fragment

4. Send joined relations to coordinator

5. Take union and return

SCANSCAN

JOIN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

Coordinator UNION

broadcast

Page 77: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

77

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

Page 78: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

78

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

Page 79: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

79

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER, CUSTOMER

SCAN SCAN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

Page 80: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

80

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER, CUSTOMER

2. Repartition on *NATION and send to appropriate worker for Task 3

SCAN SCAN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

WorkerTask 3

Page 81: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

81

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER, CUSTOMER

2. Repartition on *NATION and send to appropriate worker for Task 3

3. Join SUPPLIER and CUSTOMER tuples SCAN SCAN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

JOINWorkerTask 3

Page 82: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

82

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER, CUSTOMER

2. Repartition on *NATION and send to appropriate worker for Task 3

3. Join SUPPLIER and CUSTOMER tuples

4. Send joined relations to coordinator

SCAN SCAN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

Coordinator

JOINWorkerTask 3

Page 83: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

83

Repartitioned join (Parallel hash join)“Which customers and suppliers are in the same country?”

SUPPLIER partitioned on SKEYCUSTOMER partitioned on CKEYJoin on CNATION=SNATION

1. Scan SUPPLIER, CUSTOMER

2. Repartition on *NATION and send to appropriate worker for Task 3

3. Join SUPPLIER tuples CUSTOMER tuples

4. Send joined relations to coordinator

5. Take union and return

SCAN SCAN

WorkerTask 1

WorkerTask 2

CUSTOMERSUPPLIER

Coordinator UNION

JOINWorkerTask 3

Page 84: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Concurrency Control

Page 85: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

85

Concurrency and Parallelism

85

• A single transaction may update data in several different places

• Multiple transactions may be using the same (distributed) tables simultaneously

• One or several nodes could fail

• Requires concurrency control and recovery across multiple nodes for:• Locking and deadlock detection

• Two-phase commit to ensure ‘all or nothing’

Page 86: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

86

Locking and Deadlocks

86

• With Shared Nothing architecture, each node is responsible for locking its own data

• No global locking mechanism

• However:• T1 locks item A on Node 1 and wants item B on Node 2

• T2 locks item B on Node 2 and wants item A on Node 1

• Distributed Deadlock

Page 87: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

87

Resolving Deadlocks

87

Simple approach – Timeouts

1. Timeout T2, after wait exceeds a certain interval• Interval may need random element to avoid ‘chatter’

i.e. both transactions give up at the same time and then try again

2. Rollback T2 to let T1 to proceed

3. Restart T2, which can now complete

Page 88: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

88

Resolving Deadlocks

88

More sophisticated approach (used by DB2)

• Each node maintains a local ‘wait-for’ graph

• Distributed deadlock detector (DDD) runs at the catalogue node for each database

• Periodically, all nodes send their graphs to the DDD

• DDD records all locks found in wait state

• Transaction becomes a candidate for termination if found in same lock wait state on two successive iterations

Page 89: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Reliability

Page 90: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

90

Reliability

90

We wish to preserve the ACID properties for parallelised transactions• Isolation is taken care of by 2PL protocol

• Isolation implies Consistency

• Durability can be taken care of node-by-node, with proper logging and recovery routines

• Atomicity is the hard part. We need to commit all parts of a transaction, or abort all parts

Two-phase commit protocol (2PC) is used to ensure that Atomicity is preserved

Page 91: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

91

Two-Phase Commit (2PC)

91

Distinguish between:• The global transaction• The local transactions into which the global transaction is decomposed

Global transaction is managed by a single site, known as the coordinator

Local transactions may be executed on separate sites, known as the participants

Page 92: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

92

Phase 1: Voting

92

• Coordinator sends “prepare T” message to all participants

• Participants respond with either “vote-commit T” or “vote-abort T”

• Coordinator waits for participants to respond within a timeout period

Page 93: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

93

Phase 2: Decision

93

• If all participants return “vote-commit T” (to commit), send “commit T” to all participants. Wait for acknowledgements within timeout period.

• If any participant returns “vote-abort T”, send “abort T” to all participants. Wait for acknowledgements within timeout period.

• When all acknowledgements received, transaction is completed.

• If a site does not acknowledge, resend global decision until it is acknowledged.

Page 94: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

94

Normal Operation

94

C P

Page 95: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

95

Normal Operation

95

C Pprepare T

Page 96: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

96

Normal Operation

96

C Pprepare T

vote-commit T

Page 97: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

97

Normal Operation

97

C Pprepare T

vote-commit T

Voting Phase

Page 98: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

98

Normal Operation

98

C Pprepare T

vote-commit T

vote-commit Treceived from allparticipants

Page 99: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

99

Normal Operation

99

C Pprepare T

vote-commit T

commit T

vote-commit Treceived from allparticipants

Page 100: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

100

Normal Operation

100

C Pprepare T

vote-commit T

commit T

ack

vote-commit Treceived from allparticipants

Page 101: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

101

Normal Operation

101

C Pprepare T

vote-commit T

commit T

ack

vote-commit Treceived from allparticipants

Decision Phase

Page 102: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

102

Logging

102

C Pprepare T

vote-commit T

commit T

ack

<commit T>

<begin-commit T>

<end T>

<ready T>

<commit T>

vote-commit Treceived from allparticipants

Page 103: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

103

Aborted Transaction

103

C Pprepare T

vote-commit T

abort T

ack

<abort T>

<begin-commit T>

<end T>

<ready T>

<abort T>

vote-abort T received from at least one participant

Page 104: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

104

Aborted Transaction

104

C Pprepare T

vote-abort T

abort T

ack

<abort T>

<begin-commit T>

<end T>

<abort T>

P

vote-abort T received from at least one participant

Page 105: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

105

State Transitions

105

C Pprepare T

vote-commit T

commit T

ack

vote-commit Treceived from allparticipants

INITIAL

WAIT

COMMIT

INITIAL

READY

COMMIT

Page 106: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

106

State TransitionsC P

prepare T

vote-commit T

abort T

ack

vote-abort T received from at least one participant

INITIAL

WAIT

ABORT

INITIAL

READY

ABORT

Page 107: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

107

State TransitionsC P

prepare T

vote-abort T

abort T

ack

P

INITIAL

WAIT

ABORT

INITIAL

ABORT

Page 108: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

108

Coordinator State Diagram

sent: prepare T

recv: vote-commit Tsent: commit T

INITIAL

WAIT

COMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack recv: ack

Page 109: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

109

Participant State Diagram

recv: prepare Tsent: vote-commit T

recv: commit Tsend: ack

INITIAL

READY

COMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

Page 110: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

110

Dealing with failuresIf the coordinator or a participant fails during the commit, two things happen:

• The other sites will time out while waiting for the next message from the failed site and invoke a termination protocol

• When the failed site restarts, it tries to work out the state of the commit by invoking a recovery protocol

The behaviour of the sites under these protocols depends on the state they were in when the site failed

Page 111: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

111

Termination Protocol: CoordinatorTimeout in WAIT

• Coordinator is waiting for participants to vote on whether they're going to commit or abort

• A missing vote means that the coordinator cannot commit the global transaction

• Coordinator may abort the global transaction

sent: prepare T

recv: vote-commit Tsent: commit T

INITIAL

WAIT

COMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack recv: ack

Page 112: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

112

Termination Protocol: CoordinatorTimeout in COMMIT/ABORT

• Coordinator is waiting for participants to acknowledge successful commit or abort

• Coordinator resends global decision to participants who have not acknowledged

sent: prepare T

recv: vote-commit Tsent: commit T

INITIAL

WAIT

COMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack recv: ack

Page 113: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

113

Termination Protocol: ParticipantTimeout in INITIAL

• Participant is waiting for a “prepare T”

• May unilaterally abort the transaction after a timeout

• If “prepare T” arrives after unilateral abort, either:• resend the “vote-abort T” message or • ignore (coordinator then times out in

WAIT)

recv: prepare Tsent: vote-commit T

recv: commit Tsend: ack

INITIAL

READY

COMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

Page 114: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

114

Termination Protocol: ParticipantTimeout in READY

• Participant is waiting for the instruction to commit or abort – blocked without further information

• Alternatively, use cooperative termination protocol – contact other participants to find one who knows the decision

recv: prepare Tsent: vote-commit T

recv: commit Tsend: ack

INITIAL

READY

COMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

Page 115: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

115

Cooperative Termination ProtocolAssumes that participants are aware of each other

• Coordinator sends list of participants with "prepare T"

If a participant P times out while waiting for the global decision, it contacts the other participants to see if they know it

Response from the other participant depends on their state and any vote they've sent:• INITIAL – hasn't yet voted, so unilaterally aborts by sending "abort T"

• READY – voted to abort, so sends "abort T"

• READY – voted to commit, but doesn't know the global decision, so sends "uncertain T"

• ABORT/COMMIT – knows the global decision, so sends "commit T" or "abort T"

If all participants return "uncertain T", then P remains blocked

Page 116: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

116

Recovery Protocol: CoordinatorFailure in INITIAL

• Commit not yet begun, restart commit procedure

sent: prepare T

recv: vote-commit Tsent: commit T

INITIAL

WAIT

COMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack recv: ack

Page 117: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

117

Recovery Protocol: CoordinatorFailure in WAIT

• Coordinator has sent “prepare T”, but has not yet received all vote-commit/vote-abort messages from participants

• Recovery restarts commit procedure by resending “prepare T”

sent: prepare T

recv: vote-commit Tsent: commit T

INITIAL

WAIT

COMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack recv: ack

Page 118: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

118

Recovery Protocol: CoordinatorFailure in COMMIT/ABORT

• If coordinator has received all “ack” messages, complete successfully

• Otherwise, invoke terminate protocol(i.e. resend global decision)

sent: prepare T

recv: vote-commit Tsent: commit T

INITIAL

WAIT

COMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack recv: ack

Page 119: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

119

Recovery Protocol: ParticipantFailure in INITIAL

• Participant has not yet voted

• Coordinator cannot have reached a decision

• Participant should unilaterally abort by sending “vote-abort T”

(what was the coordinator doing while theparticipant was down?)

recv: prepare Tsent: vote-commit T

recv: commit Tsend: ack

INITIAL

READY

COMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

Page 120: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

120

Recovery Protocol: ParticipantFailure in READY

• Participant has voted, but doesn't know what the global decision was

• Treat as a timeout in READY(use cooperative termination protocol)

recv: prepare Tsent: vote-commit T

recv: commit Tsend: ack

INITIAL

READY

COMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

Page 121: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

121

Recovery Protocol: ParticipantFailure in COMMIT/ABORT

• “ack” message has been sent

• Participant need take no action

recv: prepare Tsent: vote-commit T

recv: commit Tsend: ack

INITIAL

READY

COMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

Page 122: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

2PC Variants

Page 123: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

123

2PC PerformanceCosts associated with 2PC:

• Number of messages transmitted between coordinator and participants

• Number of times that logs are accessed

We can improve the performance of 2PC if we can reduce either of these• Coordinator keeps state information about current transactions in memory

(doesn't need to consult logs)

Two proposed approaches:• Presumed-Abort

• Presumed-Commit

Page 124: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

124

Presumed-AbortImproves performance by letting the coordinator forget about transactions (remove them from memory) in certain circumstances

If the global decision was to abort the transaction, write <abort T> to log and forget T• If a participant asks the coordinator about the global decision and it isn't in memory, tell

the participant that the transaction was aborted

• Coordinator doesn't need to wrie <end T>

If the global decision was to commit the transaction, only forget it and write <commit T> and <end T> to log once all "ack" messages have been received from participants

Page 125: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

125

Presumed-CommitAssumes that, if no information about a transaction is in memory, it must have been committed

If the global decision is to commit, coordinator writes <commit T> to log, sends "commit T" and forgets the transaction

If the global decision is to abort, coordinator writes <abort T> to log and sends "abort T"

Only writes <end T> and forgets T when all "ack" messages have been received

Page 126: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Three-Phase Commit

Page 127: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

127

Three-Phase CommitAs we saw earlier, 2PC can still block in certain circumstances

• Participant times out in READY and is unable to find out the global decision

3PC is non-blocking in the event of site failure (but not network partition)

Adds an additional state between WAIT/READY and COMMIT• PRECOMMIT – process is ready to commit but has not yet committed

Some changes to termination and recovery protocols from 2PC

Page 128: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

128

Coordinator State Diagram

sent: prepare T

recv: vote-commit Tsent: prepare-commit T

INITIAL

WAIT

PRECOMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack

recv: ackCOMMIT

recv: ready-commit Tsent: commit T

Page 129: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

129

Participant State Diagram

recv: prepare Tsent: vote-commit T

recv: prepare-commit Tsend: ready-commit

INITIAL

READY

PRECOMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

COMMIT

recv: commit Tsend: ack

Page 130: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

130

3PC Termination Protocol: CoordinatorTimeout in PRECOMMIT

• Coordinator does not if non-responding participants have moved to PRECOMMIT, but it does know that they're all in READY at least (so have all voted to commit)

• Move all participants to PRECOMMIT by sending "prepare-commit T", then send "commit T"

sent: prepare T

recv: vote-commit Tsent: prepare-commit T

INITIAL

WAIT

PRECOMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack

recv: ackCOMMIT

recv: ready-commit Tsent: commit T

Page 131: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

131

3PC Termination Protocol: CoordinatorTimeout in COMMIT/ABORT

• Coordinator does not know if participants have performed the commit or abort, but knows that they are in either PRECOMMIT or READY

• Participants follow their own recovery protocols

sent: prepare T

recv: vote-commit Tsent: prepare-commit T

INITIAL

WAIT

PRECOMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack

recv: ackCOMMIT

recv: ready-commit Tsent: commit T

Page 132: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

132

3PC Termination Protocol: ParticipantTimeout in READY

• Participant has voted to commit, butdoes not know the global decision

• Elects a new coordinator, and proceeds according to its state:• WAIT – new coordinator globally aborts

• PRECOMMIT – new coordinator globallycommits

• ABORT – all participants will also moveinto ABORT

recv: prepare Tsent: vote-commit T

recv: prepare-commit Tsend: ready-commit

INITIAL

READY

PRECOMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

COMMIT

recv: commit Tsend: ack

Page 133: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

133

3PC Recovery Protocol: CoordinatorFailure in WAIT

• Participants will have already terminatedthe transaction due to termination protocol

• Coordinator needs to ask participantsfor outcome

sent: prepare T

recv: vote-commit Tsent: prepare-commit T

INITIAL

WAIT

PRECOMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack

recv: ackCOMMIT

recv: ready-commit Tsent: commit T

Page 134: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

134

3PC Recovery Protocol: CoordinatorFailure in PRECOMMIT

• Participants will have already terminated the transaction due to termination protocol

• Coordinator needs to ask participants for outcome

sent: prepare T

recv: vote-commit Tsent: prepare-commit T

INITIAL

WAIT

PRECOMMIT ABORT

recv: vote-abort Tsent: abort T

recv: ack

recv: ackCOMMIT

recv: ready-commit Tsent: commit T

Page 135: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

135

3PC Recover Protocol: ParticipantFailure in PRECOMMIT

• Participant must ask to determine howother participants have terminated the transaction

recv: prepare Tsent: vote-commit T

recv: prepare-commit Tsend: ready-commit

INITIAL

READY

PRECOMMIT ABORT

recv: prepare Tsent: vote-abort T

recv: abort Tsend: ack

COMMIT

recv: commit Tsend: ack

Page 136: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Parallel Utilities

Page 137: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

137

Parallel UtilitiesAncillary operations can also exploit the parallel hardware

• Parallel Data Loading/Import/Export

• Parallel Index Creation

• Parallel Rebalancing

• Parallel Backup

• Parallel Recovery

Page 138: Parallel Databases - University of Southamptonedshare.soton.ac.uk/20908/2/08a - Parallel Databases.pdf · 2021. 4. 16. · 14 Shared Nothing Architecture •Massively Parallel •Loosely

Next Lecture: Distributed Databases