22
Transactions , Concluded , and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550  Database & Information Systems December 4, 2003 Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke 

CC Conclude

  • Upload
    ashish

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 1/22

Transactions, Concluded, and

the Future of Data Management

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

December 4, 2003

Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke 

Page 2: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 2/22

2

Final Administrivia

Project demos today and tomorrow

Final exam handed out at the end of today’s class 

Finals plus project reports due by 1PM, 12/18/2003

Project reports should be ballpark 10-15 pages Remember, quality and clarity of presentation matters!

Also, email me a brief message detailing: Your contributions to the project

Your group members’ contributions and your assessment of “group dynamics” 

Turn in at my office, 576 Levine Hallor to my assistant, Kathy Venit, in 308 Levine Hall

Page 3: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 3/22

3

Last Time… 

We were discussing isolation levels

How to keep transactions from interfering with oneanother

Or at least, how to minimize this

Recall the strongest version of isolation wasserializability  

Page 4: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 4/22

4

Theory of Serializability

A schedule of a set of transactions is a linear ordering of theiractions

e.g. for the simultaneous deposits example:

R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)

A serial schedule is one in which all the steps of eachtransaction occur consecutively

A serializable schedule is one which is equivalent to someserial schedule (i.e. given any initial state, the final state is thesame as one produced by some serial schedule)

The example above is neither serial nor serializable

Page 5: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 5/22

5

Questions of Concern

Given a schedule S, is it serializable?

How can we "restrict" transactions in progress toguarantee that only serializable schedules are

produced?

Page 6: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 6/22

6

Conflicting Actions

Consider a schedule S in which there are two consecutiveactions Ii and I j of transactions Ti and T j respectively

If Ii and I j refer to different data items, then swapping Ii and I j does not matter

If Ii and I j refer to the same data item Q, then swapping Ii andI j matters if and only if one of the actions is a write

Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q)

Page 7: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 7/22

7

Testing for Serializability

Given a schedule S, we can construct a di-graphG=(V,E) called a precedence graph 

V : all transactions in S

E : Ti  T j whenever an action of Ti precedes andconflicts with an action of T j in S

Theorem:

A schedule S is conflict serializable if and only if its

precedence graph contains no cycles

Note that testing for a cycle in a digraph can bedone in time O(|V|2)

Page 8: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 8/22

8

An Example

T1 T2 T3

R(X,Y,Z)

R(X)W(X)

R(Y)

W(Y)

R(Y)

R(X)

W(Z)

T1 T2 T3

Cyclic: Not serializable.

Page 9: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 9/22

9

Another Example

T1 T2 T3

R(X)

W(X)

R(X)W(X)

R(Y)

W(Y)

R(Y)

W(Y)

T1 T2 T3

Acyclic: serializable

Page 10: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 10/22

10

Producing the Equivalent Serial

Schedule

If the precedence graph for a schedule is acyclic, thenan equivalent serial schedule can be found by atopological sort of the graph

For the second example, the equivalent serial scheduleis:

R1(Y)W1(Y) R2(X)W2(X) R2(Y)W2(Y) R3(X)W3(X)

Page 11: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 11/22

11

Locking and Serializability

We said that for a serializable schedule, atransaction must hold all locks until it terminates (acondition called strict locking)

It turns out that this is crucial to guaranteeserializability

Note that the first (bad) example could have beenproduced if transactions acquired and immediately

released locks.

Page 12: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 12/22

12

 Well-Formed, Two-Phased

Transactions

A transaction is well-formed if it acquires at leasta shared lock on Q before reading Q or anexclusive lock on Q before writing Q and doesn’t

release the lock until the action is performed Locks are also released by the end of the transaction

A transaction is two-phased if it never acquires alock after unlocking one

i.e., there are two phases: a growing phase in which thetransaction acquires locks, and a shrinking phase inwhich locks are released

Page 13: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 13/22

13

Two-Phased Locking Theorem

If all transactions are well-formed and two-phase,then any schedule in which conflicting locks arenever granted ensures serializability

i.e., there is a very simple scheduler! However, if some transaction is not well-formed or

two-phase, then there is some schedule in whichconflicting locks are never granted but which fails

to be serializable i.e., one bad apple spoils the bunch.

Page 14: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 14/22

14

Summary of Transactions

Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in thesystem

Theoretically, the “correct” execution of transactions is serializable (i.e. equivalent to someserial execution)

Practically, this may adversely affect throughput  isolation levels

With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate 

Page 15: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 15/22

15

 What to Look for Down the Road

… well, no one really knows the answer to this… 

… But here are some hints, ideas, and hot directions 

Sensors and streaming data

Peer-to-peer meets databases

“The Semantic Web” 

Collaborative data sharing

Page 16: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 16/22

16

Sensors and Streaming Data

No databases at all… 

… Instead we have

networks of simple sensors

Madden, starting at MIT

Gehrke, Cornell

Widom, Stanford

queries are in SQL data is live and “streaming”  we compute aggregates over

“windows” 

Page 17: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 17/22

17

 What’s Interesting Here 

We’re not talking about data on disk –  we’re talking about

queries over “current readings” 

Sensors are generally “stupid” and may be battery-operated

A lot of challenges are networking-related: how to aggregate databefore it gets sent, etc.

The next step (e.g., work initiated here @ Penn): includingsensors that capture images  – a very different problem!

This has many more compelling applications – security, monitoring,correlating multiple sensors, rescue operations, military logistics andcoordination, etc.

Page 18: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 18/22

18

Peer-to-Peer Computing

Fundamentally, our model of DBMSs tends to be centralized

Even for data integration: there’s a single mediator 

This has many implications: central administration, centralcoordination, etc.

What can be gained from borrowing a page from peer-to-peer systems like Napster, Kazaa, etc.?

A better architecture?

Solutions to many problems unsolved by distributed DBMSs?

Replication, object location, distributed optimization, resiliency to failure,… 

New types of applications, e.g., in integration?

Page 19: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 19/22

19

P2P Work 

As a new architecture for storage and querying

PIER (Berkeley), P-Grid (EPFL), Medusa (MIT)

A better way of thinking about translating and

exchanging data Piazza (Washington), Orchestra (Penn), Hyperion

(Toronto), work at Trento

Page 20: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 20/22

20

The Semantic Web

In some ways, a very “pie-in-the-sky” vision 

But some real and concrete problems might be partly solvable Goal is really very similar to data integration, where somehow we

have mappings between the schemas

Currently, most people in the SW community are fromknowledge representation community and use RDF Focus: very rich ways of describing schemas –  “ontologies” – that

blend querying with class definitions

“Teachers are people who teach students” 

“Tenure-track professors are teachers at universities who can gettenure”; etc. 

Implicit take on the problem: if we create better languages fordescribing ontologies, it’s easier to mediate between schemas 

Page 21: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 21/22

21

Holes in the Semantic Web

What issues and concerns came up in the data integrationassignment you had?

Do you think a richer schema language would help for these?

Do you think “better normalization” would help? 

Fundamentally, we need:

Languages for not only describing relationships, but transformations between formats (e.g., XML schemas)

Automatic or partly automated ways of discovering mappings and

correspondences These are all database problems, and the solution likely must come

from the DB community

This is part of what P2P systems like Piazza, Hyperion try to address

Page 22: CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 22/22

22

My Take on the Future

We’ve evolved from a world where data management isabout controlling the data

Instead, data management is about translating andtransforming data using declarative languages It should ultimately become much like TCP or SOAP – a set of 

standard services for “getting stuff” from one point to another, orfrom one form to another

It’s the plumbing that connects different applications using differentformats

Orchestra project at Penn: focuses on how to build a

system for supporting collaborative science People publish and map data in different schemas

What happens if people start updating it? How do you propagate, manage, trace, reconcile changes?