CC Conclude

8/2/2019 CC Conclude

http://slidepdf.com/reader/full/cc-conclude 1/22

Transactions, Concluded, and

the Future of Data Management

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

December 4, 2003

Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke



2

Final Administrivia

Project demos today and tomorrow

Final exam handed out at the end of today’s class

Finals plus project reports due by 1PM, 12/18/2003

Project reports should be ballpark 10-15 pages Remember, quality and clarity of presentation matters!

Also, email me a brief message detailing: Your contributions to the project

Your group members’ contributions and your assessment of “group dynamics”

Turn in at my office, 576 Levine Hallor to my assistant, Kathy Venit, in 308 Levine Hall



3

Last Time…

We were discussing isolation levels

How to keep transactions from interfering with oneanother

Or at least, how to minimize this

Recall the strongest version of isolation wasserializability



4

Theory of Serializability

A schedule of a set of transactions is a linear ordering of theiractions

e.g. for the simultaneous deposits example:

R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)

A serial schedule is one in which all the steps of eachtransaction occur consecutively

A serializable schedule is one which is equivalent to someserial schedule (i.e. given any initial state, the final state is thesame as one produced by some serial schedule)

The example above is neither serial nor serializable



5

Questions of Concern

Given a schedule S, is it serializable?

How can we "restrict" transactions in progress toguarantee that only serializable schedules are

produced?



6

Conflicting Actions

Consider a schedule S in which there are two consecutiveactions Ii and I j of transactions Ti and T j respectively

If Ii and I j refer to different data items, then swapping Ii and I j does not matter

If Ii and I j refer to the same data item Q, then swapping Ii andI j matters if and only if one of the actions is a write

Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q)



7

Testing for Serializability

Given a schedule S, we can construct a di-graphG=(V,E) called a precedence graph

V : all transactions in S

E : Ti T j whenever an action of Ti precedes andconflicts with an action of T j in S

Theorem:

A schedule S is conflict serializable if and only if its

precedence graph contains no cycles

Note that testing for a cycle in a digraph can bedone in time O(|V|2)



8

An Example

T1 T2 T3

R(X,Y,Z)

R(X)W(X)

R(Y)

W(Y)

R(Y)

R(X)

W(Z)

T1 T2 T3

Cyclic: Not serializable.



9

Another Example

T1 T2 T3

R(X)

W(X)

R(X)W(X)

R(Y)

W(Y)

R(Y)

W(Y)

T1 T2 T3

Acyclic: serializable



10

Producing the Equivalent Serial

Schedule

If the precedence graph for a schedule is acyclic, thenan equivalent serial schedule can be found by atopological sort of the graph

For the second example, the equivalent serial scheduleis:

R1(Y)W1(Y) R2(X)W2(X) R2(Y)W2(Y) R3(X)W3(X)



11

Locking and Serializability

We said that for a serializable schedule, atransaction must hold all locks until it terminates (acondition called strict locking)

It turns out that this is crucial to guaranteeserializability

Note that the first (bad) example could have beenproduced if transactions acquired and immediately

released locks.



12

Well-Formed, Two-Phased

Transactions

A transaction is well-formed if it acquires at leasta shared lock on Q before reading Q or anexclusive lock on Q before writing Q and doesn’t

release the lock until the action is performed Locks are also released by the end of the transaction

A transaction is two-phased if it never acquires alock after unlocking one

i.e., there are two phases: a growing phase in which thetransaction acquires locks, and a shrinking phase inwhich locks are released



13

Two-Phased Locking Theorem

If all transactions are well-formed and two-phase,then any schedule in which conflicting locks arenever granted ensures serializability

i.e., there is a very simple scheduler! However, if some transaction is not well-formed or

two-phase, then there is some schedule in whichconflicting locks are never granted but which fails

to be serializable i.e., one bad apple spoils the bunch.



14

Summary of Transactions

Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in thesystem

Theoretically, the “correct” execution of transactions is serializable (i.e. equivalent to someserial execution)

Practically, this may adversely affect throughput isolation levels

With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate



15

What to Look for Down the Road

… well, no one really knows the answer to this…

… But here are some hints, ideas, and hot directions

Sensors and streaming data

Peer-to-peer meets databases

“The Semantic Web”

Collaborative data sharing



16

Sensors and Streaming Data

No databases at all…

… Instead we have

networks of simple sensors

Madden, starting at MIT

Gehrke, Cornell

Widom, Stanford

queries are in SQL data is live and “streaming” we compute aggregates over

“windows”



17

What’s Interesting Here

We’re not talking about data on disk – we’re talking about

queries over “current readings”

Sensors are generally “stupid” and may be battery-operated

A lot of challenges are networking-related: how to aggregate databefore it gets sent, etc.

The next step (e.g., work initiated here @ Penn): includingsensors that capture images – a very different problem!

This has many more compelling applications – security, monitoring,correlating multiple sensors, rescue operations, military logistics andcoordination, etc.



18

Peer-to-Peer Computing

Fundamentally, our model of DBMSs tends to be centralized

Even for data integration: there’s a single mediator

This has many implications: central administration, centralcoordination, etc.

What can be gained from borrowing a page from peer-to-peer systems like Napster, Kazaa, etc.?

A better architecture?

Solutions to many problems unsolved by distributed DBMSs?

Replication, object location, distributed optimization, resiliency to failure,…

New types of applications, e.g., in integration?



19

P2P Work

As a new architecture for storage and querying

PIER (Berkeley), P-Grid (EPFL), Medusa (MIT)

A better way of thinking about translating and

exchanging data Piazza (Washington), Orchestra (Penn), Hyperion

(Toronto), work at Trento



20

The Semantic Web

In some ways, a very “pie-in-the-sky” vision

But some real and concrete problems might be partly solvable Goal is really very similar to data integration, where somehow we

have mappings between the schemas

Currently, most people in the SW community are fromknowledge representation community and use RDF Focus: very rich ways of describing schemas – “ontologies” – that

blend querying with class definitions

“Teachers are people who teach students”

“Tenure-track professors are teachers at universities who can gettenure”; etc.

Implicit take on the problem: if we create better languages fordescribing ontologies, it’s easier to mediate between schemas



21

Holes in the Semantic Web

What issues and concerns came up in the data integrationassignment you had?

Do you think a richer schema language would help for these?

Do you think “better normalization” would help?

Fundamentally, we need:

Languages for not only describing relationships, but transformations between formats (e.g., XML schemas)

Automatic or partly automated ways of discovering mappings and

correspondences These are all database problems, and the solution likely must come

from the DB community

This is part of what P2P systems like Piazza, Hyperion try to address



22

My Take on the Future

We’ve evolved from a world where data management isabout controlling the data

Instead, data management is about translating andtransforming data using declarative languages It should ultimately become much like TCP or SOAP – a set of

standard services for “getting stuff” from one point to another, orfrom one form to another

It’s the plumbing that connects different applications using differentformats

Orchestra project at Penn: focuses on how to build a

system for supporting collaborative science People publish and map data in different schemas

What happens if people start updating it? How do you propagate, manage, trace, reconcile changes?

Documents

CC Conclude