19
An Introduction to Relational Algebra James McMurray PhD Student Department of Empirical Inference 09/12/2014

Introduction to Relational Algebra

Embed Size (px)

Citation preview

Page 1: Introduction to Relational Algebra

An Introduction to Relational Algebra

James McMurray

PhD StudentDepartment of Empirical Inference

09/12/2014

Page 2: Introduction to Relational Algebra

Outline of talk

IntroductionWhat is Relational Algebra?

Relational Algebra operatorsCore operatorsJoinsSet operatorsRename operator

ConclusionConclusion

Page 3: Introduction to Relational Algebra

What is Relational Algebra?

I Algebra for modelling and querying data stored in relationaldatabases.

I A relational database consists of a set of named relations,which contain named and typed attributes, and the datastored in tuples.

I A relation is a table.

I An attribute is a column.

I A tuple is a row.

I Relational Algebra queries are compositional, so any queryreturns a relation, which can in turn be queried.

Page 4: Introduction to Relational Algebra

Database schema

I The schema of the database is the names of the relation,names and types of attributes - the structure of the database.

I Usually, the schema is fixed in advance, and the datainstances may change with time.

I All attributes are typed - i.e. Ints, strings, etc., theimplementation depends on the specific databasemanagement system (DBMS).

I In addition to typed values, there exists the null value which isused for missing values.

I A key is an attribute which is unique for every tuple (usuallyan ID, or row ID), this is important for indexing the relationsin practice.

I Note there is a debate over where relations (tables) shouldhave singular or plural names. Ruby on Rails uses plurals,Django uses singular names.

Page 5: Introduction to Relational Algebra

Example database

DVD

movie rating

‘Edge Of Tomorrow’ 4‘Elysium’ 9‘Gattaca’ 8

A relation DVD, with the attributes movie (String) and rating (Int).

Store

name movie price

‘CheapBay’ ‘Edge Of Tomorrow’ 10‘DVD World’ ‘Elysium’ 20‘DVD World’ ‘Gattaca’ 10

‘Prime’ ‘Edge Of Tomorrow’ 20‘Prime’ ‘Elysium’ 15

A relation Store, with the attributes name (String), movie (String) and price (Int)

Page 6: Introduction to Relational Algebra

Relational Algebra operators

I The simplest query is just the relation name, which returnsthe relation (table).

I Operators act on this to filter, slice and combine the data.

I The projection operator, Π[attr ], returns the relation with onlythe chosen attributes, from the relation to which it is applied.

I The selection operator, σ[cond ], returns a relation with onlythe tuples matching the conditions provided, from the relationto which it is applied.

I Note we use ∧ for AND, and ∨ for OR.

Page 7: Introduction to Relational Algebra

Example of projection and selection

DVD

movie rating

‘Edge Of Tomorrow’ 4‘Elysium’ 9‘Gattaca’ 8

ΠratingDVD =rating

498

σrating>=5DVD =movie rating

‘Elysium’ 9‘Gattaca’ 8

Page 8: Introduction to Relational Algebra

Example of combination

DVD

Movie Rating

‘Edge Of Tomorrow’ 4‘Elysium’ 9‘Gattaca’ 8

Πmovie (σrating>=5DVD) =movie

‘Elysium’‘Gattaca’

Page 9: Introduction to Relational Algebra

Relational Algebra operators (contd.)

I The tuple cross-product, ×, returns a relation with everycombination of the tuples.

I The natural join, ./, is the cross-product combined with theselection operator to match tuples according to sharedattribute names.

I Note any relation natural joined with itself, returns the samerelation itself.

I Note the natural join just syntactic sugar for combining thecross-product and a specific selection operator.

I The Theta join, ./θ, is the cross-product combined with ageneral selection operator.

I This is also just syntactic sugar, and is the standard JOIN ONoperator in DBMSs.

Page 10: Introduction to Relational Algebra

Example database

DVD

movie rating

‘Edge Of Tomorrow’ 4‘Elysium’ 9‘Gattaca’ 8

A relation DVD, with the attributes movie (String) and rating (Int)

Store

name movie price

‘CheapBay’ ‘Edge Of Tomorrow’ 10‘DVD World’ ‘Elysium’ 20‘DVD World’ ‘Gattaca’ 10

‘Prime’ ‘Edge Of Tomorrow’ 20‘Prime’ ‘Elysium’ 15

A relation Store, with the attributes name (String), movie (String) and price (Int)

Page 11: Introduction to Relational Algebra

Example cross-product

I The cross-product returns every possible combination of thetuples.

(ΠnameStore)× (ΠMovieDVD) =

name movie

‘CheapBay’ ‘Edge Of Tomorrow’‘CheapBay’ ‘Elysium’‘CheapBay’ ‘Gattaca’

‘DVD World’ ‘Edge Of Tomorrow’‘DVD World’ ‘Elysium’‘DVD World’ ‘Gattaca’

‘Prime’ ‘Edge Of Tomorrow’‘Prime’ ‘Elysium’‘Prime’ ‘Gattaca’

Page 12: Introduction to Relational Algebra

Example join

I The natural join combines the tuples in the relations, wherethe joint attribute (movie) is equal.

Store ./ DVD =

name movie price rating

‘CheapBay’ ‘Edge Of Tomorrow’ 10 4‘DVD World’ ‘Elysium’ 20 9‘DVD World’ ‘Gattaca’ 10 8

‘Prime’ ‘Edge Of Tomorrow’ 20 4‘Prime’ ‘Elysium’ 15 9

Page 13: Introduction to Relational Algebra

Relational Algebra operators (contd.)

I The union operator, ∪, concatenates/stacks the tuples.

I The difference operator, −, subtracts the second set of tuplesfrom the first .

I i.e. A− B returns tuples in A but not in B.

I The intersection operator, ∩, returns the tuples present inboth sets.

I Note the intersection operator is just syntactic sugar, we canalways write:

A ∩ B = A− (A− B)

Page 14: Introduction to Relational Algebra

Example difference

I We want to find all stores which only sell movies for less than12 dollars.

I We can do this by finding all stores which sell any movie for12 dollars or more, and then subtracting this from the set ofall stores.

(ΠnameStore) − (Πname(σprice>=12Store)) =name

‘CheapBay’

Page 15: Introduction to Relational Algebra

Relational Algebra operators (contd.)

I The rename operator, ρ[newnames], returns the relation with theattributes assigned to the new names.

I This is not just syntactic sugar.

I Technically we need attribute names to be the same to carryout the natural join.

I More importantly, it is necessary for disambiguation inself-joins.

ρnewname ΠratingDVD =newname

498

Page 16: Introduction to Relational Algebra

Self-join example

I We want to find every distinct pair of names of movies whichhave been rated - i.e. (‘Edge Of Tomorrow’, ‘Elysium’)

I In order to do this we have to use a self-join, and the renameoperator is critical (also note the importance of the greaterthan condition):

σmovie>movie2 ((ΠmovieDVD)× (ρmovie2 ΠmovieDVD)) =

movie movie2

‘Edge Of Tomorrow’ ‘Elysium’‘Edge Of Tomorrow’ ‘Gattaca’

‘Elysium’ ‘Gattaca’

Page 17: Introduction to Relational Algebra

More complicated example

I We want to find the maximum rating of all films sold byPrime.

I The trick we use is to find the set of ratings which are everless than another rating (via a self-join), and then subtractthis from the set of all ratings of films sold by Prime (denotedby PrimeRatings).

PrimeRatings := Πrating ((σname=‘prime’Store) ./ DVD)

Result := PrimeRatings −Πrating σrating<rating2( PrimeRatings× ρrating2PrimeRatings )

=rating

9

Page 18: Introduction to Relational Algebra

Even harder example

I We want to find the names of all the stores which sell ALL ofthe films which have ratings higher than 7.

I Trick is to calculate all possible, relevant store-movie tuples,then calculate the “could have been” tuples which were notobserved, then find which stores these were present for, andsubtract this from all the stores.

I This is Relational Division.

GoodMovie := Πmovie σrating>7DVD

Result :=ΠnameStore− Πname(((ΠnameStore)× GoodMovie)

−Πname,movieStore)

=name

‘DVD World’

Page 19: Introduction to Relational Algebra

Conclusion

I Relational algebra defines an algebra for querying relationaldatabases.

I Operators may seem simple but queries can becomecomplicated.

I Many other common extensions have been omitted here, suchas the semijoin, antijoin and relational division - these are allsyntactic sugar.

I In practice we do not query relational databases with relationalalgebra directly, but with a query language based upon it.

I The most popular is SQL, the Structured Query Language.

I Though there is a Relational Algebra interpeter called RA forSQLite (used for these examples).

Questions?