Upload
ngominh
View
214
Download
0
Embed Size (px)
Citation preview
LastTime:SQLJoinExpressions� Lasttime,begandiscussingSQLjoinsyntax� OriginalSQLform:
� SELECT…FROMt1,t2,…WHEREP� AnyjoinconditionsarespecifiedinWHEREclause
� FROMclauseproducesaCartesianproductoft1,t2,…� t1× t2× …� SchemaproducedbyFROMclauseist1.*È t2.*È…
� ANSI-standardSQL:WHEREclausemayonlyrefertothecolumnsgeneratedbytheFROMclause� AliasesinSELECTclauseshouldn’tbevisible(althoughmanydatabasesmakethemvisibleinWHEREclause)
2
SQLJoinExpressions(2)� SELECT…FROMt1,t2,…WHEREP
� t1× t2× …� SchemaofFROMclauseist1.*È t2.*È…(inthatorder)
� Toavoidambiguity,columnnamesinschemaalsoincludecorrespondingtablenames,e.g.t1.a,t1.b,t2.a,t2.c,etc.� Ifcolumnnameisunambiguous,predicatecanjustusecolumnnamebyitself
� Ifcolumnnameisambiguous,predicatemustspecifybothtablenameandcolumnname
� Example:SELECT*FROMt1,t2WHEREa>5ANDc =20;� Notvalid:columnnameaisambiguous(givenaboveschema)
� Valid:SELECT*FROMt1,t2WHEREt1.a>5ANDc =20;
3
AdditionalSQLJoinSyntax� SQL-92introducedseveralnewforms:
� SELECT…FROMt1JOINt2ONt1.a=t2.a� SELECT…FROMt1JOINt2USING(a1,a2,…)� SELECT…FROMt1NATURALJOINt2� CanspecifyINNER,[LEFT|RIGHT|FULL]OUTERJOIN
� AlsoCROSSJOIN,butcannotspecifyON,USING,orNATURAL
� ONclauseisnotthatchallenging� Similartooriginalsyntax,butallowsinner/outerjoins� Schemaof“FROMt1JOINt2ON…”ist1.*È t2.*
4
AdditionalSQLJoinSyntax(2)� USINGandNATURALjoinsaremorecomplicated
� SELECT…FROMt1JOINt2USING(a1,a2,…)� SELECT…FROMt1NATURALJOINt2� Joinconditionisinferredfromthecommoncolumnnames(NATURALJOIN),orgeneratedfromtheUSINGclause
� Alsoincludesaprojecttoeliminateduplicatecolumnnames(projectispartoftheFROMclause;affectsWHEREpredicate)
� ForSELECT*FROMt1NATURALJOINt2,orSELECT*FROMt1JOINt2USING(a1,a2,…):� DenotethejoincolumnsasJC.Thesehavenotablename.
� Fornaturaljoin,JC=t1Ç t2;otherwise,JC=attrs inUSINGclause� FROMclause’sschemaisJCÈ (t1– JC)È (t2– JC)
5
AdditionalSQLJoinSyntax(3)� ForSELECT*FROMt1NATURAL[???] JOINt2:
� Schemas:t1(a,b)andt2(a,c)� FROMschema:(a,t1.b,t2.c)
� Fornaturalinnerjoin:� Projectcanuseeithert1.a ort2.a togeneratea
� Fornaturalleftouterjoin:� Projectshoulduset1.a;t2.amaybeNULLforsomerows� (Similarfornaturalrightouterjoin,exceptt2.a isused)
� Fornaturalfullouterjoin:� ProjectshoulduseCOALESCE(t1.a,t2.a),sinceeithert1.aort2.a couldbeNULL
6
AdditionalSQLJoinSyntax(4)� SELECTt1.aFROMt1NATURALJOINt2
� Schemas:t1(a,b)andt2(a,c)� FROMschema:(a,t1.b,t2.c)
� ThisqueryisnotvalidundertheANSIstandard,becausethereisnot1.aoutsidetheFROMclause� Somedatabases(e.g.MySQL)willallowthisquery
� Thisqueryisvalid:� SELECTa,t2.cFROMt1NATURALJOINt2� (Technically,canalsosay“SELECTa,c”becausecwon’tbeambiguous)
7
AdditionalSQLJoinSyntax(5)� SELECT*FROMt1NATURALJOINt2NATURALJOINt3
� Schemas:t1(a,b),t2(a,c),t3(a,d)� FROMschema:(a,t1.b,t2.c,t3.d)
� Thisquerypresentsanotherchallenge� Step1:t1NATURALJOINt2
� Joinconditionis:t1.a =t2.a� Resultschemais(a,t1.b,t2.c)
� Step2:natural-jointhisresultwitht3� Joinconditionis:a =t3.a� Problem:column-referencea isambiguous
8
AdditionalSQLJoinSyntax(6)� SELECT*FROMt1NATURALJOINt2NATURALJOINt3
� Schemas:t1(a,b),t2(a,c),t3(a,d)� FROMschema:(a,t1.b,t2.c,t3.d)
� Generateplaceholdertablenamestoavoidambiguities� Step1(revised):t1NATURALJOINt2
� Joinconditionis:t1.a =t2.a� Resultschemais#R1(a,t1.b,t2.c)
� Step2(revised):natural-jointhisresultwitht3� Joinconditionis:#R1.a =t3.a� Resultschemais#R2(a,t1.b,t2.c,t3.d)
9
MappingSQLJoinsintoPlans� Summary:translatingSQLjoinshasitsownchallenges� Primarilycenteraroundnaturaljoins,andjoinswiththeUSINGclause:� Mustgenerateanappropriateschematoeliminateduplicatecolumns
� MustuseCOALESCE()operationsonjoin-columnsusedinfullouterjoins
� Mayneedtodealwithambiguouscolumnnameswhenmorethantwotablesarenatural-joinedtogether
� (Allsurmountable;justannoying…)
10
NestedSubqueries� SQLqueriescanalsoincludenestedsubqueries� SubqueriescanappearintheSELECTclause:
� SELECTcustomer_id,(SELECTSUM(balance)FROMloanJOINborrowerbWHEREb.customer_id=c.customer_id)tot_bal
FROMcustomerc;� (Computetotalofeachcustomer’sloanbalances)
� Mustbeascalarsubquery� Mustproduceexactlyonerowandonecolumn
� Thisisalmostalwaysacorrelatedsubquery� Innerqueryreferstoanenclosingquery’svalues� Requirescorrelatedevaluationtocomputetheresults
11
NestedSubqueries(2)� SubqueriescanalsoappearintheFROMclause:
� SELECTu.username,email,max_scoreFROMusersu,
(SELECTusername,MAX(score)ASmax_scoreFROMgame_scoresGROUPBYusername)ASs
WHEREu.username=s.username;� Calledaderivedrelation
� Thetableisproducedbyasubquery,insteadofbeingreadfromafile(a.k.a.abaserelation)
� Cannotbeacorrelatedsubquery� …atleast,notwithrespecttotheimmediatelyenclosingquery� Couldstillbecorrelatedwithaqueryfurtherout,ifparentappearsinaSELECTexpression,oraWHEREpredicate,etc.
12
NestedSubqueries(3)� SubqueriescanalsoappearintheWHEREclause:
� SELECTemployee_id,last_name,first_nameFROMemployeeseWHEREe.is_manager=0ANDEXISTS(SELECT*FROMemployeesm
WHEREm.department=e.departmentANDm.is_manager=1ANDm.salary<e.salary);
� (Findnon-manageremployeeswhomakemoremoneythansomemanagerinthesamedepartment)
� Also,IN/NOTINoperators,ANY/SOME/ALLqueries,andscalarsubqueriesaswell
� Again,couldbeacorrelatedsubquery,andoftenis.L
13
NestedSubqueries(4)� Previousexample:
� SELECTemployee_id,last_name,first_nameFROMemployeeseWHEREis_manager=0ANDEXISTS(SELECT*FROMemployeesm
WHEREm.department=e.departmentANDm.is_manager=1ANDm.salary<e.salary);
� NotethatEXISTS/NOTEXISTScancompleteafteronlyonerowisgeneratedbysubquery� Don’tneedtoevaluateentiresubqueryresult…� Definitelywanttooptimizesubplantoproducefirstrowasquicklyaspossible
14
SubqueriesinFROMClause� FROMsubqueries aretheeasiesttodealwith!J
� SELECTu.username,email,max_scoreFROMusersu,
(SELECTusername,MAX(score)ASmax_scoreFROMgame_scoresGROUPBYusername)ASs
WHEREu.username=s.username;� Togenerateexecutionplanforfullquery:
� Simplygenerateexecutionplanforthederivedrelation(e.g.recursivecalltoplannerwithsubquery’sAST)
� Usethesubquery’splanasaninputintotheouterquery(asifitwereanothertableintheFROMclause)
15
SubqueriesinFROMClause(2)� Ourexample:
� SELECTu.username,email,max_scoreFROMusersu,
(SELECTusername,MAX(score)ASmax_scoreFROMgame_scoresGROUPBYusername)ASs
WHEREu.username=s.username;� Subqueryplan:
game_scores
G usernameMAX(score)
users
θ
Πu.username,u.email,s.max_score
u.username =s.username
game_scores
G usernameMAX(score)
� Fullplan:
16
FROMSubqueriesandViews� Viewswillalsocreatesubqueries intheFROMclause
� CREATEVIEWtop_scoresASSELECTusername,MAX(score)ASmax_scoreFROMgame_scoresGROUPBYusername;
� SELECTu.username,email,max_scoreFROMusersu,top_scoressWHEREu.username=s.username;
� Simplesubstitutionofview’sdefinitioncreatesanestedsubquery intheFROMclause:� SELECTu.username,email,max_scoreFROMusersu,(SELECTusername,MAX(score)ASmax_score
FROMgame_scores GROUPBYusername)sWHEREu.username =s.username;
17
FROMSubqueriesandViews(2)� Twooptionsastohowthisisdone� Option1:
� Whenviewiscreated,databasecanconstructarelationalalgebraplanfortheview,andsaveit.
� Whenaqueryreferencestheview,simplyusetheview’splanasasubplaninthereferencingquery.
� Option2:� Whenviewiscreated,databaseparsesandverifiestheSQL,butdoesn’tgeneratearelationalalgebraplan.
� Whenaqueryreferencestheview,modifythequery’sSQLtousetheview’sdefinition,thengenerateaplan.
� Secondoptionrequiresmoreworkduringplanning,butpotentiallyallowsforgreateroptimizationstobeapplied
18
SubqueriesinSELECTClause� SubqueriesintheSELECTclausemustbescalarsubqueries:
� SELECTcustomer_id,(SELECTSUM(balance)FROMloanJOINborrowerbWHEREb.customer_id =c.customer_id)tot_bal
FROMcustomerc;� Mustproduceexactlyonerowandonecolumn
� Aneasy,generallyusefulapproach:� Representscalarsubquery asspecialkindofexpression� Duringqueryplanning,generateaplanforthesubquery� Whenselect-expressionisevaluated,recursivelyinvokethequeryexecutortoevaluatethesubquery togeneratearesult
� (Reportanerrorifdoesn’tproduceexactlyonerow/column!)
19
SubqueriesinSELECTClause(2)� SubqueriesintheSELECTclausemustbescalarsubqueries:
� SELECTcustomer_id,(SELECTSUM(balance)FROMloanJOINborrowerbWHEREb.customer_id =c.customer_id)tot_bal
FROMcustomerc;� Mustproduceexactlyonerowandonecolumn
� Ifscalarsubquery iscorrelated:� Mustreevaluatethesubquery foreachrowinouterquery
� Ifscalarsubquery isn’tcorrelated:� Canevaluatesubqueryonceandcachetheresult� (Thisisanoptimization;correlatedevaluationwillalsowork,althoughitisobviouslyunnecessarilyslow.)
20
SubqueriesinSELECTClause(3)� Correlatedscalarsubqueries intheSELECTclausecanfrequentlyberestatedasadecorrelated outerjoin:� SELECTcustomer_id,
(SELECTSUM(balance)FROMloanJOINborrowerbWHEREb.customer_id =c.customer_id)tot_bal
FROMcustomerc;� Equivalentto:
� SELECTc.customer_id,tot_balFROMcustomercLEFTOUTERJOIN
(SELECTb.customer_id,SUM(balance)tot_balFROMloanJOINborrowerbGROUPBYb.customer_id)tONt.customer_id =c.customer_id);
� Usually,outerjoinischeaperthancorrelatedevaluation
21
ScalarSubqueriesinOtherClauses� Scalarsubqueries canalsoappearinotherpredicates,e.g.WHEREclauses,HAVINGclauses,ONclauses,etc.
� Thesecasesaremorelikelytobeuncorrelated,whichmeanstheycanbeevaluatedonceandthencached
� Iftheyarecorrelated,theycanalsooftenberestatedasajoininanappropriatepartoftheexecutionplan� But,itcangetsignificantlymorecomplicated…
22
SubqueriesinWHEREClause� IN/NOTINclausesandEXISTS/NOTEXISTSpredicatescanalsoappearinWHEREandHAVINGclauses
� Example:FindbankcustomerswithaccountsatanybankbranchinLosAngeles� SELECT*FROMcustomercWHEREcustomer_id IN
(SELECTcustomer_id FROMdepositorNATURALJOINaccountNATURALJOINbranchWHEREbranch_city ='LosAngeles');
� Isthisquerycorrelated?� No;innerquerydoesn’treferenceenclosingqueryvalues
23
SubqueriesinWHEREClause(2)� Again,canimplementIN/EXISTSinasimpleandgenerallyusefulway:� CreatespecialINandEXISTSexpressionoperatorsthatincludeasubquery
� Duringplanning,anexecutionplanisgeneratedforeachsubquery inanINorEXISTSexpression
� WhenINorEXISTSexpressionisevaluated,recursivelyinvoketheexecutortoevaluatesubquery andtestrequiredcondition� e.g.INscansthegeneratedresultsfortheLHSvalue� e.g.EXISTSreturnstrueifarowisgeneratedbysubquery,orfalseifnorowsaregeneratedbythesubquery
24
SubqueriesinWHEREClause(3)� IN/NOTINclausesandEXISTS/NOTEXISTSpredicatescanalsobecorrelated� EXISTS/NOTEXISTSsubqueries arealmostalwayscorrelated
� Ifsubquery isnotcorrelated,canmaterializesubqueryresultsandreusethem� …buttheymaybelarge;wemaystillendupbeingverrry slow
� Previousapproachisn’tanywherenearideal� INoperatoreffectivelyimplementsajoinoperation,butwithoutanyoptimizations
� EXISTSisabitfaster,butsubquery isfrequentlycorrelated� Wouldgreatlyprefertoevaluatesubquery usingjoins,particularlyifwecaneliminatecorrelatedevaluation!
25
Semijoin andAntijoin� TwousefulrelationalalgebraoperationsinthecontextofIN/NOTINandEXISTS/NOTEXISTSqueries
� Relationsr(R)ands(S)� Thesemijoin r s isthecollectionofallrowsinr thatcanjoinwithsomecorrespondingrowins� {tr |tr Î r Ù $ ts Î s (join(tr,ts))}� join(tr,ts) isthejoincondition
� r s equivalenttoΠR(r s),butonlywithsets oftuples� Ifr ands aremultisets,theseexpressionsarenotequivalent,sinceatupleinr thatmatchesmultipletuplesinswillbecomeduplicatedinthenaturaljoin’sresult
26
Semijoin andAntijoin (2)� Theantijoin r s isthecollectionofallrowsinr thatdon’tjoinwithsomecorrespondingrowins� {tr |tr Î r Ù ¬$ ts Î s (join(tr,ts))}
� Alsocalledanti-semijoin,sincer s isequivalenttor – r s (isthecomplementof)
� Bothsemijoin andantijoin operationsareeasytocomputewithourvariousjoinalgorithms� Canincorporateintotheta-joinimplementationseasily
� CanusetheseoperationstorestatemanyIN/NOTINandEXISTS/NOTEXISTSqueries
27
ExampleINSubquery� Findallbankcustomerswhohaveanaccountatanybankbranchinthecitytheylivein� SELECT*FROMcustomercWHEREc.customer_city IN
(SELECTb.branch_cityFROMbranchbNATURALJOINaccounta
NATURALJOINdepositordWHEREd.customer_id =c.customer_id);
� Recall:brancheshaveabranch_name andabranch_city� Innerqueryisclearlycorrelatedwithouterquery� Naïvecorrelatedevaluationwouldbevery slowL
� Jointhreetablesininnerqueryforeverybankcustomer!
28
ExampleINSubquery (2)� Examplequery:
� SELECT*FROMcustomercWHEREc.customer_city IN(SELECTb.branch_cityFROMbranchbNATURALJOINaccounta
NATURALJOINdepositordWHEREd.customer_id =c.customer_id);
� Candecorrelate byextractinginnerquery,modifyingittofindallbranchesforallcustomers,inoneshot:� SELECT*FROMbranchbNATURALJOINaccounta
NATURALJOINdepositord� Includestuplesforeachbranchthateachcustomerhasaccountsat
29
ExampleINSubquery (3)� Couldtakeourinnerqueryandjoinitagainstcustomer
� SELECTc.*FROMcustomercJOIN(SELECT*FROMbranchbNATURALJOINaccounta
NATURALJOINdepositord)AStON(t.customer_id =c.customer_id AND
c.customer_city =t.branch_city);� Problems?
� Ifacustomerhasmultipleaccountsatlocalbranches,thecustomerwillappearmultipletimesintheresult
� Cause:theoutermostjoinwillduplicatecustomerrowsforeachmatchingrowinnestedquery
� Solution:useasemijoin tojoincustomerstothesubquery
30
ExampleINSubquery (4)� Ouroriginalcorrelatedquery:
� SELECT*FROMcustomercWHEREc.customer_city IN(SELECTb.branch_cityFROMbranchbNATURALJOINaccounta
NATURALJOINdepositordWHEREd.customer_id =c.customer_id);
� Thedecorrelated query:� SELECT*FROMcustomercSEMIJOIN
(SELECT*FROMbranchbNATURALJOINaccountaNATURALJOINdepositord)ASt
ON(t.customer_id =c.customer_id ANDc.customer_city =t.branch_city);
� (Note:writingasemijoin inSQLisn’twidelysupported…)
31
ExampleNOTEXISTSSubquery� Asimplerquery:findcustomerswhohavenobankbranchesintheirhomecity� SELECT*FROMcustomercWHERENOTEXISTS(SELECT*FROMbranchb
WHEREb.branch_city =c.customer_city);� Again,thisqueryrequirescorrelatedevaluation
� Notasbadaspreviousquery,sinceNOTEXISTSonlyhastoproduceonerowfrominnerquery,notalltherows…
� Ifthere’sanindexonbranch_city,thiswon’tbehorriblyslow,butagain,weareimplementingajoinhere
� (Wehavefastequijoinalgorithms;whynotusethem?)
32
ExampleNOTEXISTSSubquery (2)� Examplequery:
� SELECT*FROMcustomercWHERENOTEXISTS(SELECT*FROMbranchb
WHEREb.branch_city =c.customer_city);� Thisqueryisveryeasytowritewithanantijoin:
� SELECT*FROMcustomercANTIJOIN branchbONbranch_city =customer_city;
� Couldalsowritewithanouterjoin:� SELECTc.*FROMcustomercLEFTJOINbranchb
ONbranch_city =customer_cityWHEREbranch_city ISNULL;
� Thisapproachwon’tcreateduplicatesofcustomers,likeourpreviousINexamplewouldhave…
33
Summary:NestedSubqueries� Onlyscratchedthesurfaceofsubquery translationandoptimization� Anincrediblyrichtopic– tonsofinterestingresearch!
� Canusebasictoolswediscussedtodaytodecorrelateandoptimizeaprettybroadrangeofsubqueries� Outerjoins,sometimesagainstgroup/aggregateresults� Semijoins andantijoins forset-membershipsubqueries
� Animportantquestion,notconsideredfornow:� Isthetranslatedversionactuallyfaster?(Orwhenmultipleoptions,whichoptionisfastest?)
� Aplanner/optimizermustmakethatdecision
34