Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
CompSci 516DataIntensiveComputingSystems
Lecture5,6,7Storage andIndexing
Instructor:Sudeepa Roy
1DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems
Announcements
• Homework1– DueonSeptember16(Friday),11:55pm
• Homework2:AWSaccountsetupinstructions• carefullyusethecredit• remembertoturnoffinstances(toavoidadditionalcharges)
• ConflictwithCSgraduatestudentsretreatforSeptember30(Friday)class– MidtermmovedtoOctober12(Wednesday)– Apiazzapollwillbepostedtodayforamake-upclassforthose
whoareparticipating– forothers,therewillbeaclassonSept30
– Pleasefilloutbytomorrow
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 2
Wherearewenow?
Welearntü RelationalModelandQueryLanguages
ü SQL,RA,RCü Postgres(DBMS)§ HW1
ü Map-reduceandspark§ HW2
Next• DBMSInternals
– Storage– Indexing– QueryEvaluation– OperatorAlgorithms– Externalsort– QueryOptimization
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3
ReadingMaterial
• [RG]– Storage:Chapters8.1,8.2,8.4,9.4-9.7– Index:8.3,8.5– Tree-basedindex:Chapter10.1-10.7– Hash-basedindex:Chapter11
Additionalreading• [GUW]
– Chapters8.3,14.1-14.4
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4
Acknowledgement:Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.
Whatwillwelearn?
• HowdoesaDBMSorganizefiles?– Recordformat,Pageformat
• Whatisanindex?• Whataredifferenttypesofindexes?
– Tree-basedindexing:• B+tree• insert,delete
– Hash-basedindexing• Staticanddynamic(extendiblehashing,linearhashing)
• Howdoweuseindextooptimizeperformance?
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5
Storage
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6
DBMSArchitecture
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 7
• AtypicalDBMShasalayeredarchitecture
• Thefiguredoesnotshowtheconcurrencycontrolandrecoverycomponents
– tobedonein“transactions”
• Thisisoneofseveralpossiblearchitectures
– eachsystemhasitsownvariations
Query Parsing, Optimization,and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Theselayersmustconsiderconcurrencycontrolandrecovery
DataonExternalStorage• Datamustpersistondisk acrossprogramexecutionsina
DBMS– Dataishuge– Mustpersistacrossexecutions– ButhastobefetchedintomainmemorywhenDBMSprocessesthe
data
• Theunitofinformationforreadingdatafromdisk,orwritingdatatodisk,isapage
• Disks: Canretrieverandompageatfixedcost– Butreadingseveralconsecutivepagesismuchcheaperthanreading
theminrandomorder
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 8
DiskSpaceManagement• LowestlayerofDBMSsoftwaremanagesspaceondisk
• Higherlevelscalluponthislayerto:– allocate/de-allocateapage– read/writeapage
• Sizeofapage= sizeofadiskblock=dataunit
• Requestforasequenceofpagesoftensatisfiedbyallocatingcontiguousblocksondisk
• SpaceondiskmanagedbyDisk-spaceManager– Higherlevelsdon’tneedtoknowhowthisisdone,orhowfreespace
ismanaged
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 9
BufferManagement
Suppose• 1millionpagesindb,butonlyspacefor1000inmemory• Aqueryneedstoscantheentirefile• DBMShasto
– bringpagesintomainmemory– decidewhichexistingpagestoreplacetomakeroomforanew
page– calledReplacementPolicy
• ManagedbytheBuffermanager– Filesandaccessmethodsaskthebuffermanagertoaccessa
pagementioningthe“recordid”(soon)– Buffermanagerloadsthepageifnotalreadythere
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 10
BufferManagement
• DatamustbeinRAMforDBMStooperateonit• Tableof<frame#,pageid>pairsismaintained
DB
MAIN MEMORY
DISK
disk page
free frame
Page Requests from Higher Levels
BUFFER POOL
choice of frame dictatedby replacement policy
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 11
Bufferpool=mainmemoryispartitionedintoframeseithercontainsapagefromdiskorisafreeframe
WhenaPageisRequested...Foreveryframe,store• adirty bit:
– whetherthepagehasbeenmodifiedsinceithasbeenbroughttomemory
– initially0oroff
• apin-count:– thenumberoftimesapagehasbeenrequestedbutnotreleased
(andno.ofcurrentusers)– initially0– whenapageisrequested,thecountinincremented– whentherequestorreleasesthepage,countisdecremented– buffermanageronlyreadsapageintoaframewhenitspin-countis0– ifnopagewithpin-count0,buffermanagerhastowait(ora
transactionisaborted-- later)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 12
WhenaPageisRequested...
• Checkifthepageisalreadyinthebufferpool• ifyes,incrementthepin-countofthatframe• Ifno,
– Chooseaframeforreplacementusingthereplacementpolicy– Ifthechosenframeisdirty (hasbeenmodified),writeittodisk– Readrequestedpageintochosenframe
• Pin (increasepin-countof)thepageandreturnitsaddress totherequestor
• If requests can be predicted (e.g., sequential scans), pages can be pre-fetched several pages at a time
• ConcurrencyControl&recoverymayentailadditionalI/Owhenaframeischosenforreplacement• e.g.Write-AheadLogprotocol:whenwedoTransactions
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 13
BufferReplacementPolicy
• Frameischosenforreplacementbyareplacementpolicy
• Least-recently-used(LRU)– addframeswithpin-count0totheendofaqueue– choosefromhead
• Clock– anefficientimplementationofLRU– Assign1toN(=#frames)toframes– choosenextframewithpin-count0
• FirstInFirstOut(FIFO)• Most-Recently-Used(MRU)etc.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 14
BufferReplacementPolicy
• Policycanhavebigimpacton#ofI/O’s• Dependsontheaccesspattern• Sequentialflooding:NastysituationcausedbyLRU+
repeatedsequentialscans– Whathappenswith10framesand9pages?– Whathappenswith10framesand11pages?– #bufferframes<#pagesinfilemeanseachpagerequestineachscan
causesanI/O– MRUmuchbetterinthissituation(butnotinallsituations,ofcourse)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 15
DBMSvs.OSFileSystem
• OperatingSystemsdodiskspaceandbuffermanagementtoo:• WhynotletOSmanagethesetasks?
• DBMScanpredictthepagereferencepatternsmuchmoreaccurately– canoptimize– adjustreplacementpolicy– pre-fetch pages– alreadyinbuffer+contiguousallocation– pinapageinbufferpool,forceapagetodisk(importantfor
implementingTransactionsconcurrencycontrol&recovery)
• DifferencesinOSsupport:portabilityissues
• Somelimitations,e.g.,filescan’tspandisks
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 16
FilesofRecords
• PageorblockisOKwhendoingI/O,buthigherlevelsofDBMSoperateonrecords,andfilesofrecords
• FILE:Acollectionofpages,eachcontainingacollectionofrecords
• Mustsupport:– insert/delete/modifyrecord– readaparticularrecord(specifiedusingrecordid)– scanallrecords(possiblywithsomeconditionsontherecordstoberetrieved)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 17
FileOrganization
• Fileorganization:Methodofarrangingafileofrecordsonexternalstorage– Onefilecanhavemultiplepages– Recordid(rid)issufficienttophysicallylocatethepagecontainingtherecordondisk
– Indexes aredatastructuresthatallowustofindtherecordidsofrecordswithgivenvaluesinindexsearchkeyfields
• NOTE:Severalusesof“keys”inadatabase– Primary/foreign/candidate/superkeys– Indexsearchkeys
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 18
AlternativeFileOrganizationsManyalternativesexist,eachidealforsomesituations,and
notsogoodinothers:• Heap(randomorder)files: Suitablewhentypicalaccessisa
filescanretrievingallrecords• SortedFiles:Bestifrecordsmustberetrievedinsome
order,oronlya“range”ofrecordsisneeded.• Indexes:Datastructurestoorganizerecordsviatreesor
hashing– Likesortedfiles,theyspeedupsearchesforasubsetofrecords,
basedonvaluesincertain(“searchkey”)fields– Updatesaremuchfasterthaninsortedfiles
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 19
Unordered(Heap)Files
• Simplestfilestructurecontainsrecordsinnoparticularorder
• Asfilegrowsandshrinks,diskpagesareallocatedandde-allocated
• Tosupportrecordleveloperations,wemust:– keeptrackofthepages inafile– keeptrackoffreespaceonpages– keeptrackoftherecords onapage
• Therearemanyalternativesforkeepingtrackofthis
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 20
HeapFileImplementedasaList
• TheheaderpageidandHeapfilenamemustbestoredsomeplace
• Eachpagecontains2`pointers’plusdata• Problem:
– toinsertanewrecord,wemayneedtoscanseveralpagesonthefreelisttofindonewithsufficientspace
HeaderPage
DataPage
DataPage
DataPage
DataPage
DataPage
DataPage Pages with
Free Space
Full Pages
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 21
HeapFileUsingaPageDirectory
• Theentryforapagecanincludethenumberoffreebytesonthepage.
• Thedirectoryisacollectionofpages– linkedlistimplementationofdirectoryisjustonealternative– Muchsmallerthanlinkedlistofallheapfilepages!
DataPage 1
DataPage 2
DataPage N
HeaderPage
DIRECTORY
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 22
Howdowearrangeacollectionofrecordsonapage?
• Eachpagecontainsseveralslots– oneforeachrecord
• Recordisidentifiedby<page-id,slot-number>
• Fixed-LengthRecords• Variable-LengthRecords
• Forboth,thereareoptionsfor– Recordformats(howtoorganizethefieldswithinarecord)– Pageformats(howtoorganizetherecordswithinapage)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 23
PageFormats:FixedLengthRecords
• Recordid=<pageid,slot#>• Packed:movingrecordsforfreespacemanagementchangesrid;maynotbe
acceptable• Unpacked:useabitmap– scanthebitarraytofindanemptyslot• Eachpagealsomaycontainadditionalinfoliketheidofthenextpage(notshown)
Slot 1Slot 2
Slot N
. . . . . .
N M10. . .M ... 3 2 1
PACKED UNPACKED, BITMAP
Slot 1Slot 2
Slot N
FreeSpace
Slot M11
number of records
numberof slots
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 24
PageFormats:VariableLengthRecords
• Needtofindapagewiththerightamountofspace– Toosmall– cannotinsert– Toolarge– wasteofspace
• ifarecordisdeleted,needtomovetherecordssothatallfreespaceiscontiguous– needabilitytomoverecordswithinapage
• Canmaintainadirectoryofslots(nextslide)– <record-offset,record-length>– deletion=setrecord-offsetto-1
• Record-idrid=<page,slot-in-directory>remainsunchanged
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 25
PageFormats:VariableLengthRecords
• Canmoverecordsonpagewithoutchangingrid– so,attractiveforfixed-lengthrecordstoo
• Store(record-offset,record-length)ineachslot• rid-sunaffectedbyrearrangingrecordsinapage
Page iRid = (i,N)
Rid = (i,2)
Rid = (i,1)
Pointerto startof freespace
SLOT DIRECTORY
N . . . 2 120 16 24 N
# slots
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 26
RecordFormats:FixedLength
• Eachfieldhasafixedlength– forallrecords– thenumberoffieldsisalsofixed– fieldscanbestoredconsecutively
• Informationaboutfieldtypessameforallrecordsinafile– storedinsystemcatalogs
• Findingi-th fielddoesnotrequirescanofrecord– giventheaddressoftherecord,addressofafieldcanbeobtained
easily
Base address (B)
L1 L2 L3 L4
F1 F2 F3 F4
Address = B+L1+L2
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 27
RecordFormats:VariableLength• Cannotusefixed-lengthslotsforrecords• Twoalternativeformats(#fieldsisfixed):
• Second offers direct access to i-th field, efficient storage of nulls (special don’t know value); small directory overhead
• Modification may be costly (may grow the field and not fit in the page)
4 $ $ $ $
FieldCount
Fields Delimited by Special Symbols
F1 F2 F3 F4
F1 F2 F3 F4
Array of Field Offsets
1.usedelimiters
2.useoffsetsatthestartofeachrecord
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 28
Lecture5endshere
Indexes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 29
Lecture6startshere
Announcements
• Homework1– DueTODAY:September16(Friday),11:55pm
• ConflictwithCSgraduatestudentsretreatforSeptember30(Friday)class– MidtermmovedtoOctober12(Wednesday)– RegularclassonSeptember30– MakeuplectureonSeptember29(Thursday),4:40-5:55pm,LSRCD309:onlyforstudentsgoingtoCSgradretreat(roomaccommodates~10people)
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 30
Indexes
• Anindexonafilespeedsupselectionsonthesearchkeyfieldsfortheindex
– Anysubsetofthefieldsofarelationcanbethesearchkeyforanindexontherelation.
– “Searchkey”isnotthesameas“key”key=minimalsetoffieldsthatuniquelyidentifyatuple
• Anindexcontainsacollectionofdataentries,andsupportsefficientretrievalofalldataentries k*withagivenkeyvaluek
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 31
AlternativesforDataEntryk*inIndexk
• Inadataentryk*wecanstore:1. (Alternative1)Theactualdatarecordwithkeyvalue k,
or2. (Alternative2)<k,rid>
• rid=recordofdatarecordwithsearchkeyvalue k,or
3. (Alternative3)<k,rid-list>• listofrecordidsofdatarecordswithsearchkeyk>
• Choiceofalternativefordataentriesisorthogonaltotheindexingtechniqueusedtolocatedataentrieswithagivenkeyvaluek
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 32
AlternativesforDataEntries:Alternative1
• Indexstructureisafileorganizationfordatarecords– insteadofaHeapfileorsortedfile
• HowmanydifferentindexescanuseAlternative1?• AtmostoneindexcanuseAlternative1
– Otherwise,datarecordsareduplicated,leadingtoredundantstorageandpotentialinconsistency
• Ifdatarecordsareverylarge,#pageswithdataentriesishigh– Impliessizeofauxiliaryinformationintheindexisalsolarge
• Inadataentryk*wecanstore:1. Theactualdatarecordwithkeyvalue k2. <k,rid>
• rid=recordofdatarecordwithsearchkeyvalue k3. <k,rid-list>
• listofrecordidsofdatarecordswithsearchkeyk>
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 33
Advantages/Disadvantages?
AlternativesforDataEntries:Alternative2,3
• Dataentriestypicallymuchsmallerthandatarecords– So,betterthanAlternative1withlargedatarecords– Especiallyifsearchkeysaresmall.
• Alternative3morecompactthanAlternative2– butleadstovariable-sizedataentriesevenifsearchkeyshavefixedlength.
• Inadataentryk*wecanstore:1. Theactualdatarecordwithkeyvalue k2. <k,rid>
• rid=recordofdatarecordwithsearchkeyvalue k3. <k,rid-list>
• listofrecordidsofdatarecordswithsearchkeyk>
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 34
Advantages/Disadvantages?
IndexClassification
• Primaryvs.secondary• Clusteredvs.unclustered• Tree-basedvs.Hash-based
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 35
Primaryvs.SecondaryIndex
• Ifsearchkeycontainsprimarykey,thencalledprimaryindex,otherwisesecondary– Unique index:Searchkeycontainsacandidatekey
• Duplicatedataentries:– iftheyhavethesamevalueofsearchkeyfieldk– Primary/uniqueindexneverhasaduplicate– Othersecondaryindexcanhaveduplicates
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 36
Clusteredvs.Unclustered Index
• Iforderofdatarecordsinafileisthesameas,or`closeto’,orderofdataentriesinanindex,thenclustered,otherwiseunclustered
– Alternative1impliesclustered– 2,3aretypicallyunclustered• unlesssortedaccordingtothesearchkey
– Inpractice,clusteredalsoimpliesAlternative1(sincesortedfilesarerare)
– Afilecanbeclusteredonatmostonesearchkey– Costofretrievingdatarecords(rangequeries)throughindexvaries
greatlybasedonwhetherindexisclusteredornot
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 37
• SupposethatAlternative(2)isusedfordataentries,andthatthedatarecordsarestoredinaHeapfile
• Tobuildclusteredindex,firstsorttheHeapfile– withsomefreespaceoneachpageforfutureinserts– Overflowpagesmaybeneededforinserts– Thus,datarecordsare`closeto’,butnotidenticalto,sorted
Index entries
Data entries
direct search for
(Index File)(Data file)
Data Records
data entries
Data entries
Data Records
CLUSTERED UNCLUSTERED
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 38
Clusteredvs.Unclustered Index
Methodsforindexing
• Tree-based• Hash-based
• (indetaillater)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 39
SystemCatalogs
• Foreachindex:– structure(e.g.,B+tree)andsearchkeyfields
• Foreachrelation:– name,filename,filestructure(e.g.,Heapfile)– attributenameandtype,foreachattribute– indexname,foreachindex– integrityconstraints
• Foreachview:– viewnameanddefinition
• Plusstatistics,authorization,bufferpoolsize,etc.• (describedin[RG]12.1)
Catalogs are themselves stored as relations!DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 40
RememberTerminology
• Indexsearchkey(key):k– Usedtosearcharecord
• Dataentry:k*– Pointedtobyk– Containsrecordid(s)orrecorditself
• Recordsordata– Actualtuples– Pointedtobyrecordids
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 41
INDEXdoesthis
Tree-basedIndexandB+-Tree
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 42
RangeSearches
• ``Findallstudentswithgpa >3.0’’– Ifdataisinsortedfile,dobinarysearchtofindfirstsuchstudent,thenscantofindothers.
– Costofbinarysearchcanbequitehigh.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 43
Indexfileformat
• Simpleidea:Createan“indexfile”– <first-key-on-page,pointer-to-page>,sortedonkeys
Can do binary search on (smaller) index filebut may still be expensive: apply this idea repeatedly
Page 1 Page 2 Page NPage 3 Data File
k2 kNk1 Index File
P0 K 1 P 1 K 2 P 2 K m P m
index entry
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 44
IndexedSequentialAccessMethod(ISAM)
• Leaf-pagescontaindataentry– alsosomeoverflowpages• DBMSorganizeslayoutoftheindex– astaticstructure• Ifanumberofinsertstothesameleaf,alongoverflowchaincanbecreated
– affectstheperformance
Leaf pages contain data entries.
Non-leafPages
PagesOverflow
pagePrimary pages
Leaf
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 45
B+Tree• MostWidelyUsedIndex
– adynamicstructure• Insert/deleteatlogF Ncost=heightofthetree
– F=fanout,N=no.ofleafpages– treeismaintainedheight-balanced
• Minimum50%occupancy– Eachnodecontainsd <=m<=2d entries– Rootcontains1<=m<=2dentries– Theparameterd iscalledtheorder ofthetree
• Supportsequalityandrange-searchesefficiently
Index Entries
Data Entries("Sequence set")
(Direct search)Theindex-file
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 46
B+TreeIndexes
• Leaf pages contain data entries, and are chained (prev & next)• Non-leaf pages have index entries; only used to direct searches:
P0 K 1 P 1 K 2 P 2 K m P m
index entry
Non-l eafPages
Pages (Sorted by search key)
Leaf
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 47
ExampleB+Tree
• Searchbeginsatroot,andkeycomparisonsdirectittoaleaf
• Searchfor5*,15*,alldataentries>=24*...Based on the search for 15*, we knowit is not in the tree!Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 48
ExampleB+Tree
• Find– 28*?– 29*?– All>15*and<30*
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Entries<=17 Entries>17
Notehowdataentriesinleaflevelaresorted
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 49
B+TreesinPractice
• Typicalorder:d=100.Typicalfill-factor:67%– averagefanout F=133
• Typicalcapacities:– Height4:1334 =312,900,700records– Height3:1333 =2,352,637records
• Canoftenholdtoplevelsinbufferpool:– Level1=1page=8Kbytes– Level2=133pages=1Mbyte– Level3=17,689pages=133MBytes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 50
InsertingaDataEntryintoaB+Tree
• FindcorrectleafL• PutdataentryontoL
– IfLhasenoughspace,done– Else,mustsplit L
• intoLandanewnodeL2• Redistributeentriesevenly,copyupmiddlekey.• InsertindexentrypointingtoL2intoparentofL.
• Thiscanhappenrecursively– Tosplitindexnode,redistributeentriesevenly,butpushupmiddlekey
• Contrastwithleafsplits• Splits“grow”tree;rootsplitincreasesheight.
– Treegrowth:getswider oroneleveltallerattop.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 51
Seethisslidelater,First,seeexamplesonthenextfewslides
Root
17 24 30
2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
Inserting8*intoExampleB+Tree
2* 3* 5* 7* 8*
5Entry to be inserted in parent node.(Note that 5 iscontinues to appear in the leaf.)
s copied up and
• Copy-up:5appearsinleafandthelevelabove
• Observehowminimumoccupancyisguaranteed
STEP-1
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 52
• Notedifferencebetweencopy-upandpush-up
• Whatisthereasonforthisdifference?
• Alldataentriesmustappearasleaves– (foreasyrangesearch)
• nosuchrequirementforindexes– (soavoidredundancy)
appears once in the index. Contrast
5 24 30
17
13
Entry to be inserted in parent node.(Note that 17 is pushed up and only
this with a leaf split.)
Root
17 24 30
2* 3* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
13
Inserting8*intoExampleB+Tree
STEP-2
5* 7* 8*
Needtosplitparent
5
ExampleB+TreeAfterInserting8*
• Notice that root was split, leading to increase in height.
• In this example, we can avoid split by re-distributing entries (insert 8 to the 2nd leaf node from left and copy it up instead of 13)
• however, this is usually not done in practice – since need to access 1-2 extra pages always (for two siblings), and average occupancy may remain unaffected as the file grows
2* 3*
Root17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 54
DeletingaDataEntryfromaB+Tree
• Startatroot,findleafLwhereentrybelongs• Removetheentry
– IfLisatleasthalf-full,done!– IfLhasonlyd-1entries,
• Trytore-distribute,borrowingfromsibling(adjacentnodewithsameparentasL)
• Ifre-distributionfails,merge Landsibling
• Ifmergeoccurred,mustdeleteentry(pointingtoLorsibling)fromparentofL
• Mergecouldpropagatetoroot,decreasingheight
Eachnon-rootnodecontainsd <=m<=2d entries
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 55
Seethisslidelater,First,seeexamplesonthenextfewslides
ExampleTree:Delete19*
• Wehadinserted8*• Nowdelete19*• Easy
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 56
2* 3*
Root17
24 30
14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Beforedeleting19*
ExampleTree:Delete19*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 57
2* 3*
Root17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Afterdeleting19*
ExampleTree:Delete20*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 58
2* 3*
Root17
24 30
14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Beforedeleting20*
ExampleTree:Delete20*
• <2entriesinleaf-node• Redistribute
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 59
2* 3*
Root17
24 30
14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
135
7*5* 8*
Afterdeleting20*- step1
• Noticehowmiddlekeyiscopiedup
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
ExampleTree:Delete20*
Afterdeleting20*- step2
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 60
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Beforedeleting24*
ExampleTree: ...AndThenDelete24*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 61
• Onceagain,imbalanceatleaf• Canweborrowfromsibling(s)?• No– d-1anddentries(d=2)• Needtomerge
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22*
27
27* 29*
Afterdeleting24*- Step1
ExampleTree: ...AndThenDelete24*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 62
• Imbalanceatparent• Mergeagain• Butneedto“pulldown”rootindexentry
2* 3*
Root
17
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22*
30
27* 29*
Afterdeleting24*- Step2
ExampleTree: ...AndThenDelete24*
• Observe`toss’ ofoldindexentry27
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 63
because,threeindex5,13,30butfivepointerstoleaves
FinalExampleTree
2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*
Root30135 17
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 64
ExampleofNon-leafRe-distribution
• Anintermediatetreeisshown• Incontrasttopreviousexample,canre-distributeentryfromleftchildof
roottorightchild
Root
135 17 20
22
30
14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 65
AfterRe-distribution• Intuitively,entriesarere-distributedby`pushingthrough’the
splittingentryintheparentnode.– Itsufficestore-distributeindexentrywithkey20;we’vere-distributed
17aswellforillustration.
14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2* 3*
Root
135
17
3020 22
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 66
Duplicates
• FirstOption:– Thebasicsearchalgorithmassumesthatallentrieswiththe
samekeyvalueresidesonthesameleafpage– Iftheydonotfit,useoverflowpages(likeISAM)
• SecondOption:– Severalleafpagescancontainentrieswithagivenkeyvalue– Searchfortheleftmostentrywithakeyvalue,andfollowthe
leaf-sequencepointers– Needmodificationinthesearchalgorithm
• ifk*=<k,rid>,severalentrieshavetobesearched– Orincluderidink– becomesuniqueindex,noduplicate– Ifk*=<k,rid-list>,somesolution,butifthelistislong,againa
singleentrycanspanmultiplepages
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 67
ANoteon`Order’
• Order(d)– denotesminimumoccupancy
• replacedbyphysicalspacecriterioninpractice(`atleasthalf-full’)
– Indexpagescantypicallyholdmanymoreentriesthanleafpages– Variablesizedrecordsandsearchkeysmeandifferentnodeswill
containdifferentnumbersofentries.– Evenwithfixedlengthfields,multiplerecordswiththesamesearchkey
value(duplicates)canleadtovariable-sizeddataentries(ifweuseAlternative(3))
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 68
Summary
• Tree-structuredindexesareidealforrange-searches,alsogoodforequalitysearches
• ISAMisastaticstructure– Onlyleafpagesmodified;overflowpagesneeded– Overflowchainscandegradeperformanceunlesssizeofdatasetand
datadistributionstayconstant
• B+treeisadynamicstructure– Inserts/deletesleavetreeheight-balanced;logF Ncost– Highfanout (F)meansdepthrarelymorethan3or4– Almostalwaysbetterthanmaintainingasortedfile– Mostwidelyusedindexindatabasemanagementsystemsbecauseof
itsversatility.– OneofthemostoptimizedcomponentsofaDBMS
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 69
Hash-basedIndex
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 70
Hash-BasedIndexes
• Recordsaregroupedintobuckets– Bucket=primarypagepluszeroormore overflowpages
• Hashingfunction h:– h(r)=bucketinwhich(dataentryfor)recordrbelongs– h looksatthesearchkeyfieldsofr– Noneedfor“indexentries”inthisscheme
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 71
Example:Hash-basedindex
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 72
h1AGE
Smith,44,3000Jones,40,6003Tracy,44,5004
Ashby,25,3000Basu,33,4003
Bristow,29,2007
Cass,50,5004Daniels,22,6003
h1(AGE)=00
h1(AGE)=01
h1(AGE)=10
h2
3000300050045004
4003200760036003
h2(SAL)=01
h2(AGE)=00
Fileof<SAL,rid>pairshashedonSAL
EmployeeFilehashedonAGE
IndexorganizedfilehashedonAGE,withAuxiliaryindexonSAL
Alternative1 Alternative2
Introduction
• Hash-basedindexesarebestforequalityselections– Findallrecordswithname=“Joe”– Cannotsupportrangesearches– Butusefulinimplementingrelationaloperatorslikejoin(later)
• Staticanddynamichashingtechniquesexist– trade-offssimilartoISAMvs.B+trees
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 73
StaticHashing• Pagescontainingdata=acollectionofbuckets
– eachbuckethasoneprimarypage,alsopossiblyoverflowpages
– bucketscontaindataentriesk*
h(key)modN
hkey
Primary bucket pages Overflow pages
20
N-1
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 74
StaticHashing• #primarypagesfixed
– allocatedsequentially,neverde-allocated,overflowpagesifneeded.
• h(k)modN=buckettowhichdataentrywithkeykbelongs– N=#ofbuckets
h(key)modN
hkey
Primary bucket pages Overflow pages
20
N-1
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 75
StaticHashing• Hashfunctionworksonsearchkeyfieldofrecordr
– Mustdistributevaluesoverrange0...N-1– h(key)=(a*key+b)usuallyworkswell
• bucket=h(key)modN– aandbareconstants– chosentotuneh
• Advantage:– #bucketsknown– pagescanbeallocatedsequentially– searchneeds1I/O(ifnooverflowpage)– insert/deleteneeds2I/O(ifnooverflowpage)(why2?)
• Disadvantage:– Longoverflowchainscandevelopiffilegrowsanddegradeperformance– Orwasteofspaceiffileshrinks
• Solutions:– keepsomepagessay80%fullinitially– Periodicallyrehash ifoverflowpages(canbeexpensive)– oruseDynamicHashing
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 76
DynamicHashingTechniques
• ExtendibleHashing• LinearHashing
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 77
ExtendibleHashing• Considerstatichashing• Bucket(primarypage)becomesfull
• Whynotre-organizefilebydoubling#ofbuckets?– Readingandwriting(double#pages)allpagesisexpensive
• Idea:Usedirectoryofpointerstobuckets– double#ofbucketsbydoublingthedirectory,splittingjustthe
bucketthatoverflowed– Directorymuchsmallerthanfile,sodoublingitismuchcheaper– Onlyonepageofdataentriesissplit– Nooverflowpage(newbucket,nonewoverflowpage)– Trickliesinhowhashfunctionisadjusted
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 78
Example• Directoryisarrayofsize4
– eachelementpointstoabucket– #bitstorepresent=log4=2=
globaldepth
• Tofindbucketforsearchkeyr– takelastglobaldepth#bitsof
h(r)– assumeh(r)=r– Ifh(r)=5=binary101– itisinbucketpointedtoby01
00
01
10
11
2
2
2
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10*
1* 21*
4* 12* 32* 16*
15* 7* 19*
5*
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 6213
ExampleInsert: • If bucket is full, split it• allocate new page• re-distribute
Supposeinserting13*• binary=1101• bucket01• Hasspace,insert
00
01
10
11
2
2
2
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10*
1* 21*
4* 12* 32* 16*
15* 7* 19*
5*
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 6214
ExampleInsert: • If bucket is full, split it• allocate new page• re-distribute
Supposeinserting20*• binary=10100• bucket00• Alreadyfull• Tosplit,considerlastthreebitsof10100• Lasttwobitsthesame00– thedataentry
willbelongtooneofthesebuckets• Thirdbittodistinguishthem
13*00
01
10
11
2
2
2
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10*
1* 21*
4* 12* 32* 16*
15* 7* 19*
5*
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 6215
20*
00011011
2 2
2
2
LOCAL DEPTH 2
2
DIRECTORY
GLOBAL DEPTHBucket A
Bucket B
Bucket C
Bucket D
Bucket A2(`split image'of Bucket A)
1* 5* 21*13*
32*16*
10*
15* 7* 19*
4* 12*
19*
2
2
2
000001010011100101
110111
3
3
3DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2(new `split image'of Bucket A)
32*
1* 5* 21*13*
16*
10*
15* 7*
4* 20*12*
LOCAL DEPTH
GLOBAL DEPTH
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 82
Globaldepth:Max#ofbitsneededtotellwhichbucketanentrybelongsto
Localdepth:#ofbitsusedtodetermineifanentrybelongstothisbucket• alsodenoteswhetheradirectorydoublingisneededwhilesplitting• nodirectorydoublingneededwhen9*=1001isinserted(LD<GD)
Example
Whendoesbucketsplitcausedirectorydoubling?
• Beforeinsert,localdepthofbucket=globaldepth• Insertcauseslocaldepthtobecome>globaldepth
• directoryisdoubledbycopyingitoverand`fixing’pointertosplitimagepage
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 83
CommentsonExtendibleHashing• Ifdirectoryfitsinmemory,equalitysearchansweredwithone
diskaccess(toaccessthebucket);elsetwo.– 100MBfile,100bytes/rec,4KBpagesize,contains106 records(asdata
entries)and25,000directoryelements;chancesarehighthatdirectorywillfitinmemory.
– Directorygrowsinspurts,and,ifthedistributionofhashvaluesisskewed,directorycangrowlarge
– Multipleentrieswithsamehashvaluecauseproblems
• Delete:– Ifremovalofdataentrymakesbucketempty,canbemergedwith`split
image’– Ifeachdirectoryelementpointstosamebucketasitssplitimage,can
halvedirectory.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 84
Lecture6endshere
LinearHashing
• Thisisanotherdynamichashingscheme– analternativetoExtendibleHashing
• LHhandlestheproblemoflongoverflowchains– withoutusingadirectory– handlesduplicatesandcollisions– veryflexiblew.r.t.timingofbucketsplits
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 85
Lecture7 startshere
LinearHashing:BasicIdea• Useafamilyofhashfunctionsh0,h1,h2,...
– hi(key)=h(key)mod(2iN)– N=initial#buckets– hissomehashfunction(rangeisnot0toN-1)– IfN=2d0,forsomed0,hi consistsofapplyinghandlookingatthelastdi bits,wheredi =d0 +i
• Note:hi(key)=h(key)mod(2d0+i)– hi+1doublestherangeofhi
• ifhi mapstoMbuckets,hi+1 mapsto2Mbuckets• similartodirectorydoubling
– SupposeN=32,d0 =5• h0 =hmod32(last5bits)• h1 =hmod64 (last6bits)• h2 =hmod128 (last7bits)etc.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 86
LinearHashing:Rounds
• DirectoryavoidedinLHbyusingoverflowpages,andchoosingbuckettosplitround-robin
• DuringroundLevel,onlyhLevel andhLevel+1 areinuse
• Thebucketsfromstarttolastaresplitsequentially– thisdoublestheno.ofbuckets
• Therefore,atanypointinaround,wehave– bucketsthathavebeensplit– bucketsthatareyettobesplit– bucketscreatedbysplitsinthisround
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 87
OverviewofLHFile
Levelh
Buckets that existed at thebeginning of this round:
this is the range of
NextBucket to be split if hLevel (r) Buckets split in this round:
is in this range, must use
`split image' bucket.hLevel + 1 (r) to decide if entry is in
`split image' buckets:created (through splittingof other buckets) in this round
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 88
Next - 1
• Buckets0toNext-1havebeensplit
• NexttoNLevel yettobesplit• RoundendswhenallNRinitial
(forroundR)bucketsaresplit
0
NLevel
is in this range, no need
onblackboard
if hLevel (r)
• InthemiddleofaroundLevel– originally0toNLevel
OverviewofLHFile• InthemiddleofaroundLevel– originally0toNLevel
Levelh
Buckets that existed at thebeginning of this round:
this is the range of
NextBucket to be split if hLevel (r) Buckets split in this round:
is in this range, must use
`split image' bucket.hLevel + 1 (r) to decide if entry is in
`split image' buckets:created (through splittingof other buckets) in this round
DukeCS,Fall2016 CompSci 516:DataIntensiveComputingSystems 89
Next - 1
• Buckets0toNext-1havebeensplit
• NexttoNLevel yettobesplit• RoundendswhenallNRinitial
(forroundR)bucketsaresplit
0
NLevel
is in this range, no need
onblackboard
if hLevel (r)
• Search:Tofindbucketfordataentryr,findhLevel(r):• IfhLevel(r)inrange`Next toNLevel ’ ,rbelongshere.• Else,rcouldbelongtobuckethLevel(r) orhLevel(r)+NR
• ApplyhLevel+1(r)tofindout
LinearHashing:Insert
• Insert:FindbucketbyapplyinghLevel /hLevel+1:– Ifbuckettoinsertintoisfull:
1. Addoverflowpageandinsertdataentry2. SplitNext bucketandincrementNext
• Note:Wearegoingtoassumethatasplitis`triggered’wheneveraninsertcausesthecreationofanoverflowpage,butingeneral,wecouldimposeadditionalconditionsforbetterspaceutilization([RG],p.380)
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 90
ExampleofLinearHashing
0hh
1
(Thisinfoisforillustrationonly!)
Level=0, N0=4=2d0 ,d0=2
00
01
10
11
000
001
010
011
(Theactualcontentsofthelinearhashedfile)
Next=0PRIMARYPAGES
Data entry rwith h(r)=5
Primary bucket page
44* 36*32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
• Insert43*=101011• h0(43)=11• Full• Insertinanoverflowpage• NeedasplitatNext(=0)• Entriesin00isdistributedto
000and100
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 26
ExampleofLinearHashing
0hh
1
(Thisinfoisforillustrationonly!)
00
01
10
11
000
001
010
011
(Theactualcontentsofthelinearhashedfile)
Next=0PRIMARYPAGES
Data entry rwith h(r)=5
Primary bucket page
44* 36*32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
0hh
1
00
01
10
11
000
001
010
011
Next=1
PRIMARYPAGES
44* 36*
32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
OVERFLOWPAGES
43*
00100
• Nextisincremented aftersplit• Notethedifferencebetweenoverflowpageof11
andsplitimageof00(000and100)
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 27
Level=0, N0=4=2d0 ,d0=2 Level=0, N0=4=2d0 ,d0=2
ExampleofLinearHashing
0hh
1
00
01
10
11
000
001
010
011
Next=1
PRIMARYPAGES
44* 36*
32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
OVERFLOWPAGES
43*
00100
• Searchfor18*=10010• betweenNext(=1)and4• thisbuckethasnotbeensplit
• 18shouldbehere
• Searchfor32*=100000or44*=101100• Between0andNext-1
• Needh1
• Notallinsertiontriggerssplit• Insert37*=100101• Hasspace
• SplittingatNext?• Nooverflowbucketneeded• Justcopyattheimage/original
• Next=Nlevel-1andasplit?• Startanewround• IncrementLevel• Nextresetto0
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 28
Level=0, N0=4=2d0 ,d0=2
ExampleofLinearHashing
0hh
1
00
01
10
11
000
001
010
011
Next=1
PRIMARYPAGES
44* 36*
32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
OVERFLOWPAGES
43*
00100
• Notallinsertiontriggerssplit• Insert37*=100101
• Hasspace
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 28
Level=0, N0=4=2d0 ,d0=2
0hh
1
00
01
10
11
000
001
010
011
Next=1
PRIMARYPAGES
44* 36*
32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
OVERFLOWPAGES
43*
00100
Level=0, N0=4=2d0 ,d0=2
37*
ExampleofLinearHashing
0hh
1
00
01
10
11
000
001
010
011
Next=1
PRIMARYPAGES
44* 36*
32*
25*9* 5*
14* 18*10*30*
31*35* 11*7*
OVERFLOWPAGES
43*
00100
• SplittingatNext?• Nooverflowbucketneeded• Justcopyattheimage/original
Duke%CS,%Spring%2016 CompSci%516:%Data%Intensive%Computing%Systems 28
Level=0, N0=4=2d0 ,d0=2
0hh
1
00
01
10
11
000
001
010
011
Next=2
PRIMARYPAGES
44* 36*
32*
25*9*
14* 18*10*30*
31*35* 11*7*
OVERFLOWPAGES
43*
00100
Level=0, N0=4=2d0 ,d0=2
37*
insert29*=11101
5* 37*01101 29*
Example:EndofaRound
0hh1
22*
00
01
10
11
000
001
010
011
00100
Next=3
01
10
101
110
PRIMARYPAGES
OVERFLOWPAGES
32*
9*
5*
14*
25*
66* 10*18* 34*
35*31* 7* 11* 43*
44*36*
37*29*
30*
0hh1
37*
00
01
10
11
000
001
010
011
00100
10
101
110
Next=0
111
11
PRIMARYPAGES
OVERFLOWPAGES
11
32*
9* 25*
66* 18* 10* 34*
35* 11*
44* 36*
5* 29*
43*
14* 30* 22*
31*7*
50*
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 96
Level=0, N0=4=2d0 ,d0=2
Level=1,N1=8=2d1 ,d1=3
(afterinserting22*,66*,34*- checkyourself)
insert50*=110010
LHvs.EH
• Theyareverysimilar– hi tohi+1 islikedoublingthedirectory– LH:avoidtheexplicitdirectory,cleverchoiceofsplit– EH:alwayssplit– higherbucketoccupancy
• Uniformdistribution:LHhasloweraveragecost– Nodirectorylevel
• Skeweddistribution– Manyempty/nearlyemptybucketsinLH– EHmaybebetterDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 97
Summary
• Hash-basedindexes:bestforequalitysearches,cannotsupportrangesearches.
• StaticHashingcanleadtolongoverflowchains.• ExtendibleHashingavoidsoverflowpagesbysplittingafullbucketwhenanewdataentryistobeaddedtoit– Duplicatesmaystillrequireoverflowpages– Directorytokeeptrackofbuckets,doublesperiodically– Cangetlargewithskeweddata;additionalI/Oifthisdoesnotfitinmainmemory
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 98
Summary
• LinearHashingavoidsdirectorybysplittingbucketsround-robin,andusingoverflowpages– Overflowpagesnotlikelytobelong– Duplicateshandledeasily
• Forhash-basedindexes,askewed datadistributionisoneinwhichthehashvaluesofdataentriesarenotuniformlydistributed– bad
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 99
SelectionofIndexes
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 100
DifferentFileOrganizations
Searchkey=<age,sal>
• Heapfiles– randomorder;insertatend-of-file
• Sortedfiles– sortedon<age,sal>
• ClusteredB+treefile– searchkey<age,sal>
• Heapfilewithunclustered B+-treeindex– onsearchkey<age,sal>
• Heapfilewithunclustered hashindex– onsearchkey<age,sal>
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 101
Weneedtounderstandtheimportanceofappropriatefileorganizationandindex
PossibleOperations
• Scan– Fetchallrecordsfromdisktobufferpool
• Equalitysearch– Findallemployeeswithage=23andsal =50– Fetchpagefromdisk,thenlocatequalifyingrecordinpage
• Rangeselection– Findallemployeeswithage>35
• Insertarecord– identifythepage,fetchthatpagefromdisk,insetrecord,writeback
todisk(possiblyotherpagesaswell)
• Deletearecord– similartoinsert
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 102
UnderstandingtheWorkload• Aworkloadisamixofqueries andupdates
• Foreachqueryintheworkload:– Whichrelationsdoesitaccess?– Whichattributesareretrieved?– Whichattributesareinvolvedinselection/joinconditions?How
selectivearetheseconditionslikelytobe?
• Foreachupdateintheworkload:– Whichattributesareinvolvedinselection/joinconditions?How
selectivearetheseconditionslikelytobe?– Thetypeofupdate(INSERT/DELETE/UPDATE),andtheattributesthatare
affected
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 103
ChoiceofIndexes
• Whatindexesshouldwecreate?– Whichrelationsshouldhaveindexes?Whatfield(s)shouldbethesearchkey?Shouldwebuildseveralindexes?
• Foreachindex,whatkindofanindexshoulditbe?– Clustered?Hash/tree?
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 104
MoreonChoiceofIndexes• Oneapproach:
– Considerthemostimportantqueries– Considerthebestplanusingthecurrentindexes– Seeifabetterplanispossiblewithanadditionalindex.– Ifso,createit.– Obviously,thisimpliesthatwemustunderstandhowaDBMSevaluatesqueries
andcreatesqueryevaluationplans– Wewilllearnqueryexecutionandoptimizationlater- Fornow,wediscusssimple
1-tablequeries.
• Beforecreatinganindex,mustalsoconsidertheimpactonupdatesintheworkload!
• Trade-off:Indexescanmakequeriesgofaster,updatesslower.Requirediskspace,too.
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 105
Trade-offsforIndexes• Indexescanmake
– queriesgofaster– updatesslower
• Requirediskspace,too
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 106
IndexSelectionGuidelines• AttributesinWHEREclausearecandidatesforindexkeys
– Exactmatchconditionsuggestshashindex– Rangequerysuggeststreeindex– Clusteringisespeciallyusefulforrangequeries
• canalsohelponequalityqueriesiftherearemanyduplicates
• Trytochooseindexesthatbenefitasmanyqueriesaspossible– Sinceonlyoneindexcanbeclusteredperrelation,chooseitbasedon
importantqueriesthatwouldbenefitthemostfromclustering
• Multi-attributesearchkeysshouldbeconsideredwhenaWHEREclausecontainsseveralconditions
– Orderofattributesisimportantforrangequeries
• Note:clusteredindexshouldbeusedjudiciously– expensiveupdates,althoughcheaperthansortedfiles
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 107
ExamplesofClusteredIndexes• B+treeindexonE.age canbeusedtogetqualifyingtuples
• Howselectiveisthecondition?– everyone>40,indexnotofmuchhelp,scanisasgood
– Suppose10%>40.Then?
• Dependsoniftheindexisclustered– otherwisecanbemoreexpensivethanalinearscan
– ifclustered,10%I/O(+indexpages)
SELECT E.dnoFROM Emp EWHERE E.age>40
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 108
Whatisagoodindexingstrategy?
ExamplesofClusteredIndexesGroup-Byquery
• UseE.age assearchkey?– BadIfmanytupleshaveE.age >10orifnot
clustered….– …usingE.age indexandsortingtheretrieved
tuplesbyE.dno maybecostly
• ClusteredE.dno indexmaybebetter– Firstgroupby,thencounttupleswithage>
10– goodwhenage>10isnottooselective
• Note:thefirstoptionisgoodwhentheWHEREconditionishighlyselective(fewtupleshaveage>10),thesecondisgoodwhennothighlyselective
SELECT E.dno, COUNT (*)FROM Emp EWHERE E.age>10GROUP BY E.dno
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 109
Whatisagoodindexingstrategy?
ExamplesofClusteredIndexes
Equalityqueriesandduplicates
• ClusteringonE.hobby helps– hobbynotacandidatekey,several
tuplespossible
• Doesclusteringhelpnow?• (eid =key)
– Notmuch– atmostonetuplesatisfiesthe
condition
SELECT E.dnoFROM Emp EWHERE E.hobby=‘Stamps’
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 110
SELECT E.dnoFROM Emp EWHERE E.eid=50
Whatisagoodindexingstrategy?
IndexeswithCompositeSearchKeys• CompositeSearchKeys:Searchon
acombinationoffields
• Equalityquery:Everyfieldvalueisequaltoaconstantvalue.E.g.wrt<sal,age>index:– age=20andsal =75
• Rangequery:Somefieldvalueisnotaconstant.E.g.:– sal >10– <age,sal>doesnothelp– hastobeaprefix
sue 13 75
bobcaljoe 12
10
208011
12
name age sal
<sal, age>
<age, sal> <age>
<sal>
12,2012,1011,80
13,75
20,1210,12
75,1380,11
11121213
10207580
Data recordssorted by name
Data entries in indexsorted by <sal,age>
Data entriessorted by <sal>
Examples of composite keyindexes using lexicographic order.
CompositeSearchKeys• ToretrieveEmp recordswithage=30AND sal=4000,anindexon
<age,sal>wouldbebetterthananindexonage oranindexonsal– firstfindage=30,amongthemsearchsal =4000
• Ifconditionis:20<age<30AND 3000<sal<5000:– Clusteredtreeindexon<age,sal>or<sal,age>isbest.
• Ifconditionis:age=30AND 3000<sal<5000:– Clustered<age,sal>indexmuchbetterthan<sal,age>index– moreindexentriesareretrievedforthelatter
• Compositeindexesarelarger,updatedmoreoftenDukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 112
Index-OnlyPlans• Anumberofqueriescanbeansweredwithoutretrievingany
tuplesfromoneormoreoftherelationsinvolvedifasuitableindexisavailable
SELECT E.dno, COUNT(*)FROM Emp EGROUP BY E.dno
SELECT E.dno, MIN(E.sal)FROM Emp EGROUP BY E.dno
SELECT AVG(E.sal)FROM Emp EWHERE E.age=25 AND
E.sal BETWEEN 3000 AND 5000
<E.dno>
<E.dno,E.sal>Tree index!
<E. age,E.sal>Tree index!
DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 113
• Forindex-onlystrategies,clusteringisnotimportant