Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Thisdocumentiscopyright(C)StanfordComputerScienceandMartyStepp,licensedunderCreativeCommonsAttribution2.5License.Allrightsreserved.BasedonslidescreatedbyKeithSchwarz,JulieZelenski,JerryCain,EricRoberts,MehranSahami,StuartReges,CynthiaLee,andothers.
CS106B,Lecture27AdvancedHashing
Thisdocumentiscopyright(C)StanfordComputerScienceandAshleyTaylor,licensedunderCreativeCommonsAttribution2.5License.Allrightsreserved.BasedonslidescreatedbyMartyStepp,ChrisGregg,KeithSchwarz,JulieZelenski,JerryCain,EricRoberts,MehranSahami,StuartReges,CynthiaLee,andothers
2
Plan for Today • DiscusshowHashMapsdifferfromHashSets• AnotherimplementationforHashSet/Map:CuckooHashing!• Discussqualitiesofagoodhashfunction.• Learnaboutanotherapplicationforhashing:cryptography.
3
Hash map (15.4)
• Ahashmapislikeasetwherethenodesstorekey/valuepairs:
//key(ID)value(name)map.put(51234562,"Ashley");map.put(62756179,"Amy");map.put(54727849,"Marty");map.put(46281955,"Seth");– MustmodifytheHashNodeclasstostoreakeyandavalue
index 0 1 2 3 4 5 6 7 8 9value
62756179 Amy46281955 Seth51234562 Ashley
54727849 Marty
4
Hash map vs. hash set – Thehashingisalwaysdoneonthekeys,notthevalues.– ThecontainsfunctionisnowcontainsKey;thereandinremove,yousearchforanodewhosekeymatchesagivenkey.
– Theaddmethodisnowput;ifthegivenkeyisalreadythere,youmustreplaceitsoldvaluewiththenewone.map.put(54727849,"Chris");//replaceMartywithChris
index 0 1 2 3 4 5 6 7 8 9value
62756179 Amy46281955 Seth51234562 Ashley
54727849 MartyChris
5
Another Way to Hash • Fun(butsoontoberelevant)fact:cuckoobirdslaytheireggsinotherbirds’nests
Source:wikimedia
6
Cuckoo Hashing • Whatifwemadecontainsreallyfast(lookatatmosttwoelements,nomatterwhat)?
• Idea:havetwoarraysthatstoreelements,whereeacharrayhasitsownhashfunction
• Tryhashingtheelementintobotharrays,andputitinanemptyspace
• Ifnospaceisempty,kickoutoneoftheexistingelementsandmoveittotheotherarray.
• Containsjustchecksthecorrespondingspotinbotharrays• Sloweradd,butfastercontains
7
Cuckoo Hashing Insert:3
HashFunction:3x%4 HashFunction:(2x+1)%4
8
Cuckoo Hashing
3
Insert:3
HashFunction:3x%4 HashFunction:(2x+1)%4
9
Cuckoo Hashing
3
Insert:6
HashFunction:3x%4 HashFunction:(2x+1)%4
10
Cuckoo Hashing
3 6
Insert:6
HashFunction:3x%4 HashFunction:(2x+1)%4
11
Cuckoo Hashing
3 6
Insert:5
HashFunction:3x%4 HashFunction:(2x+1)%4
12
Cuckoo Hashing
3 6
5
Insert:5
HashFunction:3x%4 HashFunction:(2x+1)%4
13
Cuckoo Hashing
3 6
5
Insert:7
HashFunction:3x%4 HashFunction:(2x+1)%4
14
Cuckoo Hashing
3 6
7
Insert:7
HashFunction:3x%4 HashFunction:(2x+1)%4
5
15
Cuckoo Hashing
3
5
6
7
Insert:7
HashFunction:3x%4 HashFunction:(2x+1)%4
16
Cuckoo Hashing
3
5
6
7
Searchfor7(lookinbotharrays)
HashFunction:3x%4 HashFunction:(2x+1)%4
17
Cuckoo Hashing • Whataretheadvantagesordisadvantagesofcuckoohashingversusresolvingcollisionsthroughchaining?
• Whatdoweneedtowatchoutfor?Whenshouldwerehash?
18
Announcements • Calligraphyannouncements
– Shouldstartthe3rdparttodayortomorrowatthelatest– StartercodeandWindows–pleaseredownload– Nolatedaysmaybeused,nolatesubmissionsaccepted
• Lastclasstomorrow–gotopoll.ly/#/LdVNgWyo/G6z0awRv• FinalisaonSaturday,at8:30AM,inCubberleyAuditorium
– Everythingfromthecoursethroughtodayisfairgame,emphasisisonsecondhalfmaterials(startingwithpointers)
– Moreinformation:https://web.stanford.edu/class/cs106b/exams/final.html
– Practiceexamisonline–notguaranteedtomatchinformat,etc.– WednesdayandThursdaywillbefinalreview
• Pleasegiveusfeedback!cs198.stanford.edu
19
Hashing strings
• Itiseasytohashanintegeri(useindexabs(i)%length).– Howcanwehashothertypesofvalues(suchasstrings)?
• Ifwecouldconvertstringsintointegers,wecouldhashthem.– Whatkindofintegerisappropriateforagivenstring?– Doesitmatterwhatintegerwechoose?Whatshoulditbebasedon?
index 0 1 2 3 4 5 6 7
character 'H' 'i' '' 'D' '0' '0' 'd' '!'
20
hashCode consistency • AvalidhashCodefunctionmustbeconsistent(mustproducesameresultsoneachcall)
hashCode(x)==hashCode(x),ifx'sstatedoesn'tchange
21
hashCode and equality • AvalidhashCodefunctionmustbeconsistentwithequality.
a==bmustimplythathashCode(a)==hashCode(b).Vector<int> v1; Vector<int> v2; v1.add(1); v2.add(3); v1.add(3); v2.insert(0, 1); // hashCode(v1) == hashCode(v2)
a!=b doesNOTnecessarilyimplythat
hashCode(a)!=hashCode(b) (whynot?)
22
hashCode distribution • AgoodhashCodefunctioniswell-distributed.
– Foralargesetofdistinctvalues,theyshouldgenerallyreturnuniquehashcodesratherthanoftencollidingintothesamehashbucket.
– Thispropertyisdesiredbutnotrequired.Why?
23
Possible hashCode 1 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#1return42;}
0 1 2 3 4 5 6 7
H i D 0 0 d !
24
Possible hashCode 2 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#2returnrandomInteger(0,9999999);}
0 1 2 3 4 5 6 7
H i D 0 0 d !
25
Possible hashCode 3 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#3return(int)&s;//addressofs(apointer)}
0 1 2 3 4 5 6 7
H i D 0 0 d !
26
Possible hashCode 4 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#4returns.length();}
0 1 2 3 4 5 6 7
H i D 0 0 d !
27
Possible hashCode 5 • Q:Isthisavalidhashfunction?Isitgood?inthashCode(strings){//#5if(s.length()>0){return(int)s[0];//asciiof1stchar}else{return0;}}
0 1 2 3 4 5 6 7
H i D 0 0 d !
28
Possible hashCode 6 • Thisfunctionsumsthecharacters'ASCIIvalues.
– Isitvalid?Isitgood?– Whatwillcollide?inthashCode(strings){//#6inthash=0;for(inti=0;i<s.length();i++){hash+=(int)s[i];//ASCIIofchar}returnhash;}
0 1 2 3 4 5 6 7
H i D 0 0 d !
29
Measuring collisions • Hashfunction=sumofcharactersofstring.• Add50,000,000articletitlestoahashmapwith50,000buckets:
30
Idea: Weighted sum hash=s[0]+s[1]+s[2]+...+s[n]
• Insteadofadding,let'sgiveeachcharacteraweight.– Multiplyitbyincreasingpowersofsomeprimenumber;say,31.– Thishelpsspreadthestrings'hashcodesovertherangeofintvalues.
hash=s[0]+(31*s[1])+(312*s[2])+...+(31n*s[n])
31
hashCode for strings inthashCode(strings){inthash=5381;for(inti=0;i<(int)s.length();i++){hash=31*hash+(int)s[i];}returnhash;}– FYI:TheaboveistheactualhashfunctionusedforstringsinJava.
– Aswithanygeneralhashingfunction,collisionsarepossible.• Example:"Ea"and"FB"havethesamehashvalue.
32
Measuring collisions • Hashfunction=sumofcharactersofstring,multiplyingby31.• Add50,000,000articletitlestoahashmapwith50,000buckets:
33
Hashing structs/objects • Bydefaultyoucannotaddyourownstructs/objectstohashsets.
– Ourlibrariesdon'tknowhowtohashtheseobjects.structPoint{intx;inty;
...
};HashSet<Point>hset;Pointp{17,35};hset.add(p);ERROR:nomatchingfunctionforcallto'hashCode(constPoint&)'
34
Hashing structs/objects • Tomakeyourowntypeshashablebyourlibraries:
– 1)Overloadthe==operator.– 2)WriteahashCodefunctionthattakesyourtypeasitsparameter.
• "Addup"theobject'sstate;scale/multiplypartstodistributetheresults.
structPoint{intx;inty;
...
};
inthashCode(constPoint&p){return1337*p.y+31*p.x;}
booloperator==(constPoint&p1,constPoint&p2){returnp1.x==p2.x&&p1.y==p2.y;}
35
Hashing and Passwords • Wewanttostoreafileofuserpasswords
– Whenausertypesapassword,seeifitmatchesourfile• Problem:anyonewhocanseeourfilecangetallthepasswords
User Password Ashley password123
Shreya traceComics Seth ki88leLuv
36
Hashing and Passwords • Whatifwestoredauniquecodeforeachpasswordinsteadofthestring?– Hashing!
• Extrarequirementsforthehashfunction:– Wantalargenumberofpossiblevalues(hardtofindcollisions)– Can’tfindthepasswordfromthehash(one-way)– Generallyuseadifferenthashfunction(e.g.SHA-256)
• TheneedforsaltingUser Password Ashley 17851691385
Marty 63158910316 Amy 90713593110
37
Hashing and Data Integrity • Acommon"attack"incryptographyisman-in-the-middle• Howcanyouensurethatahackerdidn'tinterferewiththedata?• Getthehashfromatrustedsource–sincehashfunctionsonlyrarelyhavecollisions,changestodatawillleadtoadifferenthash