Apache ZooKeeper · • ZooKeeper provides a very simple interface to a highly reliable and...

Preview:

Citation preview

ApacheZooKeeper

CMSC491Hadoop-BasedDistributedCompu=ng

Spring2016AdamShook

Whatisit?

•  ApacheZooKeeperisanefforttodevelopandmaintainanopen-sourceserverwhichenableshighlyreliabledistributedcoordina=on.–  Simple– Replicated– Ordered–  Fast

Provides

•  Configura=onInforma=on•  DistributedSynchroniza=on•  GroupServices

•  Eachoftheseservicesareusedinsomebydistributedapplica=ons

Interface

•  ZooKeeperprovidesaverysimpleinterfacetoahighlyreliableanddistributedservice

•  Powerfulabstrac=onscanbebuiltfromthisverysimpleinterface

•  CurrentlyinterfacesareinJavaandC– WanttoexpandtoPython,Perl,andREST.

TheCore

•  Sharedhierarchicalnamespaceofdataregisters,calledznodes

•  Unlikefilesystems,providesclientswithhighthroughput,lowlatency,highlyavailable,andorderedaccesstoznodes

Quorum

Namespace

znodes

•  Meta-informa=on:– Configura=on–  StatusInforma=on–  Loca=onInforma=on– Whateveryouwant(that’ssmall)

znodes

•  Eachnodeactsasafileanddirectory•  1MBmaximumperznode•  Persistentvs.Ephemeral•  Sequen=alznodes•  Fullpaths– Anop=onal“chroot”suffixcanbeappendedtoconnec=onstring

–  “127.0.0.1:3000,127.0.0.1:3002/app/a”

Watchers

•  Tiedtoeachznode

•  One-=metrigger•  Senttotheclient•  Thedataforwhyitwassent

That’sIt

•  Inanutshell•  Verybasicservice,fromwhichpowerfulabstrac=onscanbebuilt

•  Let’stalkabouthowgooditis!– Thatis,ifyoudon’thaveanyques=onsrightnow…•  Youcanask.Idon’tbite

–  Really»  Promise

UseCase:Loca=onData•  Serversstoremachinehostnameasephemeralznodes

– /app1/machine1– /app1/machine87– /app1/machine4

•  Whenaserverisadded,createanewznode•  Whenaserverisremoved,znodeisdeleted•  Whenaserverfails,ZKwilldeletetheephemeralnode•  Allowsfordynamicthronlingofresources•  Clientscanchooseahostnamefromchildrenof/app1to

connectto– Setachildwatchon/app1,ifservergoesdownitwillreceiveno=fica=onandcanchooseanewserver

UseCase:Status

•  UseZooKeeperasaheartbeatmechanism•  “Master”servicekeepsdatawatchesonznodes

•  Serverssetthedataoftheirnodeevery15seconds

•  IftheMasterdoesn’treceiveano=fica=onchangewithin20seconds,canassumethatserverhasfailedandkillitbeforebadthingshappen.

Performance

Performance

CommandLineInterface

•  Interac=veusageofthenamespaceinashell–  create[path][data]– delete[path]–  get[path]–  set[path]–  ls[path]–  rmr[path]– Anumberofothercommands…

•  Tabcomple=on!

API

•  Currentandstablev3.4.6(March2014)•  RequiresonlyalistofZKserverstoconnect•  IMO,goodbutmessyinterface•  RecommendbuildinganicewrapperAPIforgerng/serngPODtypesandhandlingexcep=ons

Recipes!

•  Wearegoingtotalkaboutthese:•  Configura=on•  DistributedLocks•  DistributedQueue

Configura=on

•  Configura=onisosendriventhroughkey/valuepairsstoredinafile– Cangetmessywhenconfigura=onisdynamic

•  Implementa=onisverystraightorward,asitiswhatZooKeeperwasdesignedfor

•  Eachfull-pathedznodeisthekeyandthedataassociatedwiththeznodeisthevalue

Variables

•  Sta=cVariables–  Thoseonesthatareprobablynevergoingtochange(notasmuchfun)

•  DynamicVariables– Changedbyhandviacommandlineorbytheapplica=onitself•  Trackstatusofprocesses•  Updatehistoricaldata

UseofWatchers

•  Applica=onscanchangeconfigura=onontheflyforsomevariables

•  Wheneveravariablechanges,thosewatchinganodecanreceivethechangedvariableandmakethecorrectchanges

•  Veryusefulforlong-runningapplica=onsthatrequirethemostuptodateinforma=on

DistributedLocks•  Ameanstohavedistributedprocessesretrievealockfor

someopera=on–  Thronledupda=ngofdatabase–  Yourusecasehere!

•  ExistsinZooKeeper'srecipesdirectoryandisdistributedwiththerelease--src/recipes/lock

Algorithm•  Defineaznodetoholdthelock,say“/dlock”1.  mypath=create(“/dlock/lock-”),withthesequence

andephemeralflagsset2.  children=getChildren(“/dlock”),nowatch3.  Ifmypathhaslowestnumbersuffixinchlidren,exit4.  Callexists()onnodefromchildrenwithnextlowest

sequencenumberwiththewatchflagset1.  i.e.,ifmypathis“/dlock/lock-6”andchildrencontains

3,4,6,7,callexistson“/dlock/lock-4”5.  Ifexistsisfalse,gotostep26.  Iftrue,waitforwatchtriggerbeforegoingtostep2

DistributedQueues•  Ameanstoallowclientstoasynchronouslyaddelementstoa

queueandhaveasingleprocessorapplica=ondequeueandprocessthem.–  Ican’trememberthelast=meIneededaqueue– Maybeyouhaveafew

Algorithm•  Designateaznodetoholdthequeue,say“/dqueue”•  Enqueue:create(“/dqueue/queue-”),withsequenceandephemeralflagsset.–  Returnsarealpathnode/dqueue/queue-X,whereXisamonotonicincreasingnumber

•  Dequeue:getChildren(“/dqueue”),watchsettotrue•  Processthesenodeswiththelowestnumberfirst–  NoneedtocallgetChildren()un=lthecurrentreceivedlistisexhausted

•  Ifnochildrenareinthequeue,waitforwatchno=fica=onbeforecheckingagain

PriorityQueueExtension•  Twosimplemodifica=onstothisalgorithm!– Whenenqueuing,pathnamesendswithqueue-ZZ,whereZZisthepriorityoftheelement•  Lowerthenumber,higherthepriority

– Whendequeuing,ifthewatchno=fica=onistriggeredonthe“/dqueue”node,clientneedstocallgetChildren()againandresortbypriority.

OtherRecipes

•  Groupmembership•  Barriers•  Two-phasedcommit•  LeaderElec=on

ApacheCurator

•  "Curatornˈkyoor͝ˌātər:akeeperorcustodianofamuseumorothercollec=on-AZooKeeperKeeper.“

•  Contains:•  Recipes•  Framework•  U=li=es

•  Client•  Errors•  Extensions

References

•  hnp://zookeeper.apache.org•  hnp://curator.apache.org