Upload
brianne-mclaughlin
View
212
Download
0
Embed Size (px)
Citation preview
Industrial project 234313Sergey Semenko &Ivan Nesmeyanov
Under supervision of Eliezer Levy
Erlang routing mesh overview and implementation details.
W W W
NM
Virtual machine
Node structure. User processes
NM – Node Manager, process in charge of communicating with mesh and dispatching jobs to Workers. NM can play two different roles in mesh: Root and Leaf. More about this in mesh topology description
W – Worker, process that executes jobs. Capable of executing one job at a time. When done, notifies NM and client.
Node structure. Supervision tree
S – Supervisor, is a system process in charge of restarting crashed processes. Doesn’t implement user logic.
NM and W are user processes
S
NM w w w
Node Manager’s roles. Leaf
Leaf Node manager dispatches received jobs to local workers gets “done” notifications and notifies its root.
Worker sends job result directly to client and notifies Node manager.W W W
NM
job
job
done
done
result
Node manager’s roles. Root
Roots are responsible for getting jobs from web server and forwarding them to the least occupied Leaf they have.
Roots are also responsible for accepting join mesh requests from node managers and assigning roles to them.
Roots do not execute jobs on their local workers.
R
L L L
Mesh topology overview
Each HTTP request is forwarded by web server to a randomly chosen Root.
Each Root forwards jobs to the least occupied Leafs registered at them
Results are sent directly to web server from workers.
Web Server
R R R R R
HTTP requests
Mesh job requests
L L L L L LL L
Mesh topology. Join protocolEach root has at most MaxChildNum leaves. Root
that has maximum children is called saturated. Root with less children is called hungry.
There are always at least MinRootNum roots in operating mesh. If there are less, each new node is assigned a root role.
If all roots are saturated, a new node is assigned to be a Root. Otherwise it becomes hungry root’s leaf.
Recovery protocol.When a leaf crashes, its root is notified and its jobs
are reassigned.When root crashes, all its leafs perform a join request.
If such a leaf gets root role, it reassigns its pending jobs.
When both leaf and its root crash and there is no info about the job in mesh, web server resends the job after a timeout.
Sending job protocolWeb server randomly chooses one of the roots and
sends the request to it. If that root has no leaves registered, web server is notified and request is resent. Otherwise the job is forwarded to the least occupied leaf of the root.
If all leaf’s workers are occupied the job is stored in the pending jobs list. When worker becomes available it is assigned one of the pending jobs
Implementation details. Process groups
There are two registered pg2 groups: root_group and hungry_group.
When a node manager becomes a root, it joins root_group and hungry_group.
When root has maximum number of children, it leaves the hungry_group.
When a saturated node looses a leaf, it rejoins hungry_group.
Access to groups is synchronized by a global lock to avoid race conditions.
Locking clarificationAn Erlang way to synchronize access to a shared
resource is by implementing a “resource manager” process that would get access requests and execute them.
Due to requirement to have no single point of failure we decided not to implement such a process to sync access to root groups.
Hence we were forced to implement a locking primitive which is not an Erlang way to solve the problem.
Implementation details.Monitoring
Each leaf monitors its root (erlang:monitor() function).Each root monitors its leaves.
Implementation details.Node manager module
Implements gen_server behavior.Role is preserved in the state. When the role is
changed only the corresponding field in the state changes. Node manager is capable of processing other roles’ messages (this is useful when leaf turns into a root and might still get job_done messages from its workers)
Implementation details.Worker module
Implements gen_server behavior.The only function is to execute jobs.Job execution is not of our interest. It is simulated by
sleeping a certain amount of time.
Implementation details.Mesh module
Provides interface for sending jobs protocol.Interface for joining mesh protocol.