15
Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details.

Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Embed Size (px)

Citation preview

Page 1: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Industrial project 234313Sergey Semenko &Ivan Nesmeyanov

Under supervision of Eliezer Levy

Erlang routing mesh overview and implementation details.

Page 2: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

W W W

NM

Virtual machine

Node structure. User processes

NM – Node Manager, process in charge of communicating with mesh and dispatching jobs to Workers. NM can play two different roles in mesh: Root and Leaf. More about this in mesh topology description

W – Worker, process that executes jobs. Capable of executing one job at a time. When done, notifies NM and client.

Page 3: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Node structure. Supervision tree

S – Supervisor, is a system process in charge of restarting crashed processes. Doesn’t implement user logic.

NM and W are user processes

S

NM w w w

Page 4: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Node Manager’s roles. Leaf

Leaf Node manager dispatches received jobs to local workers gets “done” notifications and notifies its root.

Worker sends job result directly to client and notifies Node manager.W W W

NM

job

job

done

done

result

Page 5: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Node manager’s roles. Root

Roots are responsible for getting jobs from web server and forwarding them to the least occupied Leaf they have.

Roots are also responsible for accepting join mesh requests from node managers and assigning roles to them.

Roots do not execute jobs on their local workers.

R

L L L

Page 6: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Mesh topology overview

Each HTTP request is forwarded by web server to a randomly chosen Root.

Each Root forwards jobs to the least occupied Leafs registered at them

Results are sent directly to web server from workers.

Web Server

R R R R R

HTTP requests

Mesh job requests

L L L L L LL L

Page 7: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Mesh topology. Join protocolEach root has at most MaxChildNum leaves. Root

that has maximum children is called saturated. Root with less children is called hungry.

There are always at least MinRootNum roots in operating mesh. If there are less, each new node is assigned a root role.

If all roots are saturated, a new node is assigned to be a Root. Otherwise it becomes hungry root’s leaf.

Page 8: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Recovery protocol.When a leaf crashes, its root is notified and its jobs

are reassigned.When root crashes, all its leafs perform a join request.

If such a leaf gets root role, it reassigns its pending jobs.

When both leaf and its root crash and there is no info about the job in mesh, web server resends the job after a timeout.

Page 9: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Sending job protocolWeb server randomly chooses one of the roots and

sends the request to it. If that root has no leaves registered, web server is notified and request is resent. Otherwise the job is forwarded to the least occupied leaf of the root.

If all leaf’s workers are occupied the job is stored in the pending jobs list. When worker becomes available it is assigned one of the pending jobs

Page 10: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Implementation details. Process groups

There are two registered pg2 groups: root_group and hungry_group.

When a node manager becomes a root, it joins root_group and hungry_group.

When root has maximum number of children, it leaves the hungry_group.

When a saturated node looses a leaf, it rejoins hungry_group.

Access to groups is synchronized by a global lock to avoid race conditions.

Page 11: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Locking clarificationAn Erlang way to synchronize access to a shared

resource is by implementing a “resource manager” process that would get access requests and execute them.

Due to requirement to have no single point of failure we decided not to implement such a process to sync access to root groups.

Hence we were forced to implement a locking primitive which is not an Erlang way to solve the problem.

Page 12: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Implementation details.Monitoring

Each leaf monitors its root (erlang:monitor() function).Each root monitors its leaves.

Page 13: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Implementation details.Node manager module

Implements gen_server behavior.Role is preserved in the state. When the role is

changed only the corresponding field in the state changes. Node manager is capable of processing other roles’ messages (this is useful when leaf turns into a root and might still get job_done messages from its workers)

Page 14: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Implementation details.Worker module

Implements gen_server behavior.The only function is to execute jobs.Job execution is not of our interest. It is simulated by

sleeping a certain amount of time.

Page 15: Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Erlang routing mesh overview and implementation details

Implementation details.Mesh module

Provides interface for sending jobs protocol.Interface for joining mesh protocol.