Erlang routing mesh overview and implementation details. Industrial project 234313 Sergey Semenko & Ivan Nesmeyanov Under supervision of Eliezer Levy Node structure. User processes п‚— NM вЂ“ Node Manager, process in charge of communicating with mesh and dispatching jobs to Workers. NM can play two different roles in mesh: Root and Leaf. More about this in mesh topology description Virtual machine NM W W W п‚— W вЂ“ Worker, process that executes jobs. Capable of executing one job at a time. When done, notifies NM and client. Node structure. Supervision tree п‚— S вЂ“ Supervisor, is a NM w w w system process in charge of restarting crashed processes. DoesnвЂ™t implement user logic. п‚— NM and W are user processes Node ManagerвЂ™s roles. Leaf job done result п‚— Leaf Node manager NM W W W dispatches received jobs to local workers gets вЂњdoneвЂќ notifications and notifies its root. п‚— Worker sends job result directly to client and notifies Node manager. Node managerвЂ™s roles. Root п‚— Roots are responsible for R L L L getting jobs from web server and forwarding them to the least occupied Leaf they have. п‚— Roots are also responsible for accepting join mesh requests from node managers and assigning roles to them. п‚— Roots do not execute jobs on their local workers. Mesh topology overview HTTP requests п‚— Each HTTP request is Web Server Mesh job requests R R R R R forwarded by web server to a randomly chosen Root. п‚— Each Root forwards jobs to the least occupied Leafs registered at them п‚— Results are sent directly to web server from workers. Mesh topology. Join protocol п‚— Each root has at most MaxChildNum leaves. Root that has maximum children is called saturated. Root with less children is called hungry. п‚— There are always at least MinRootNum roots in operating mesh. If there are less, each new node is assigned a root role. п‚— If all roots are saturated, a new node is assigned to be a Root. Otherwise it becomes hungry rootвЂ™s leaf. Recovery protocol. п‚— When a leaf crashes, its root is notified and its jobs are reassigned. п‚— When root crashes, all its leafs perform a join request. If such a leaf gets root role, it reassigns its pending jobs. п‚— When both leaf and its root crash and there is no info about the job in mesh, web server resends the job after a timeout. Sending job protocol п‚— Web server randomly chooses one of the roots and sends the request to it. If that root has no leaves registered, web server is notified and request is resent. Otherwise the job is forwarded to the least occupied leaf of the root. п‚— If all leafвЂ™s workers are occupied the job is stored in the pending jobs list. When worker becomes available it is assigned one of the pending jobs Implementation details. Process groups п‚— There are two registered pg2 groups: root_group п‚— п‚— п‚— п‚— and hungry_group. When a node manager becomes a root, it joins root_group and hungry_group. When root has maximum number of children, it leaves the hungry_group. When a saturated node looses a leaf, it rejoins hungry_group. Access to groups is synchronized by a global lock to avoid race conditions. Locking clarification п‚— An Erlang way to synchronize access to a shared resource is by implementing a вЂњresource managerвЂќ process that would get access requests and execute them. п‚— Due to requirement to have no single point of failure we decided not to implement such a process to sync access to root groups. п‚— Hence we were forced to implement a locking primitive which is not an Erlang way to solve the problem. Implementation details. Monitoring п‚— Each leaf monitors its root (erlang:monitor() function). п‚— Each root monitors its leaves. Implementation details. Node manager module п‚— Implements gen_server behavior. п‚— Role is preserved in the state. When the role is changed only the corresponding field in the state changes. Node manager is capable of processing other rolesвЂ™ messages (this is useful when leaf turns into a root and might still get job_done messages from its workers) Implementation details. Worker module п‚— Implements gen_server behavior. п‚— The only function is to execute jobs. п‚— Job execution is not of our interest. It is simulated by sleeping a certain amount of time. Implementation details. Mesh module п‚— Provides interface for sending jobs protocol. п‚— Interface for joining mesh protocol.