Yarn ( Yet Another Resource Negotiator) :
The YARN was introduced basically to split up the functionalities of resource management and job scheduling or monitoring into separate processes .The Whole idea was to have a global ResourceManager (RM) and for each application an ApplicationMaster (AM). An application can be a single job or a DAG of jobs so that the
MapReduce jobs will run unchanged on top of YARN with just by recompile.
Resource Manager : There are 2 components in ResourceManager:
1.Scheduler
2. ApplicationsManager.
Application Manager : The ApplicationsManager will accept the submitted jobs and ignore the first container for executing the application specific .Application Master and on failure,it provides the service for restarting the ApplicationMaster container.Per-application ApplicationMaster will ignore the appropriate resource containers from the Scheduler, track their status and monitor their progress.
Scheduler :
The Scheduler will be mainly for allocating resources to various running applications keeping the familiar constraints of capacities, queues etc. for allocation The Scheduler does not perform any monitoring or tracking the status for the application. It is not responsbible as to why the restarting failed tasks due to application failure or hardware failures. The Scheduler will schedule based the resource requirements of the applications. Sheduling is done based on the abstract conception of a resource container which includes elements such as memory, cpu, disk, network etc.
Node Manager :
The Node Manager is the slave which will be many per cluster . Upon starting,the node manager will send a heartbeat signal to the Resource Manager periodically. Node Manager offers some resources to the cluster for execution of programs. Resource capacity is amount of memory and the number of vcores. At run-time, the Resource Scheduler will decide use this capacity at runtime.Container is a fraction of the NodeManager capacity and is used by the client for running the program.
Container :
Container is an allocated resource in the cluster. Set of system resources like , CPU core , RAM etc are allocated for each container. It is the sole authority of ResourceManager to allocate any Container to applications
Application Master :
The Application Master will be responsible for the execution of a single application.The Resource Scheduler (Resource Manager) will provide the required containers on which the specific programs (e.g., the main of a Java class) are executed. The Application Master knows the application logic and hence it is framework-specific. The MapReduce framework gives its own implementation of an Application Master.
In YARN, there are three actors:
o The Job Submitter (the client)
o The Resource Manager (the master)
o The Node Manager (the slave)
Yarn Execution Process :
The application startup process is the following:
o Mapreduce Application will be submitted by the client program to the resource manager. It also provides the information required to launch the application-specific ApplicationMaster.
o Client program submits the MapReduce application to the ResourceManager, along with information to launch the application-specific ApplicationMaster.
o ResourceManager will negotiate a container for the ApplicationMaster and launches the ApplicationMaster.
o ApplicationMastervwill boot and registers itself with the ResourceManager, there by allowing the original calling client to converse directly with the ApplicationMaster.
o ApplicationMaster will negotiate resources (resource containers) for client application.
o ApplicationMaster provides the container launch specification to the NodeManager, which will launche a container for the application.
o At the time of execution, client will be polling ApplicationMaster for application status and progress.
o On completion, ApplicationMaster will deregister with the ResourceManager and shuts down and returs its containers to the resource pool.