YARN is a Yet Another Resource Negotiator. Yarn goal is to facilitate applications to achieve 100% utilization of all resources on the physical system while letting every application execute at its maximum potential.
YARN multi node cluster has an aggregate pool of computer resources memory and cpu. YARN uses scheduler policy.
Compare Hadoop 1 & YARN.
Scalability: Hadoop 1 , Hadoop cluster deployed on 3500 nodes . YARN has been successfully deployed on 35,000+ nodes
Availbility: Hadoop1 use JobTracker, If jobtracker failed, then all jobs failed. There was single point of failure. YARN use Resource manager and Task management separated concern.
Hadoop 1 was meant to solve batch-processing scenarios, and MapReduce was the only programming paradigm available. YARN has new programming models & services.
YARN are not limited to Java. Applications written in any language, as long as the binaries are installed on the cluster, can run natively, all while requesting resources from YARN and utilizing HDFS.
MapReduce in Hadoop 2 (MRv2) each job has its own ApplicationMaster. Each MRv2 job’s resource requests are dynamically sized for its Map and Reduce processes.
How to configure YARN?
YARN has one core configuration file:
/etc/hadoop/conf/yarn-site.xml
What can we do in yarn-site.xml?
We configure how resource allocation work. There are two types of resources. Physical: total physical resources (memory) allocate per container.
yarn.scheduler.maximum-allocation-mb 8GB per container default
yarn.scheduler.minumum-allocation-mb 1GB per container default
Virtual: Total virtual resources (memory) that a container