What is Yarn?

YARN is a Yet Another Resource Negotiator. Yarn goal is to facilitate applications to achieve 100% utilization of all resources on the physical system while letting every application execute at its maximum potential.

YARN multi node cluster has an aggregate pool of computer resources memory and cpu.  YARN uses scheduler policy.

Compare Hadoop 1 & YARN.

HADOOP 1 HADOOP 2

Scalability: Hadoop 1 , Hadoop cluster deployed on 3500 nodes . YARN has been successfully deployed on 35,000+ nodes

Availbility: Hadoop1 use JobTracker, If jobtracker failed, then all jobs failed. There was single point of failure. YARN use Resource manager and Task management separated concern.

Hadoop 1 was meant to solve batch-processing scenarios, and MapReduce was the only programming paradigm available. YARN has new programming models & services.

HADOOP YARN

YARN are not limited to Java. Applications written in any language, as long as the binaries are installed on the cluster, can run natively, all while requesting resources from YARN and utilizing HDFS.

MapReduce in Hadoop 2 (MRv2)  each job has its own ApplicationMaster. Each MRv2 job’s resource requests are dynamically sized for its Map and Reduce processes.

How to configure YARN?

YARN has one core configuration file:

/etc/hadoop/conf/yarn-site.xml

What can we do in yarn-site.xml?

We configure how resource allocation work. There are two types of resources. Physical: total physical resources (memory) allocate per container.

yarn.scheduler.maximum-allocation-mb 8GB per container default

yarn.scheduler.minumum-allocation-mb 1GB per container default

Virtual: Total virtual resources (memory) that a container

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s