Submitting Map Reduce Job using YARN (wordcount) : Spark Databox

Submitting Map Reduce Job using YARN (wordcount) Print

Created by: spark databox

Modified on: Tue, 30 Jul, 2019 at 12:08 AM

Let us understand how we can submit map reduce job using YARN.

On our labs.sparkdatabox.com, we can search for appropriate hadoop examples jar by using find command.

find /usr/hdp -name "*hadoop*examples*.jar"

Pick up the latest version and use as part of the hadoop jar command to submit the job.

The jar file is runnable jar and we can invoke other programs by passing appropriate arguments such as wordcount.
wordcount program takes additional arguments such as input path and output path
Here we have invoked wordcount with /public/randomtextwriter/part-m-0000* as input path and then /user/training/wordcount as output.
It takes care of getting wordcount for each and every word in the 10 files matching the input path pattern.

hadoop jar \  
/usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar \  
wordcount \  
/public/randomtextwriter/part-m-0000* \  
/user/training/wordcount

It creates 90 mappers and 1 reducer.
90 mappers because we have 10 files with 9 blocks each and hence 90 mappers.
Number of reducers are by default 1
To speed up the process we can increase the number of reducers at run time using -Dmapreduce.job.reduces as part of hadoop jar command.
Once job is submitted, take the tracking URL and review the progress of the job.
Go through the logs by navigating through the tasks that are either completed or being executed.

Spark is the author of this solution article.

Did you find it helpful? Yes No

Submitting Map Reduce Job using YARN (wordcount) Print

Related Articles