Let us understand how we can submit map reduce job using YARN.
On our labs.sparkdatabox.com, we can search for appropriate hadoop examples jar by using find command.
find /usr/hdp -name "*hadoop*examples*.jar"
Pick up the latest version and use as part of the hadoop jar command to submit the job.
- The jar file is runnable jar and we can invoke other programs by passing appropriate arguments such as wordcount.
- wordcount program takes additional arguments such as input path and output path
- Here we have invoked wordcount with
/public/randomtextwriter/part-m-0000*
as input path and then/user/training/wordcount
as output. - It takes care of getting wordcount for each and every word in the 10 files matching the input path pattern.
hadoop jar \
/usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar \
wordcount \
/public/randomtextwriter/part-m-0000* \
/user/training/wordcount
- It creates 90 mappers and 1 reducer.
- 90 mappers because we have 10 files with 9 blocks each and hence 90 mappers.
- Number of reducers are by default 1
- To speed up the process we can increase the number of reducers at run time using
-Dmapreduce.job.reduces
as part of hadoop jar command. - Once job is submitted, take the tracking URL and review the progress of the job.
- Go through the logs by navigating through the tasks that are either completed or being executed.