Let us understand how we can submit map reduce job using YARN.

On our labs.sparkdatabox.com, we can search for appropriate hadoop examples jar by using find command.

find /usr/hdp -name "*hadoop*examples*.jar"

Pick up the latest version and use as part of the hadoop jar command to submit the job.

  • The jar file is runnable jar and we can invoke other programs by passing appropriate arguments such as wordcount.
  • wordcount program takes additional arguments such as input path and output path
  • Here we have invoked wordcount with /public/randomtextwriter/part-m-0000* as input path and then /user/training/wordcount as output.
  • It takes care of getting wordcount for each and every word in the 10 files matching the input path pattern.
hadoop jar \  
/usr/hdp/2.6.5.0-292/hadoop-mapreduce/hadoop-mapreduce-examples.jar \  
wordcount \  
/public/randomtextwriter/part-m-0000* \  
/user/training/wordcount
  • It creates 90 mappers and 1 reducer.
  • 90 mappers because we have 10 files with 9 blocks each and hence 90 mappers.
  • Number of reducers are by default 1
  • To speed up the process we can increase the number of reducers at run time using -Dmapreduce.job.reduces as part of hadoop jar command.
  • Once job is submitted, take the tracking URL and review the progress of the job.
  • Go through the logs by navigating through the tasks that are either completed or being executed.