Lab 01- Design Map Reduce Application
Objective:
This is the first lab in Hadoop Map Reduce training session series. In this lab, you will learn how to write and test map reduce program and able to run it locally and on the cluster. To Complete this lab, You need Vm which can be obtained by emailing at shujamughal@gmail.com
Problem Statement: Word Count
In this lab, Program read text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occurred, separated by a tab
Instructions:
This lab has been developed as tutorial. Simply follow the instructions and analyze the results.
Outcome:
After completing this lab session, you will able to do the following tasks.
- Write the Map Code
- Test the Map
- Write the Reduce Code
- Test the Reducer
- Test Map and Reduce Code
- Run Locally
- Run on Server
Please place the “wordcount” folder on desktop. Note that this folder is provided to you at the start of lab.
Now open the eclipse by clicking the icon “Link to eclipse” as shown below.
Next step is to import the project. Use the same steps which we used in hadoop tutorial. The steps for importing the project are given below as well.
Select the File-> Import... Option from the Eclipse Menu. You will see the following screen.
Select the “Existing Maven Projects” as shown above and Press the Next button and you will see following screen
Click on the Browse... button and select the wordcount project from the desktop as shown above and click the finish button.
At this moment, we have successfully import the project
Now open the WordCountMapper.java File and write the map code in the designated area as shown below. Please see the solution provided below to complete the map task if you are facing problem.
Solution of above task:
StringTokenizer tokenizer = new StringTokenizer(text.toString());
while (tokenizer.hasMoreTokens()) {
tokenValue.set(tokenizer.nextToken());
context.write(tokenValue, ONE);
}
|
Map task is completed and for each word, it will emit the count 1 for that word. Now let’s test the Map functionality.
Open the WordCountMapTest file as shown below.
In this test, we have defined the following input for the map task.
Map Input
|
Cat cat dog
|
Map Output must be equal to following
|
Cat 1
Cat 1
Dog 1
|
Running the test:
To run the test, right click on the wordcountMappertest file and select Run As Junit Test as shown below
If everything goes fine then you will see the following screen which shows test output as success.
At this stage, we have completed the Map part and also have tested it. Now let’s come to reducer.
In reducer, we need to sum up the count for each word. Let’s write the reducer code in the designated area as shown below. You need to open the WordCountReducer File for it.
If you are not able to write the reducer part then please see the solution below and complete it
long n = 0;
for (LongWritable count : counts)
n += count.get();
total.set(n);
|
At this moment, we have completed the reducer part. Now let’s test it. Open the file WordCountReducerTest as shown below.
Let’s focus on test code.
List<LongWritable> values = new ArrayList<LongWritable>();
values.add(new LongWritable(1));
values.add(new LongWritable(1));
reduceDriver.withInput(new Text("cat"), values);
reduceDriver.withOutput(new Text("cat"), new LongWritable(2));
reduceDriver.runTest();
|
The input and output of reducer is as follow.
Input:
|
Cat 1,1
|
Output:
|
Cat 2
|
Run the Reducer Test as we have tested Map Code. The output of test running is shown below.
Great!!!! We have completed with writing Map and reduce part and have tested separately. Now It’s time to test them together.
Open the file MapReduceMapReduceTest and run it as JUnit test as we have done before for Map and Reduce part.
We have tested our code. Now it’s time to run it. But before running on cluster, we will run it locally on small test data.
Open the input folder and there you will find a file named as sample.txt Write some Text in this file for which you want to do testing. I have written some text as shown below.
Now to run it locally, Execute the test defined in WordCountDriverTest.java file by selecting the option Run As Junit Test. The output will be as follow.
But where is the output of reducer?? There is a folder named as output. Check this folder, here you will see some new files created by this procedure as shown below. Part-r-0000 will contain the reducer output.
Great….you have run the job locally and output is also confirmed. The output should be matched with the expected output which we have already placed in the resources folder in the file exptected.
Let’s run it on cluster. To run it cluster, we need program jar file. To generate jar file, right click on project and select Run As->Maven Install
By this process, a jar file will be created in the /root/Desktop/wordcount/target folder of you code as shown above. Now Open the terminal as shown below.
Right the following commands to see the files of target directory as shown above.
cd Desktop/wordcount/target/
|
ls
|
Next step is to provide the input files which will be processed by this jar file. To do this, let's create some input file according to following procedure.
Write following command to create a new empty file in the same location i-e target folder.
vim.tiny sample.txt
|
After writing the above command, Press enter, you will see the following screen.
Press “i “ to make the file writeable and write some text as shown below.
After finishing writing, its time to save the file. Press Escape button and write :wq as shown below.
The file name is Input.txt. To see the contents of file, you need to logged in the server and for this purpose, we need
You can view the contents of file by entering the following command as shown below.
cat sample.txt
|
Now we have file ready which we need to process. But for it, we need to copy this file to hadoop file system. To do this let’s first create the directory on hdfs to place the file.
To create the directory, Use the following command.
hadoop fs -mkdir /user/root/example
|
To View the newly created directory, use the following command.
hadoop fs -ls | grep example
|
The screenshot is shown below
Now copy the sample.txt file to this location, and to do this, use the following command.
hadoop fs -put sample.txt /user/root/example/input.txt
|
To confirm the operation, you can use the following command to check the existence of file on hdfs.
hadoop fs -ls /user/root/example/
|
The screen shot is shown below.
To view the contents of file which we just placed over hdfs, use the following command.
hadoop fs -cat /user/root/example/input.txt
|
The screen shot is given below.
Now we have everything ready. It’s time to launch the job, To launch it, use the following command.
hadoop jar wordcount-0.0.1-SNAPSHOT.jar com.platalytics.wordcount.Driver /user/root/example/input.txt /user/root/example/output
|
Congratulations!!! You have done with the map reduce cluster job. The last step is to verify the output. Use the following command to verify it.
Following command will show the output files generated by this process.
|
hadoop fs -ls /user/root/example/output
|
hadoop fs -cat /user/root/example/output/part-r-00000
|
This is the end of lab
No comments:
Post a Comment