Word Count hadoop Program(in Java & Python)

Rate this post

Today, We want to share with you wordcount program in hadoop.In this post we will show you mapreduce programming in java examples, hear for hadoop-mapreduce-examples.jar wordcount we will give you demo and example for implement.In this post, we will learn about Add Word Count to Single Posts in WordPress with an example.

how to run wordcount program in hadoop using eclipse?

Word Count Code:

package org.myorg;
        
import java.io.IOException;
import java.util.*;
        
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
        
public class WordCount {
        
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text alltxt = new Text();
        
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            alltxt.set(tokenizer.nextToken());
            context.write(alltxt, one);
        }
    }
 } 
        
 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }
        
 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
        
        Job job = new Job(conf, "wordcount");
    
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
        
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
        
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
        
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
    job.waitForCompletion(true);
 }
        
}

To run the example, the command syntax is:

bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>

bin/hadoop dfs -mkdir <hdfs-dir> //not required in hadoop 0.17.2 and later
bin/hadoop dfs -copyFromLocal <local-dir> <hdfs-dir>

Free Live Chat for Any Issue

Word Count example in Python:

mapper.py

import sys 
for line in sys.stdin: 
    # remove leading and trailing whitespace 
    line = line.strip() 
    # split the line into alltxts 
    alltxts = line.split() 
    # increase counters 
    for alltxt in alltxts: 
        print '%s\t%s' % (alltxt, 1)

reducer.py

import sys
current_alltxt = None
current_count = 0
alltxt = None
for line in sys.stdin:
    # remove leading and trailing whitespaces
    line = line.strip()
    # parse the input we got from mapper.py
    alltxt, count = line.split('\t', 1)
    # convert count (currently a string) to int
    try:
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        continue
    if current_alltxt == alltxt:
        current_count += count
    else:
        if current_alltxt:
            print '%s\t%s' % (current_alltxt, current_count)
        current_count = count
        current_alltxt = alltxt
if current_alltxt == alltxt:
    print '%s\t%s' % (current_alltxt, current_count)

The above program can be run using cat filename.txt | python mapper.py | sort -k1,1 | python reducer.py

Download

I hope you get an idea about word count program in hadoop using python.
I would like to have feedback on my infinityknow.com blog.
Your valuable feedback, question, or comments about this article are always welcome.
If you enjoyed and liked this post, don’t forget to share.