Word Count hadoop Program(in Java & Python)

Today, We want to share with you wordcount program in hadoop.In this post we will show you mapreduce programming in java examples, hear for hadoop-mapreduce-examples.jar wordcount we will give you demo and example for implement.

how to run wordcount program in hadoop using eclipse?

Word Count Code:

package org.myorg;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
 public static class Map extends Mapper {
    private final static IntWritable one = new IntWritable(1);
    private Text alltxt = new Text();
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            context.write(alltxt, one);
 public static class Reduce extends Reducer {

    public void reduce(Text key, Iterable values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        context.write(key, new IntWritable(sum));
 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
        Job job = new Job(conf, "wordcount");
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

To run the example, the command syntax is:

bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>]  
bin/hadoop dfs -mkdir  //not required in hadoop 0.17.2 and later
bin/hadoop dfs -copyFromLocal  

Word Count example in Python:


import sys 
for line in sys.stdin: 
    # remove leading and trailing whitespace 
    line = line.strip() 
    # split the line into alltxts 
    alltxts = line.split() 
    # increase counters 
    for alltxt in alltxts: 
        print '%s\t%s' % (alltxt, 1)


import sys
current_alltxt = None
current_count = 0
alltxt = None
for line in sys.stdin:
    # remove leading and trailing whitespaces
    line = line.strip()
    # parse the input we got from mapper.py
    alltxt, count = line.split('\t', 1)
    # convert count (currently a string) to int
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
    if current_alltxt == alltxt:
        current_count += count
        if current_alltxt:
            print '%s\t%s' % (current_alltxt, current_count)
        current_count = count
        current_alltxt = alltxt
if current_alltxt == alltxt:
    print '%s\t%s' % (current_alltxt, current_count)

The above program can be run using cat filename.txt | python mapper.py | sort -k1,1 | python reducer.py

I hope you get an idea about word count program in hadoop using python.
