Word Count Hadoop Program(in Java & Python)

Today, We want to share with you wordcount program in hadoop.In this post we will show you mapreduce programming in java examples, hear for hadoop-mapreduce-examples.jar wordcount we will give you demo and example for implement.In this post, we will learn about Add Word Count to Single Posts in WordPress with an example.

how to run wordcount program in hadoop using eclipse?

Contents

Word Count Code:

package org.myorg;
        
import java.io.IOException;
import java.util.*;
        
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
        
public class WordCount {
        
 public static class Map extends Mapper {
    private final static IntWritable one = new IntWritable(1);
    private Text alltxt = new Text();
        
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            alltxt.set(tokenizer.nextToken());
            context.write(alltxt, one);
        }
    }
 } 
        
 public static class Reduce extends Reducer {

    public void reduce(Text key, Iterable values, Context context) 
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }
        
 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
        
        Job job = new Job(conf, "wordcount");
    
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
        
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
        
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
        
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
        
    job.waitForCompletion(true);
 }
        
}

To run the example, the command syntax is:

bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>]

bin/hadoop dfs -mkdir  //not required in hadoop 0.17.2 and later
bin/hadoop dfs -copyFromLocal

Word Count example in Python:

mapper.py

import sys 
for line in sys.stdin: 
    # remove leading and trailing whitespace 
    line = line.strip() 
    # split the line into alltxts 
    alltxts = line.split() 
    # increase counters 
    for alltxt in alltxts: 
        print '%s\t%s' % (alltxt, 1)

reducer.py

import sys
current_alltxt = None
current_count = 0
alltxt = None
for line in sys.stdin:
    # remove leading and trailing whitespaces
    line = line.strip()
    # parse the input we got from mapper.py
    alltxt, count = line.split('\t', 1)
    # convert count (currently a string) to int
    try:
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        continue
    if current_alltxt == alltxt:
        current_count += count
    else:
        if current_alltxt:
            print '%s\t%s' % (current_alltxt, current_count)
        current_count = count
        current_alltxt = alltxt
if current_alltxt == alltxt:
    print '%s\t%s' % (current_alltxt, current_count)

The above program can be run using cat filename.txt | python mapper.py | sort -k1,1 | python reducer.py

I hope you get an idea about word count program in hadoop using python.
I would like to have feedback on my infinityknow.com blog.
Your valuable feedback, question, or comments about this article are always welcome.
If you enjoyed and liked this post, don’t forget to share.

Word Count hadoop Program(in Java & Python)

how to run wordcount program in hadoop using eclipse?

Word Count example in Python:

Leave a Comment Cancel reply

Top 10 Free Websites to Find Song Lyrics

Top 10 Card Games for Family Gatherings

How to Install IPTV Smarters Pro on FireTV Stick?

How to Install Freedom IPTV on Kodi?

Chandipura Virus: Symptoms, Treatment, Transmission, and Mortality Rate

BSNL vs Reliance Jio vs Airtel: Best Annual Plans Comparison 2024

how to run wordcount program in hadoop using eclipse?

Word Count example in Python:

Related posts:

Leave a Comment Cancel reply

Top 10 Free Websites to Find Song Lyrics

Top 10 Card Games for Family Gatherings

How to Install IPTV Smarters Pro on FireTV Stick?

How to Install Freedom IPTV on Kodi?

Chandipura Virus: Symptoms, Treatment, Transmission, and Mortality Rate

BSNL vs Reliance Jio vs Airtel: Best Annual Plans Comparison 2024