Word count program in Pig

Saturday, January 16, 2016

Word count program in Pig

 inputdata = load 'Input-Big.txt' as (line:chararray);  
 words = FOREACH inputdata GENERATE FLATTEN(TOKENIZE(line)) AS word;  
 filtered_words = FILTER words BY word MATCHES '\\w+';  
 word_groups = GROUP filtered_words BY word;  
 word_count = FOREACH word_groups GENERATE group AS word , COUNT(filtered_words) AS count;  
 ordered_word_count = ORDER word_count BY count DESC;  
 STORE ordered_word_count INTO 'PigWordCount';

The above pig script,

Load the input file into variable inputdata
Splits each line into words using the TOKENIZE operator. The tokenize function creates a bag of words. Using the FLATTEN function, the bag is converted into a tuple.
In the third statement, the words are filtered to remove any spaces in the file.
In the fourth statement, the filtered words are grouped together so that the count can be computed which is done in fourth statement.
In the fifth statement, the word has been counted.
In the sixth statement, the result in being sorted as per count.
At last the sorted list is saved into output folder named 'PigWordCount'.

5 comments:

TejutejuSeptember 11, 2018 at 2:39 AM
Needed to compose you a very little word to thank you yet again regarding the nice suggestions you’ve contributed here
Thank you. Your blog was very helpful and efficient For Me,Thanks for Sharing the information Regards..!!..Big Data Hadoop Online Training
ReplyDelete
Replies
AnonymousMarch 14, 2019 at 4:35 AM
Useful Blog and a very useful post!
Thanks for sharing !!
Big data Training in Bangalore
ReplyDelete
Replies
AnonymousJuly 10, 2020 at 11:57 AM
Hey There. I found your blog using msn. This is a very well written article. I’ll be sure to bookmark it and come back to read more of your useful info. Thanks for the post. I’ll definitely return. view
ReplyDelete
Replies
veeraAugust 25, 2020 at 12:05 AM
Nice article,thank you..

big data and hadoop course
ReplyDelete
Replies
Jasmine DaleJune 16, 2021 at 12:25 AM
Cool stuff you have got and you keep update all of us. word counter
ReplyDelete
Replies

Add comment