Saturday, April 21, 2018

My first Spark Program for Word count using Scala

package com.sample.wc

import org.apache.spark.sql.SparkSession
import org.apache.commons.io.FileUtils
import org.apache.commons.io.filefilter.WildcardFileFilter
import java.io.File

object WordCount {
  def main(args: Array[String]): Unit = {
    // Creating the spark object
    val spark = SparkSession.builder().master("local").appName("Word Count").getOrCreate()
    
    //reading the text file and create the RDD
    val data = spark.read.textFile(args(0)).rdd
    
    //Split the line in the text file with space
    val wordsSplits = data.flatMap(lines => lines.split(" "))
    
    //Map each word to word,1, to ease the counting
    val wordMaptoOne = wordsSplits.map(value => (value, 1))
    
    //Count each word
    val count = wordMaptoOne.reduceByKey(_ + _)

    //Delete the output file, if already exists
    FileUtils.deleteDirectory(new File(args(1)))

    //Save the output file as text
    count.saveAsTextFile(args(1))

    //Stop the spark object
    spark.stop()
  }
}
Command to execute the Jar file // bin/spark-submit --class com.sample.wc.WordCount WordCounts.jar text.txt output

10 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.


    rpa training in Chennai | rpa training in velachery

    rpa training in tambaram | rpa training in sholinganallur


    ReplyDelete
  3. The post is written in very a good manner and it entails many useful information for me. I am happy to find your distinguished way of writing the post. Now you make it easy for me to understand and implement the concept.

    java training in chennai | java training in bangalore

    java online training | java training in pune

    ReplyDelete
  4. I wanted to thank you for this great read!! I definitely enjoying every little bit of it I have you bookmarked to check out new stuff you post.is article.
    python training Course in chennai
    python training in Bangalore
    Python training institute in kalyan nagar

    ReplyDelete
  5. Great post!
    Thanks for posting it was really helpful!
    Big data training in Bangalore

    ReplyDelete
  6. Great article,thank you for sharing this awesome blog with us.

    thank you so much,keep updating...

    big data hadoop course

    hadoop admin online course

    ReplyDelete
  7. simple hallo world program in spark, simply explained using scala language.
    Thanks Pranav to share ur knowledge. i request pls share different examples . for the last three years no article related to bigdata.

    Thanks & Regards
    Venu
    spark training in Hyderabad

    ReplyDelete