Tuesday, November 17, 2015

IO Operations in Hadoop


Consider the below diagram


Take the sample Wordcount example, where the most of the words has been repeated for half a million or more times.
In that case after the Mapper phase, each mapper output will have words in the range of half a million.

While transferring the data from Mapper to S & S, due to network bandwidth / any connection issues, one or more mapper output (Mapper 2 output in above case) did not reach to Sort and Shuffle, then the complete MR job will be failed.

To avoid this situation, Mapper output will always be stored in Local File System(LFS) till the MR Job completion. If because of any of the issues mentioned above, mapper output did not reach to Sort and Shuffle phase, then stored output in LFS will be taken and resent to the S & S phase.

Points to be remember
  • This phase is only for Mapper in Map Reduce.
  • It is NOT available for storing Sort and Shuffle output or Reducer output.
  • Life of the Mapper output is till the end of the job completion i.e., as the job completion success or failure, the Local copies of mapper output will be automatically be revoked by Mapper only.


Certification Note: Mapper output copies will always be stored in LFS.


12 comments:

  1. Thanks for providing this informative information you may also refer.
    http://www.s4techno.com/blog/2016/08/13/installing-a-storm-cluster/

    ReplyDelete
  2. You have provided an nice article, Thank you very much for this one. And i hope this will be useful for many people.. and i am waiting

    for your next post keep on updating these kinds of knowledgeable things...
    iOS App Development Company
    Android App Development Company
    Best Mobile app Development company
    Android App Development Company in chennai
    iOS App Development Company in chennai

    ReplyDelete
  3. hanks for sharing such details about bigdata and hadoop. Big data hadoop online Course India

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete
  5. Its great information on hadoop, nicely explained by you. Thanks Hadoop Big Data Classes in Pune

    ReplyDelete
  6. Good post!Thank you so much for sharing this pretty post,it was so good to read and useful to improve my knowledge as updated one,keep blogging.
    Big Data Hadoop training in Electronic City

    ReplyDelete
  7. Good article! It is very inspiring and informative, This article is worth sharing to other people too. We are looking forward to more of this.

    Data Science training institutes in marathahalli
    Spark Training in Marathahalli

    ReplyDelete
  8. very nice article,thank you for sharing this awesome articlw with us.
    keep updating,...


    big data and hadoop training

    ReplyDelete