Wednesday, December 9, 2015

Fetching twitter data using Flume



After the flume installation, as mentioned in the other post in the blog, please follow the below steps for fetching twitter data using flume.

Setup for Twitter























Sign In, the twitter account.






















Click on Create New App



Give the details as shown below

Note: Website should be fully qualified URL.
e.g., instead of google.com, should give http://www.google.com



Tick "Yes, I agree" and click Create your Twitter application.























Application will be created as below





















Click on the created application and select tab Keys and Access Tokens.





















Scroll down and click Create my access token.





















If the access token successfully created, you will get below message as Status.





















Scroll down and you will see the Access Token and Access Token Secret.






















These 4 highlighted details in above 2 screenshots, we need as a part of flume configuration file(mentioned in the below screenshot).

Create OR copy the conf file into conf folder of apache flume as shown below.
Name of the conf file is user dependent. I used flume.conf in my case.






















Add the details into flume.conf as below.

CosumerKey  == Consumer Key from Twitter
ConsumerSecret == Consumer Secret from Twitter
accessToken == Access Token from twitter
accessTokenSecret == Access Token Secret from twitter























Run the below command to fetch the data from twitter.






















If everything goes right, we can see the below output, on HDFS location, mentioned in the flume.conf file.























Error Rectifications

While executive the Flume command, user may get the following error

Error says that Ensure that you have set the valid consumer key/secret, access token/secret and system clock is in sync.





















Resolution 1: First of all, check the access key/secret and access tokens/secret are correct as per twitter values.

Resolution 2: There may be a chance that the Host OS and the guest OS has difference in timezone,
e.g., My Host OS Windows 8 has time set as per India (IST) and Guest OS has US time zone.

To resolve the timezone problem, do the following.

From root user login
Stop ntp service
$ service ntp stop => This will stop the ntp service
$ ntpdate ntp.ubuntu.com => This will update the Guest OS time same as Host OS
$ service ntp start => This will start the ntp service




Re-execute the Flume command...

No comments:

Post a Comment