After the flume installation, as mentioned in the other post in the blog, please follow the below steps for fetching twitter data using flume.
Setup for Twitter
Sign In, the twitter account.
Note: Website should be fully qualified URL.
e.g., instead of google.com, should give http://www.google.com
Tick "Yes, I agree" and click Create your Twitter application.
Application will be created as below
Click on the created application and select tab Keys and Access Tokens.
Scroll down and click Create my access token.
If the access token successfully created, you will get below message as Status.
Scroll down and you will see the Access Token and Access Token Secret.
These 4 highlighted details in above 2 screenshots, we need as a part of flume configuration file(mentioned in the below screenshot).
Create OR copy the conf file into conf folder of apache flume as shown below.
Name of the conf file is user dependent. I used flume.conf in my case.
Add the details into flume.conf as below.
CosumerKey == Consumer Key from Twitter
ConsumerSecret == Consumer Secret from Twitter
accessToken == Access Token from twitter
accessTokenSecret == Access Token Secret from twitter
Run the below command to fetch the data from twitter.
If everything goes right, we can see the below output, on HDFS location, mentioned in the flume.conf file.
Error Rectifications
While executive the Flume command, user may get the following error
Error says that Ensure that you have set the valid consumer key/secret, access token/secret and system clock is in sync.
Resolution 1: First of all, check the access key/secret and access tokens/secret are correct as per twitter values.
Resolution 2: There may be a chance that the Host OS and the guest OS has difference in timezone,
e.g., My Host OS Windows 8 has time set as per India (IST) and Guest OS has US time zone.
To resolve the timezone problem, do the following.
From root user login
Stop ntp service
$ service ntp stop => This will stop the ntp service
$ ntpdate ntp.ubuntu.com => This will update the Guest OS time same as Host OS
$ service ntp start => This will start the ntp service
Re-execute the Flume command...
No comments:
Post a Comment