Just to bring the aspect of search, social media into perspective before the post:
Todays ACM technews has an article from Eberly College of Science relating to tweets analysis relating to a particular vaccine. It is available at
http://science.psu.edu/news-and-events/2013-news/Salathe4-2013 Tweets were analysed for terms relating to the vaccine/vaccination. Then, user opinion was deduced. This is an example of using social media to derive trends or responses or acceptance.
As another example of impact of internet search: Earlier in December, during an informal Big Data meetup involving, Ministry of Health, Visa and SingTel, the head of MoH software division commented that, Google could find out if, there would be an outbreak of a particular disease say a flu in a particular region. This is due to the fact that, people would be searching for medicines, symptoms, prevention etc on Google. This could be analyzed to deduce what health issues are more prevalent in a particular region.
Thinking on that line, Google does not know if, you actually bought the medicine or anything after a search. Whereas Amazon or eBay knows what you bought but, does not know if, you bought the item after a search or your affinity for that item. Standing out is Facebook which knows everything about you and your friends but, does not have the above two factors.
That said here is a small program that, demonstrates Python Twitter API. This retrieves tweets on an interesting subject and processes them. The tags in each tweet are counted and displayed in ascending order. For example, if #Tottenham appears 20 times and #OldTrafford appears 40 times in the tweet stream, then it comes up in the list. Basically, a measure of what people are dicussing and how popular the topic is with the public.
Objective: Retrieve tweets with "premierleague" on it. Then find the most popular tags. Basically you can search for any key word.
Language used Python 2.7 and Twython for Twitter interface. Operating system Ubuntu 12.04.
Twython API core code snapshot for Twitter
Sample Screen run on keyword "premierleague". (Printing tag counts by minute rather than by hour so that, output is available frequently)
The complete eclipse python project is available here as zip. You will need a postgresql database and Twython installed. The database table format is available here. You can also run TaTwitterQn/TwitterSearcher.py without the database to get tweets and show the tag counts.
Project Link:
https://docs.google.com/file/d/0B-oeIeog2xb7WDRFLUdzM0w0cEE/edit?usp=sharing
Some Issues/observations developing this program: https://github.com/ryanmcgrath/twython/tree/master/core_examples
1) Tweepy a popular Twitter API for python returns an 401 failure for search / filter operations. This is on the latest version of Tweepy from Git. But, tweepy has streaming api.
2) python-twitter seems to be a good api but, does not authenticate at times and also some of the documented parts do not seem to work. For example, the getHashTags part of the Tweets API does not work.
3) The same program can be coded in Java but, the effort would be more and the code size too. Python on the other hands was small and easy. If you already have a Java environment like Hadoop running then, it seems ok.
References:
1) https://github.com/ryanmcgrath/twython/tree/master/core_examples This is a good list of examples to use twython and you can get started to do your own programs easily.
References:
1) https://github.com/ryanmcgrath/twython/tree/master/core_examples This is a good list of examples to use twython and you can get started to do your own programs easily.