Analyzing Twitter with Excel, Part 4

24.04.2009
Over the last three weeks I've been trying to figure out how to analyze Twitter messages using Excel 2003, something I thought would be fairly easy but it turns out there are a number of "gotchas." The goal was to track the buzz about a specific product so I wanted to retrieve Tweets that included the product's name and my first thought was to look for the historical data ... which proved to be a headache.

While the Twitter search API can provide a news feed for a given date range you only get the newest 15 Tweets in that period. This isn't going to be of much use if you there was a real buzz about the product.

So, if you want all the Tweets for a given period the only choice is to get the data in HTML format from multiple requests of blocks of search results. Last week I came up with a way to retrieve the required data using two free open source utilities, to grab and save the search results in a file, and to parse the saved data.

I wrapped these utilities in a batch file (which I call tweets.bat) to which I have since added some extra error testing. I also created another batch file, domonth.bat, that calls tweets.bat for each day of a given month. Finally a simple program I wrote is called by tweets.bat and updates a comma separated variable (CSV) file that contains a line for each date with the number of Tweets found.

To perform the analysis in Excel a data source -- the CSV file -- is imported into an existing spreadsheet by refreshing the source, and a graph shows the number of Tweets per day for a given month. I admit it: This is ugly, ugly, ugly, but it works and requires minimal resources.

This system covers the historical data, but usually if you're interested in the public buzz on a specific topic you'll want to monitor that in real-time or thereabouts.