Can the Web predict the next president?

27.10.2008

Even casual Web users believe vote stuffing is common in simple online polls, as most simple Web polls can be defeated by users clearing cookies, purging Flash settings, coming from other IP addresses, crowd sourcing votes, and even using bots. Likewise, some of the more focused systems exhibit sampling bias. For example, we did not include Amazon data for book ratings or levels because of what appeared to be clear manipulation. We saw similar effects in promotion of articles across social networks. Most interestingly is that Intrade, while mostly matching the observed data elsewhere, can be gamed as a single investor drove up McCain contract values at one point in the campaign.

Our point is that using the Web requires some careful consideration of the size of sample and ease of manipulation. We do not believe that for what was selected as the primary focus of the article that rigging is a concern. The numbers used for correlation with the larger scale systems such as Alexa, Google, and Hitwise are is just too large to be easily gamed.

Online vs. offline measurements

It's clear that the Web-based measurements of candidate popularity have a consistent story to tell, albeit with some variations depending on the type and source and of the data. This raises the obvious question of how well what we can measure online matches up to what is going on in the world at large. Do Web-based measures, as a whole, contain a sampling bias? Does the Internet make its own waves, so to speak? Does it have its own ebb and flow of opinion, or does it more-or-less reflect what is happening in the broader society?

Answering those questions in a detailed way is beyond the scope of our article. We've given some indication of which online measures we consider perhaps more reliable because they are less likely to be tainted by demographic sampling bias. Much more definite conclusions than that we'll leave to analysts with more time to crunch the data.