One of the great mantras of our time is that “Big Data” will be able to tell us much more about what people think, desire and do than earlier research methods made possible. This is true to some extent: but not to as great an extent as many researchers, and the politicians who listen to said researchers, seem to think. On which point there’s an interesting paper in Science discussing the problems with assuming that what turns up on Facebook and or Twitter TWTR +1.48% really is a valid guide to the interests and actions of the populace. On top of what this paper is arguing (entirely correctly) we need to add two more reasonably standard points, one from economics the other from politics. Put together these should make us a lot more hesitant in taking the Twitterstorm du jour all that seriously as a guide to public policy.
The paper is here:
On 3 November 1948, the day after Harry Truman won the United States presidential elections, the Chicago Tribune published one of the most famous erroneous headlines in newspaper history: “Dewey Defeats Truman” (1, 2). The headline was informed by telephone surveys, which had inadvertently undersampled Truman supporters (1). Rather than permanently discrediting the practice of polling, this event led to the development of more sophisticated techniques and higher standards that produce the more accurate and statistically rigorous polls conducted today (3).
Fortunately for those of us who do not subscribe to Science there’s a further discussion by the researchers here.
Their first and most obvious point is that social media users are not representative of the general population. Further, that different social media tend in very different directions: Pinterest is much more young women orientated than most of the other platforms. In other forms of public polling these sample weightings are taken good account of (although it’s always more of an art than a science) and so surveys of social media should be using these same techniques. Said researchers make their point thusly:
Social scientists have honed their techniques and standards to deal with this sort of challenge before. “The infamous ‘Dewey Defeats Truman’ headline of 1948 stemmed from telephone surveys that under-sampled Truman supporters in the general population,” Ruths notes. ”Rather than permanently discrediting the practice of polling, that glaring error led to today’s more sophisticated techniques, higher standards, and more accurate polls. Now, we’re poised at a similar technological inflection point. By tackling the issues we face, we’ll be able to realize the tremendous potential for good promised by social media-based research.”
Well, yes, obviously.
But we can also add two further points here when we consider what we might do with any information gleaned from even these better weighted and sampled surveys. The first being the economists’ point about revealed preferences. All of social media is people saying what they’d like. And the point of revealed preferences is that that’s, the mere expression of an idea or desire, not quite the same thing as what people really do. For example, there’s a rather large number of us who promise to love forever, never betray and stick together for life. That some 50% of such promises end in divorce does show that there’s a certain gap between even what people will solemnly promise to do and what they actually do. When it’s a matter of throwing around a few likes or retweets then the gap between true desires and expressed ones is likely to be larger.