What is big data? It’s a lot of data. There is no need to try to come up with some more sophisticated definition. That’s all there is to it. It is what it is.
Big data got big because so much more of the world’s knowledge is now stored digitally and because data scientists have developed increasingly more sophisticated methods for capturing, calculating and analyzing it.
With this growth has come an enormous interest in how big data can be used to manage businesses, govern countries and control epidemics, not to mention sell everything under the sun to just the right folks.
What can big data accomplish?
- It is given some credit for the reelection of Barack Obama in 2014.
- By studying where and when most crimes occur, it can be used to determine the most effective deployment of law enforcement resources.
- If Moneyball is to be believed, it is the reason the Oakland A’s made the playoffs despite a minimalist payroll.
- Supposedly it can map out the best routes for package delivery systems, although the performance of the delivery companies during the last holiday season makes you scratch your head over this one.
- Most of us are familiar with how Amazon and Netflix recommend products based upon our purchase history.
- Target demonstrated (to the horror of some) that it could identify which of its customers were pregnant by analyzing their purchase history.
- Google used an analysis of search terms by region to predict the location of flu outbreaks (although it was later determined that the study was flawed because of inaccurate reports of cases of influenza).
And yet, according to Matt Asay writing in Information Week (8 Reasons Big Data Projects Fail) “for those (organizations) who kick off big data projects, most fail.” Asay goes on to describe some failures in the design and implementation of these projects. But I would suggest another more overarching reason. In our enthusiasm to embrace the potential for intelligence to be gleaned from big data we have become blind to its limitations.
Conclusions are drawn from data by identifying correlations. But big data cannot tell us anything about the reasons for those correlations. For example, we might analyze purchase patterns and find that women who buy a lot of shoes also buy a lot of handbags. The data has no clue why. Do they just buy a lot of everything, do they travel a lot, do they love leather? We have no idea. If we did know why that correlation would have far broader implications. Data also cannot necessarily tell you what correlations are meaningful. Women who buy a lot of shoes may also drive automatic transmission vehicles, but that means nothing to anybody.
Writing in CIO, Jonathan Hassell elaborated on “3 Mistaken Assumptions About What Big Data Can Do For You.” They are:
- Big data cannot predict the future. It is all about past behavior.
- “Big data is a poor substitute for values – those mores and standards by which you live your life and your company endeavors to operate.”
- Big data tells you very little about individuals because you can’t quantify behavior. (Can you hear that one HR people?)
I’ve often found that the tech guys who have the responsibility for capturing and managing big data are a lot more realistic about what it can and cannot do than the business managers who are looking to use it. A couple years ago I attended at conference in Silicon Valley which included a session with search engineers. The audience for this session was mostly CEOs and marketers who were chomping at the bit to hear about how big data could be used to find exactly those customers who would buy their product.
One of the participants, a search guy from Yahoo whose name I can’t recall, commented that the main thing the data tells you is that “people are weird.” What he meant was that there really is no evidence that you can use data to predict what people are going to do, including what they are going to buy. Why? Because data can’t deal with individual differences, can’t deal with context, can’t understand why.
In my next post I’ll discuss some of my own experiences with the use and misuse of big data.