Samuel Arbesman, an applied mathematician and network scientist, is a senior scholar at the EwingMarion Kauffman Foundation and the author of “The Half-Life of Facts.” This Column is from The Washington Post.
Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous.
Should narrowly focused industry efforts to glean consumer insight from large datasets be grouped under the same term used to describe the sophisticated and varied things scientists are trying to do? There’s a lot of confusion, and industry experts and scientists often end up talking past one another.
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations.
Vast linguistic datasets, for example, go back nearly 800 years. Early biblical concordances — alphabetical indexes of words in the Bible, along with their context — allowed for some of the same types of analyses found in modern-day textual data-crunching.
The sciences also have been using big data for some time. In the early 1600s, Johannes Kepler used Tycho Brahe’s detailed astronomical dataset to elucidate certain laws of planetary motion. Astronomy in the age of the Sloan Digital Sky Survey is certainly different and more awesome, but it’s still astronomy.
Ask statisticians, and they will tell you that they have been analyzing big data — or “data,” as they less redundantly call it — for centuries. As they like to argue, big data isn’t much more than a sexier version of statistics, with a few new tools that allow us to think more broadly about what data can be and how we generate it.
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.