As I wrote “Time to Build Your Big Data Muscles” for Fast Company, I discovered more fascinating bits of data about big data than I could include in the article. Here are some of the eye popping details.
If all these numbers make you wish for a reference guide, James Huggins answers How Much Data Is That? and BroadbandHub offers a terrific video guide helps with the relative size of internet data.
In 2012, every day 2.5 quintillion bytes of data (1 followed by 18 zeros) are created, with 90% of the world’s data created in the last two years alone. As a society, we’re producing and capturing more data each day than was seen by everyone since the beginning of the earth.
This vast amount of digital data would fill DVD stack reaching from the Earth to moon and back. To put things in perspective, the entire works of William Shakespeare (in text form) represent about 5 MB of data. So, you could store about 1,000 copies of Shakespeare on a single DVD. The text in all the books in the Library of Congress would fit comfortably on a stack of DVDs the height of a single-story house.
The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s according to Martin Hilbert and Priscila López.
Given that unstructured data accounts for 80% of the data in the world, and we know much of that is from social media that gets special attention.
Recent estimates from the US Bureau of Labor Statistics project a 22 percent increase in demand for professionals with management analysis skills between now and 2020. This is faster than the average for all occupations. Demand for the services of these workers will grow as organizations continue to seek ways to improve efficiency and control costs.
Employment in the Massachusetts big data sector alone is expected to more than double over the next six years according to a recent report by the Mass Technology Leadership Council. MassTLC projects that the State’s total Big Data employment could grow by 50,000 jobs for a total of 120,000 jobs by 2018, making it one of the state’s key economic drivers.
Despite high unemployment rates, a lack of skilled workers means many vacancies remain unfilled. Even at the height of the crisis, the OEDC reports more than 40% of employers in the US, Australia, Japan, said they couldn’t find people with the right skills.
Schools participating in IBM’s Academic Initiative, focused specifically on expanding and strengthening analytics curricula include universities such as Fordham, Yale School of Management, DePaul, Northwestern, University of West Scotland, Indian Institute of Management Calcutta, Xi’an Jiao Tong University, University of Ulster, IAE Aix-en-Provence, EDC business school in France, Ottawa University Telfer School of Management.
Tim O’Reilly recently stated in a Google+ conversation: Companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue.
2.5 Petabytes: Data flowing through Walmart’s transaction databases. (The Economist)
40 Terabytes: Data generated every second from nuclear physics experiments at the Large Hadron Collider at CERN. (The Economist)
10 Terabytes: Sensor data produced by a jet every 30 minutes of flight time.
1.25 Terabytes: The amount of data the human brain can hold. It performs at roughly 100 teraflops. (Ray Kurzweil as cited by IBM’s Tony Pearson)
1 Terabyte: Structured trading data collected by the NYSE each day the market is open.
340,000: Projected number of people working specifically on big data in 2018 (McKinsey)
40%: five-year compound annual growth rate (CAGR) growth for worldwide big data market. The growth of individual segments varies from 27.3% for servers and 34.2% for software to 61.4% for storage. Infrastructure technology for big data deployments is expected to grow slightly faster at 44% CAGR. Storage, in particular, shows the strongest growth opportunity, growing at 61.4% CAGR through 2015. (IDC)
6Trillion: Big data cost. (IDC )
3.37 Billion worldwide email accounts, 2.52B consumer email accounts/850M corporate email accounts. (Radicati Group/pdf)
$900 Billion/year: cost of lowered employee productivity and reduced innovation from information overload. Despite its heft, this is a fairly conservative number and reflects the loss of 25% of the knowledge worker’s day to the problem. The total could be as high as $1 trillion. (Basex)
3000%: increase in meter reading data captured from deploying smart meters for better energy management. (IBM/pdf)
15 (out of 17): Number of industry sectors in the U.S. that have more data stored, per company, than the U.S. Library of Congress (McKinsey)
Leslie Johnson at the Library of Congress created a full post listing all the reports that compare the amount of data to those of the LoC called, How many Library of Congress does it take? She also wrote, Defining the “Big” in Big Data and Data is the New Black. Follow her @lljohnston.
Information overload was first mentioned in 1962, in an article entitled “Operation Basic: The Retrieval of Wasted Knowledge” by Gertram M. Gross. The problem was predicted by Alvin Toffler in Future Shock (1970), and in 1989, Richard Saul Wurman warned of it in his book, Information Anxiety.
Examples of new software tools include MapReduce and Hadoop. While Hadoop is more widely talked about and used, a recent article reported that MapReduce was successful in sorting a petabyte file of 100-byte records on a system of 8000 computers in 33 minutes compared to the six hours it took to accomplish the same task on a cluster of 4000 machines in 2008 (see “Sorting Petabytes with MapReduce – The Next Episode,” September 2011). From OEDC.
“Interactive: Analyze your smile,” Forbes.com, March 3, 2011. A Web-based application to use your computer’s camera to track your emotions over time to identify what portions of an advertisement you found amusing.
While the tools keep getting faster, the data sets are growing larger. According to Chris Anderson, “Petabytes allow us to say: ‘Correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.” (Anderson, 2008.)
Big data: Harnessing a game-changing asset (pdf). A report from the Economist Intelligence Unit, sponsored by SAS
Big Data, the Next Frontier for Innovation from McKinsey Global Institute, 2011
Measuring the Internet: The Data Challenge, OECD Digital Economy Papers, No. 194. 2012
Making Sense of Big Data, Technology Forecast from PWC (pdf), 2010
While working on the article I also asked my social network for suggestions on books that introduce people-centric practices to data-driven people. While none of the responses were precisely what I was looking for, I heard about many books I haven’t read and you might enjoy learning about too.
If you have a suggestion, please send it to me directly or include it in a comment after this post.
“Pleasantville shows people finding their ‘voice”, coming to life, frantic desire of town hierarchs to stop/control, etc. and all that w nary a hyperlink ;-) Best movie (simplistic) about the +ve side of linking, sharing & finding & using voice.”
If you’re curious how your workplace, your family, or your school can take steps now to increase their big data thinking: Learn more about computational thinking, a growing field that shows how science and math can apply to all aspects of modern work.
[Photo credit: soupautomat4, fdecomite]