Main banner
Responding to REALLY Big Data: the 2010 Digital Universe Study
Submitted by Bernard Golden on Thu, 2010-05-06 16:34

I came across the IDC "2010 Digital Universe Study," which was released earlier this week. Its major theme: get ready for the data deluge.

According to the paper, which I highly recommend reading, 2010 will -- for the first time -- see over one zettabyte of data created. The year 2020 will see 44 times as much data created as was created in 2009. The paper goes on to note that by 2020 around 15% of this data will reside in external clouds.

My conclusions from reading the paper:

  • The growth of data is going to outstrip previous predictions. This is something we work with our clients about all the time. Our economy and society is shifting to all-computing, all-everywhere, all-the-time, so data is going to pour out of every device, every interaction, every transaction.
  • IDC underestimates what proportion of data will live in the cloud. Simply put, the economics of storage will force companies to move to lower cost alternatives, and external specialist providers will deliver the lowest cost option.

So, if these conclusions are correct, what steps should a CIO take today to prepare for the future world of the data deluge? Here are our recommendations:

  • Start planning -- stat. There is a huge wave of bits coming, and failing to put together a strategy to deal with it will mean far more pain later. With this kind of data growth, you'd need a conveyor belt delivering new disk arrays to keep up, which is not workable either technically or economically. Evaluate your roadmap to factor in a much larger standard deviation in storage growth forecasts.
  • Get better pipes. If a significant part of your data is going to reside outside your LAN, you need fast, low-latency pipes to wherever it is. That means better pipes, and probably means -- if you're a decent-sized enterprise -- fiber connectivity rather than a public carrier's standard offering.
  • Implement WAN acceleration. Recognize that every office -- and every data center is going to be sending a ton of data -- and even if the data is going over fiber, you still want it to be efficient, so look to make the traffic smart via WAN acceleration devices. The well-known players here are Citrix, F5, Riverbed, and others I'm undoubtedly leaving out.
  • Look at emerging technologies to further shape traffic. I've been really intrigued by a product that Riverbed is going to come out with that -- putatively -- will optimize and compress block device traffic across the Internet. If this works, it could be an enormous aid in managing big data in a cloud environment. Likewise, look into the new protocols for large file transfer to increase transfer performance.
  • Evaluate your application architecture standards to factor in larger storage and storage distributed across the Internet. You will need to implement architectures capable of gracefully working with storage near and far. While I'm dubious about the typical description of "cloudburst" applications, architecting to incorporate remote storage will be important.

If you are responsible for managing an IT organization, running an infrastructure, or architecting applications, you owe it to yourself to seek out this report, read it, and ponder its implications. If you don't, you have only yourself to blame when the bits come flooding in.