Although cloud providers like Amazon are linking big data with the public cloud, enterprises rightfully don’t see it that way
Posted taken from David Linthicum | InfoWorld, MAY 03, 2013
Amazon.com CTO Werner Vogels recently made the case for big data computing in the cloud. But what else would you expect him to say?
The points made by Vogels are compelling, including a prediction that demand for big-data analysis is spurring interesting in real-time analytics. Enterprises thus need capacity; to Vogels, this means they need the public cloud — Amazon’s public cloud in particular. Vogels also said we can expect infrastructure like Hadoop to become invisible behind a cloud-provided analytics layer such as (of course) Amazon’s Redshift.
[ Get the skinny on big data with InfoWorld’s “Big Data Analytics Deep Dive” and “Hadoop Deep Dive” — two PDF special reports you won’t find anywhere else. | Stay up on the cloud with InfoWorld’s Cloud Computing Report newsletter. ]
Vogels is half right today, and he could be completely correct in five years.
The reality is that big data is, well, big. Most enterprise have some sort of big data project under way, and they see much the same benefits as Vogels does, such as the move to real-time analytics, including predictive analytics that CIOs believe will add a huge amount of value to what enterprise IT can do.
Public cloud computing platforms are indeed compelling. Consider the instant scalability from using auto- and self-provisioning services and from using built-in big data services such as Hadoop. However, in reality, most of what can be called big data today is still very much in the enterprise data center. It may remain there for some time.
The reasons are understandable: The use of local data storage systems means that integration with operational data stores won’t be as much of an issue as it would over an Internet connection. In many instances, using public clouds as the place to store huge amounts of enterprise data seems like a good idea, until you have to ship off the USB drives to the cloud provider and hope they load correctly.
Also, while I think security and compliance are typically solvable problems in public cloud computing, they are easier to work with if the data is local. Moreover, performance is better with local data because you’re not dealing with the latency of sending requests and returning data sets over the open Internet. Finally, hardware and software are cheap these days, and the ROI of putting these systems on public cloud providers versus internal servers is not as persuasive as you might imagine.
Should you discount public clouds as an option for building big data systems? No — for the first generation of big data systems, most enterprises shouldn’t choose the cloud. But that calculus will change over time.