Monday, December 01, 2014

Why hybrid data models and open source: Cloud Law conference remarks

Cloudlaw MG 4733

This blog post is adapted from my remarks during the Data Governance and Sovereignty – Challenges and Requirements panel at The Broad Group’s Cloud Law conference in London last week. 

The history of the IT industry is a history of cyclical reimaginings. Not repeated cycles exactly. But repeated themes reflected in new and different technologies and environments. One such cycle that’s upon us today is the reinvention of centralized computing under the “cloud” rubric. It’s much different from the mainframe of the 1960s but it shares the motion of intelligence and state to the core and away from the network edge.

Indeed, this centralization cycle is arguably even more intense than that of the past. Author Nick Carr calls it “The Big Switch” by analogy to the centralization of electrical power generation. And, while the ecosystem of cloud service providers is both large and varied, there are but a handful of true global service providers. One data point. The Amazon Web Services re:Invent conference scored about 14,000 attendees this year. Sold out. Just year three for the conference. Just year eight for the service.

Some other day, I’ll be happy to argue why this handful of global service providers isn’t the future of all computing—certainly not within an interesting planning horizon. But there is significant centralization going on for important swaths of computing. And that makes it important to have detailed and precise discussions about governance and sovereignty as they relate to these large entities storing and processing our data.

Need some more convincing? Consider “security,” which leads just about every survey about cloudy concerns or roadblocks. Except security in this context often doesn’t mean classic security concerns like unlatched software or misconfigured firewalls. As 451 Research VP William Fellows noted in his HCTS keynote in October, it’s actually jurisdiction which is the number one question. Perhaps not surprising really given the headlines of that the last year but it reinforces that when people voice concerns about security, they are often talking about matters quite different from the traditional Infosec headaches. Transparency, control over data, and data locality are the big “security” concerns in the context of public cloud providers.

When using public clouds, it’s important to understand where data is stored, how encryption is or can be used, what protections are available, the procedures for notifications in the event of a breach or a judicial request, and many other aspects of due diligence. And, given appropriate vetting, public clouds can be entirely appropriate for many classes of data. At the same time, it’s also important to recognize that there is an inherent sharing of responsibility when using public cloud providers. Reduced control and visibility are just part of the bargain in exchange for not having to run your own servers. 

This tradeoff is one reason for the increasing recognition that much IT will be hybrid. Public clouds remain attractive for many uses whether for reasons of pricing or reasons of flexibility. But private clouds can give greater control over aspects of compute and data storage—as well as making it possible to tailor the environment to an organization’s specific requirements. (Of course, on-premise computing also makes it possible to create gratuitous customizations and complexity but that’s a topic for another day.) Furthermore, public clouds can be something of golden handcuffs—especially above the base infrastructure level. The more cloud provider-specific features you use, the harder it will be to move your workloads on-premise or even to another public cloud provider. You may deem such inflexibility a reasonable tradeoff but it is a tradeoff just as proprietary vertical hardware/software stacks once were in the systems space. 

Open source was one alternative then and it's still an alternative to lock-in today. Control over technology. Control over formats. Control over use. Much of the impetus behind ongoing development of OpenStack, for example, is that organizations of many types have a strategy to become an in-house service provider. The central idea behind OpenStack is to let you build a software defined datacenter for your own use.

The storage of data is central to this concept. Open source storage projects like Gluster and Ceph work on-premise, in a public cloud, or across both using a hybrid model. Ultimately not about public cloud or private cloud being better or worse but which is best suited for a specific use and purpose. And that's leading to hybrid computing, which open source enables in important ways. 

No comments: