Tuesday, May 15, 2012

Podcast: Complex adaptive systems and APIs with James Urquhart of enStratus

Cloud computing requires a mindset that approaches system architecture as a much more distributed, heterogenerous, and even self-organizing entity than was the historic norm in IT. VP of Product Strategy and GigaOm blogger James Urquhart shares his thoughts on the topic as he discusses:
  • Complex adaptive systems
  • What high availability means in the cloud
  • The role of standards
Listen to MP3 (0:14:22)
Listen to OGG (0:14:22)


Gordon Haff:  Hi everyone. This is Gordon Haff, Cloud Evangelist with Red Hat. I'm here at the Open Cloud Conference in the Bay Area. I'm sitting here with James Urquhart, who's the VP of product strategy for enStratus. Hi, James.

James Urquhart:  How are you, Gordon? Good to see you.

Gordon:  We've known each other for a while. You've had blogs in a number of places that I've also written on and currently on GigaOm, right?

James:  Yeah. I'm a regular contributor to the GigaOm cloud section. I should be blogging more often than I do. You'll see me about every two to three weeks on GigaOm.

Gordon:  I have the same problem, James, getting stuff written on a regular basis.

We've had some really interesting conversations about how the cloud is changing systems architecture. In fact, you had some really interesting thoughts about how to think about architectures with the cloud.
James:  For a long time I've had an interest in the subject of complex adaptive systems. There's an entire science around a world in which there are many, many different individual agents that each have their own behavioral decision making process, whether that's DNA or whether it’s the economic space with buyers and sellers. And then, they interact in very arbitrary ways over a very large scale creating a system that ends up having its own emergent behavior as a system that comes with no central control of that behavior. That's just the way things work out as these agents work out.

If you look at cloud computing, what we're really beginning to do in a very large way is to step out of the silo world into much more of a heavily integrated world where the applications, the infrastructure, the services being delivered are all agents that are being very often decided by different people. A great example of that is I might have multiple agents as an enterprise running on Heroku, which in turn is running on Amazon Web Services.

And so, the behaviors of the systems are decided very independently. The subcomponents of the system are decided very independently. What you're beginning to see is that complex adaptive systems behavior slowly but surely starting to show up in IT in general, in computing in general, on the Internet in general. In part because cloud computing is an enabler of that.

What that means is, if you embrace and understand the complex systems piece of the puzzle, what you're really going to begin to see is a way to understand and to embrace the complexity of the system and to understand how to do your little pieces to make sure that your agents that you care the most about thrive and survive in that system.

I think that's really, to me, the critical shift in thinking. From trying to figure out how to build something that just works and will never break to building something that adapts to the environment and constantly is able to thrive within a changing environment.

Gordon:  I think one way that's sort of an interesting way to think about that at a conference a couple weeks ago, someone got up, up and asked "Is there a way to get five nines reliability in the cloud?" And of course, you’re coming from among other things working in a large systems in the past...The traditional thinking there was that you had some sort of failover clustering capability among large Unix systems or among large mainframes whereas from a transactional perspective, stock transactions, whatnot, you got however many minutes of downtime a year equated to five nines.

And really, though, that's not the right question to ask in the cloud, is it?

James:  No. In fact, there's a really, really interesting part of complex adaptive system science that's really just starting to come out now and be explored by academia in a large way. Now, there's actually a tradeoff between stability and resiliency. If you attempt to say "I want five nines by knowing exactly what my stack is and exactly how that stack works and that nothing is going to fail in that stack," or "If something fails, I know exactly how something else will come in and replace it. But I'm going to make sure that this thing is as stable as possible." The problem you have is there's a number of things that can come in from the environment that shift the ground underneath your designs so much that there's no way that your design can in fact adapt to that change and it will fail as a whole.

A resilient architecture is one much more where you say, "Look. The individual components each have to be able to not only survive the environment as it stands, but the individual components have to be designed in a way that as a horizontally scalable system, as a group of agents working together, that as changes happen in the environment that the system somehow keeps going. The subsystem somehow finds a way to at least meet a minimum set of capability that keeps the system going."

If you look at the way Amazon's designed, if you look at the way that Netflix is designed, this is exactly what they do. That front page of Amazon's not an application that’s made up of a whole bunch of pieces that are all designed to be stable. It's made up of a whole bunch of things where there's a whole bunch of failover and a whole bunch of different ways that data can be gathered.

So go to a cache. If the cache is gone, you go to the core data source. If that data source is gone, there's this other data source that will give you kind of a remotely good picture that you can adapt to. If that data source is gone, then you can say, "Well, I'm just not going to display that element of the page."
But the home page, that Amazon purchase page, is always there. When was the last time you went to Amazon and it was gone completely? That kind of resiliency...Right? Things fail all the time in Amazon, but that resiliency of the overall system gets you the appearance of five nines plus.
I think that that's the beginning shift of the mentality to say, "Rather than focusing on the component and making the component as stable as possible, focus on the relationships between components and how components work together and how can you build as much resiliency into the different relationships and the way things work together so that the system as a whole is in fact quite available, and quite resilient.”

Gordon:  This sort of idea that we're just going to have these utterly standardized APIs that work together in lockstep and communicate with that way. That's really not the future. What we're really talking about architecting for this very heterogeneous environment where you need to sometimes translate from one thing to another, connect to things in a loosely coupled way.

James:  I don't think standard APIs are the problem. I think the way to look at it though is, you're trying to find the patterns and you're trying to make sure that you can build to the patterns that work and to adapt and evolve those patterns over time. But I think there's a place for standard APIs. I think there's a place...Frankly, provisioning a server as an action, there's very little highly differentiated ways that you can provision a server. I think it's very fair to say that we're getting to a point for a Linux system working on an X86 environment, there can be a very, very standard way of doing that basic task.

But that's not the application. That's not the thing that solves a business problem up above.
I believe that there are standards in places that you can come to, but the idea that there's one stack that solves the problem is...And I don't think I've heard anybody really argue that it's all going to be this one big stack and everybody's going to move to this or they're non standard.

I think what you have to realize is that there are different components in the different stacks, that they give you different value. I think it's fine to talk about open standards for interfaces and then for formats, but I think when you go farther than that, when you try to say the stack is locked down, you have to do it this way with these sets of components. I think that that's the point that you break the model. I think that's when the market says "That's a broken model," and they do something different.

A great example, really quickly, about that is just when ITIL took off and companies started identifying ITIL and really, really being hip on it. DevOps shows up. Because ITIL was broken for some aspects of what the business wanted to do, DevOps is much more flexible in terms of the agility when you need agility. So, in fact, it is disruptive to what we thought was the commodity way of doing IT. It's always going to be that way.

Gordon:  I think, really, if you look at the history of IT, you've got big monolithic approaches really have not done as well as more nimble, more modular approaches.

James:  And that's exactly true. I think...There's a gentleman by the name of Simon Wardley who has great writing on this, where he talks about there are spectrums of business activities and there are times when you need to be highly, highly agile and there are times when you need to be locked down and to very closely control change and control adjustments. But what happens is you go through that cycle and get to the end of the cycle, to where things are little bit more like that. That enables a whole new set of innovation on top of that which, in the end, may trickle down and say, "Yep, we need to rethink the way that we're doing X."

It's being prepared and being able to understand that that constant churn is a fact of life. It's something you need to develop your processes with that concept in mind. And that the patterns and the toolsets and the infrastructure that we build out for the cloud is going to have to take that complex systems approach in mind, as well, and really begin to embrace this concept of focusing on the relationships between things more than focusing on the components themselves.

Gordon:  And we do certainly seem to be shifting to an API driven world in a lot of ways. I guess a lot of people tend to think in terms of the Amazon APIs, and the Flickr APIs, and in many cases the more consumer oriented services. But more and more businesses, credit card processors, banks, what have you are starting to expose APIs, if not for general public use then the use of their partners.

James:  Yeah, and I think what's really, really fascinating about that is why those APIs are being exposed. If it's of a surface that you give it some data and instruction and it returns something back to you that's of value. You're basically providing that service through an API instead of through human methods or whatever it might have been before. In other areas where you're saying, "Well, the API is really about how you consume another resource downstream." The problem that you have is the API isn't enough. And so, I think what you're going to see is giant success in terms of exposing business capability through APIs. I know of companies out there like a giant construction company that has this phenomenal API layer over all of their backend systems. They're writing mobile apps that will blow your mind at a rate that, in turn, would blow your mind as well because they just call to a standard REST kind of syntax and structure.

That stuff really works really, really well. But when it comes to saying, "Hey, I want to provision a service so here's my API and that's going to work all the time." That's not true today and it may never be fully true. It may be more true than it is today, but I think you have to understand that there's a lot more that has to be standardized than just APIs to get to that point. That's going to take more work and that's going to take more effort.

But projects like OpenStack, like CloudStack, like Eucalyptus, they have a great opportunity to, in fact, create mini ecosystems or even large ecosystems out there where that's more true than it would be for the cloud as a whole.

It excites me that the API story is taking off because for developers it's powerful. But I also...It's temporary when I say that. I'm sort of saying it's not enough to say APIs. We need common formats and additional common interfaces. That work still needs to be done.

Gordon:  Yeah, you really need hybrid cloud management to take care of some of that really at a level that's below what developers really ought to be worrying about.

James:  Yeah, and that's why...This is the reason why enStratus is focused on the application level of operations. We're about application operations in the cloud. How do you consume cloud services to deliver application capabilities? We're largely focused on infrastructures and service today, but that's obviously an evolving picture. I think when you look at the problem of saying, "I'm going to...My tools for running my application are in a cloud service," that's very limiting in terms of how you do things.

Having tools that say, "Let's step back and abstract how we want to operate our applications in general," and then apply that to the different clouds we might want to consume in the way that we operate. Make sure we're applying consistent governance. Make sure we're applying consistent automation to the approach. Make sure we're applying that in a very independent way so not only are you independent from the clouds that you can choose but also in terms of the tools that you apply to operate.

So with DevOps tools, you want to use Chef/Puppet. What management tools do you want to use? Monitoring tools, those kinds of things, do you want to use in the environment? That's really what the enterprise needs, is that ability to begin to abstract application operations and begin to incorporate the things that they need to in that way.

Looking at application operations as separate from infrastructure and services operations. The delivery of a cloud service to the end customer. So building your private cloud is not an application operation problem. It's a service operations problem. Consuming that private cloud is an application operations problem.

Gordon:  Great. Thanks very much, James. Anything else to add?

James:  No. Congratulations to Red Hat on their wonderful launch and with their OpenShift stuff. I'm very excited to see what's going on in that PaaS side of the market. I think that's a really exciting space to watch. And I'm very happy to have been here with you today and have a chance to talk to you.

Gordon:  Great. Thanks, James.

No comments: