Tuesday, December 16, 2014

Links for 12-16-2014

Monday, December 15, 2014

Photo: Start of winter hiking season

Led an AMC group hiking weekend up to Pinkham Notch in New Hampshire this weekend. Lots of snow!

Thursday, December 11, 2014

Links for 12-11-2014

Podcast: The layers of containers with Red Hat's Mark Lamourine


Mark Lamourine and I continue our ongoing containerization series. In this entry, we break down the containerization stack from the Linux kernel to container packaging to orchestration across multiple hosts and talk about where containers are going.
This will probably wrap-up 2014 for the Cloudy Chat podcast. Among other goals for 2015, I'd like to continue this series with Mark on a biweekly basis or so. I also expect to start folding in DevOps as a way of discussing the ways to take advantage of containerized infrastructures.

Listen to MP3 (0:22:30)
Listen to OGG (0:22:30)

[Transcript]

Gordon Haff:  I'm sitting here with Mark Lamourine, who's a co‑worker of mine, again today. One of our plans for this coming year is I'm going to be inviting Mark onto this show more frequently because Mark's doing a lot of work around the integration of containers, the integration around microservices, or open hybrid cloud platforms. A lot of interest in these topics, and some of the other technologies and trends that intersect with them.
We're going to spend a fair bit of time next year diving down into some of the details. One of the things as we dive down in all these details is we're not going to get into the ABC basics every week, but I'm going to make sure to put some links in the show notes.
If you like what you hear, but want to learn a little bit more about some of the foundational aspects or some of the basics, just go to my blog and look at the show notes. That should have lots of good pointers for you.
Finally, last piece of housekeeping for today, we're going to be talking about the future of containers. There's been some, shall we say, interesting news around containers this week. But we're going to stay focused on this podcast from a customer, a user, a consumer of containers perspective, looking at where they're going to be going, where they might want to be paying attention over the next, let's say, 6 to 12 months type of time frame.
We don't want to get into a lot of inside baseball, inside the beltways sort of politics about what's going on with different companies and personalities, and really we'll stay focused on things from a technology perspective. That's my intro. Welcome, Mark
Mark Lamourine:  Thank you.
Gordon:  Mark, I think most of the listeners here appreciate essentially what containers are, at a high level. Operating system virtualization, the ability to run multiple workloads, multiple applications, within a single instance of an operating system, within a single kernel. But that's, if you would, the first layer of the onion.
What I'd like in this show, as we're talking about where we are today, and where we're going in the future, is to talk a bit more about the different pieces of containers, the different aspects of containers.
The first aspect I'd like to talk about is the foundational elements. What's in the kernel of the operating system. This is kernel space stuff. So could you talk about that?
Mark:  We've discussed before that the initial technology, the enabling technology, which in this case is kernel namespaces, that there have been things like this before in the past. Essentially what they do is allow someone to give a process a different view of the operating system.
They operate when a kernel, when a process asks for, "Where am I in the file system?" The name spaces can say, "Oh, you're at slash," or, "You're at something," and the answer they're getting is a little bit different from what you'd see outside the container. That's really the core technology: the idea of an altered view of the operating system from the point of view of the contained process.
Gordon:  Now there are some different philosophies out there about exactly how you go about doing this from a process perspective.
Mark:  Not so much the technology, but what do you with it once you've got it? How do you mount file systems? What views are useful? How do you build up a special view for a process which is this thing inside a container? There are different ways of doing that and people have different goals. That informs how they want to build one.
Gordon:  I think although this part of containers, this aspect of containers is often hidden, I think it's important to note it's a pretty important part of the entire subsystem because everything else is resting on top of it.
We've some news stories recently, for example, about how, if you don't have this consistency among kernel levels, it's hard to have portability between environments of the applications in a container.
Mark:  How you look at that view, how you compose that view is one element that's interesting and can be different, but you want to make sure that they're providing uniformly so everybody knows what they're getting. One important aspect of that is that these views, they're different views. There's the view that the PIDs can see, that the processes can.
What other processes are available? That's one possible view. There's a view of the file system that each process can see the file system from a different way or they can share one which gives two processes the same view of the file system, but maybe a different process.
This composition is something that people are still working out, how an application would want to see things that have multiple processes with different responsibilities and how do you build up the environment for each one?
Gordon:  That's the foundational container part, which is part of the operating system, depends on a lot of operating system services. It depends on a lot of things the operating system does for security, for resource isolation, that type of stuff.
Now let's talk about the image that is going to make use of that container. As we were talking before this podcast, from your perspective, there are really two particular aspects of images ‑‑ the spec of the image and the instantiation, the actual deployed image.
Let's first talk about the spec of the image and what are some of the things, the characteristics that you see as being important there now and moving forward.
Mark:  Again, uniformity is probably the biggest one. The big container system right now is Docker and Docker has a very concise way of describing what should go into a container. The specification is very small and that's one of the things that Docker has brought and made people realize that this is important.
Prior to using something like Docker, describing a container was very difficult and very time‑consuming and it required expert knowledge. With the realization that you need some kind of concise specification and that you can make some assumptions, containers have become easier to build, and that's really what's instigated the rise of containers in the marketplace.
Gordon:  Let's talk about the other aspect of containers, the instantiation, the payload, the actual instance, if you would. What are some of the trends you see happening there?
Mark:  Again, Docker was kind of the inception. The assumption they made was that you can take this specification, create a series of reusable layers to build up the image. But they specified that they were a tar ball.
Mostly they established a standard, and once that standard is there, people can just stop thinking about it and they can just go on and start working with it. That uniformity of whatever the composed piece is going to really important going forward.
Gordon:  However, that's not necessarily tied into all the other aspects of a container subsystem. That spec, that format can really exist independently of other pieces of technology, and that's probably going to be kind of a theme that we hit a few times in this podcast.
Mark:  At each place you want to have a uniformity, but like you said, that doesn't preclude having a different way of specifying what goes in ‑‑ just that once you've specified it it's got to have a form that other people can accept. The same thing is true with the image format itself.
Once that's there, how it gets instantiated on the far machine, as long as the behavior is the same. That really gets the job done. That allows people to focus on the job they need to do and not a lot of extra work putting everything together.
Gordon:  This always was the conflict with standards at some level. Standards are always great from the point of view of the customer and they really have enormous value in terms of portability, in terms of just not having to think about certain things.
On the other hand, they need to embody the functionality that's needed to get the job done. We don't use parallel printer cables any longer, thank God, because there are standards, certainly, but they're also not very useful in today's world.
Mark:  Yeah, I've said before that probably one of the biggest things that Docker did was to make a certain set of assumptions, and to live with those assumptions, those simplifying assumptions.
That allowed them to get on with the work of building something that was functional. I think that the assumptions are going to be challenged. There are going to be places where their assumptions are too tight for some kinds of uses.
I think the community is going to inform that and the community is going to say, "This is something we need to expand on it." Without a different assumption or without the ability to control those assumptions, we can't really move forward. There are a number of different responses in the market to that.
Gordon:  This is how successful open source projects work. You have a community. You have members of that community with specific needs. If the project as it exists doesn't meet those needs, they need to argue, they need to contribute, they need to convince other people that their ideas, the things they need are really important to the project as a whole.
Of course, there need to be mechanisms in place in that project to have that wide range of contributions.
Mark:  In any good open source project, you get that right from the beginning. The assumption by the authors is, we've got a good idea here or I think I've got a good idea here and I'm going to instantiate it. I'm going to create it and make it the way I think it needs to be.
Then I'm going to accept feedback, because people are going to want to do things with it. Once they see something's neat, they're going to want to say, "Yeah, that's exactly what I want. Only it would be better if I had this too."
Gordon:  Let's talk about the repositories, the ecosystems. You talked about this a little bit last time, but where are we now and what are the next steps? What needs to be done here?
Mark:  Again, returning to Docker, another one of their simplifying assumptions was the creation of this centralized repository of images. That allowed people to get started really quickly. One of the things that people found when they started looking at their enterprise, though, was that it was a public space.
What we need to go forward is we need the ability to know where images come from. Right now things are just thrown out into space, and when you pull something down you don't know where it came from. I don't think there's anybody who really thinks that that's the ideal in the end.
I think to go forward with it, the community needs to build mechanisms where someone who builds a new container image can sign it, can verify that it comes from the person who claims that they built it, and that it has only the parts that were specified and that it gets put out in a public place if it's intended to be public, so that people can be assured that it meets all their requirements and that it's not something malicious.
On the flip side you get companies where they're going to say, "No, I don't want to put this in a public space." There needs to be some private repository mechanism where a company can develop their own images, place them where they can reach them, and retrieve them and use them in ways that they want without exposing it to the public.
Gordon:  Again, this is another example of, there's not just going to be just one way of doing things, because there's a lot of legitimate different requirements out there.
Mark:  There are different environments, although I think there's probably a limited number that we'll find over time. I don't think it's completely open. I think there are a limited number of environments and uses that will fall out over time as people explore how they want to use it.
Gordon:  Finally, let's talk about and again, you touched on some of this during our last podcast, but the orchestration and scheduling piece, which is another piece that I think we sometimes tend to think of as just part of this container subsystem.
In fact we're pretty early in the container concept and we're really still developing how these pieces fit with and complement the lower‑level container function.
Mark:  The whole technology started off with, "Let's build something that runs one." It's actually working out really nicely that as people start using containers, they're kind of naturally backing into bigger and bigger spaces.
They start off going, "Oh, this is really cool. I can run a container on my box that can either run a command I want or I can build a small application using a database and a web server container and I can just push my content into it and it goes."
And people are going, "That's great. Now, how do I do 12?" Or companies are looking at it and saying, "Here's an opportunity. If I can make it so other people can do this, I can sell that service, but I have to enable it for lots of people." I think we're backing into this growing environment that orchestration is going to fill.
I think there's still a lot of work to be done with the orchestration right now. The various orchestration mechanisms, they're not really finished. There are pieces that are still unclear ‑‑ how to manage storage between containers, and a big one is, in a container farm, in an orchestrated container farm, how do you provide network access from the outside?
A lot of work has gone into making it so the containers can communicate with each other, but they're not very useful for most cases until you can reach in from the outside and get information out of them. That requires a software‑defined network, which, if you follow the OpenStack model, they have these things.
That's actually still one of the most difficult problems within OpenStack. I think if you ask people about the three iterations of software‑defined networks within OpenStack, you're going to find that they're still working out the problems with that and OpenStack is four or five years older than any of the container systems are.
Gordon:  One of the things that strikes me when I go to events like LinuxCon and CloudOpen and other types of particularly open source‑oriented industry events is that there's a lot of different work, in many cases addressing different use cases, whether it's Twitter's use cases or Facebook's use cases or some enterprise use case or Google.
There're all these different projects that are being integrated together in different ways, and the thing that strikes me is first of all, wow, there's a lot of smart people working in this stuff out there. But b) we're nowhere ready to say, "This is the golden path to container orchestration now and forever."
Mark:  I would be really surprised if we found that there ever was one golden way. I suspect in the same way that we've got different environments for different uses, you'll find that there are small‑scale orchestration systems that are great for a small shop, and then you're going to get large enterprise systems.
I can guarantee that whatever Google uses in the next five years is going to be something that I probably wouldn't want to install in my house.
Gordon:  Or your phone.
Mark:  Or my phone, yeah. The different scales are going to have very different patterns for use and very different characteristics. I think that there's room in each of those environments to fill it.
Gordon:  Sort of a related theme ‑‑ what I'm going to simplistically call provisioning tools. I was just having a discussion yesterday. You've got Puppet, you've got Chef, you've got Ansible, you've got Salt.
Certainly there're adherents and detractors for all of them and they're at various different points in their maturity cycles, but the other thing that strikes me is there's also a very clear affinity between certain groups of users, like developers or sys admins towards one tool rather than another, because they're really not just the same thing.
Mark:  They're not, and I thought it was interesting that you used the term "provisioning tool" when talking about Puppet and Chef, because that is the way in which people are starting to use it now, where five years ago they would have called it a configuration management tool and the focus wouldn't have been on the initial setup, although that's important. It would have been on long‑term state management.
That's one of the places where containers are going to change how people think about this work, because I think the focus is going to be more on the initial setup and short‑term life of software rather than the traditional ‑‑ actually someone told me to use the word "conventional," although in this case "traditional" might make sense.
The traditional "Set it up and maintain it for a long period of time." Your point about people having different tools for different perspectives is true. I also want to point out that all of these things, even while they're under development, they have use. You might claim that Puppet and Chef and these various things, the configuration management or the provisioning or the container market are evolving.
But at the same time, they're in use. People are getting work out of them right now. People are getting work out of containers now, as much as we're talking about the long‑term aspects, people are using containers now for real work.
Gordon:  Gardner has this idea they call bimodal IT and they have this traditional IT, conventional IT, whatever you want to call it, either you have these “pets” type system. The system runs for a long time. If the application gets sick you try and nurse it back to health.
You do remediation in the running system for security patches, and other types of batches and the like. Then you have this fast IT and the idea there is I've got these relatively short lived systems. If something's wrong with it, it takes what, half a second to spit up a new container. Why on earth would I bother nursing it back to health?
Mark:  I think this is another case where perspective is going to be really important. If you're a PaaS or an IaaS shop, the individual pieces to you are cattle. You don't really care. You've got hundreds, thousands, hundreds of thousands of them out there, and one of them dropping off isn't all that big a deal.
But if you're a PaaS situation, you're cattle is somebody else's pet, and it's going to be really important to either keep this cattle alive, the individual ones, because, to someone, it's really their most important thing. Or, to help them find ways so that they can treat it like a pet while you treat it like cattle.
Where they say, "I want my special application," and you spin up automatically two or three redundant systems so that you see the pieces dying, you kill them off, you restart them, but the user doesn't see that. They shouldn't have to manage it.
Gordon:  To pick Netflix as a much overused example. Obviously, Netflix delivering movies to you as a consumer, that's type the cattle at one level. You lose your ability to watch Orange is the New Black or whatever and you're going to be unhappy.
From Netflix point of view, if you're unhappy, they're unhappy, but the individual micro services are very explicitly designed so that they can individually fail.
Mark:  This is what I was saying that they need to be able to treat it both ways. I don't know, but I suspect that when you're watching your movie, if the server which is feeding it dies, what Netflix sees is, "Oh, something died. Start a new one." What you see is maybe a few seconds glitch in your move, but it comes back.
Mostly, they're reliable. If that's true, then they've managed to do what I was saying. They've managed to make it so that they preserved the important pet information for you somehow. It might be on your client side, but the cattle part of it is still, "Get rid of it and start again."
Gordon:  Well, Mark, this has been a great conversation. We've probably gone on long enough today. But, as I said at the beginning, we're going to continue this as a series going into the New Year because there is a lot happening here, and nobody has all the answers today.
Mark:  That's for sure.

Gordon:  Hey, thanks everyone. Have great holidays.

Tuesday, December 02, 2014

Bath Christmas market

I spent Thankgiving in Bath, England last week. The Christmas market was just getting started up.

Links for 12-02-2014

Monday, December 01, 2014

Why hybrid data models and open source: Cloud Law conference remarks

Cloudlaw MG 4733

This blog post is adapted from my remarks during the Data Governance and Sovereignty – Challenges and Requirements panel at The Broad Group’s Cloud Law conference in London last week. 

The history of the IT industry is a history of cyclical reimaginings. Not repeated cycles exactly. But repeated themes reflected in new and different technologies and environments. One such cycle that’s upon us today is the reinvention of centralized computing under the “cloud” rubric. It’s much different from the mainframe of the 1960s but it shares the motion of intelligence and state to the core and away from the network edge.

Indeed, this centralization cycle is arguably even more intense than that of the past. Author Nick Carr calls it “The Big Switch” by analogy to the centralization of electrical power generation. And, while the ecosystem of cloud service providers is both large and varied, there are but a handful of true global service providers. One data point. The Amazon Web Services re:Invent conference scored about 14,000 attendees this year. Sold out. Just year three for the conference. Just year eight for the service.

Some other day, I’ll be happy to argue why this handful of global service providers isn’t the future of all computing—certainly not within an interesting planning horizon. But there is significant centralization going on for important swaths of computing. And that makes it important to have detailed and precise discussions about governance and sovereignty as they relate to these large entities storing and processing our data.

Need some more convincing? Consider “security,” which leads just about every survey about cloudy concerns or roadblocks. Except security in this context often doesn’t mean classic security concerns like unlatched software or misconfigured firewalls. As 451 Research VP William Fellows noted in his HCTS keynote in October, it’s actually jurisdiction which is the number one question. Perhaps not surprising really given the headlines of that the last year but it reinforces that when people voice concerns about security, they are often talking about matters quite different from the traditional Infosec headaches. Transparency, control over data, and data locality are the big “security” concerns in the context of public cloud providers.

When using public clouds, it’s important to understand where data is stored, how encryption is or can be used, what protections are available, the procedures for notifications in the event of a breach or a judicial request, and many other aspects of due diligence. And, given appropriate vetting, public clouds can be entirely appropriate for many classes of data. At the same time, it’s also important to recognize that there is an inherent sharing of responsibility when using public cloud providers. Reduced control and visibility are just part of the bargain in exchange for not having to run your own servers. 

This tradeoff is one reason for the increasing recognition that much IT will be hybrid. Public clouds remain attractive for many uses whether for reasons of pricing or reasons of flexibility. But private clouds can give greater control over aspects of compute and data storage—as well as making it possible to tailor the environment to an organization’s specific requirements. (Of course, on-premise computing also makes it possible to create gratuitous customizations and complexity but that’s a topic for another day.) Furthermore, public clouds can be something of golden handcuffs—especially above the base infrastructure level. The more cloud provider-specific features you use, the harder it will be to move your workloads on-premise or even to another public cloud provider. You may deem such inflexibility a reasonable tradeoff but it is a tradeoff just as proprietary vertical hardware/software stacks once were in the systems space. 

Open source was one alternative then and it's still an alternative to lock-in today. Control over technology. Control over formats. Control over use. Much of the impetus behind ongoing development of OpenStack, for example, is that organizations of many types have a strategy to become an in-house service provider. The central idea behind OpenStack is to let you build a software defined datacenter for your own use.

The storage of data is central to this concept. Open source storage projects like Gluster and Ceph work on-premise, in a public cloud, or across both using a hybrid model. Ultimately not about public cloud or private cloud being better or worse but which is best suited for a specific use and purpose. And that's leading to hybrid computing, which open source enables in important ways.