Tuesday, November 04, 2014

Podcast: Docker, Kubernetes, and App Packaging with Red Hat's Mark Lamourine

My colleague Mark has been doing lots of work on Docker, managing Docker across multiple hosts, and integrating Docker with other functionality such as that provided through the Pulp content repository. Mark sat down with me to talk about some of his recent experiences working with these technologies.

Mark’s blog which drills into many of the things we discuss here
The server virtualization landscape, circa 2007

Listen to MP3 (0:18:30)
Listen to OGG (0:18:30)

[Transcript]

Gordon Haff:  Hi everyone. This is Gordon Haff with Red Hat, and today I'm sitting here with one of my colleagues, Mark Lamourine, who's also in the cloud product strategy group. Mark's been doing a bunch of work with Docker, containers, Kubernetes, figuring out how all this stuff works together. I decided to sit Mark down, and we're going to talk about this. Welcome, Mark.
Mark Lamourine:  Hello.
Gordon:  As a way of kicking things off, one of the things that struck me as interesting with this whole Docker application packaging business is that you go back a few years...and I was looking at all these different virtualization types. I'll try to find some links to stick in the show notes. One of the interesting things when you're looking around the mid‑2000s or so is there was all this work going on with what they were calling application virtualization.
This idea of being able to package up applications within a copy of an operating system in a way that really brought all the dependencies of an application together so that you didn't have conflicts between applications or missing parts and that kind of thing. One of the things that I find interesting around Docker now is it's, at its heart, using operating system containers as another form of virtualization.
What Docker's really done is it's added some capabilities that solve some of the same problems that application virtualization was trying to introduce.
Mark:  One of the things about much of this technology that people have observed is that in many cases the technology itself isn't new, and in some cases the concepts aren't even new. If you look at something like Java, the origin of Java and virtual machines goes back to Berkeley Pascal, which is decades old, but it wasn't ready because the hardware wasn't ready. The ideas were there.
In the case of application virtualization, people were taking this new thing and saying, "What can I do with it?" What they found was you could, indeed, build complete applications into a virtual machine, and for a while people were passing around VMware images to run.
VMware actually had a Player whose job it was to take these images and provide them to people, but they turned out to be heavy‑weight. They tended to not be designed to interact with lots of things outside. It was a great place to put everything inside, but once you had it inside it didn't interact very well with anything else.
We've gone on with machine virtualization as the hardware's gotten better. We've used it in ways that we found appropriate. We weren't sure what the patterns would be back then. We found the patterns for machine virtualization, but they didn't include this kind of idea. It didn't seem to work out.
The impetus for containerization, which is a reuse of old multi‑tenant log onto the machine, get an account style computer use...with the creation of things like cgroups and the namespaces and the pieces that came from Solaris for zones we've suddenly got a new way of putting all of this stuff together which promises to give us what we were looking for back in 2000 with machine virtualization but which didn't seem to be the best use of resources.
We're looking now at containerization that is much lighter weight. It doesn't provide its own operating system, it doesn't provide its own networking stack, it doesn't provide its own self‑contained disk storage in the containerization mechanism. Those both pose problems because we have to do the isolation, which the cgroups has done.
They also provide opportunities because you could now put greater density on these containers, assuming we could solve the access problems. If we could solve these other missing pieces, or make it so they're not necessary, we could achieve the goal that they were trying with machine virtualization a decade ago in something that actually scales well.
Gordon:  The other thing that has happened is the use case has changed, so to speak. Containers were actually reasonably successful in at least certain niches of the service provider world. The reason was that in the service provider world you have these very standardized types of machines that people use. You couldn't manage that scale any other way.
At the time, that was very different from the way most enterprises were set up where you had all these unique operating system instances. In a sense, the service provider model has come to the enterprise. The service provider type use cases are now, increasingly, a better model, certainly for IAAS and cloud‑style work loads.
There's a couple of things you talked about I'd like to dive into a little bit deeper, but it would be useful to, first of all, we've been throwing around Docker, we've been throwing around containers. Sometimes those two concepts, those two technologies, are viewed as different names for the same thing, but that's not really true, is it?
Mark:  In a practical sense, right now it's fairly close to true. I try to make the distinction between containers, which are a conceptual object, and Docker, which is a current implementation we're working with, because the possibility exists that somebody's going to come along and create one that is different.
We were just talking a couple of minutes before we started about Docker within Microsoft, and I observed that some of the things that make Docker possible are, at least right now, very Linux‑centric features ‑ the cgroups and namespaces. I don't know the state of those technologies within the Microsoft operating systems.
There's certainly no barrier I can think of to creating them, or the features may already even be there if you know the right places. I can imagine that someone would create a Microsoft version or even a Macintosh Darwin version of what Docker does using slightly different technologies.
Containerization is certainly more general. It's putting processes on an operating system and then creating a new view for them so that they don't see the same thing that a general process does. Docker does that.
Gordon:  The other interesting aspect of Docker, of course, though, is it goes beyond the foundation infrastructure element and gets into how you package up applications and how you deploy applications. That's the other thing that has gotten people interested in containers broadly and Docker, in particular, these days.
Mark:  I can remember the first time I saw Docker. I had been working on OpenShift on the isolation pieces. OpenShift is a platform as a service. It provides multi‑tenant services on a single box without virtualization, without traditional virtualization. I'd been working on the isolation pieces so that one customer's processes couldn't see what was happening with another customer's processes and couldn't damage them or whatever.
I was doing lots of ad hoc work to try and make all of this stuff. When I saw Docker I was like, 'OK, that's it. I've been going at it the hard way all along.' The novelty, as you said, wasn't the underlying technologies. The novelty was that they had made a good set of starting assumptions and then made those assumptions easy.
The problem with a lot of container systems ‑ we were struggling with OpenShift and people had problems with Solaris Zones ‑ was that it was difficult to create a new one because there were a lot of moving parts and knobs. One of the Docker novelties was that you've got this very concise, very clear mechanism for describing what goes into your images which form your containers, and creating new ones is very easy.
That's in contrast to everything that I've seen that went before, and that was the thing that a lot of people have jumped onto. It's not that this was necessarily new but all of a sudden it's easy to use at least in its simplest form.
Gordon:  We've been talking about the individual microservice or the individual application in the context of containerization and Docker. You've also been looking at some of the ways that you can orchestrate, manage, scale groups of Docker containers together.
Mark:  One of the things that I was interested in right away was the fact that Docker is designed to work on a single host. When you create a container, all of the commands that you use in Docker all relate to the single host with one exception which is pulling down new images. They can come from the Docker hub or they can come from another repository.
Once you get them started, they all run on one host. That's where Docker stops, but we're all getting very used to the idea of cloud services where the location of something shouldn't matter and the connectivity between it shouldn't matter. When you start thinking about complex applications using containers, the first thing you think is, 'OK, I'm going to put these everywhere and I don't care where they are.'
Docker doesn't address that. There are a number of things where people are developing them. One of them that we're looking at using strongly at Red Hat is Kubernetes. Kubernetes is a Google project which I believe comes out of their own internal containerization efforts. I believe they've started to re‑engineer their own container orchestration publicly to use Docker.
They're still just beginning that. It became evident very soon that trying to build complex applications which span multiple hosts with Docker and with Kubernetes that there are still pieces that need to be ironed out. I picked a project called Pulp that we use inside Red Hat and that we use for repository mirroring because it had all the parts that I needed. It was a good model.
It has a database, it has a message broker, it has some persistent storage, it has parts that are used by multiple processes that are shared by multiple processes. Those are all use cases and use models that I know people are going to want to do, and Pulp has them all in a single service.
I thought, 'If I can get Pulp working on Docker using Kubernetes then that would expose all of these issues.' Hopefully I can get something that actually works, but in the process I should be able to show where the things that are easy and, more importantly, where the things that are still hard or still unresolved are.
Gordon:  How do you see potential listeners, customers, how are they most likely to encounter Docker and how are they most likely to make use of it in the near‑term and mid‑term?
Mark:  The funny thing is that they won't know it. If all of this works out, people who are using web applications or cloud applications they won't know and they won't care what the underlying technology is.
When it comes to people developing applications, that's where things are going to get interesting. Those people are going to see something fairly new. We're used to doing web applications where we have a web server and maybe a database behind it and we have traditional multi‑tier applications. We're moving those into the cloud using traditional virtual machine virtualization.
One of the things that Docker and Kubernetes promises is to create a Lego system, a kind of building block system, for whole applications that hasn't been present before. When we deliver software at Red Hat traditionally we deliver them as RPMs, which are just bundle of bits, and they go on your box and then you're responsible for tying all the parts together. We may write some configuration scripts to help do that.
What Docker promises to do, if everything works out and we address all the problems, is that you would be able to go down a shelf and say, "OK, here's my database module, here's my storage module, here's my web service module."
I could imagine a graphical application where you drag these things off some kind of shelf, tie them together with some sort of buttons, and say, "Go." That thing goes out and comes into existence and tells you where it lives and you can point people at it and put code there. The developers are going to be the people who see this.
There's also going to be another layer of developers who are the people who create these things. That's still something that's up in the air, both the magical, unicorn world I just described and the dirty hands of creating these things. Those are both works in progress.
Gordon:  Sounds like object‑oriented programming.
Mark:  Except that object‑oriented programming quickly devolved into 'it's exactly what I want only not.' We have to avoid that pitfall. We need to figure out ways of making good assumptions for what people are going to need and indicating what those assumptions are and providing ways for people to extend things but also incentive not to. To use things as they are when it's possible and to extend them when it's necessary.
Gordon:  It's interesting. Going back to the beginning of this podcast, we talked about application virtualization. As you correctly said, the reason you're maybe not super familiar with it was it never really took off as a mainstream set of technologies. On the client side, what did take off, because it solved real problems and was what people were looking for, is, essentially, things like the Android app store, the Apple app store.
That is client‑side application virtualization, in a sense. I think that same type of solve actual problems that people have without turning it into some complex OASIS  or ISO type of complexity, that's where you'll see the real benefit and you'll see the real win.
Mark:  I agree. That store was made possible by advances in network technology and storage technology. We're seeing advances which cause the same kind of disruption in the programming and software developing mechanism.
When Android and iOS were designed, they were designed specifically to set the boundaries on what developers could choose. There's an Android SDK which gives you an abstracted view of the underlying hardware for whether it's phone or GPS or storage or CPU or whatever, and iOS has similar abstractions. When you develop an iOS app, you're still developing one self‑contained thing which uses these parts.
In a lot of cases, you create one which lives on the phone but communicates with some service out in the cloud, and that's a very common model now. It'll be very interesting to see a couple of different things that could develop.
One is that people will use these kinds of containers to create the web‑side applications that go with things like an Android app or an iOS app, but the possibility exists that you could actually create these things and run them on your phone. You could create something where your application is composed of a set of containers that run on your phone.
Those things still need to be worked out because right now ‑ we had discussed at another time ‑ Android runs Linux and Linux, as of not too long ago, has cgroups in it, which is the enabling technology for Docker containers. It's already there. You could conceivably write an application in C that you could load onto an Android phone that would run using Docker containers.
You need to port Docker. There are lots of things that need to be moved over, but all of those are software above the operating system. Android phones already have that operating system part there. I don't think it'll be very long before people are trying to add the parts, and it'll be interesting to see if Google has any strategy for containers, for Docker‑style containers, in the Android environment.
If they don't do it I suspect somebody else is going to try it, and we'll find out whether it works or not when they do.
Gordon:  Great, Mark. There's lots more things I'd like to discuss, but we're going to cut off this podcast and save up some of those other topics for next time. Thanks, Mark.
Mark:  You're welcome. Thank you very much.

Gordon:  Thank you everyone...

No comments: