Monday, April 06, 2015

Podcast: Microservices, Rocket, and Docker with Red Hat's Mark Lamourine

This podcast takes a look under the covers at today's different containerization approaches and the implications for developers, operators, and architects. We also talk microservices, the different ways they can be aggregated, and the challenges of figuring out the best service boundaries.

You can also point your browser at Containers on redhat.com to learn more about:

  • Container portability with deployment across physical hardware, hypervisors, private clouds, and public clouds
  • An integrated application delivery platform that spans from app container to deployment target—all built on open standards
  • Trusted access to digitally signed container images that are safe to use and have been verified to work on certified container hosts

Listen to MP3 (0:16:32)
Listen to OGG (0:16:32)

[Transcript]

Gordon Haff:  Welcome to another edition of "Cloudy Chat," Podcast. I'm Gordon Haff in the Cloud Product Strategy Group at Red Hat. I'm here with my colleague, Mark Lamourine. Today, we're going to talking about microservices, Rocket, and Docker, and how these various things relate to each other.
Before we get into technologies, one bit of housekeeping. We're going to be talking about the technical aspects of these things. Unless I say something explicit about commercialized product, none of this should be taken as a product roadmap, or product plans. With that said, Mark, let's take it away.
To start off with, maybe provide a little context. Where does Rocket fit in the container landscape?
Mark Lamourine:  Prior to Rocket, if you were trying to build containers, you really had two avenues. I'm leaving older technologies, like Solaris and stuff aside. We're talking just about Linux containers. It was LXC, which was a build‑your‑own, handcrafted, rather complex mechanism which has existed for years, but really didn't gain any traction outside of a couple of important areas,  which we’re not going to get into today.
Then, about two and half years ago, Docker emerged and made it much easier. By making a lot of assumptions for you about what a container should be, they made it a lot simpler to create containers to move images around to distribute them. They did a lot of tooling around it.
Since the advent of Docker, containers have really caught on, at least in mindshare as a new way to distribute software and distribute applications. Rocket came into that as a slight alternative. The community that was working with Docker and with LXC, they noticed a couple of things that were a little restrictive about what Docker does.
There are some assumptions in Docker that make it slightly more difficult than it might be to create containers that communicate with each other, or to create containers that have multiple processes or a number of different little things, that people have noticed only after they started trying to build complex applications. Rocket was designed specifically to fall into that middle space.
Gordon:  It's probably worth mentioning something here, which you touched on, which is that Docker is often equated with containers, the base level technology. Whereas, Docker really encompasses a lot more than that.
Mark:  You're right. You made two points there. One is that people think of containers, and they think Docker like that's the only way. There are a lot of ways to do containers, or there are now three ways to do containers.
But also Docker encompasses a lot more than just the containers. They've built an ecosystem. They've built a life cycle, or they're starting to build a life cycle. Rocket, again, falls somewhere in the middle of control spectrum. Where LXC, you have very little in the way of infrastructure, very little in the way of control and boundaries, and it's very free, where Docker provides a lot of infrastructure, and a lot of help in producing, and managing your containers and in running them, but it also imposes a certain level of restriction.
Gordon:  Both of these technologies are independent of a containerized operating system, such as RHEL Atomic Host.
Mark:  In fact, they're independent of the operating system in general. You can run both Docker and Rocket on ordinary hosts. You can embed both of them onto a container host, a stripped‑down host, like either Core OS, Project Atomic, or RHEL Atomic.
They're really orthogonal pieces. In fact, they can run side‑by‑side. You can run Rocket right next to Docker, and you can run Rocket containers right next to Docker containers.
Gordon:  I think there's been a tendency in the marketplace, deliberately by some folks to mash things together that don't need to stay together.
Mark:  In some senses, that makes sense because, in the end, the goal is to produce an environment where people can create containerized applications, and stop thinking about the underlying operating system. In that sense, the idea that somehow the two are equated, it makes a certain amount of sense.
But when you're actually developing for one or another, it doesn't really matter. In fact, you don't generally develop containers on a container host. You develop them on a real host, then you migrate them over, test them, and hopefully deploy them out onto a container host.
Gordon:  Mark, what is Rocket at a high level from the technical perspective?
Mark:  Rocket's a container system. What that means is that Rocket is a means of starting processes that have a different view of the host operating system, than an ordinary process does. It's different from other isolation mechanisms, like virtual machines, in that the containerized app is running on the host.
From the host, it looks like an ordinary app, but the containerized app has a different view. Rocket is one mechanism of creating one of these, and what Rocket does is it allows you to create a single image with a single process, a set of process binaries, and run it on the host in its little special environment.
Gordon:  Now, if you think about how Rocket is different from other approaches to implementing containers on a containerized operating system or in some other way, what are some of the most salient characteristics?
Mark:  The big difference between others is that when you're crafting a new image or a new container, if you use LXC, you have to handcraft all of the boundaries, all of the contents. If you're using Docker, you use the Docker file and a base image, which helps you define what goes inside.
With Rocket, you have to handcraft the image in the same way that you do with LXC or a similar way, but Rocket also offers you the advantage of the image spec, which LXC doesn't. It's a simpler image format. It's a simpler set of contents, but you do still have to handcraft what goes inside.
What that means is that you get a very lightweight container image. The container image doesn't have all of the package adding mechanisms. It has just what you need to do your application.
Gordon:  What are the pros and cons of those approaches?
Mark:  For Docker, the big advantage is that the application developers can work very quickly. They don't have to understand a lot about the specifics of how their application works. They can treat it like an ordinary RPM. The disadvantage is that, they tend to get a lot of bulk into their images that doesn't really need to be there at runtime.
The biggest example is whether it's an RPM or Debian‑based system, all that packaging infrastructure is included in the image and is used in composing the image at build time, but it's essentially unused afterwards.
People have to do some fairly interesting tricks to get around that. In Rocket, you just put the files you want in and go on.
Gordon:  We've really been talking about the impact on the developer. What are the differences at runtime?
Mark:  Again, we get back to comparisons with Docker. With Docker, you get one image and it has one process. The boundaries of the container are defined by the contents of the image. With Rocket, the image is a start and the runtime is specified. But when you create an actual container with Rocket, you can supply multiple images and it can run multiple processes.
It establishes one set of boundaries around all of those processes. With that means is that the processes can communicate with each other, but not with the outside world, except through the holes you push. When you're working with a Docker container, and you want multiple processes to communicate with each other, you can't easily have the two processes communicate without explicitly punching lots of little holes.
What it means is, again, that it's easier to compose a Rocket container from multiple images that cooperate, and still have the containment boundary.
Gordon:  Are there any implications for application architect or microservice type application architectures?
Mark:  There are a lot of implications for that. There are a number of places where what Rocket does is it recognizes that the processes running in a service have more or less affinity to each other. Some of them only connect via a network. Other ones need to share files.
They may need to share pipes. They need to be on the same host. Rocket gives you the flexibility to control the distance between the processes and to control more finely which ones are treated as close boundaries and which ones are treated as wide open ones.
Gordon:  Presumably, there are also implications at the orchestration layer, like Kubernetes, for example.
Mark:  One thing about Docker is it's taken what we've learned about orchestration so far and is starting to apply it to the containers themselves, the container design. I hadn't looked at it until very recently, but it turns out that the Rocket developers have actually adopted some Kubernetes technology.
They had started out calling a container this thing which has multiple processes in it. The documentation recently has started to refer to that as a pod, which is a term pulled from Kubernetes, which refers to multiple processes which run on the same host, and which share resources more tightly than merely ones that are communicating over a network.
Gordon:  This all very much corresponds to some of the terminology, different terminology of the ideas in microservices where you expose every microservice or do you have nested microservices because some things don't really need to be explicitly exposed to the outside world?
Mark:  I think there's still a lot of exploration that needs to go across the container community about how good application architectures will look. There are purists. Actually, it's interesting. There aren't that many purists in this area. Everyone's pretty much exploring. There are a few. But I think we'll find out from practical experience, over time, what works for what situations.
Gordon:  One of the things people adopting microservices have found to be very difficult is figuring out what these boundaries are to start with. In fact, I've even seen for some situations the recommendation that maybe you start out doing a more monolithic application, and then break that out into microservices, once you understand how the parts relate.
Mark:  This is one of the areas where you might call me a purist, because I take exactly the opposite tack. I think part of the reason we don't know, we find difficulty in finding those boundaries when we're trying to decompose an application, is because we've deliberately ignored them. They were unimportant in a host‑based environment.
If you put the files down in their own root and two different parts of an application share them, on a host it doesn't matter. There is no boundary to cross. As we start putting applications into containers, if you treat them as, what someone I found out last week calls, "Virt‑Light," if you treat them as if it's a virtual machine that you're stuffing things into, you actually continue to obscure the communications that you've been ignoring before.
I think it's still early, and I think people are going to probably take the monolithic path at first as a perceived easier first step. But I think it's really important to start uncovering the ways in which these covert communications can hide inside the applications and try to find ways both to identify them a priority, to figure out whether they need to be there in the first place, and then figure out how to flag them, split them out, establish proper communications boundaries.
In some senses, this mirrors the object‑oriented wave that took place in the '80s and '90s where one of the points of object‑oriented programming was to enforce the boundaries, the modularity, and the coherence. Some of the people took it. Some people decided that wasn't so important after all.
Gordon:  New global variables are really useful.
Mark:  At times, yes.
Gordon:  [laughs] As you were talking, I was thinking exactly the same thing, because I mean none of the things are perfect today. But I think it's absolutely true that when you first saw object‑oriented programming out there, there were a lot of people that couldn't be bothered with having to use methods to get data out of classes and that thing. You took a lot of shortcuts.
Mark:  It was perceived as a lot of these new waves are, it was perceived at the time as overhead, as extra work, as unnecessary work. I know what's there. I'll go get it. The changes have been informed by the recognition that software has a life cycle and, that by doing this little bit of overhead, it actually frees the developer of whatever the tool is to change the implementation without causing problems for the user on the other end; for the consumer, in the case of libraries or objects.
Because containers are this object where we want to treat them as a unit, and then we want to be able to push things in and get things out, these boundaries are going to become very important.
If we expect in the end to build this hardware store mentality, where someone can go in and get, "I need a database container, but I need one that's tuned this way," rather than having one container for the database, which you have this monstrous set of tuning parameters.
What I'd like to see is that we discover, "Gee, there are some people who can just use the default tuning parameters, who never touch them, and that's fine." Then you might find others where they have specific patterns, because I suspect that not every tuning parameter has equal value.
You're going to find specific situations where you want to tune specific things. The response to that rather than having one highly configurable thing is to have a small set of slightly tuned ones, and you can go and say, "Oh, I want the generic database thing. Or, I want the storage tuned database thing. Or, I want the throughputs tuned database thing," and then only passing parameters that are necessary to get your job done.
It puts some of the work back on the developer to try, and think through the common use cases, and figure out what the right tuning choices are, but it's going to make it much easier for the consumer in the long run, if we can shift that work.
Gordon:  Well, great. It's, as always, been a great discussion today, Mark. Thank you.

Mark:  Thank you.

No comments: