Tuesday, December 16, 2014

Links for 12-16-2014

Monday, December 15, 2014

Photo: Start of winter hiking season

Led an AMC group hiking weekend up to Pinkham Notch in New Hampshire this weekend. Lots of snow!

Thursday, December 11, 2014

Links for 12-11-2014

Podcast: The layers of containers with Red Hat's Mark Lamourine


Mark Lamourine and I continue our ongoing containerization series. In this entry, we break down the containerization stack from the Linux kernel to container packaging to orchestration across multiple hosts and talk about where containers are going.
This will probably wrap-up 2014 for the Cloudy Chat podcast. Among other goals for 2015, I'd like to continue this series with Mark on a biweekly basis or so. I also expect to start folding in DevOps as a way of discussing the ways to take advantage of containerized infrastructures.

Listen to MP3 (0:22:30)
Listen to OGG (0:22:30)

[Transcript]

Gordon Haff:  I'm sitting here with Mark Lamourine, who's a co‑worker of mine, again today. One of our plans for this coming year is I'm going to be inviting Mark onto this show more frequently because Mark's doing a lot of work around the integration of containers, the integration around microservices, or open hybrid cloud platforms. A lot of interest in these topics, and some of the other technologies and trends that intersect with them.
We're going to spend a fair bit of time next year diving down into some of the details. One of the things as we dive down in all these details is we're not going to get into the ABC basics every week, but I'm going to make sure to put some links in the show notes.
If you like what you hear, but want to learn a little bit more about some of the foundational aspects or some of the basics, just go to my blog and look at the show notes. That should have lots of good pointers for you.
Finally, last piece of housekeeping for today, we're going to be talking about the future of containers. There's been some, shall we say, interesting news around containers this week. But we're going to stay focused on this podcast from a customer, a user, a consumer of containers perspective, looking at where they're going to be going, where they might want to be paying attention over the next, let's say, 6 to 12 months type of time frame.
We don't want to get into a lot of inside baseball, inside the beltways sort of politics about what's going on with different companies and personalities, and really we'll stay focused on things from a technology perspective. That's my intro. Welcome, Mark
Mark Lamourine:  Thank you.
Gordon:  Mark, I think most of the listeners here appreciate essentially what containers are, at a high level. Operating system virtualization, the ability to run multiple workloads, multiple applications, within a single instance of an operating system, within a single kernel. But that's, if you would, the first layer of the onion.
What I'd like in this show, as we're talking about where we are today, and where we're going in the future, is to talk a bit more about the different pieces of containers, the different aspects of containers.
The first aspect I'd like to talk about is the foundational elements. What's in the kernel of the operating system. This is kernel space stuff. So could you talk about that?
Mark:  We've discussed before that the initial technology, the enabling technology, which in this case is kernel namespaces, that there have been things like this before in the past. Essentially what they do is allow someone to give a process a different view of the operating system.
They operate when a kernel, when a process asks for, "Where am I in the file system?" The name spaces can say, "Oh, you're at slash," or, "You're at something," and the answer they're getting is a little bit different from what you'd see outside the container. That's really the core technology: the idea of an altered view of the operating system from the point of view of the contained process.
Gordon:  Now there are some different philosophies out there about exactly how you go about doing this from a process perspective.
Mark:  Not so much the technology, but what do you with it once you've got it? How do you mount file systems? What views are useful? How do you build up a special view for a process which is this thing inside a container? There are different ways of doing that and people have different goals. That informs how they want to build one.
Gordon:  I think although this part of containers, this aspect of containers is often hidden, I think it's important to note it's a pretty important part of the entire subsystem because everything else is resting on top of it.
We've some news stories recently, for example, about how, if you don't have this consistency among kernel levels, it's hard to have portability between environments of the applications in a container.
Mark:  How you look at that view, how you compose that view is one element that's interesting and can be different, but you want to make sure that they're providing uniformly so everybody knows what they're getting. One important aspect of that is that these views, they're different views. There's the view that the PIDs can see, that the processes can.
What other processes are available? That's one possible view. There's a view of the file system that each process can see the file system from a different way or they can share one which gives two processes the same view of the file system, but maybe a different process.
This composition is something that people are still working out, how an application would want to see things that have multiple processes with different responsibilities and how do you build up the environment for each one?
Gordon:  That's the foundational container part, which is part of the operating system, depends on a lot of operating system services. It depends on a lot of things the operating system does for security, for resource isolation, that type of stuff.
Now let's talk about the image that is going to make use of that container. As we were talking before this podcast, from your perspective, there are really two particular aspects of images ‑‑ the spec of the image and the instantiation, the actual deployed image.
Let's first talk about the spec of the image and what are some of the things, the characteristics that you see as being important there now and moving forward.
Mark:  Again, uniformity is probably the biggest one. The big container system right now is Docker and Docker has a very concise way of describing what should go into a container. The specification is very small and that's one of the things that Docker has brought and made people realize that this is important.
Prior to using something like Docker, describing a container was very difficult and very time‑consuming and it required expert knowledge. With the realization that you need some kind of concise specification and that you can make some assumptions, containers have become easier to build, and that's really what's instigated the rise of containers in the marketplace.
Gordon:  Let's talk about the other aspect of containers, the instantiation, the payload, the actual instance, if you would. What are some of the trends you see happening there?
Mark:  Again, Docker was kind of the inception. The assumption they made was that you can take this specification, create a series of reusable layers to build up the image. But they specified that they were a tar ball.
Mostly they established a standard, and once that standard is there, people can just stop thinking about it and they can just go on and start working with it. That uniformity of whatever the composed piece is going to really important going forward.
Gordon:  However, that's not necessarily tied into all the other aspects of a container subsystem. That spec, that format can really exist independently of other pieces of technology, and that's probably going to be kind of a theme that we hit a few times in this podcast.
Mark:  At each place you want to have a uniformity, but like you said, that doesn't preclude having a different way of specifying what goes in ‑‑ just that once you've specified it it's got to have a form that other people can accept. The same thing is true with the image format itself.
Once that's there, how it gets instantiated on the far machine, as long as the behavior is the same. That really gets the job done. That allows people to focus on the job they need to do and not a lot of extra work putting everything together.
Gordon:  This always was the conflict with standards at some level. Standards are always great from the point of view of the customer and they really have enormous value in terms of portability, in terms of just not having to think about certain things.
On the other hand, they need to embody the functionality that's needed to get the job done. We don't use parallel printer cables any longer, thank God, because there are standards, certainly, but they're also not very useful in today's world.
Mark:  Yeah, I've said before that probably one of the biggest things that Docker did was to make a certain set of assumptions, and to live with those assumptions, those simplifying assumptions.
That allowed them to get on with the work of building something that was functional. I think that the assumptions are going to be challenged. There are going to be places where their assumptions are too tight for some kinds of uses.
I think the community is going to inform that and the community is going to say, "This is something we need to expand on it." Without a different assumption or without the ability to control those assumptions, we can't really move forward. There are a number of different responses in the market to that.
Gordon:  This is how successful open source projects work. You have a community. You have members of that community with specific needs. If the project as it exists doesn't meet those needs, they need to argue, they need to contribute, they need to convince other people that their ideas, the things they need are really important to the project as a whole.
Of course, there need to be mechanisms in place in that project to have that wide range of contributions.
Mark:  In any good open source project, you get that right from the beginning. The assumption by the authors is, we've got a good idea here or I think I've got a good idea here and I'm going to instantiate it. I'm going to create it and make it the way I think it needs to be.
Then I'm going to accept feedback, because people are going to want to do things with it. Once they see something's neat, they're going to want to say, "Yeah, that's exactly what I want. Only it would be better if I had this too."
Gordon:  Let's talk about the repositories, the ecosystems. You talked about this a little bit last time, but where are we now and what are the next steps? What needs to be done here?
Mark:  Again, returning to Docker, another one of their simplifying assumptions was the creation of this centralized repository of images. That allowed people to get started really quickly. One of the things that people found when they started looking at their enterprise, though, was that it was a public space.
What we need to go forward is we need the ability to know where images come from. Right now things are just thrown out into space, and when you pull something down you don't know where it came from. I don't think there's anybody who really thinks that that's the ideal in the end.
I think to go forward with it, the community needs to build mechanisms where someone who builds a new container image can sign it, can verify that it comes from the person who claims that they built it, and that it has only the parts that were specified and that it gets put out in a public place if it's intended to be public, so that people can be assured that it meets all their requirements and that it's not something malicious.
On the flip side you get companies where they're going to say, "No, I don't want to put this in a public space." There needs to be some private repository mechanism where a company can develop their own images, place them where they can reach them, and retrieve them and use them in ways that they want without exposing it to the public.
Gordon:  Again, this is another example of, there's not just going to be just one way of doing things, because there's a lot of legitimate different requirements out there.
Mark:  There are different environments, although I think there's probably a limited number that we'll find over time. I don't think it's completely open. I think there are a limited number of environments and uses that will fall out over time as people explore how they want to use it.
Gordon:  Finally, let's talk about and again, you touched on some of this during our last podcast, but the orchestration and scheduling piece, which is another piece that I think we sometimes tend to think of as just part of this container subsystem.
In fact we're pretty early in the container concept and we're really still developing how these pieces fit with and complement the lower‑level container function.
Mark:  The whole technology started off with, "Let's build something that runs one." It's actually working out really nicely that as people start using containers, they're kind of naturally backing into bigger and bigger spaces.
They start off going, "Oh, this is really cool. I can run a container on my box that can either run a command I want or I can build a small application using a database and a web server container and I can just push my content into it and it goes."
And people are going, "That's great. Now, how do I do 12?" Or companies are looking at it and saying, "Here's an opportunity. If I can make it so other people can do this, I can sell that service, but I have to enable it for lots of people." I think we're backing into this growing environment that orchestration is going to fill.
I think there's still a lot of work to be done with the orchestration right now. The various orchestration mechanisms, they're not really finished. There are pieces that are still unclear ‑‑ how to manage storage between containers, and a big one is, in a container farm, in an orchestrated container farm, how do you provide network access from the outside?
A lot of work has gone into making it so the containers can communicate with each other, but they're not very useful for most cases until you can reach in from the outside and get information out of them. That requires a software‑defined network, which, if you follow the OpenStack model, they have these things.
That's actually still one of the most difficult problems within OpenStack. I think if you ask people about the three iterations of software‑defined networks within OpenStack, you're going to find that they're still working out the problems with that and OpenStack is four or five years older than any of the container systems are.
Gordon:  One of the things that strikes me when I go to events like LinuxCon and CloudOpen and other types of particularly open source‑oriented industry events is that there's a lot of different work, in many cases addressing different use cases, whether it's Twitter's use cases or Facebook's use cases or some enterprise use case or Google.
There're all these different projects that are being integrated together in different ways, and the thing that strikes me is first of all, wow, there's a lot of smart people working in this stuff out there. But b) we're nowhere ready to say, "This is the golden path to container orchestration now and forever."
Mark:  I would be really surprised if we found that there ever was one golden way. I suspect in the same way that we've got different environments for different uses, you'll find that there are small‑scale orchestration systems that are great for a small shop, and then you're going to get large enterprise systems.
I can guarantee that whatever Google uses in the next five years is going to be something that I probably wouldn't want to install in my house.
Gordon:  Or your phone.
Mark:  Or my phone, yeah. The different scales are going to have very different patterns for use and very different characteristics. I think that there's room in each of those environments to fill it.
Gordon:  Sort of a related theme ‑‑ what I'm going to simplistically call provisioning tools. I was just having a discussion yesterday. You've got Puppet, you've got Chef, you've got Ansible, you've got Salt.
Certainly there're adherents and detractors for all of them and they're at various different points in their maturity cycles, but the other thing that strikes me is there's also a very clear affinity between certain groups of users, like developers or sys admins towards one tool rather than another, because they're really not just the same thing.
Mark:  They're not, and I thought it was interesting that you used the term "provisioning tool" when talking about Puppet and Chef, because that is the way in which people are starting to use it now, where five years ago they would have called it a configuration management tool and the focus wouldn't have been on the initial setup, although that's important. It would have been on long‑term state management.
That's one of the places where containers are going to change how people think about this work, because I think the focus is going to be more on the initial setup and short‑term life of software rather than the traditional ‑‑ actually someone told me to use the word "conventional," although in this case "traditional" might make sense.
The traditional "Set it up and maintain it for a long period of time." Your point about people having different tools for different perspectives is true. I also want to point out that all of these things, even while they're under development, they have use. You might claim that Puppet and Chef and these various things, the configuration management or the provisioning or the container market are evolving.
But at the same time, they're in use. People are getting work out of them right now. People are getting work out of containers now, as much as we're talking about the long‑term aspects, people are using containers now for real work.
Gordon:  Gardner has this idea they call bimodal IT and they have this traditional IT, conventional IT, whatever you want to call it, either you have these “pets” type system. The system runs for a long time. If the application gets sick you try and nurse it back to health.
You do remediation in the running system for security patches, and other types of batches and the like. Then you have this fast IT and the idea there is I've got these relatively short lived systems. If something's wrong with it, it takes what, half a second to spit up a new container. Why on earth would I bother nursing it back to health?
Mark:  I think this is another case where perspective is going to be really important. If you're a PaaS or an IaaS shop, the individual pieces to you are cattle. You don't really care. You've got hundreds, thousands, hundreds of thousands of them out there, and one of them dropping off isn't all that big a deal.
But if you're a PaaS situation, you're cattle is somebody else's pet, and it's going to be really important to either keep this cattle alive, the individual ones, because, to someone, it's really their most important thing. Or, to help them find ways so that they can treat it like a pet while you treat it like cattle.
Where they say, "I want my special application," and you spin up automatically two or three redundant systems so that you see the pieces dying, you kill them off, you restart them, but the user doesn't see that. They shouldn't have to manage it.
Gordon:  To pick Netflix as a much overused example. Obviously, Netflix delivering movies to you as a consumer, that's type the cattle at one level. You lose your ability to watch Orange is the New Black or whatever and you're going to be unhappy.
From Netflix point of view, if you're unhappy, they're unhappy, but the individual micro services are very explicitly designed so that they can individually fail.
Mark:  This is what I was saying that they need to be able to treat it both ways. I don't know, but I suspect that when you're watching your movie, if the server which is feeding it dies, what Netflix sees is, "Oh, something died. Start a new one." What you see is maybe a few seconds glitch in your move, but it comes back.
Mostly, they're reliable. If that's true, then they've managed to do what I was saying. They've managed to make it so that they preserved the important pet information for you somehow. It might be on your client side, but the cattle part of it is still, "Get rid of it and start again."
Gordon:  Well, Mark, this has been a great conversation. We've probably gone on long enough today. But, as I said at the beginning, we're going to continue this as a series going into the New Year because there is a lot happening here, and nobody has all the answers today.
Mark:  That's for sure.

Gordon:  Hey, thanks everyone. Have great holidays.

Tuesday, December 02, 2014

Bath Christmas market

I spent Thankgiving in Bath, England last week. The Christmas market was just getting started up.

Links for 12-02-2014

Monday, December 01, 2014

Why hybrid data models and open source: Cloud Law conference remarks

Cloudlaw MG 4733

This blog post is adapted from my remarks during the Data Governance and Sovereignty – Challenges and Requirements panel at The Broad Group’s Cloud Law conference in London last week. 

The history of the IT industry is a history of cyclical reimaginings. Not repeated cycles exactly. But repeated themes reflected in new and different technologies and environments. One such cycle that’s upon us today is the reinvention of centralized computing under the “cloud” rubric. It’s much different from the mainframe of the 1960s but it shares the motion of intelligence and state to the core and away from the network edge.

Indeed, this centralization cycle is arguably even more intense than that of the past. Author Nick Carr calls it “The Big Switch” by analogy to the centralization of electrical power generation. And, while the ecosystem of cloud service providers is both large and varied, there are but a handful of true global service providers. One data point. The Amazon Web Services re:Invent conference scored about 14,000 attendees this year. Sold out. Just year three for the conference. Just year eight for the service.

Some other day, I’ll be happy to argue why this handful of global service providers isn’t the future of all computing—certainly not within an interesting planning horizon. But there is significant centralization going on for important swaths of computing. And that makes it important to have detailed and precise discussions about governance and sovereignty as they relate to these large entities storing and processing our data.

Need some more convincing? Consider “security,” which leads just about every survey about cloudy concerns or roadblocks. Except security in this context often doesn’t mean classic security concerns like unlatched software or misconfigured firewalls. As 451 Research VP William Fellows noted in his HCTS keynote in October, it’s actually jurisdiction which is the number one question. Perhaps not surprising really given the headlines of that the last year but it reinforces that when people voice concerns about security, they are often talking about matters quite different from the traditional Infosec headaches. Transparency, control over data, and data locality are the big “security” concerns in the context of public cloud providers.

When using public clouds, it’s important to understand where data is stored, how encryption is or can be used, what protections are available, the procedures for notifications in the event of a breach or a judicial request, and many other aspects of due diligence. And, given appropriate vetting, public clouds can be entirely appropriate for many classes of data. At the same time, it’s also important to recognize that there is an inherent sharing of responsibility when using public cloud providers. Reduced control and visibility are just part of the bargain in exchange for not having to run your own servers. 

This tradeoff is one reason for the increasing recognition that much IT will be hybrid. Public clouds remain attractive for many uses whether for reasons of pricing or reasons of flexibility. But private clouds can give greater control over aspects of compute and data storage—as well as making it possible to tailor the environment to an organization’s specific requirements. (Of course, on-premise computing also makes it possible to create gratuitous customizations and complexity but that’s a topic for another day.) Furthermore, public clouds can be something of golden handcuffs—especially above the base infrastructure level. The more cloud provider-specific features you use, the harder it will be to move your workloads on-premise or even to another public cloud provider. You may deem such inflexibility a reasonable tradeoff but it is a tradeoff just as proprietary vertical hardware/software stacks once were in the systems space. 

Open source was one alternative then and it's still an alternative to lock-in today. Control over technology. Control over formats. Control over use. Much of the impetus behind ongoing development of OpenStack, for example, is that organizations of many types have a strategy to become an in-house service provider. The central idea behind OpenStack is to let you build a software defined datacenter for your own use.

The storage of data is central to this concept. Open source storage projects like Gluster and Ceph work on-premise, in a public cloud, or across both using a hybrid model. Ultimately not about public cloud or private cloud being better or worse but which is best suited for a specific use and purpose. And that's leading to hybrid computing, which open source enables in important ways. 

Thursday, November 20, 2014

Links for 11-20-2014

Open source accelerating the pace of software

When we talk about the innovation that communities bring to open source software, we often focus on how open source enables contributions and collaboration within communities. More contributors, collaborating with less friction.

However, as new computing architectures and approaches rapidly evolve for cloud computing, for big data, for the Internet of Things (IoT), it's also becoming evident that the open source development model is extremely powerful because of the manner in which it allows innovations from multiple sources to be recombined and remixed in powerful ways.

Read the rest on opensource.com...

Wednesday, November 05, 2014

Slides: What manufacturing can teach us about DevOps


What manufacturing teaches about DevOps from ghaff

Software development, like manufacturing, is a craft that requires the application of creative approaches to solve problems given a wide range of constraints. However, while engineering design may be craftwork, the production of most designed objects relies on a standardized and automated manufacturing process. By contrast, much of moving an application from prototype to production and, indeed, maintaining the application through its lifecycle has often remained craftwork. In this session, Gordon Haff discusses the many lessons and processes that DevOps can learn from manufacturing and the assembly line-like tools, such as Platform-as-a-Service, that provide the necessary abstraction and automation to make industrialized DevOps possible.

The core product matters

From a post-election comment.

[Ref:] 8. The Democratic ground game is losing ground.That was one I always doubted. There were lots of stories about how good the Democratic ground game was. Also a lot of related stories about how good the Democratic analytic folks are, and how good the Democrats were with social media.

But having worked on large analytics projects for retailers most of the last 30 years, the Democratic stories sound really similar to those I've heard in the business world. When something works lots of people want to claim credit for it working. Getting credit for a successful piece of a campaign gives a consultant the ability to charge higher rates for years. And it doesn't much matter if that campaign was to elect a candidate or to get people to visit a store.

The reality is when there's a good product that people want the marketing is easy. And analytic tweaks to the marketing message are at best of marginal value. When people don't want the product the marketing and analytics won't save it. President Obama had lots of people who were proud of him being the first black president. They were easy to get to the polls. It didn't take a great ground game, great analytics people, or an inspired social media presence. It just worked.

A lot of effort goes into marginal things. Product names, lots of branding details, or focus on insane detail that isn’t even “on the screen.” It does add up. Or it’s an inherent part of an overall mindset or approach that can’t be divorced from what is on the screen. 

But blocking and tackling is usually most evident when it’s absent or deeply flawed. Suspicion is probably warranted when extraordinary claims are made for results stemming from optimizations made far outside the core product.

Reducing the Wisdom of Crowds

This is something we’ve studied a lot in constructing the FiveThirtyEight model, and it’s something we’ll take another look at before 2016. It may be that pollster “herding” — the tendency of polls to mirror one another’s results rather than being independent — has become a more pronounced problem. Polling aggregators, including FiveThirtyEight, may be contributing to it. A fly-by-night pollster using a dubious methodology can look up the FiveThirtyEight or Upshot or HuffPost Pollster or Real Clear Politics polling consensus and tweak their assumptions so as to match it — but sometimes the polling consensus is wrong.

I find this an interesting point from Nate Silver over at FiveThirtyEight. I think I’ve seen something similar in Oscar contest data that I’ve analyzed. It wasn’t unequivocal but:

The trend lines do seem to be getting closer over time. I suspect... we're seeing that carefully-considered predictions are increasingly informed by the general online wisdom. The result is that Consensus in the contest starts to closely parallel the wisdom of the Internet because that's the source so many people entering the contest use. And those people who do the best in the contest over time? They lean heavily on the same sources of information too. There's increasingly a sort of universal meta-consensus from which no one seriously trying to optimize their score can afford to stray too far.

There are some fancy statistical terms for some of this but fundamentally what’s happening is that information availability, aggregation, and (frankly) the demonstrated success of aggregating in many cases tend to drown out genuine individual insights. 

Tuesday, November 04, 2014

Links for 11-04-2014

Podcast: Docker, Kubernetes, and App Packaging with Red Hat's Mark Lamourine

My colleague Mark has been doing lots of work on Docker, managing Docker across multiple hosts, and integrating Docker with other functionality such as that provided through the Pulp content repository. Mark sat down with me to talk about some of his recent experiences working with these technologies.

Mark’s blog which drills into many of the things we discuss here
The server virtualization landscape, circa 2007

Listen to MP3 (0:18:30)
Listen to OGG (0:18:30)

[Transcript]

Gordon Haff:  Hi everyone. This is Gordon Haff with Red Hat, and today I'm sitting here with one of my colleagues, Mark Lamourine, who's also in the cloud product strategy group. Mark's been doing a bunch of work with Docker, containers, Kubernetes, figuring out how all this stuff works together. I decided to sit Mark down, and we're going to talk about this. Welcome, Mark.
Mark Lamourine:  Hello.
Gordon:  As a way of kicking things off, one of the things that struck me as interesting with this whole Docker application packaging business is that you go back a few years...and I was looking at all these different virtualization types. I'll try to find some links to stick in the show notes. One of the interesting things when you're looking around the mid‑2000s or so is there was all this work going on with what they were calling application virtualization.
This idea of being able to package up applications within a copy of an operating system in a way that really brought all the dependencies of an application together so that you didn't have conflicts between applications or missing parts and that kind of thing. One of the things that I find interesting around Docker now is it's, at its heart, using operating system containers as another form of virtualization.
What Docker's really done is it's added some capabilities that solve some of the same problems that application virtualization was trying to introduce.
Mark:  One of the things about much of this technology that people have observed is that in many cases the technology itself isn't new, and in some cases the concepts aren't even new. If you look at something like Java, the origin of Java and virtual machines goes back to Berkeley Pascal, which is decades old, but it wasn't ready because the hardware wasn't ready. The ideas were there.
In the case of application virtualization, people were taking this new thing and saying, "What can I do with it?" What they found was you could, indeed, build complete applications into a virtual machine, and for a while people were passing around VMware images to run.
VMware actually had a Player whose job it was to take these images and provide them to people, but they turned out to be heavy‑weight. They tended to not be designed to interact with lots of things outside. It was a great place to put everything inside, but once you had it inside it didn't interact very well with anything else.
We've gone on with machine virtualization as the hardware's gotten better. We've used it in ways that we found appropriate. We weren't sure what the patterns would be back then. We found the patterns for machine virtualization, but they didn't include this kind of idea. It didn't seem to work out.
The impetus for containerization, which is a reuse of old multi‑tenant log onto the machine, get an account style computer use...with the creation of things like cgroups and the namespaces and the pieces that came from Solaris for zones we've suddenly got a new way of putting all of this stuff together which promises to give us what we were looking for back in 2000 with machine virtualization but which didn't seem to be the best use of resources.
We're looking now at containerization that is much lighter weight. It doesn't provide its own operating system, it doesn't provide its own networking stack, it doesn't provide its own self‑contained disk storage in the containerization mechanism. Those both pose problems because we have to do the isolation, which the cgroups has done.
They also provide opportunities because you could now put greater density on these containers, assuming we could solve the access problems. If we could solve these other missing pieces, or make it so they're not necessary, we could achieve the goal that they were trying with machine virtualization a decade ago in something that actually scales well.
Gordon:  The other thing that has happened is the use case has changed, so to speak. Containers were actually reasonably successful in at least certain niches of the service provider world. The reason was that in the service provider world you have these very standardized types of machines that people use. You couldn't manage that scale any other way.
At the time, that was very different from the way most enterprises were set up where you had all these unique operating system instances. In a sense, the service provider model has come to the enterprise. The service provider type use cases are now, increasingly, a better model, certainly for IAAS and cloud‑style work loads.
There's a couple of things you talked about I'd like to dive into a little bit deeper, but it would be useful to, first of all, we've been throwing around Docker, we've been throwing around containers. Sometimes those two concepts, those two technologies, are viewed as different names for the same thing, but that's not really true, is it?
Mark:  In a practical sense, right now it's fairly close to true. I try to make the distinction between containers, which are a conceptual object, and Docker, which is a current implementation we're working with, because the possibility exists that somebody's going to come along and create one that is different.
We were just talking a couple of minutes before we started about Docker within Microsoft, and I observed that some of the things that make Docker possible are, at least right now, very Linux‑centric features ‑ the cgroups and namespaces. I don't know the state of those technologies within the Microsoft operating systems.
There's certainly no barrier I can think of to creating them, or the features may already even be there if you know the right places. I can imagine that someone would create a Microsoft version or even a Macintosh Darwin version of what Docker does using slightly different technologies.
Containerization is certainly more general. It's putting processes on an operating system and then creating a new view for them so that they don't see the same thing that a general process does. Docker does that.
Gordon:  The other interesting aspect of Docker, of course, though, is it goes beyond the foundation infrastructure element and gets into how you package up applications and how you deploy applications. That's the other thing that has gotten people interested in containers broadly and Docker, in particular, these days.
Mark:  I can remember the first time I saw Docker. I had been working on OpenShift on the isolation pieces. OpenShift is a platform as a service. It provides multi‑tenant services on a single box without virtualization, without traditional virtualization. I'd been working on the isolation pieces so that one customer's processes couldn't see what was happening with another customer's processes and couldn't damage them or whatever.
I was doing lots of ad hoc work to try and make all of this stuff. When I saw Docker I was like, 'OK, that's it. I've been going at it the hard way all along.' The novelty, as you said, wasn't the underlying technologies. The novelty was that they had made a good set of starting assumptions and then made those assumptions easy.
The problem with a lot of container systems ‑ we were struggling with OpenShift and people had problems with Solaris Zones ‑ was that it was difficult to create a new one because there were a lot of moving parts and knobs. One of the Docker novelties was that you've got this very concise, very clear mechanism for describing what goes into your images which form your containers, and creating new ones is very easy.
That's in contrast to everything that I've seen that went before, and that was the thing that a lot of people have jumped onto. It's not that this was necessarily new but all of a sudden it's easy to use at least in its simplest form.
Gordon:  We've been talking about the individual microservice or the individual application in the context of containerization and Docker. You've also been looking at some of the ways that you can orchestrate, manage, scale groups of Docker containers together.
Mark:  One of the things that I was interested in right away was the fact that Docker is designed to work on a single host. When you create a container, all of the commands that you use in Docker all relate to the single host with one exception which is pulling down new images. They can come from the Docker hub or they can come from another repository.
Once you get them started, they all run on one host. That's where Docker stops, but we're all getting very used to the idea of cloud services where the location of something shouldn't matter and the connectivity between it shouldn't matter. When you start thinking about complex applications using containers, the first thing you think is, 'OK, I'm going to put these everywhere and I don't care where they are.'
Docker doesn't address that. There are a number of things where people are developing them. One of them that we're looking at using strongly at Red Hat is Kubernetes. Kubernetes is a Google project which I believe comes out of their own internal containerization efforts. I believe they've started to re‑engineer their own container orchestration publicly to use Docker.
They're still just beginning that. It became evident very soon that trying to build complex applications which span multiple hosts with Docker and with Kubernetes that there are still pieces that need to be ironed out. I picked a project called Pulp that we use inside Red Hat and that we use for repository mirroring because it had all the parts that I needed. It was a good model.
It has a database, it has a message broker, it has some persistent storage, it has parts that are used by multiple processes that are shared by multiple processes. Those are all use cases and use models that I know people are going to want to do, and Pulp has them all in a single service.
I thought, 'If I can get Pulp working on Docker using Kubernetes then that would expose all of these issues.' Hopefully I can get something that actually works, but in the process I should be able to show where the things that are easy and, more importantly, where the things that are still hard or still unresolved are.
Gordon:  How do you see potential listeners, customers, how are they most likely to encounter Docker and how are they most likely to make use of it in the near‑term and mid‑term?
Mark:  The funny thing is that they won't know it. If all of this works out, people who are using web applications or cloud applications they won't know and they won't care what the underlying technology is.
When it comes to people developing applications, that's where things are going to get interesting. Those people are going to see something fairly new. We're used to doing web applications where we have a web server and maybe a database behind it and we have traditional multi‑tier applications. We're moving those into the cloud using traditional virtual machine virtualization.
One of the things that Docker and Kubernetes promises is to create a Lego system, a kind of building block system, for whole applications that hasn't been present before. When we deliver software at Red Hat traditionally we deliver them as RPMs, which are just bundle of bits, and they go on your box and then you're responsible for tying all the parts together. We may write some configuration scripts to help do that.
What Docker promises to do, if everything works out and we address all the problems, is that you would be able to go down a shelf and say, "OK, here's my database module, here's my storage module, here's my web service module."
I could imagine a graphical application where you drag these things off some kind of shelf, tie them together with some sort of buttons, and say, "Go." That thing goes out and comes into existence and tells you where it lives and you can point people at it and put code there. The developers are going to be the people who see this.
There's also going to be another layer of developers who are the people who create these things. That's still something that's up in the air, both the magical, unicorn world I just described and the dirty hands of creating these things. Those are both works in progress.
Gordon:  Sounds like object‑oriented programming.
Mark:  Except that object‑oriented programming quickly devolved into 'it's exactly what I want only not.' We have to avoid that pitfall. We need to figure out ways of making good assumptions for what people are going to need and indicating what those assumptions are and providing ways for people to extend things but also incentive not to. To use things as they are when it's possible and to extend them when it's necessary.
Gordon:  It's interesting. Going back to the beginning of this podcast, we talked about application virtualization. As you correctly said, the reason you're maybe not super familiar with it was it never really took off as a mainstream set of technologies. On the client side, what did take off, because it solved real problems and was what people were looking for, is, essentially, things like the Android app store, the Apple app store.
That is client‑side application virtualization, in a sense. I think that same type of solve actual problems that people have without turning it into some complex OASIS  or ISO type of complexity, that's where you'll see the real benefit and you'll see the real win.
Mark:  I agree. That store was made possible by advances in network technology and storage technology. We're seeing advances which cause the same kind of disruption in the programming and software developing mechanism.
When Android and iOS were designed, they were designed specifically to set the boundaries on what developers could choose. There's an Android SDK which gives you an abstracted view of the underlying hardware for whether it's phone or GPS or storage or CPU or whatever, and iOS has similar abstractions. When you develop an iOS app, you're still developing one self‑contained thing which uses these parts.
In a lot of cases, you create one which lives on the phone but communicates with some service out in the cloud, and that's a very common model now. It'll be very interesting to see a couple of different things that could develop.
One is that people will use these kinds of containers to create the web‑side applications that go with things like an Android app or an iOS app, but the possibility exists that you could actually create these things and run them on your phone. You could create something where your application is composed of a set of containers that run on your phone.
Those things still need to be worked out because right now ‑ we had discussed at another time ‑ Android runs Linux and Linux, as of not too long ago, has cgroups in it, which is the enabling technology for Docker containers. It's already there. You could conceivably write an application in C that you could load onto an Android phone that would run using Docker containers.
You need to port Docker. There are lots of things that need to be moved over, but all of those are software above the operating system. Android phones already have that operating system part there. I don't think it'll be very long before people are trying to add the parts, and it'll be interesting to see if Google has any strategy for containers, for Docker‑style containers, in the Android environment.
If they don't do it I suspect somebody else is going to try it, and we'll find out whether it works or not when they do.
Gordon:  Great, Mark. There's lots more things I'd like to discuss, but we're going to cut off this podcast and save up some of those other topics for next time. Thanks, Mark.
Mark:  You're welcome. Thank you very much.

Gordon:  Thank you everyone...