Wednesday, September 25, 2013

Links for 09-25-2013

Thursday, September 19, 2013

What are containers and how did they come about?

Today's announced collaboration between Red Hat and dotCloud, the company behind Docker, is exciting for a lot of reasons. As the release notes: "Docker and OpenShift currently leverage the same building blocks to implement containers, such as Linux kernel namespaces and resource management with Control Groups (cGroups). Red Hat Enterprise Linux Gears in OpenShift use Security-Enhanced Linux (SELinux) access control policies to provide secure multi-tenancy and reduce the risk of malicious applications or kernel exploits."

Among other areas of collaboration, we'll be working with dotCloud "to integrate Docker with OpenShift’s cartridge model for application orchestration. This integration will combine the power of Docker containers with OpenShift's ability to describe and manage multi-container applications, enabling customers to build more sophisticated applications with enhanced portability."

(See also the blog post from Docker here.)

But what are containers exactly? I'm down at LinuxCon/CloudOpen in New Orleans this week and I've seen a lot of interest in the various sessions that touch on containers. I've also see a fair bit of confusion and vagueness over what they are and what function they serve. In a way, I find this a bit surprising as the concept--and products based on that concept--has been around for almost a decade and progenitors go back even further. But I think it reflects just how thoroughly hypervisor-based virtualization has come to dominate discussions about partitioning physical systems into smaller chunks. Today is a far cry from the mid-2000s when I was writing research notes as an industry analyst about the "partitioning bazaar" which saw the offering of all manner of both hardware-based, software-based, and hybrid techniques--many implemented on large Unix servers. 

See: The Partitioning Bazaar (2002), New Containments for New Times (2005), and The Server Virtualization Bazaar, Circa 2007. Some of this blog post is adapted from material in those earlier notes. The original notes get into more of the historical background and context for those who are interested.

Hypervisor-based Virtualization

First let's consider hypervisor-based virtualization, aka hardware virtualization. Feel free to skip or skim this part. But I think the context will be useful for some.

Virtual Machines (VMs) are software abstractions, a way to fool operating systems and their applications into thinking that they have access to a real (i.e. physical) server when, in fact, they have access to only a virtualized portion of one. Each VM then has its own independent OS and applications, and is not even aware of any other VMs that may be running on the same box, other than through the usual network interactions that systems have (and certain other communication mechanisms that have evolved over time). Thus, the operating systems and applications from VMs are isolated from each other in (almost) the same manner as if they were running on separate physical servers. They’re created by a virtual machine monitor (VMM), often called a “hypervisor,” that sits on top of the hardware, where it creates and manages one or more VMs sitting on top of the hypervisor, and interfaces with the underlying hardware. (The hypervisor thereby provides many of the services provided by an operating system and in the case of, for example, KVM that operating system can even be a general purpose OS like Linux. For the purposes of this discussion, we're only considering native virtualization as opposed to host-based virtualization such as provided by VirtualBox which isn't really relevant in the server space.)

The result is multiple independent operating system instances running on a single physical server, each of which communicate with the rest of the world, including other instances on the same physical server, through a hypervisor. Historically, the reason server virtualization was so interesting was that it enabled server consolidation--the amalgamation of several underutilized physical servers into one virtualized one while keeping workloads isolated from each other. This last point was important because Windows workloads in particular often couldn't be just crammed together into a single operating system instance because of DLL hell, version dependencies, and other issues. The VM approach was also just a good fit for enterprises which often maintained lots of different operating systems an versions thereof. VMs let them largely keep on doing things the way they were used to--just with virtual servers instead of physical ones.

Over time, server virtualization also introduced features such as live migration which allowed moving running instances from machine to machine as well of a variety of other features going beyond consolidation. These features too generally reflected enterprise needs and stateful "system of record" workloads.

Where did containers come from?

Containers build from the basic *nix process model that forms the basis for separation. Although a process is not truly an independent environment, it does provide basic isolation and consistent interfaces. For example, each process has its own identity and security attributes, address space, copies of registers, and independent references to common system resources. These various features standardize communications between processes and help reduce the degree to which wayward processes and applications can affect the system as a whole.

*nix also builds in some basic resource management at the process level—including priority-based scheduling, augmented by things like the intrinsic function ulimit, which can be used to set maximum resources such as CPU cycles, file descriptors, and locked memory used by a process and its descendants. More recently, Control Groups (Cgroups) have significantly extended the resource management built into Linux--providing controls over CPU, memory, disk, and I/O use such as that often offered through add-on management products in the past.

The first example of isolating resource groups from each other probably dates to 1999 when the FreeBSD jail(2) function reused the chroot implementation, but blocked off the normal routes to escape chroot confinement. However, two different implementations garnered a fair bit of attention in the 2000s: one from SWsoft's Virtuozzo (the company is now called Parallels) and another in Sun's Solaris. The Solaris 10 implementation is probably what most popularized the "containers" term, which was their marketing name for OS virtualization. (Their technical docs used "zones" for the same thing.) IBM also introduced containers in AIX which were unique in that they allowed for moving running containers between systems.

Despite some interest in containers during a certain period, though, they never had a broad-based impact. SWsoft came with a history of success in the hosting space where their flavor of containers (Virtual Private Servers) had a fair degree of success. That's because (most hosting providers are typically highly cost sensitive and, as we'll see, containers enable very high density relative to hypervisor-style virtualization (among other differences). However, the push behind containers never moved them into new markets to any appreciable degree. In part, this was because of technical match with enterprise requirements. It was probably as much because of an industry tendency to standardize on particular approaches--and that ended up being VMware for server consolidation during the 2000s. 

What are containers?

Before getting into what's happening today, let's talk about what containers are from a technical perspective. I've tried to make this description relatively generic as opposed to getting into specific implementations such as OpenShift's.

Like partitions, a container presents the appearance of being a separate and independent OS image—a full system, really. But, like the workload groups that containers extend, there’s only one actual copy of an operating system running on a physical server. Are containers lightweight partitions or reinforced workload groups? That’s really a matter of definition and interpretation, because they have characteristics of each. It may help to think of them as “enhanced resource partitions” that effectively bridge the two categories.

Containers virtualize an OS; the applications running in each container believe that they have full, unshared access to their very own copy of that OS. This is analogous to what VMs do when they virtualize at a lower level, the hardware. In the case of containers, it’s the OS that does the virtualization and maintains the illusion.

Containers can be very low-overhead. Because they run atop a single copy of the operating system, they consume very few system resources such as memory and CPU cycles. In particular, they require far fewer resources than workload management approaches that require a full OS copy for each isolated instance. 

Containers tend to have lower management overhead, given that there’s but a single OS to be patched and kept current with security and bug fixes. Once a set of patches is applied and the system restarted, all containers automatically and immediately benefit. With other forms of partitioning such as hypervisor-based virtualization, each OS instance needs to be patched and updated separately, just as they would if they were on independent, physical servers. This is a critical benefit in hosting environments but it has often been seen as a negative in much more heterogeneous enterprise environments.

What containers don't do is provide much if any additional fault isolation for problems arising outside the process or group of processes being contained. If the operating system or underlying hardware goes, so go its containers—that is, every container running on the system. However, it's worth noting that, over the past decade, an enormous amount of work has gone into hardening the Linux kernel and its various subsystems. Furthermore, SELinux can be used to provide additional security isolation between processes. 

So, here are a few statements about containers that are (generally) true:

  • The containers on a single physical server (or virtual machine) run on a single OS kernel. The degree to which the contents of a given container can be customized is somewhat implementation dependent.
  • As a result, many patches will apply across the containers associated with an OS instance.
  • Management of resource allocation between containers is fast and low overhead because it's done by a single kernel managing its own process threads. 
  • Similarly the creation (and destruction) of containers is faster and lower overhead than booting a virtual machine.
  • Today, running containers cannot generally be moved from one running system to another.

Why today's interest?

In a nutshell: Because cloud computing--Platform-as-a-Service (PaaS) in particular--more closely resembles hosting providers than traditional enterprise IT.

A lot has to do with the nature of cloud workloads. I explored the differences between traditional "systems of record" and cloud-style "systems of engagement" in a new whitepaper. However, in brief, cloud-style workloads tend towards scale-out, stateless, and loosely coupled. They also tend to run on more homogeneous environments (alongside existing applications under hybrid cloud management) and use languages (Java, Python, Ruby, etc.) that are largely abstracted from the underlying operating system.

Some of the implications of these characteristics is that you don't generally need to protect the state of individual instances (using clustering or live migration). Nor do you typically have or want to have a highly disparate set of underlying OS images given that makes management harder. You also tend to have a large number of smaller and shorter-lived application instances. These are all a good match for containers.

PaaS amplifies all this in that it explicitly  abstracts away the underlying infrastructure and enables the rapid creation and deployment of applications with auto-scaling. This is a great match for containers, both because of the high densities and the rapid resource reallocation they enable (and, indeed, require). 

In short, it's no coincidence that containers have re-entered the conversation so strongly. They're a match for cloud broadly and PaaS in particular. This isn't to say that they're going to replace virtual machines. I'd argue rather that they're a great complement that needn't (and probably shouldn't) try to replicate the VM capabilities that were designed with different use cases in mind. 

Tuesday, September 17, 2013

Links for 09-17-2013

Tuesday, September 10, 2013

Links for 09-10-2013

Hanging out about open clouds

James Maguire of Datamation moderated a Google+ Hangout panel yesterday with myself together with Devin Carlen (Nebula), Marten Mickos (Eucalyptus Systems), and Peder Ulander (Citrix) on the topic of open source clouds. Good discussion mixed in with just enough competitive jousting to keep things interesting.

Monday, September 09, 2013

Wrapping up VMworld 2013 in SF

Belated I know. But last week was spent balancing some high priority work with R&R and I wanted to give my first reactions a little time to settle anyway. At least that's my excuse and I'm sticking with it. So, without further ado, here were some things that either struck me or caught my eye at my seventh VMworld.

Disclaimer: I am Cloud Evangelist for Red Hat. Read what I write in that context. However, also please note that these opinions are mine alone and do not necessarily represent the position of my employer. I also note that I usually avoid getting into a lot of competitor specifics when I write. However, VMworld is a major event and I thought some may find my observations useful.


It was sober.

By which I mean it was more about product/initiative rationalization, integration, and extension than it was about bold claims and major new directions. Even that which was arguably most new, VMware NSX (network virtualization), was a largely anticipated consequence of VMware's Nicira acquisition. (Furthermore, based on a number of conversations I had at the show, it isn't clear to me that the typical attendee understands at this point what software-defined-networking means for them either technologically or organizationally. To be clear, I firmly believe that software-defined-everything is an important trend; I just think it will take time to develop.)

Sober isn't necessarily bad, by the way. In fact, the industry analyst who described this year's event to me thusly weigh meant it as something of a complement. The opposite of hype-to-excess. But it made for a general sense of VMware securing its core virtualization business and framing every big trend in the context of that existing business even though cloud computing, for example, is hardly just virtualization with a bit of management tacked on. Note, for example, how deliberately the VMware NSX moniker evokes VMware ESX. 

Exploiting existing beachheads can be a good business and technology strategy. It's also a strategy that can fail miserably. (See, for example, Microsoft's determination to bridge from its Windows dominance to mobile.)

Speaking of which. "Defy Convention." Really? 

I get that someone in VMware corporate marketing or some executive thought it would be a good idea to theme the show around the polar opposite of stodgy old VMware. The problem is that this brainstorm apparently began and ended with a slogan and some grunge graphics. All the actual content of the conference was about incrementally extending virtualization. A comforting message for the thousands of virtualization admins in attendance? Sure. A message about upending the status quo? Not so much.

(OK. VMware NSX--or at least what it represents--is threatening to network admins and to Cisco. But even most anything that whiffed too strongly of software-defined-storage was pretty low profile given how VMware parent EMC is still deeply in the business of selling expensive specialized storage hardware.)

The non-hybrid hybrid.

Just one example of not-defying-convention is VMware's approach to hybrid cloud. You can have any kind of hybrid cloud you want, private or public, so long as it's based on VMware technology. It has a certain logic from VMware's perspective. They've had no real success competing with non-VMware-based public cloud providers and are unlikely to change that using their proprietary software. What they can arguably do is provide a simple bridge for their installed base to a tightly circumscribed set of public cloud resources. (That these public cloud resources will be a combination of VMware data centers and those partners would seem to be a channel conflict issue but that's another topic.)

A useful incremental feature for a subset of their installed base perhaps. But hardly hybrid in any true meaning of the word. Chalk up another one for convention.

 Pivotal MIA.

Finally I note that Pivotal, the Paul Maritz-led EMC initiative which includes CloudFoundry and other VMware pieces involved with developing cloud applications was largely absent from the show. It had a booth--plastered with various vague slogans around "making business faster" and such--and put out a short press release with VMware about expanding their "strategic partnership" but wasn't otherwise very visible. 

I get the deliberate arms-length relationship that helps VMware focus on its core business while Pivotal works on cool new stuff. But one consequence is that VMware looks to be refocused on infrastructure and its historical comfort zone (and the comfort zone of many in attendance at VMworld) rather than the broader possibilities opened up by a complete cloud portfolio.

And that's about convention, not defying it.


Video: SolidFire's CEO David Wright talks flash storage and OpenStack

At VMworld in San Francisco at the end of August, SolidFire offered me the opportunity to sit down with their CEO, David Wright. SolidFire makes what it describes as "scale-out high performance storage systems built to deliver guaranteed performance for public and private clouds." We talked about why flash-based storage products--including SolidFire's--are coming on so strong. (They were a number of flash examples at VMworld in spite of this being a general category of product that didn't really go much of anywhere for a number of years.)

We also discussed SolidFire's integration with OpenStack. They joined the Red Hat OpenStack Partner Network back in June.

Flash-based storage is interesting in the context of cloud computing partly because overall performance typically much higher than rotating media (which, while continuing to increase in capacity, has largely plateaued in performance). However, as the SolidFire graphic below shows, flash can also help deliver predictable performance in concert with appropriate software hooks and management.

(At Red Hat, we've also been working on both quality-of-service and security for multi-tenant environments. We deliver Linux features such as cgroups and SELinux through Red Hat Enterprise Linux, which allows Red Hat Enterprise Virtualization and Red Hat Enterprise Linux OpenStack Platform to take advantage of them. Higher up the stack, Red Hat CloudForms hybrid cloud management provides policy-based quotas and other controls that can span a heterogeneous infrastructure. Those are just a few examples.)