Thursday, June 16, 2016

The end of cattle vs. pets

6830903723 eb2df17454 z

Metaphors and models have finite lifespans. 

This usually happens for one of two reasons.

The first is that metaphors and models simplify and abstract a messy real world down to especially relevant or important points. Over time, these simplifications can come to be seen as too simple or not adequately capturing essential aspects of reality. (This seems to be what’s going on with the increasing pushback on “bimodal IT.” But that’s a topic for another day.)

The other reason is that the world changes in such a way that it drifts away from the one that was modeled.

Or it can be a bit of both. That’s the case with the pets and cattle analogy as it’s been applied to virtualized enterprise infrastructure and private clouds. 

The “pets vs. cattle” metaphor is usually attributed to Bill Baker, then of Microsoft. The idea is that traditional workloads are pets. If a pet gets sick, you take it to the vet and try to make it better. New-style, cloud-native workloads, on the other hand are cattle. If the cow gets sick, well, you get a new cow.

Pets and cattle roughly corresponded to the Systems of Record and Systems of Engagement taxonomy proposed by consultant Geoffrey Moore (of Crossing the Chasm fame). The former were stateful, big, long-lived, scale-up, and managed/maintained at the individual machine level. The latter were assumed to be stateless, small, transitory, scale-out, and managed at the level of the entire application (with individual VM instances destroyed and recreated in the event of a problem).

As an initial pass at distinguishing between traditional transactional apps and those designed along more cloud-native lines, the metaphor isn’t a bad one. I've argued that “ants” is a better fit than “cattle” because it captures the idea that individual service instances are not only disposable but they work together cooperatively to perform tasks. However, the overall concept of long-running mutable instances vs. short-lived disposable ones would seem to capture an essential distinction.

It still does, but as we as an industry continue to evolve DevOps practices and services-oriented architectural patterns for software defined infrastructure and orchestrated pools of containers, the metaphor is breaking down for several reasons. 

State matters

Many of the components/instances of a cloud-native application should be designed so that they are stateless. That is, they should use ephemeral storage—which is to say storage and data that only sticks around for the life of the instance itself. However, no one’s claiming that the data doesn’t need to live somewhere. For example, in twelve factor app parlance, there’s the concept of a backing service which "is any service the app consumes over the network as part of its normal operation. Examples include datastores (such as MySQL or CouchDB), messaging/queueing systems (such as RabbitMQ or Beanstalkd), SMTP services for outbound email (such as Postfix), and caching systems (such as Memcached)."

As we move to containerized infrastructures, the stateful vs. stateless dichotomy becomes particularly important because containers are explicitly designed to be immutable. As Keith Tenzer describes in this post about OpenShift v3 persistent storage: "Docker images are immutable and it is not possible to simply store persistent data within containers. When applications write to the Docker union file system, that data is lost as soon as the container is stopped.”

However, it’s still possible to associate persistent data with containers so that an entire application can be containerized. Keith goes on to note that:

OpenShift v3 supports using persistent storage through Kubernetes storage plugins. Red Hat has contributed plugins for NFS, ISCSI, Ceph RBD and GlusterFS to Kubernetes. OpenShift v3 supports NFS, ISCSI, Ceph RBD or GlusterFS for persistent storage. Kubernetes deploys Docker containers within a pod and as such, is responsible for storage configuration. Details about the implementation of persistent storage in Kubernetes can be found here.

Kubernetes allows you to create a pool of persistent volumes. Each persistent volume is mapped to a external storage file system. When persistent storage is requested from a pod, Kubernetes will claim a persistent volume from the pool of available volumes. The Kubernetes scheduler decides where to deploy the pod. External storage is mounted on that node and presented to all containers within pod. If persistent storage is no longer needed, it can be reclaimed and made available to other pods.

Most cloud-native applications require there to be persistent storage somewhere. While one can assume that it’s provided through a service running somewhere else, a platform supporting the development of complete cloud applications needs to provide persistence mechanisms within that platform.

Virtualization and cloud/software defined infrastructure convergence

Software-defined infrastructure technologies, also made initial simplifying assumptions in other areas such as networking architectures and the maintenance of running instances. Some of this was a sincere, if sometimes naive, desire to dump legacy encumbrances. However, it was also about getting an MVP out the door in a rapidly changing world. 

We’re seeing today the reintroduction of virtualization features required for “enterprise use cases” into projects such as OpenStack. Thus, Neutron (OpenStack networking) isn’t just about flat networking architectures and current versions of OpenStack support live migration of instances whether using shared storage or block-based storage associated with a single image. The fact that many of the same technologies such as the KVM hypervisor in Linux bridge enterprise virtualization and cloud technologies simplifies the bridging of the two worlds. (Of course, it’s probably increased the complexity of OpenStack relative to what it would look like in a purist cloud-only world. Such a purist OpenStack would also likely not be very useful.)

The continued evolution of the new application platform

Perhaps most of all though, the metaphor is breaking down because the idea that there are two canonical application architectures seems increasingly simplistic. 

I’ve already covered how persistent storage is an important component of most modern cloud-native applications. 

It’s also the case that many new applications will indeed be decomposed into lightweight single-function microservices that expose public APIs. However, getting to that point will be an evolution. Martin Fowler, who helped popularize the microservices term, has even argued for a “Monolith First” strategy in some cases, including for projects that are big enough to justify a shift to microservices over time. As a result, blanket statements about horizontal app scalability and disposable services don’t apply universally—even for apps that are greenfield and reasonably considered cloud-native.

Applications also have substantially different patterns that relate to how instances are clustered together using technologies such as Kubernetes (which OpenShift uses as its orchestration layer). Some types of applications are batch oriented in the vein of traditional high performance computing/grid while others are composed from multiple layers of services communication through APIs. There’s also considerable variety not only in the absolute scale of application components being scheduled and orchestrated, but also in the variety of the components (large, small, frequency of scheduling, etc.) and requirements related to quality-of-service, latency sensitivity, and so forth.

In short, while there are certain patterns that we tend to associate with cloud-native applications, there’s also much variety and even divergence in key aspects. Furthermore, it turns out that some traditional enterprise application characteristics such as persistent state and tightly-coupled components continue to play a role even for greenfield cloud apps. 

It’s not cattle and pets out there. It’s a whole menagerie!

Photo credit:

Post a Comment