Friday, February 20, 2015

Podcast: Configuration Manangement with Red Hat's Mark Lamourine

We take things up a level from the prior podcasts about container management in this series to discuss the goals of configuration management and how things change (and don't) with containers, what's the meaning of state, promise theory, and containerized operating systems such as Project Atomic.

Previous container podcasts with Mark:
Other shownotes:
Listen to MP3 (0:16:52)
Listen to OGG (0:16:52)


Gordon Haff:  Hi, everyone. This is Gordon Haff with another disk of Cloudy Chat podcast, here once again with my colleague, Mark Lamourine.
For today, we're going to take things up level and talk about what is configuration management at a conceptual level and some of the ways that configuration management is changing in a cloud and containerized world.
One of the reasons this is an interesting topic today, and it really is an interesting topic--I was at configuration management camp in Ghent Belgium a couple of weeks ago. The event sold out. It was absolutely packed.
All the different configuration management systems out there, enormous amount of interest in this space. The reason there is so much interest is because what has been classic configuration management is changing in pretty fundamental ways.
Mark, you've got a lot of experience as a system admin. You're very familiar with classic or conventional configuration management systems. Maybe a good way to start would be to talk from your perspective as a system admin what is or was configuration management classically.
Mark Lamourine:  It's going to stay. I don't think that's changing. It's finding new places to be applicable. Most people, when they talk about configuration management, they talk about managing the configuration of individual hosts as a bigger system.
Allowing you to create either a portion or a complete enterprise specification for how all of your machines should be configured and then defining that specification and then using the configuration management system to realize that.
You make it so that each machine, as it comes up, joins your configuration management system. Then the processes run on the box to make it fit, to make it configured like your definition, your specification of what that machine should be.
Of the elements of this usually are one controversial one, is whether there's an agent running on each host that listens for changes and there are discussions about whether this is a good or a bad thing and what to do about it.
But the other big thing is that there is some global state definition for what the larger system, the group of hosts should look like and how it should behave.
Gordon:  This gets into a lot of the, again, classic thinking about systems in general, certainly in a pre‑cloud world.
This really applied, whether we're talking physical servers or virtualized servers, that there is some correct state that everything is not only driven toward to start with, but is constantly monitored to keep in tune with that correct truth, if you would.
Mark:  It started out when I was the young cub sysadmin, we'd go and we had a set of manual procedures that started out as things in our head set the network, set resolv.conf, set the host name, make sure time services were running.
It would start off when you only had a short list of these things that you would do and then hand it over, it wasn't really a big deal. You'd go to each one, you install the machine, you'd spend 15 minutes making it fit into your network and then you'd hand it off to some developer or user.
Over time, we realized that we were doing an awful lot of this and we were hiring lots of people to do this, so we need to write scripts to do it. Eventually, people started writing configuration management systems, starting with Mark Burgess and CFengine. That was the origin of that.
There were a number of that during that time. Then CFengine and Puppet became the defacto ones for a while that, as everyone knows, that's changing a lot now. The idea was that we were doing these tasks manually and then when we stop, we started automating them, we were automating them in a custom way.
These people recognize the patterns and said, "We can do this. There's a pattern here that we can automate, that we can take one step higher." That led to these various systems which would make your machines work a certain way. The specifications we had, the settings we had were fairly static. That made a lot of sense.
Gordon:  One of the things that's probably worth mentioning, and this gets into this "pets" versus cattle or models of state of systems is that because these systems were pets, i.e, you didn't shut it down and stored it up with a clean version of the same thing.
Really, you try to keep the running system running properly. One of the traditional jobs of configuration management was to take care of things like drift. As these systems change, again, bring them back to the correct state of truth.
Mark:  In some sense, the pets versus cattle model, was that way of thinking was enabled by the invention of configuration management systems. People look at it the other way now. When things were pets, it was because they had to be.
The rate of change was slow enough that drift was less of an important thing than just not having to send someone to spend an hour to bring a new machine online.
The fact that you could use these things to prevent drift or to drive change over large groups of systems, I think that was a side effect and something that people realized after they started using the tools to stop doing manual labor.
The cattle versus pets distinction is one that was enabled when all of the sudden, you realize...We use to measure the difficulty of working in an environment by the number of machines per administrator.
When I was first starting, it's like 10 to 1 or 15 to 1 was a good ratio because of the amount of manual labor that went into it.
Then with the start of CM systems, 100 to 1 or 200 to 1 in data center environments was a good ratio. Now, you don't even look at that anymore? Why would you? Because you've got thousands of VMs.
You get a system like OpenStack or Amazon, you don't even look at the ratio of hosts to sysadmins anymore. It's become irrelevant. It's become irrelevant because these systems made cattle versus pets possible.
Gordon:  You mentioned Mark Burgess You mentioned this idea of state. Let's talk a little bit more about this. How do you think both state as we move to these containerized cloud‑type systems?
Mark:  I'm confused a little bit. We're finding how this older idea, which made a lot of sense when the machines changed very slowly or relatively slowly, how does that fit when the machines are changing?
The case of a small enterprise, it might be tens of machines started and stopped per day, or hundreds, to something like Amazon or OpenStack, where it's thousands, maybe even thousands per minute. I don't know.
I've seen numbers from Google where they have thousands of machines starts and stops per minute over the entire world. Maybe even that's the wrong scale. The original idea was something where you had something that was essentially stable.
Your machines didn't change. When they changed, it was because you changed them. Again, you had users, who are these other people.
The idea of state made a lot of sense in that context. The idea of a state is static. That's the root of the word. Life has become much more dynamic. We expect change. We expect drift. We expect that our definition of what is correct changes. It changes faster than we can apply it to the machines we have.
We've gone from this idea where I could define a state and the machines would settle on that state, and then using the configuration management system, and then would come along later and we'd tweet the state.
We'd update some packages or we'd change some specification or we add or remove a user to a point where you almost never expect it to settle, you never expect to reach the state that you've defined as your correct state.
You change things gradually or determine eventual consistency. Things will eventually get there, but we're changing the state now so fast that in some senses, if you have this single central state.
You're never going to achieve consistency across the entire system before you change the state again. In that sense, I start wondering whether this state really make sense.
Gordon:  What replaces it?
Mark:  This is where Mark Burgess, some of his work over the last couple of decades, is starting to come into its own. He's a proponent of something called promise theory.
Whether or not the theory holds, there is a kernel of an idea that's really, really important there, which is that...He says this is impossible. He's thinking, this becomes so complex at so many different scales that reaching that state, or sometimes even defining that state, doesn't make sense.
He wants to flip the state definition on its side or upside down. He wants to say, "Let's treat all these things locally. Let's figure out what the little tiny piece is."
The old way would be to say, "I eventually reached some state." What he's saying is that the new piece, you teach it some promises. I promise I will be on the net. I promise that I will serve web pages. I promise that I will take files from a certain location.
You define the promises well down in the scaling. You try and define a system based on, "If all these things fulfill their promises, then some desired behavior will come about at a much higher scale." I'm not yet convinced that this is an engineering model.
This is one of the things that I've talked to you about it and I've talked to a couple of other people about it, that this is a great idea, I like this. What I don't know is how to do engineering with it yet.
We'll see whether or not there are people who are ignoring the state using...Some of the newer configuration management systems, some of them have state built in like Salt does. Ansible really doesn't. Ansible really is more about applying changes to something than reaching a certain state.
There is fuzziness in all of this, whether or not when it's true or not. People are starting to recognize that this is a problem, and people are starting to find ways to define the behaviors of the system without necessarily defining the low level states one piece at a time.
Gordon:  That's probably a pretty good segue to bring this particular podcast home. As I mentioned, that was a config management camp again a couple of weeks ago, huge amount of interest in Chef, in Puppet, in Salt, in Ansible, in Foreman, in CFengine.
Maybe we could close this out with some comments about some of the different approaches being taken here and some of your thoughts and some of these different tools.
Mark:  The first thing I want to say with respect to that is that while I describe this fast‑moving dynamic environment, there are lots of companies that are still and will continue to run in a more conventional environment for a long time.
I'm not saying, that these configuration management systems are, in any sense, obsolete. They still have a place, because the environments that they are designed for still exist.
That said, there are several different things that seem to happen. One is push versus pull model. You get systems like Puppet, which are strong push model. You get something like CFEngine, which uses a strong pull model.
In both cases, they have had to create feedback mechanisms, which really are the other one, which leads me to believe that push versus pull is probably a straw man, that there probably have to be feedback loops in both directions regardless of which emphasis you take.
Then you get the agent version versus agentless discussion. There are people who would say, "Adding this new thing that runs on each host that listens for changes is an overhead, which isn't really necessary." The strongest proponent of an agentless system that I've heard of is Ansible.
Ansible uses SSH, which is in some senses, its agent. Then the SSH login triggers some Ansible behavior on the host. Again, I this is a muddy distinction.
But it's fair that this additional agent doesn't run in Ansible's case but also Ansible, it seems to me, defines more the means of creating the state while ignoring the state engine itself. I'm probably going to get hate mail and corrections for that. Corrections are welcome, hate mail, not so much.
These are the distinctions that are there now. There are still people now who are looking at the cloud environment, and they're looking at these configuration management systems in trying to figure out how to use them. They're still trying to apply them in the same way. I'm a little suspicious of that as well.
I'm interested in seeing how configuration management systems get used in an Atomic environment [Red Hat Enterprise Linux Atomic Host] or in a CoreOS environment or a minimalized operating system environment, where the whole point of that is to eliminate the need for this configuration management and where they move the configuration out to the containers.
Put a container here, put another container here, make the containers work together, that's what the configuration management system would have done. Now we got orchestration systems doing that for the most part.
I'm interesting in seeing how this evolves, whether their conventional system administration systems, how they fit and how people end up using them and whether or not they turn out to be more or less useful than they would be in a conventional environment.
Gordon:  If someone wants to learn some more about this stuff, what do you recommend?
Mark:  First is to look at the various configuration management systems, largely avoid the hype. There are people who are advocates who are not somuch  pundits. I'm skeptical of people who will say, "This is the right way. This is the best way." If you wanted to learn about promise theory, certainly Mark Burgess's books are on that. Mark's the only person I know who is publishing in an academic sense.

This is one of the things I'm personally interested as system administration something worth of academic study. Mark is the only person I know who's doing that in publishing.

No comments: