Tuesday, August 25, 2020

Red Hat's William Henry on grid, containers, and orchestration

William Henry is a senior distinguished engineer at Red Hat who has been involved with open source since Slackware. Part of this involvement included the development of open source grid software such as Condor.

In this podcast, we discuss:

  • How supercomputing has evolved
  • What grid software was designed to do
  • How container orchestration and grid did and do relate
  • Why containers for HPC

[Transcript]

Gordon Haff:  This is Gordon Haff, technology evangelist with Red Hat, for another episode of the "Innovate @Open" podcast. Today, we're going to talk about containers, grid, high performance computing, container orchestration and more.

I have with me my colleague, William Henry, who has a whole lot of experience in this area.

William, why don't you introduce yourself?

William Henry:  I am William Henry. I'm a distinguished engineer at Red Hat. I had multiple roles with Red Hat over my last 12 years here. One of the exciting times that I had was as Red Hat were transitioning from essentially a Linux platform company to an expanded role within enterprises and adding value on top of Linux with some other open source projects in the area of messaging and real time, Linux, and grid.

Gordon:  Let me start this off with a little bit of context setting because the term grid has been used throughout computing history for a number of different things, including, for instance, some of the early peer to peer computing.

To be clear, when I'm talking and we're talking about grid here, we're really referring to it in the sense of high performance computing, resource management, and scheduling.

William, can you take us through some of the history of this technology?

William:  From my perspective, it was a breakaway from the whole large supercomputer based approaches. As very large organizations either in the defense field or medical field or whatever, required huge amounts of computing.

As they started looking at the costs of supercomputing and noticing the pricing on the cheaper versions of this around distributed computing and they saw the growth of distributed computing along with networking technologies, etc., there was a shift from...there are certain problems we can try to solve by putting a lot of different processors to work across a network to solve problems.

That proved to be a very effective approach particularly within the areas of things like seismic analysis in the oil business or scientific analysis in the medical field, weather related stuff, and also in other areas of the military, and research areas. Then we saw move into things like the movie animation and CGI world, as well.

It was trying to solve a number of different problems. One was that distributed supercomputing problem. Then there was more about high throughput computing, on demand computing, which was more similar to what we're seeing now in some of the container world. Data intensive computing, and then also collaborative computing as well, working with scientific teams.

That's where I see it coming from. It's an evolution out of supercomputing to more cost effective at larger scale initiatives, and then moving from the scientific world into a more commercial based environment.

Gordon:  If you look at the list of the large supercomputers over the years, one of the things you saw, as we went from a world where these vector supercomputers dominated by the likes of Cray to network computers. Initially, these were often proprietary interconnects, or at least there was a real mass of different interconnects.

Large RISC systems and increasingly, though, you still do see RISC systems there as well, these are "commodity x86 systems" connected by a couple of standard interconnects, specifically Ethernet and InfiniBand.

William:  It was funny. I remember on a visit, it was 10 years ago, down to New Mexico to see a DreamWorks based rack system that they were using for doing animation. Seeing it in the same ex Intel prefab room, massive room in the corner beside the New Mexico supercomputer. The DreamWorks' one was much larger.

Gordon:  We fast forward to, I guess, maybe five years or so ago, containers are coming in    now containers, of course, being a technology that's been around for a long time but containers really becoming this very important technology for computing broadly.

Docker at the time was useful for developers on their laptops and small systems, but if you were going to have applications made up of hundreds, thousands of containers, there needed to be some job scheduling system, resource management system, what have you. As we all know, Kubernetes has become the dominant orchestration technology.

When this whole thing was jumping off, in fact, it wasn't clear that there was just one layer there because you had this orchestration layer as we tended to call it, that was Kubernetes. Then there was also this idea of resource management, which wasn't really quite the same thing. Can you talk a bit about that?

William:  This was one of the things I noticed early on. Although there was a lot of parallels to grid, and we're seeing those parallels moving much more closely together now that we're several years into the container technology based stuff, originally, you could see some differences.

With the large grid computing efforts, they were focused more on having a well defined cluster of resources that were quite often hardware based. Later on, we saw some virtualization and stuff coming into that, but for the most part, I have a large cluster of hardware, I can add it to that to cluster, etc.
Although that might be difficult depending on the grid technology I'm using, but really, I want to try to share those resources at the start within one organization, and then more broadly to multiple organizations.
You had the idea of trying to give a certain amount of resources, let's say, 20 percent of the resources to this particular research field, and another 10 percent of the resources to that field. Of course, originally, it would be just one group. I'm running this huge big job, I'm going to take over the supercomputer today, and do it.

Then it was like, hold on, how do we divvy it out, and provide fair shares between each of the different groups, while at the same time, building the different organizations as well. It started getting very complicated.

Most of the jobs, when you think of it, were very batch oriented. They were like, "I need to take over the grid today to run this particular large workload of batch computational stuff, or data intensive analysis work." Then, they evolved even more to, "I could set up a workflow of this work."

It's getting very complex with resource management, fair shares, costing back to different departments, because these were expensive resources. The idea of, data versus high computation, and also how do we get workflow into this.

That contrasted with what we saw originally in the container world which was essentially using a cluster of hardware completely abstracting that away into abstract concepts of CPU percentages, or whatever, or memory percentages, and trying to keep the workload up.

It was often about, "I want to be able to run this application which might have been a website, etc. I want to run it for forever. By the way, can you scale it up a bit on high demand and scale it back down again?" It was a different kind of workload on a similar concept of a cluster.

Back then, originally, it would be a very homogeneous set of machines. Then as different university resources were saying, "I could add our departments' computers over onto this grid technology for use." Then it started getting complex, in terms of, describing, "This is a particular type of chipset base. It's got this operating system running on it."

It also became very complex into, what type of workloads, where are you going to schedule. In other words, "I have this huge pool of resources, but my workload needs to run on Linux, or it needs to run on Solaris, or it needs this particular type of hardware base."

It became very complex in terms of the matchmaking efforts that were going on to get things up. This machine has GPUs. This machine has a desktop. Are we adding our desktops at night to the grid?
During the day, the animation artists are working on these machines, but at night, we don't want them idle. Are these Windows machines versus Linux machines? There was a huge amount of effort going into doing this crazy amount of matchmaking that the resource manager would have to do.
There was matchmaking. Then there was this idea of giving out and saying, "I'm going to give you a claim on that particular machine at this sort of time," or there was booking machines in advance kind of concepts.

It became very complex. Originally, there was very expensive applications that were doing this Platform LSF or things like that. Then there was efforts to try to do open source with taking an academic project like Condor from up in University of Wisconsin.

They had a Condor project. Red Hat helped open source that. The more we looked into it, we saw just, "This was fantastic stuff, hugely complex matchmaking capabilities." It was pretty amazing. We learned a lot on that. Then, of course, you had the container revolution.

Gordon:  Yeah. I love that. There was this huge vision around grid computing around 2000 or so where people imagined this kind of computing everywhere type of thing, as you say, peer to peer computing. Intel was very big on this. The Intel CTO at the time referred to ppeer-to-peer possibly going to be bigger than the Internet.

However, to come back to our earlier discussion, a lot of this ended up more consolidating into job scheduling across a large, fairly fixed cluster of computing resources. Fast forward, back to containers.

William:  Probably 2013.

Gordon:  Containers are getting big. They're becoming more widespread. You had Kubernetes coming onto the scene. You had a variety of other open source resource management projects out there, some of which were basically competitors to Kubernetes, others of which were probably closer to next generation HPC resource managers of various sorts.

Yet, today, we seem to have largely decided we're just going to do Kubernetes.

William:  Yeah. I was in the middle of this transition and caught a little bit on the wrong side of it at one point. Let me explain.

Having come from this grid technology background with Condor and working with some very large clusters    we're talking 25 30,000 nodes or whatever, doing massive amounts of work across multiple different movies and stuff in the animation world, for example    the container technology starts coming on. We already had our own scheduler.

Remember, when we started this with OpenShift and Red Hat, we had already started using containers, but they weren't the OSI model or the Docker model back then. They were just using raw Linux containers our way. We call the ability to start those cartridges. Then we had a technology called GearD, which was to schedule them.

It was all open source stuff, because that's all we do at Red Hat. It was our open source projects that we were hoping to get some backing on.

Very quickly, we saw the Kubernetes community jump on board with the Docker format at the time. Then Google introduced Kubernetes. We saw it. We saw similarities between its approach and the GearD approach.

We saw some advantages that it has concepts like the pods in Kubernetes that seem to make sense. We decided to jump on that.

Now, that technology was based on the use case that I mentioned earlier, which was around, how do we get developers building apps that are very consumer focused applications, web based applications? How do we get those up and running and keep them running in clustered environments of Linux resources?
That was our approach. When we saw Kubernetes, we jumped on board with that.

There was another technology around at the time that was popular, another project    an Apache project, it's still around    called Mesos, which had a bunch of frameworks on top of it that were looking at the ability to schedule workloads.

Also, at the same time, you had things like the Hadoop environment which were doing data intensive work.

There was a struggle at the start around which workloads are we going to be supporting. Can we support within the Kubernetes world this concept of very much, almost a batch oriented work, the workloads we were talking about earlier on the grid? There was a lot of tension there.

The tension was, where do we want to put our resources? Remember too, at the same time as Kubernetes is going, Docker themselves are introducing this concept of Swarm. Its project called Swarm, that it's some other juggling around as well. That would confuse people, didn't help them. I don't think a lot.
Kubernetes has started. Mesos is taking off. At that time, a Mesos world or Mesos fest or whatever it was called was much bigger than anything that was going on in the Kubernetes world. People were super excited about it and all of the different frameworks on top of that for doing the data intensive workloads.
Mesos is also looking at what Kubernetes is doing around clustering. There was a, "Where do we want to put these resources? Where do we want to put our engineering resources?"

Kubernetes is also looking at batch work. Remember, these batch workloads we talked about earlier are very complex. They're in what I would call workflows.

In other words, let's do this amount of analysis, thousand CPUs doing this analysis. If it was all successful, then scheduling the number of workloads needed to do to work on the output of that in the next step to doing the next step of a workflow, etc., then consolidating some results at the end. That whole workload stuff didn't have a place in Kubernetes.

They had the concept of getting an application up and keeping it up and scaling it up and down as needed, but essentially always having it running. This whole idea of the replication controller to make sure that we have this resource available.

Whereas if you think of it, the Kubernetes concept of a job was essentially saying, "Hey, let's run this, but it's OK if it dies. You don't have to keep it up." In other words, let it run to the end. If it dies, don't try to restart it because it's a job and you expect it to die. That's about as complex as it was getting on that.
At that time the Kubernetes community, and certainly Red Hat, decided we needed to not focus on the batch state analytics world. We really needed to win the battle within the Kubernetes world. Kubernetes was going to be the future.

At the time, there were certain large customers that we were looking at. We were looking at the Mesos world of numbers, trying to propose that we try to look at both of those. This is where I got caught on the wrong side of it. There was a certain amount of consensus to maybe we should support both Mesos and Kubernetes.

Thankfully, the folks on the OpenShift side of our house decided, "Nope. Kubernetes is it. Kubernetes is the future."

Kubernetes won at Red Hat and as we've seen, Kubernetes has won in the marketplace as well.
There's still a lot of workload that we can talk about that needs to be supported or is beginning to be supported from that grid side of the house. That old high performance computing side or our data intensive computing side, or whatever you want to call it. Still a lot of work to be done on that, but Kubernetes is going to happen there.

Gordon:  Why might you want to use containers in HPC?

William:  This is why I think you're going to see a much more increase, or you are seeing a larger increase of this in the HPC world. You can look around it from the AI machine learning perspective, which is where a lot of this is falling right now. But when you think of it, that's really sometimes complex algorithms on data. It's AI ML stuff is where those workloads are falling right now.
What happens is, with the container side of this, I get a much more flexible...All the matchmaking that we have to do before becomes a lot more simpler. It's all Linux for a start. It's all going to be running on an abstracted way, very homogeneous world of compute.

Let's look at this from a container perspective now, similar to what we did before. Remember, these scientists before were writing an application that they knew their environment, what their environment was. It was the job of the complex matchmaker to figure out how to schedule the jobs on the right machines. The application developer didn't care too much, because they were writing to their chosen platform.

In this container world, everyone's writing to the same platform, they are writing into Linux containers. There's a much greater pool of developers who are out there working on this. You can containerize your application and, therefore, provide more amount of agility, in terms of, updates around that.
The scheduling becomes very simpler, because really, all you're looking for now is the CPU memory type of slices that we're looking at. Maybe GPUs, depending on the type of workload. There's been a lot of work with the Kubernetes community around GPUs.

The idea of containerizing it allows the developer to have a lot of freedom, and it provides a much more homogeneous world for the scheduling of that workload. You're seeing then things like Open Data Hub, which are putting their machine learning and AI algorithms inside of the container world to run on this Kubernetes cluster.

Gordon:  One of the things we're seeing here is really part of a long ongoing story which is enterprise computing and HPC, which used to be totally different worlds, have been coming closer and closer together really for the last two three decades or so.

William:  It's fair to say, the world we're living in now where you bring the hybrid cloud infrastructure, along with containers with AI ML open source projects and all that good...It's like bringing the supercomputer to the masses.

Before, this was in the realm of large government, certain massive well funded research labs, or biotechnology labs, or whatever it might be. Now, if I'm a small company somewhere, I can bring up a number of resources on a public or private cloud, and run a workload on an open source AI ML. Suddenly, I have a supercomputer, just consumable on an hourly basis or whatever, at my fingertips.

Now, any department in an enterprise can quickly spin it up.

What I think is amazing is the speed, the type of effort and work that was done around this to get your workload up years ago on a grid computer. It was a large project and a huge effort. Now you're hearing about the campaign website approach to supercomputing problems.

I've been a little bit facetious there, but the idea that a small funded project can spin up an AI ML learning project on something. Then we're seeing it right now with the COVID 19 crisis, where everyone seems to be coming bringing to bear the grid or the Kubernetes concepts to bear on this problem at very low cost.

Gordon:  Maybe to close out, what do you see as next steps for Kubernetes orchestration in order to continue to accommodate these other kind of workloads which came from a different direction.

William:  All of the pieces are there, and certainly the knowledge is there across the communities that are involved in this thing. You're seeing the AI ML space. We know about distributed computing, we know about all of the different parts of this. The computer science part of this is already there in various areas.

What I'm seeing is the next step is streamlining. We're not at the automatic workflow from beginning to end with this. We're still pulling together some pieces.

Now, some enterprises have done this already, but, in terms of the mass model for this problem, out on the IoT devices connecting into your edge devices. Being consumed on the enterprise platform across hybrid cloud infrastructure, when all the scheduling, and the billing, and all that good stuff. All the pieces are there, but we still are doing these as projects that are different in every deployment at the moment, but it's becoming more similar.

Open source initiative that brings it all, obviously, I personally think it'll be in the Kubernetes community, but that brings that whole streamlining of this together in a cohesive manner as opposed to, let's pick this project here, and that project there, and then let's all work together and trying to connect them all.
It's great that it's open source, and it's great that they're open API. We can do it and it's going to be low cost. Great, but wouldn't it be nice to have that tie down for everybody right now.

Is that where you're seeing it too, maybe, Gordon?

Gordon:  Yeah, and you mentioned Open Data Hub earlier, and that's a good example of pulling all of this open source goodness really together, and trying to make it more consumable.

William:  I still think that the workflow tools, it's probably another area of the framework that needs to become more mainstream as well. Not just running AI ML workload, but the idea of being able to plan out the different layers in the workflow. I'm sure there's projects that I've lost track of in that area too. Probably in Data Hub.

That, plus the seamless console that allows you to see where data is coming from into the process. Instead of the gathering and the analytics being done on a, you know, "Let's gather, and now let's bring it over to this." If that workflow is streamlined too, where the gathering or the data is almost as CI/CD or DevOps y as the backend enterprise stuff is at the moment.

We have the applications in CI/CD but it's almost like we need to connect it all together from the data gathering. It's scary, but it's also that higher level view of it all into one pattern would be really useful.








No comments: