Connections: August 2019

Thursday, August 29, 2019

Podcast: OpenDataHub brings together the open source tools needed by data scientists

[click diagram to embiggen]

OpenDataHub is a reference architecture that brings together tools for AI and data work. In this podcast, I talk with Red Hat's Steve Huels, Sherard Griffin, and Pete MacKinnon about the motivations behind OpenDataHub, the open source projects that make it up, how OpenDataHub works in concert with Kubernetes and Operators on OpenShift, and how it reflects a general trends towards open source communities working together in new ways.

Show notes:
opendatahub.io - An AI and Data Platform for the Hybrid Cloud

Podcast:
MP3 - 22:11

Transcript:

-->

Steven Huels: I am responsible for the AI Center of Excellence at Red Hat.

Sherard Griffin: I am in the AI Center of Excellence as well, responsible for the OpenDataHub and the internal running of the data hub at Red Hat.

Pete MacKinnon: Red Hat AI Center of Excellence, helping customers and ISVs deploy their machine‑learning workloads on top of OpenShift and Kubernetes.

Gordon: We heard "OpenDataHub." Let's tell our listeners what it is.

Sherard: OpenDataHub, that's an interesting project. It's actually a metaproject of other community projects that are out there. It basically is responsible for orchestrating different components that will help data engineers, data scientists, as well as business analysts and DevOps personnel to manage their infrastructure and actually deploy, train, create models, and push them out from for machine learning and AI purposes.

What we tried to do is, take things that are being done in the community through open‑source technologies. Wrap them up in something that can be deployed into Red Hat technologies such as OpenShift, and leverage some of the other technologies like SAP, JupyterHub, and Spark. Make it more easily and accessible for data scientists and data engineers to do their work.

Gordon: I'm going to put an architecture diagram in the show notes. Could you, at high‑level, describe what some of the more interesting components are, what the basic architecture is for OpenDataHub?

Sherard: One of the key things that you'll see when you deploy OpenDataHub inside of OpenShift is that it solves your storage needs. We collaborated with the storage business unit at Red Hat.

We allow users to deploy Ceph object storage that allows for users to be able to store their massive amounts of unstructured data, and start to do data engineering and machine learning on top of that. Then, you'll see other popular components like Spark where you're able to then query the data that's sitting inside of Ceph.

You can also use Jupyter notebooks for your machine‑learning tools and be able to interact with the data from those perspectives.

That's three high‑level components but there are many more that you can interact with that allow you to do things like monitoring your infrastructure, being able to get alerts from things that are going wrong, and then also doing things like pushing out models to production, testing your models, validating them, and then doing some business intelligence on top of your data.

Gordon: We've been talking at an abstract level here. What are some of the things that our listeners might be interested in that we've used OpenDataHub for?

Pete: There's a variety of use cases. It is a reference architecture. We take it out into the field and interact with our customers and ISVs and explain the various features.

Typically, the use cases are machine learning where you have a data scientist who is working from a Python notebook and developing and training a model.

ETL is an important part of machine learning pipelines. This part can be used to basically pull that data out of data lakes that perhaps are stored in HDFS and Hadoop and put into the model development process.

Sherard: More broadly, the OpenDataHub internally is used as a platform where we allow users to import data of interest themselves and just experiment and explore data.

Whether they want to build models and publish those models, it's really an open platform for them to play with an environment without having to worry about standing up their own infrastructure and monitoring components.

More specifically, we use it for use cases that ultimately make their way into some of our product focus. We're experimenting with things like how do we look at telemetry data for running systems and predict future capacity needs. How do we detect the anomalies and then drill down to the root cause analysis and try to remedy those things automatically?

These are all some of the use cases that we're using the OpenDataHub for. We feel a lot of these, obviously, have resonated with our customers, and they mirror the use cases they're trying to solve. Those are just a couple of the ones we're doing internally.

Gordon: We've been talking about the internal Red Hat aspect of OpenDataHub. Of course, this isn't just an internal Red Hat thing.

Sherard: Correct. As Pete mentioned, OpenDataHub is a reference architecture. It is open source, and we use it as a framework for how we talk with customers around implementing AI on OpenShift.

There's a lot about OpenDataHub in the way it's been broken apart and architected that it hits on a lot of the key points of the types of workloads and activities all customers are doing.

There's data ingestion. There's data exploration. There's analysis. There's publishing of models. There's operation of models. There's operating the cluster. There's security concerns. There's privacy concerns. All of those are commoditized within OpenDataHub.

Because it's an open source reference architecture, it gives us the freedom then to engage with customers and talk about the tool sets that they are using to manage their use cases.

Instead of just having a box of technology that's maybe loosely coupled and loosely integrated, we can gear the conversation toward, "What's your use case, what tools are you using today?" Some may be open source, some may be Red Hat, some may be ISV partner provided. We can work that into a solution for the customer.

They may not even touch on all of those levels that I discussed there. What we've tried to do is given all encompassing vision, so we can build out the full picture of what's capable and then solve customer problems where they have specific needs.

Gordon: Again, as listeners can see, when they look at the Show Notes, there is an architecture there. For example, there's a number of different storage options, there's a number of different streaming and event type models, there's a number of other types of tools that they can use. Of course, they can add things that aren't even in there.

Steven: If they add them, we'd love for them to contribute them back, because that's the entire open source model. We use the OpenDataHub community as that collection point for whether you're an ISV, whether you're an open source community. If you want to be part of this tightly integrated community, that's where we want to do that integration work.

Gordon: That's what the real value in open source and its open source and OpenDataHub is. It does make it possible to combine these technologies from different places together, have outside contributors, get outside input in a way that I don't think was ever really possible with proprietary products.

Pete: That's where it really resonates with Red Hat customers. Is they finally see the power of open source in terms of actually solving real use cases that are important to their businesses. All the components are open source, the entire project, open source, OpenShift itself is open source. It's an ethos that infuses everything about OpenDataHub.

Sherard: I would add to that. One of the interesting points that both Steven and Pete brought up, is how customers have gravitated towards that. A lot of that is because we're also showing them that, hey, you've invested in Red Hat already or you've invested in RHEL, you've invested in containers.

In order for you to get that same experience that you may see from an AWS, SageMaker or some of their tools, their cognitive tools, Microsoft's cognitive tools, you don't have to go in and reinvest somewhere else.

We can show you through this reference architecture how you can take some of Red Hat's more popular technologies and use those same things like OpenShift, like RHEL, like containers and be able to have that same experience that you may see in AWS, but in your own infrastructure.

Whether that's something that's on‑prem or whether that's something in the hybrid cloud or even an OpenShift deployed in the cloud, you're able to move those workloads freely between clouds and not feel like you have to reinvest in something brand new.

Gordon: Well, the interesting things about OpenDataHub and I think is also an interesting thing about OpenShift, for example is, over the last few years, maybe, we've really started to see this power of different communities and different projects and different technological pieces coming together.

Certainly, Open Source has long history. But with Cloud‑native in the case of OpenShift, with the AI tools coming together in something like OpenDataHub, we're seeing more and more this strength of open bringing all these different pieces together.

Sherard: Yeah, absolutely. OpenDataHub first started out as an internal proof point. How can we, with open source technologies, solve some AI needs at Red Hat? What we quickly understood is, there's more of a life cycle that machine learning models have to go through. First starting with collecting data all the way through to a business analyst and showing the value of a model.

That allowed us to map out all the different parts of the life cycle and then start to figure out, "How can we introduce Open Source Technologies at each stage of that to help the process along?" As we've discovered what those processes are, we've internally deployed those into our own systems.

We're working towards getting a more robust system that solves our own problems. As we do that, we share that with the broader community saying, "Hey, here are the open source tools we did for each part of the life cycle of a machine learning model and here's how you can do the same thing."

Gordon: And even really goes beyond that. We're here at Boston University at defcon.us conference. For example, there's a lot of work being done on the research side with Red Hat and BU, for instance, on privacy preserving AI techniques. That's too much to get into in this podcast, but that's part of the whole mix too.

Steven: Privacy, obviously, with a lot of the trends that people have seen with some of the major companies out there, is a very hot topic. There's a lot that Red Hat has already, historically been doing in the space. Whether it was looking into things like multi‑party computing to preserve the anonymity of certain data sets, that's something we've been doing for quite some time.

There's other things we're looking into too, things like differential privacy. How do we allow access to data for analysis from multiple parties while still preserving that anonymous component? Then even beyond that, we're starting to look into things like data governance.

What exists in the open source world for data governance? How do I adhere to and maintain my GDPR compliance? These standards are only going to continue to emerge as more and more data gets collected on people. They're very hot topics. They are things that Red Hat is actively involved in and has a voice in going forward.

Gordon: Outside of Red Hat, what are we seeing in terms of interest aand doption of not just the individual technologies but OpenDataHub more broadly?

Steven: If I even pull that a step back, the reason why a lot of this technology now is taking off the way it is, is because industry has readily adopted a lot of the open source standards. They started to expect the open source frameworks to support their use cases.

It's not enough that you simply have a single component that can deliver one piece of value. You want an integrated suite that solves a whole myriad of your problems. In doing so, there's a natural correlation and integration that has to occur.

That's being done in pockets in different areas. They're solving maybe niche use cases. When you look at something like OpenDataHub, it's actually crossing the spectrum and the boundaries of what it takes to operationalize the solutions to these problems.

Historically, a lot of these problems were solved by individuals who could do something on a very high‑powered work station on their desk, but they never made their way into production. The value companies were to get from them was very, very limited. They made great PowerPoint slides, but they never really delivered any value.

Companies now expect that value to be delivered. The OpenDataHub and that type of framework is what allows for something to put be put in operations and then maintain, like you would any other sort of critical application.

Gordon: I think the other thing that happened is if we looked at this space a few years ago, it almost looked like people were looking for that magic bullet, like Hadoop for example, "Oh, this is going to solve everybody's data problems."

What we're seeing is you need a toolkit. You don't necessarily want the toolkit that you have to go all over the vast Internet and assemble from scratch and figure out what projects do the job and which ones are works in progress. OpenDataHub kind of reaches that, it would seem.

Pete: In recent talks I've started off the talk talking about the AI Winter which has been this notion of this cycle of enthusiasm and investment by companies and other institutions in artificial intelligence and machine learning, only to ultimately see it fall by the wayside, fall apart.

Everything has changed now. I think open source is a big component of that change because, as Steven was saying, these various individual component projects like JupyterHub, they're their own communities, but those communities are starting to interact with each other in various ways.

What's been missing is an integrated suite like Steve was talking about. That's what we're trying to do with the OpenDataHub, something that provides a comprehensive AI machine learning platform to defeat the cycle of AI machinery winters that come and go.

Sherard: I would also say, what we've seen when we talk to customers is that they're all at so many different stages of their AI journey. Some of them are at the very beginning. They just want to know what AI is and what that means to their organization. Some of them are at the tail end where they've developed models, and they don't know how to productionalize it.

One of the things we're able to do is take the OpenDataHub as a grounding moment for us to all have the same basic conversation. Then we can start to talk to them and say, "Hey, if you're just starting out, here's something you can start out with, or if you're ready to productionalize it, the OpenDataHub can help you conceptualize and have a reference architecture for how to productionalize it.

It allows us to just have conversations with our customers to let them, first, understand that we know the problem because we're doing it internally, ourselves. But then also, as we work with other customers and get more information about what they may want to do with governance, with security, with auditing, all these other things that happen before you go push it to production.

How can we go about it in a broader community sense where we're tackling all of this together, and we're really pushing the envelope, where everybody's starting to contribute and we're getting feedback from the customers, but we're also providing guidance?

Gordon: How does a new person or organization typically get involved, whether it's OpenDataHub specifically, but these types of programs, what's your general recommendation?

Pete: Having served in various open source communities, it becomes readily apparent about best practices for getting involved. There's typically a lot of enthusiasm, somebody identifies a project that they think is going to be great to work on.

It's very important to approach the community and understand how the community conducts itself. Typically, open source communities these days have exactly that, a code of conduct. There's best practices around things like GitHub in terms of forming a pull request if you have a new feature or bug request to help the community.

Also, there's modern chat applications like Slack, and Google Chat, and things like that where these communities form around. It's always good to come into those arenas hat in hand, as it were, and be humble about what it is you're trying to do. Ask questions, listen to the conversations, and build up value and present that value back to the community.

Gordon: You're saying that if I want to get involved, I shouldn't just join the mailing list and tell everyone they don't know what they're doing?

Pete: I wouldn't recommend that, no.

Gordon: How would you describe, very high level, the state of OpenDataHub today and what we should expect to see for next steps, what should our expectations be?

Sherard: The state of OpenDataHub, it's ever‑evolving. We're meeting with customers, understanding what their use cases are, trying to see how do we solve the use cases with something with a reference architecture like the OpenDataHub and see where the gaps are.

If you look at when we first started this earlier this year, and we first put out our first Operator that deployed the OpenDataHub...

Gordon: Operator?

Pete: Talking about OpenShift and Kubernetes, Operator is a powerful new paradigm where you basically encapsulate application lifecycle for particular components. That component interacts with the Kubernetes and OpenShift API to do full lifecycle management of that component. That's the 50,000‑foot view.

Gordon: Yeah. To add a little bit to that, having seen a workshop yesterday in this topic, then ideas give the operator. It couldn't install to your Kafka eventing system, Spark cluster, your Jupyter Notebooks, and for something that has law components like OpenDataHub.

The idea is, it's as if you had an expert on the system come in and spend a couple hours installing things for you.

Sherard: Yeah, that's exactly right. That's why we gravitated towards operators pretty early. The OpenDataHub is an operator, a meta‑operator. It even deploys other operators like Kafka Strimzi. We're also working with the Seldon team. We're going to be looking at integrating some other of our partners into that ecosystem.

What I was getting at is, where we were earlier this year was, the OpenDataHub was really focused on the data scientist use case trying to replace the experience of all of your data scientists across an organization doing work on their laptops.

Certain people may have different components installed. Everyone's doing pip installs with different versions. You have all kinds of dependencies that are very specific to that data scientist's laptop or workstation.

What we tried to do is solve that by introducing this into Kubernetes so that we have a multi‑user environment in OpenShift, so that everybody has the same playing field, every user's using the same suite of tools. They're using the same suite of dependencies and same versions of packages so it makes it easier to collaborate.

Once we did that, the next step was to start to introduce more of a management of your machine learning models. Now, we've introduced Seldon where you can actually deploy your model as a REST API. Then, we also introduced Kafka for data ingestion down into your object storage. We also had the ability to query the data using Spark.

Coming down the pipeline and the next month here, we're going to be introducing tools for the data engineer. What we're doing is looking at how do you catalog your data that's stored in the object storage. This is Hive Metastore but we're also introducing technologies on top of that such as Hue, which will allow you to be able to manipulate the data before the data scientists even get there.

The reason that we decided on that is because we all know that before you do machine learning, data just doesn't come in cleanly. It's not perfect right out the gate. We knew that there was a step missing in enabling data engineers to massage and clean that data before the data scientists got ahold of it.

Then, down the pipeline after that, we're looking at BI tools but then also, there's going to be more governance. We're looking at tools that might help out such as Apache Ranger, Apache Atlas. We have a number of people that are contributing in that space.

We're looking at how can we introduce more cohesive end‑to‑end management of the platform. You'll see more of that as we move along in the next few months here.

Gordon: Where could someone go to learn more?

Steven: Opendatahub.io is the community site. You'll find a number of listserves if you want to stay in the loop. If you want to get involved, you can sign up and we can pull you into the various workstreams.

Friday, August 23, 2019

William Henry on open source innovation, the role of standards, and consuming software

William Henry is a senior distinguished engineer at Red Hat who has been involved with open source since Slackware. In this podcast, William talks about some of tensions in the open source world, including those between innovation and standardization, which have taken many forms over the years.

Show notes:

From Pots and Vats to Programs and Apps: How Software Learned to Package Itself (free ebook)

Podcast:

William Henry [MP3 - 31:49]

Transcript:

-->

Gordon Haff: I'm very happy today to have with me a co‑author of mine, a frequent collaborator in many things, open source. That is, William Henry, senior distinguished engineer at Red Hat. Why don't we start with some background in your part? How did you get involved with open source?

William Henry: It was probably two different areas going on at the time. It would have been late '90s and then early 2000 when I was working in the whole distributed computing space and CORBA and JEE.

I was on one side obviously experimenting with Linux. I had downloaded Slackware probably in '97 and was playing with this at home on an old IBM Aptiva.

At the same time at work, I was seeing a lot of open source initiatives in the CORBA space for example and also in JEE or J2EE at the time or JBoss. Then of course there was a lot of other projects like Fuse and Camel that were coming out on the SOA side for things like messaging and Web services.

I was getting involved somewhat with those but not maybe as much hands‑on but more as a user and a person who was advising folks the new upstream community from companies I worked with.

Then of course, I joined Red Hat in 2008 and that was a completely different level of experience with open source, complete explosion. It was like drinking from the firehose. Because I'd been working so far up the stack, on the SOA side you almost assumed that a lot of thing were already done in operating system.

We had lots of cool things like Solaris, IBM AIX, and HP‑UX, those sort of things on the Unix side surely, all the real innovation was kind of done.

Perhaps Linux was trying to catch up, in some ways, to some of the innovations on those platforms, but there wasn't any real innovation going on.

Of course, I discovered very quickly there was massive innovation in terms of real time, then containers come along, and we still see that we're innovating quite a lot of Linux platforms while there's an explosion of technologies on top of that, as well.

Gordon: That seems to be one of the watershed changes that's happened with open source over the last, maybe 10, 15 years. But, even more so recently is that open source software, Linux, various message buses, and things like that. Really, what their "innovation" was, initially, much lower cost. That was really a disruptive factor of open source.

The big change that we've been seeing over the last 10 years, and I'd argue it's accelerated in the last 5, is open source has become where much of the innovation in the tech industry is happening.

William: Yeah, I think it's still a combination though. In other words, yes, there's an explosion of innovation, particularly if you look at gravitating around the Linux platform. In many ways, some of the innovation has already been invented before, and we've just come up with different flavors of it. Less expensive flavors of it.

The whole Linux‑based platform, be that everything up front, the Linux kernel to things like containers or virtualization of the cloud, it's almost like a reinvention of technology we were doing in the '60s and '70s, just at a much greater scale. We've reinvented the mainframe.

At the same time, you're right. Some of the innovation is around the scale. It is a massive scale that we can do now because of the cheapness, and because of the availability of infrastructure as a service, where the access to the time sharing of a platform is much broader. It's very simple for me to go to the cloud today, just for a simple demo. I don't have to go out and acquire anything or do anything.

I can get that resource in Sydney or in Ireland on the cloud there. It doesn't really matter. The scale and the amount of technology out there provides a huge, layered platform for innovation on top of that. At the same time, we're still doing things we did 20 or 30 years ago, just cooler in color with video and streaming and everything else.

Gordon: In a way, I think the meme that everything has been done before is a little bit tired. Of course, there's many echoes and many instantiations of concepts which may go back decades and decades. For instance, to say, "Oh, public Clouds are just like timesharing." Well, they're really not.

William: Right. We've got the type of multi‑tenancy that you're seeing on clouds today. Almost certainly to the user, the opaqueness of the geographical distribution of those assets is incredible. The types of tools and innovation and availability of software and services on top of that are a lot different.

Gordon: We just talked about, I'm not sure if it's a tension, but certainly there's these two faces of open source about easy to acquire, low cost, very accessible on the one hand, and this engine for innovation on the other hand.

I'd like to take us into some of the other dual aspects that we see around open source today. We just touched on one of them, which is also relevant to your work with container tools like Podman, Buildah, and so forth.

That is, when do you standardize and when you innovate, and how does standardization and innovation play against each other?

William: This is a tough one because you can see certain innovations out there thriving because of standards. Yet, you can see other innovations out there dying despite standards. Not so much dying but perhaps not delivering where people thought they would deliver.

Then the de facto standards, for example, you could look at things like container area for a start, where that was driven by an open source, but very non‑standard technology called Docker. Docker then moved with the rest of the community that were developing it into an open source standard OCI.

That has obviously become very successful. You can see other things like Kafka which comes out of the Apache Software Foundation. They came out with essentially another messaging type system which was living in Apache alongside things like Qpid.

Qpid became perhaps less interesting despite the fact that it was based on the AMQP standard. Less interesting from a commercial perspective than something like Kafka which, as we know, exploded.

Just because you build a standard, it needs a huge community behind it. In some ways, a large community can drive the standards. You have projects like Kubernetes which obviously came out of Google and open source.It wasn't like a standard, but companies like Red Hat and others saw how powerful it was going to be. We jumped in on that project. Now Kubernetes has become the de facto standard. Of course, the whole Cloud Native Computing Foundation which is part of the Linux Foundation has essentially grown up out of that.

It's a complex area where you have communities. You have commercial. You have standards. It's about trying to find a perfect storm of bringing those three things together that really drive the popularity or the success of open source project.

Gordon: It's not universal, and there certainly are areas that are standards first. I think you are highlighting an important way in which standardization has evolved in many cases. You talked about some of your work in the middleware, in the eventing space.

If you look at the first iteration of that SOA 1.0 if you would, that was very much standardized in principle, but it was very heavy weight, very big vendor‑driven type of standardization. Whereas what we're seeing with the OCI container specs, obviously what we're seeing with Kubernetes.

What we're seeing with many many of the projects in cloud‑native space, is some company, some individual is going out and writing software that scratches some edge.

Other people are using that. Other people participating in the community. People are going, "Hey, this works. Let's make it a standard now." There's code‑first approach to standardization.

William: It's really fascinating because when you look at it the whole...When you talk about SOA 1.0 and you talk about things like CORBA for example, there was a massive consortium, a huge standards effort with CORBA with massive commercial backing, with banks and telecommunication companies etc.

Perhaps it slowed down because of that. It didn't have a clear vision and scale‑level approach to it. Also, you can look at the W3C around standards as well. That was almost dead on arrival. Lots of lesson learned.

I was involved with the WS policy work. You can see again massive collaboration, huge organizations, massive money behind it. Folks like IBM and Microsoft and others, but they never really took off.

What had happened instead was the free market and innovation in the industry to solve problems first. For example, REST‑based approaches won. Standards doesn't always solve the problem. What I tell you what it will do though is what standard will help figure out is whether something will last.

When you look at the lessons learned from things like CORBA or things like W3C, it's very easy to get a cool "hello world" demonstration up and going in the space. When it comes to massive scale and transactions and security and all those other things, you pull in from all these standards bodies and industry experts and all that.

That's where the drag comes in, the lag comes. In many ways, the innovation today benefits from those early standards not because they were successful, [laughs] but in many ways because they were successful for a time and there were these lessons learned from them.

When you look at some of the hugely scalable architectures we have today, they are ‑‑ as you say ‑‑ built on the backs of the knowledge we gained from those quite frankly successful standards from the past.

It's not that they were all closed‑sourced sided because you had things in the CORBA space and other areas there. Obviously, we still got things like JBoss around today in the JEE.

Essentially lots of lesson learned there, but not exactly a lot of open source products that you will regard as massively successful in today's enterprise computing deployment.

Gordon: OSI [Open Systems Interconnection, in this context] was another one I was involved with back in the day somewhat. The network model, people use it. It's a pretty good model, but the products that came out of it… There was an awful lot of money wasted on that.

William: We still learned the lesson though. We still talk about L3 or whatever level layer 3 on today when we talk or whatever layer.

We still use that as a standard way of talking about how we're going to communicate in a distributed way and whether a piece of open‑sourced software you used in today's speaking at one of those layers etc. They are not OSI projects as you say.

Gordon: Let's talk about some of the other trade‑offs or challenges or conflicts that we see in the space. You've mentioned community a number of times, what are some of the challenges you see with communities around factors like standardization, innovation, stability and trading off all of those things?

William: Again, it's a tough one because I tell you that one of the areas that a lot of community people don't understand fully is the marketability of the technology they are using and how they are bringing it to market.

An example of that that I would use would be on the Qpid side. Here you had open source projects with a huge backing from the many of the banks out there because of AMQP, the standard.

You had an AMQP standard, you had an open‑sourced project called Qpid. When you take that to market, it was a very long sale cycle. There were people who would understand it. The low latency nature, all that coolness, it's open source, super‑fast, scalable, had a fault tolerance and all that stuff built in.

Unless you know how to sell that and sell it at scale, it comes hard to take it to market. You have a salesforce that's selling lots of different products and some of them are easier to sell than others and maybe are bigger price point.

Maybe your Qpid‑based product is less interesting. Something on the other hand like Kafka comes out and not so much geared towards a massive commercial product sale but more as a service, when it comes to thing like cloud or how to build it into a platform.

Suddenly it becomes a hugely more popular way of doing things because the way it's brought to market was different.

Gordon: Maybe this would serve a good segue to talk about business models and some of the trade‑offs there a bit. We'll hear an argument with some of my colleagues about whether open source is a business model or not.

It's certainly fair to say that whether or not software's is open source or more broadly how software is licensed enables and forecloses certain pass in a business model.

That said, I still find it's useful to separate the open sourceness and the business modelish because while they interact with each other, they are not the same thing. Open source is not a business model by itself.

William: No, it is not. The other thing I would say is just because your open source project isn't able to be directly marketed doesn't mean that there's not a market for it within something else. For example, Qpid, it's still got a market there as something that's deployed within other larger platforms.

For example, the Proton Project there can be used extensively for a non‑brokered, more distributed messaging pattern. It's very good. What you have to understand is that when you're taking things to market in the open‑source world you're competing with a lot of other different stories. Particularly, you as a community, in terms of how successful this will be commercially, you're very dependent on the people who are taking it to market for you.

A lot of communities will build really wonderful technology. You're sending a guy off to the bazaar. He's taking his rugs, his baskets, or whatever it is to it. How is he presenting those in the bazaar at his stall? How is he showing them?

Is your product sitting down on a back shelf because he looks at it and goes, "I don't know how to sell it. If somebody asks me for it, I'll sell it to them, but really I don't want it up front on my countertop. What I want up on front of my countertop are the things that sell maybe easily or maybe bring me a lot of markup." Whatever it might be.

How you're taking your open source project from the community, and how you expect that to be delivered in the market is super important. Sometimes it may be that you're not selling it direct. You're selling it as part of platform or something else that you're doing.

Messaging becomes an example of that where people really don't want to handle the complexity of setting up and managing complex messaging systems, but they may certainly consume a messaging service because it's easy, and they don't have to worry about it.

Gordon: We've been seeing things play out in the Hadoop space, for example, about this recently. That seems to be a difficult stand‑alone sale. In a way, this doesn't even have that much to do directly with open source because I can look up other areas of software, like developer tools for example, that have historically been very difficult to sell for the most part.

Another trade‑off that we frequently see is between the speed of innovation ‑‑ rapid change, come out with a new incompatible build every day ‑‑ and stability, particularly for enterprise customers who just want something that works.

It doesn't necessarily need to be the latest and greatest. Fairly or not, open source communities' projects have sometimes gotten a little bit of a bad rap for being, maybe, a little too focused on the change ‑‑ early change, often, rule of development. What are some of your experiences with those trade‑offs?

William: Obviously, there's a thirst, particularly the bleeding edge curve of that innovation, where people want newer features faster. They want to be able to turn around and consume. We want to do fault tolerance. It needs to be multi‑tenant or it needs to be more secure.

They're hungry for these new features but they also are struggling with how they're going to consume it now. There's two aspects of that too. There's the cloud‑native world or whatever, and your traditional apps, where you want more stability.

You still want more stability but at the same time you want to innovate. You want to be able to consume these newer technologies faster. One of the things I would say that's changed is that, in the past, communities would want people to catch up. The consumer would say, "No. We wanna slow down."

That's why companies like Red Hat were so successful because we were able to provide stability for 10 years. For example, on Linux, when Red Hat opened Red Hat Enterprise Linux. We're able to provide that stability for them.

At the same time you had innovation on the cloud space which has accelerated including DevOps and, as you say, the break it early, fix it and deploy often ‑‑ all that good stuff or whatever. Anyway, what I'm seeing now is a very different trend.

We've gone through three stages. One is the stability side with a, "We want the innovation but we can't really handle this in our infrastructure," to the world of DevOps and CICD where it was more of a, "Yes, we want to consume it and we can consume it. Give it to us faster."

To now, almost a situation where it has become, "Yes, we want to consume it faster, but we don't want to own it anymore. We just want to consume it as a service."

We're almost at this third phase of...It's like this comedian that I heard who was talking about a joke...his name is Gary Coleman, I believe. He's talking about how this generation will say "I want all of my music on my phone now," and they're going "What do you mean all of the music on your phone?"

He says, "Sorry, what I meant was all the music on my phone, right now, this moment." "How much are you willing to pay for it?" "Pay for it? Nothing."

That's the joke. My final offer is nothing. If people want everything now to be consumable as a service for free. It's a struggle on the...It's great in some ways for the upstream communities, but it also means they have to be innovating faster. It's interesting for customers like Red Hat as they have to try to pivot to perhaps more of a services‑based model.

Software as a service, providing messaging as a service, for example, or container build as a service, or whatever that might be. It obviously fits nicely into some of the cloud vendors, but also puts a challenge on the consumers as to "How do we want to standardize all this?" If we want to innovate fast and consume these things as a service what are they tradeoffs?

Do we expect a standardized consumption model across all of this or do we expect that we're going to have different pipelines into these different deployment platforms.

Gordon: That's why the critical tradeoffs you have here is we went into near the end of our book that I'll put a free download link in the show notes. You have this ease of consumption in public clouds and software as a service, but in order to get that ease of consumption you are in many ways locking yourself into that single provider.

One of the great both opportunities and challenges for open source broadly is how do you get those attractive experiences to customers in a way that has a sustainable business model while allowing the end users the ability to move their work loads, move their data, move their intellectual property to wherever they want to run it.

William: I think that's going to be a challenge for a while. That's where the industry is in a waiting mode a little bit. Obviously they're not waiting for it in terms of business. They're expecting someone to solve that for them.

Whether that's their open source vendor or someone like Red Hat or whether it's the cloud provider themselves, but as they rushed to the cloud and as they're embracing open‑hybrid cloud and multi‑cloud approaches. They are hitting that whole, just how portable is this container stuff, for example? How easy is it for me to move my instances across these various clouds and to bring them back on‑prem?

We've done a very good job, people like you and me, of talking about the benefits of open hyper‑cloud and multi‑cloud but the industry itself is still trying to catch up with that model. Things like OpenShift can provide a single consolidated platform across public clouds and private clouds, etc. The question is whether in this approach how will these developers and operators and essentially CIO offices respond to that?

They're trying to look at a consumption model. Other things like OpenShift Dedicated and other things like Azure OpenShift or Red Hat OpenShift also help with that model. I still think it's a, "Hurry up with my business." "Keep going everybody." "Go fast." "Do things quickly." But really, are you all making the right decision here?

Because there seems to be, "I don't want to own all this stuff anymore, I want somebody else to own it." Are we doing it right? I think that a lot of the people I've been talking to are struggling with that decision.

Gordon: Well, there's really been this fundamental shift with Cloud, with software as a service. To provide a different way of consuming software and for that matter, really consuming services to your music example earlier. That changed environment. I'm not sure...In fact, I am sure it really hasn't been internalized by everyone.

We run software differently than we used to. Even when we do it on premise and a lot of those implications are still being thrashed out there.

William, anything that you'd like to add before we close? There's a couple of topics I would love to dive down to but I think those are entire podcast by themselves. So let's hit those another day.

William: I would just say that what's exciting is that open source is running strong. Never before have we seen such validation of open sources we do today. In terms of the innovation, there's enterprise‑type projects for almost everything and they continue to improve. You're seeing things like Federation getting added to things like Kubernetes.

We're seeing some good collaboration around these open source projects with companies. With a view to moving more and more of these community‑led projects into a standards approach. All that has been wonderful.

On the other hand, we have this consumption model that there's some hesitation in. Because despite the appetite for all of this innovation there is still struggles, as there always has been with open source world of how people are going to consume this. How they want to consume it and who do they want to own the problem.

Before it was, "Can you own this from an upstream management perspective?" Now it's, "Yes, can I have it on‑prem but can you also own it off‑prem in the cloud so I don't have to do anything up there to play it because I really don't want to be in this mess of complicated IT business."

On the other hand, you’ve got small practitioners and consumers out there that are just quite happy to roll up their sleeves, deploy a bunch of servers and build all the cool stuff on top of it. It's really an exciting place to be. There's all sorts of different people in the marketplace consuming this. There's some super smart people in the communities upstream.

Of course, there's some excellent standards been driven out of these projects that everyone's going to benefit from and continue to make money in the market on.

Thursday, August 29, 2019

Podcast: OpenDataHub brings together the open source tools needed by data scientists

Friday, August 23, 2019

William Henry on open source innovation, the role of standards, and consuming software

Links to site pages

Blog Archive

Podcasts (retired)

Labels