Wednesday, December 18, 2019

Podcast: Martin Mao of Chronosphere talks open sourcing internal projects

Chronosphere provides a monitoring platform, built on M3, for large-scale environments. Martin-Mao is the co-founder and CEO. In this podcast, he talks about the journey of taking M3 from an internal Uber project to a product offered by an independent company, Chronosphere. What do you do when a project outgrows its original role as an internal company project written for its own purposes?

He discusses topics such as:
  • Why it can be important to start in open source from the beginning
  • How to ramp up contributors outside of the core team
  • Aligning business goals of those governing the project with the goals of the community
  • The challenges of open sourcing an internal project.
Listen to the podcast [MP3 - 12:14]

Monday, December 16, 2019

Podcast: Idit Levine, founder of, at Kubecon

In this podcast, we covered API management and service meshes for microservices--and why moving from a monolith to microservices can be challenging. We also got into the business side to talk business models around open source and the creation of communities.

Listen to podcast [MP3 9:33]

Tuesday, December 10, 2019

Podcast: Open source sustainability with Manifold

In this podcast, recorded at Kubecon in November, Manifold co-founder Matt Creager and VP of product Leah Rivers sat down with me to talk about bringing down the barriers to open source commercialization. Manifold powers marketplace infrastructure to connect developers with APIs, tools, and services. We talked about how we might make it easier for developers to build sustainable businesses, even small ones and some of the ways we could make it easier to package and sell software.
Link to podcast [MP3 24:15]

Monday, December 09, 2019

Podcast: Open Governance with Chris Aniszczyk of the Linux Foundation

Chris Aniszczyk heads developer relations for the Linux Foundation. In this podcast, recorded at Kubecon in San Diego in November 2019, Chris takes us through:

  • What open governance means
  • Approaches to open source sustainability (and the problem with donations)
  • Governance best practices such as naming
  • Why projects should have neutral homes
  • When you should (and shouldn't) use a contributor license agreement

Listen to the podcast: [MP3 23:52]

Monday, October 21, 2019

Podcast: Matt Broberg talks Developer Relations

Matt Broberg is technical editor for In this episode, he takes us through what Developer Relation is, how to measure its value, the "soul" between the data points, and some of the ways in which DevRel roles can vary from company to company.

Listen to the podcast [MP3, 17:37]

Subscribe to Innovate@Open on your favorite podcast app.


Gordon Haff:  Today, I'm here down at All Things Open in Raleigh with Matt Broberg, who's the technical editor of We're here today to talk about Developer Relations.
Let's start with something pretty basic. What the heck is Developer Relations?
Matt Broberg:  I wish that was as basic as you asked. To attempt to put it into a quick summary, Developer Relations is an organizational unit, a business unit that is tasked with relating to developers. I think that's the quickest way to summarize it. You might hear job titles of people in DevRel as we call it that are developer advocates or developer experience engineers.
There's a number of nuanced roles that end up falling into it. It is a bit of na ew definition of an organizational unit. There's a lot of discussion on whether it rolls into one of the traditional ones or if it lives on its own. Yeah, DevRel's a thing.
You'll see job titles for it all over, but exactly how it fits is very much up for discussion.
Gordon:  How is this different from where we were historically because obviously companies like Red Hat, Microsoft, and others, have certainly had developer programs for a very long time. Are things different today?
Matt:  I think they're wildly different. My understanding of the history of this is we have this sort of rise of the developer evangelist in the technology space. These people that were on stage speaking and other companies started to notice the impact these people had to the association with their brand, the excitement around the open‑source projects around the brand.
With time, evangelism became less of the goal. It was more advocacy around participation in either the open‑source or even closed source communities of a given project.
With respect to developer relations, the need for it has risen out of the inability for the traditional marketing and traditional project manager to relate to a developer and be able to communicate in their language the value they'll get from using certain technologies. I think of it, kind of similar to how DevOps has risen out of the need for Dev and Ops to communicate differently.
I do think of DevRel as something that the project management side of the house and the marketing side of the house are trying to find this new need and DevRel's filling it.
Gordon:  If somebody is doing DevRel, or of somebody's thinking about what Matt just talked about, that's sounds sort of like something I might want to do, how should they expect to spend their days?
Matt:  The day will depend on talking to the team that's hiring you about what it is and looking at the job description closely. DevRel does seem to be a catch‑all for a lot of different things in the developer community.
For some organizations that means hitting the talking circuit and being at a lot of the most innovative and highly networked events around the industry and whatever vertical you're in. A lot of DevRel folk in the open‑source community are here at All Things Open.
Writing articles and other kinds of content. Creating podcasts, like we're doing now. Those are some of the outputs that tend to be measured.
Others, it's about code contribution and being a shepherd of a particular community. Let's say you're a subject matter expert in Python and our project requires a Python SDK. We're looking to get more Python adopters. You, being the person that speaks to that and contributes to the code and helps curate that contribution, is more of the core facet.
I wish I could give you a solid single definition. I can tell you for sure you need to talk to somebody about what it is. You'll do some weird, but wonderful, combination of speaking and listening and coding and not coding [laughs] .
Gordon:  One of the things you said does talk to some of the background of people who might go into DevRel. You're suggesting that for some they might be doing quite a bit of coding, which obviously implies a certain interest in and some degree of technical background. Whereas there's other people at DevRel who really don't consider themselves technical.
Matt:  That's a good note. I tend to look at it as something where coding, that is the core part of your responsibility when I design a DevRel organization, which I've done a couple of times. It comes to finding that right mix of coding and talking.
For many organizations they just need somebody who can talk to somebody who is a developer. That has a lot more to do with your knowledge of the community, knowledge of the ecosystem, and less to do with how much time you spend pushing code to GitHub or to GitLab.
I appreciate that. There's no gate‑keeping here. Regardless of your background, you can be inside in DevRel. You may need to learn some of that software background to participate. It's really: Are you able to participate and communicate to the degree that aligns to whatever your business needs out of a developer relations organization?
Gordon:  You mentioned measurement in passing a minute or two back. You just gave a talk here about measurements and metrics.
I'm in an evangelist role myself. Fairly different from a developer relations role. I have struggled with that same issue of metrics and measurement. Part of it is that the stuff that is easy to measure is often the stuff that isn't very important; you hope it is a proxy for something that's important, but is really challenging.
Whereas a metric of how much do developers respect us as a can do surveys and the like, but that's still kind of a hard thing to really suss out.
Matt:  Yeah, finding the soul between the numbers is always this thing. I keep trying to struggle with it because it's fun, and it's hard, but the way I approach it these days is I think of the different things we could measure, the tally marks and peanuts that we're counting.
That's raw data that we're going to feed into something, but when we're communicating our value, that's actually a business motion. We have to talk about what metric, which tends to be an aggregate metric, something that is providing more value than just, say, the number of talks I've given.
I think the number one thing that somebody in a DevRel‑ or evangelist‑type role as well, it's the stories you can tell after. Who have you influenced, what did they do with that information, what is something that happened that wouldn't have happened if you participated?
Telling that angle, which is not measurement. The core measurements, there's a couple that we can talk through that are really fascinating, but I think at the basis of it, there isn't a standardization. There's a need to know what you're being measured against, and whether that is how many people show up to your talks or how many people are contributing code to your open‑source project.
I've seen both, and they're both valid DevRel, they're just wildly different needs.
Gordon:  I think historically, there has been this tension or conundrum in, perhaps particularly open‑source ‑‑ though it's certainly not exclusively oriented towards open‑source ‑‑ towards looking at things like how many times something has been downloaded, for example.
Which has been super easy to measure, and probably indicates something. If the number is zero. That's kind of bad no matter how you slice it, but it also doesn't really tell you, for example, how much it's being used, how engaged the users are
Matt:  I really love this point, Gordon, because I don't think data has an opinion. We imply a lot of opinion from data.
Downloads, for example, you're like, "Is downloads good?" Probably yes, to some degree. Maybe it's bad if people have to keep re‑downloading the same thing in order to get it to run.
It might be a problem with your uptake for whatever project you're working on. I tend to think of downloads in some sort of time series. It's like downloads per day or downloads per week, downloads per individual, how many unique downloads can we have.
Now we're getting to data and parsing it in a way where we can tell a story behind it, because data on its own is just the raw bits. The stories can actually be really harmful if we don't use it in the right way.
A more common example, people talk about page views, because a lot of us create content. How many page views are you getting for what you're doing? Page views feels really vain, it's a vanity metric at its heart, but it can be a really good standard if other organizations you work with use page views in a way.
I can say when I write an article on, I can get 10,000 page views to the right audience of developers, but we may have to spend $5,000 to get that syndicated to an expensive platform that we're paying for.
My free path is meeting that same measurement. Now I'm talking about business value. I'm adding value by writing here as opposed to money funneling there.
It's when we can find that comparison point that we go from a raw and maybe uncomfortable, maybe seemingly useless metric, to something that's like, "That data is transforming into a metric that will add value."
Gordon:  As you mentioned, time series, changes over time, can also be very valuable. Whatever you think of page views, for example, as a metric, if it is going up steadily month to month, quarter to quarter, year‑to‑year, that is probably a good thing.
Matt:  Yeah, exactly. It's fitting into that larger narrative of what's changing, why is it changing, can we find that if we push something, if we poke this that it goes up or down, so that we can test our causality with that change. That's fun too, but that's pretty advanced.
It really does come down to, can you measure the things that are worth measuring? Then can you find the metric that people are actually going to care about? A pitch, they don't pay me to do this, but Bitergia is the place to get your data in there for everything from Slack and Discourse to GitHub, GitLab.
It all aggregates it and starts to identify individuals and see their path through your community, so seeing the correlation of data is easier than ever with the tooling.
The storytelling is just as hard as it always been. We have to keep doubling down there and that's why I keep talking to people like yourself about it.
Gordon:  It does seem in general, we have the data points now and we know how to correlate them in things like that. As you say, you can still risk over applying certain types of vanity metrics, as you put it, GitHub stars and that kind of thing.
But we do at least have that data, so we at least have a baseline where we can be thinking about how to measure effectiveness and how to really direct people so they are more effective and take actions in that way.
Matt:  There's a wonderful phrase of, "Just be careful of what you measure because if you incentivize it, you will achieve it." Which means that even though these metrics, if these data points are a proxy for some point of value. GitHub Stars, my understanding for many is that it's a proxy for popularity and popularity's a proxy for monetization. That I can make some money on this if there's a lot of people that star it.
But if you start focusing just on stars, it's not going to necessarily correlate with the monetization. You still have to have the understanding of what's the business flow to make money, because open‑source is not a business model, as we're well aware.
So understanding like the flow all the way through from the raw data you're using to the outcome you're presuming, we all have a responsibility to inspect that and take the time to understand the impact.
Gordon:  Well, going back to downloads, for example. If you really focus in on downloads and you certainly saw that a number of years back in a lot of cases. If you make that your metric of success then the answer is, well, we advertise and let people know and encourage them to download and give them gift cards if they download and everything, and then our job is done.
Matt:  You nailed it. Now you're done, so exactly. There's a corollary to that. The bad of that is that you can miss the soul of it. You can miss that people want to be happy about what they're downloading, they're not downloading it because they're pissed off. There's a nuance there, of their emotional state as they do so that you can't really grab there. The question is, how do we pivot our metrics?
You start representing happiness as opposed to just raw downloads. The other thing is maybe you're right, maybe if downloads are a great metric, maybe you go to fewer conferences and save the money and give out some gift cards. That's a valid strategic move based on data informing it. That's why data can be very informative but it can't choose for you.
The data doesn't have an opinion on your strategy. You have to have that. There's a lot of work to be done in our space to inform and share common models that are working, because all models are wrong, but some of them are quite useful. It helps us communicate value in the same way as sales can talk about their sales qualified leads leading to closes. We need that in community work that leads to happiness as well.
Gordon:  There is the idea of the funnel, if you would, with developer relations as well. If everybody knows about your product, project, software, well, maybe building awareness isn't something you need to focus on. Maybe they are aware of it but they're also aware of how horribly hard it is to use. Maybe in that case you should be focusing on documentation, training seminars, that kind of thing.
Matt:  It's funny, because anyone who has done this for a bit, we all know the right things to do. The question is, "Can you justify it with your model?" Absolutely, I think, applying a sort of marketing funnel strategy to this, Mary Thengvall who wrote the book about DevRel as a business model is super brilliant. She's talking about DevRel Qualified Leads or Community Qualified Leads, same idea of having a single unit of measure of handing off individual contacts throughout the business.
I think it's a really cool metaphor that we might want to use strategically to be able to say, "Hey, this is who we bring in." Sometimes we hand them over to recruiting to get hired and sometimes they go to sales, sometimes they go to products.
That's the unit of measure so maybe that's part of the answer. I'm not totally sure right now but I do know it's going to be unique to your business .
Gordon:  Sometimes there's a discomfort with these kind of metrics because I think, for many of us, our first reaction is, "Oh, look, we're talking at a conference. We're writing a lot." Just trust us.
We're doing good things out there. The fact of the matter is if we don't all focus too much on the page views instead, so finding that, I think you called it the soul in between those really is challenging
Matt:  The soul between the data points is really hard to capture. If I could, I would completely drop all this data hunting stuff that I like to do even though it's kind of fun for me and just focus on the soul stuff like the things where somebody feels included and loved and cared for and that it's part of their identity to be part of the community.
But I've seen time and time again when you focus just on that, you end up losing funding. It's just a fact of business, and capitalism which we can't reject.
We have to accept where we're in in our system and build the system on top that will justify it, the models on top that will justify it, I'm with you, I want to find the sole in it. I think that we still need to get a strategy to quantify it so that we justify it ourselves.
Gordon:  Thank you, anything else you'd like to add?
Matt:  I really enjoyed talking to you. If anyone interested in how they can learn to write about their technology skills, maybe if you are an engineer and you've never done this before, I coach people as part of my job at Reach out, I'm happy to work through learning how to write your open‑source stories.
Gordon:  And even if you aren't a full‑time writer like I often seem to be, I would really encourage people to do this thing. It's a great platform for you to get better known. It's fun to do at least I find it fun to do. It's really a way to share, to let others know about your experience as always to get involved in new things.
Matt:  Definitely. We try to make it a very exciting community to be a part of so we got everything from cool swag to really great people we'll connect you with. Yeah, reach out.

Thursday, August 29, 2019

Podcast: OpenDataHub brings together the open source tools needed by data scientists

[click diagram to embiggen]

OpenDataHub is a reference architecture that brings together tools for AI and data work. In this podcast, I talk with Red Hat's Steve Huels, Sherard Griffin, and Pete MacKinnon about the motivations behind OpenDataHub, the open source projects that make it up, how OpenDataHub works in concert with Kubernetes and Operators on OpenShift, and how it reflects a general trends towards open source communities working together in new ways.

Show notes: - An AI and Data Platform for the Hybrid Cloud

MP3 - 22:11


Steven Huels:   I am responsible for the AI Center of Excellence at Red Hat.
Sherard Griffin:   I am in the AI Center of Excellence as well, responsible for the OpenDataHub and the internal running of the data hub at Red Hat.
Pete MacKinnon:   Red Hat AI Center of Excellence, helping customers and ISVs deploy their machine‑learning workloads on top of OpenShift and Kubernetes.
Gordon:  We heard "OpenDataHub." Let's tell our listeners what it is.
Sherard:  OpenDataHub, that's an interesting project. It's actually a metaproject of other community projects that are out there. It basically is responsible for orchestrating different components that will help data engineers, data scientists, as well as business analysts and DevOps personnel to manage their infrastructure and actually deploy, train, create models, and push them out from for machine learning and AI purposes.
What we tried to do is, take things that are being done in the community through open‑source technologies. Wrap them up in something that can be deployed into Red Hat technologies such as OpenShift, and leverage some of the other technologies like SAP, JupyterHub, and Spark. Make it more easily and accessible for data scientists and data engineers to do their work.
Gordon:  I'm going to put an architecture diagram in the show notes. Could you, at high‑level, describe what some of the more interesting components are, what the basic architecture is for OpenDataHub?
Sherard:  One of the key things that you'll see when you deploy OpenDataHub inside of OpenShift is that it solves your storage needs. We collaborated with the storage business unit at Red Hat.
We allow users to deploy Ceph object storage that allows for users to be able to store their massive amounts of unstructured data, and start to do data engineering and machine learning on top of that. Then, you'll see other popular components like Spark where you're able to then query the data that's sitting inside of Ceph.
You can also use Jupyter notebooks for your machine‑learning tools and be able to interact with the data from those perspectives.
That's three high‑level components but there are many more that you can interact with that allow you to do things like monitoring your infrastructure, being able to get alerts from things that are going wrong, and then also doing things like pushing out models to production, testing your models, validating them, and then doing some business intelligence on top of your data.
Gordon:  We've been talking at an abstract level here. What are some of the things that our listeners might be interested in that we've used OpenDataHub for?
Pete:  There's a variety of use cases. It is a reference architecture. We take it out into the field and interact with our customers and ISVs and explain the various features.
Typically, the use cases are machine learning where you have a data scientist who is working from a Python notebook and developing and training a model.
ETL is an important part of machine learning pipelines. This part can be used to basically pull that data out of data lakes that perhaps are stored in HDFS and Hadoop and put into the model development process.
Sherard:  More broadly, the OpenDataHub internally is used as a platform where we allow users to import data of interest themselves and just experiment and explore data.
Whether they want to build models and publish those models, it's really an open platform for them to play with an environment without having to worry about standing up their own infrastructure and monitoring components.
More specifically, we use it for use cases that ultimately make their way into some of our product focus. We're experimenting with things like how do we look at telemetry data for running systems and predict future capacity needs. How do we detect the anomalies and then drill down to the root cause analysis and try to remedy those things automatically?
These are all some of the use cases that we're using the OpenDataHub for. We feel a lot of these, obviously, have resonated with our customers, and they mirror the use cases they're trying to solve. Those are just a couple of the ones we're doing internally.
Gordon:  We've been talking about the internal Red Hat aspect of OpenDataHub. Of course, this isn't just an internal Red Hat thing.
Sherard:  Correct. As Pete mentioned, OpenDataHub is a reference architecture. It is open source, and we use it as a framework for how we talk with customers around implementing AI on OpenShift.
There's a lot about OpenDataHub in the way it's been broken apart and architected that it hits on a lot of the key points of the types of workloads and activities all customers are doing.
There's data ingestion. There's data exploration. There's analysis. There's publishing of models. There's operation of models. There's operating the cluster. There's security concerns. There's privacy concerns. All of those are commoditized within OpenDataHub.
Because it's an open source reference architecture, it gives us the freedom then to engage with customers and talk about the tool sets that they are using to manage their use cases.
Instead of just having a box of technology that's maybe loosely coupled and loosely integrated, we can gear the conversation toward, "What's your use case, what tools are you using today?" Some may be open source, some may be Red Hat, some may be ISV partner provided. We can work that into a solution for the customer.
They may not even touch on all of those levels that I discussed there. What we've tried to do is given all encompassing vision, so we can build out the full picture of what's capable and then solve customer problems where they have specific needs.
Gordon:  Again, as listeners can see, when they look at the Show Notes, there is an architecture there. For example, there's a number of different storage options, there's a number of different streaming and event type models, there's a number of other types of tools that they can use. Of course, they can add things that aren't even in there.
Steven:  If they add them, we'd love for them to contribute them back, because that's the entire open source model. We use the OpenDataHub community as that collection point for whether you're an ISV, whether you're an open source community. If you want to be part of this tightly integrated community, that's where we want to do that integration work.
Gordon:  That's what the real value in open source and its open source and OpenDataHub is. It does make it possible to combine these technologies from different places together, have outside contributors, get outside input in a way that I don't think was ever really possible with proprietary products.
Pete:  That's where it really resonates with Red Hat customers. Is they finally see the power of open source in terms of actually solving real use cases that are important to their businesses. All the components are open source, the entire project, open source, OpenShift itself is open source. It's an ethos that infuses everything about OpenDataHub.
Sherard:  I would add to that. One of the interesting points that both Steven and Pete brought up, is how customers have gravitated towards that. A lot of that is because we're also showing them that, hey, you've invested in Red Hat already or you've invested in RHEL, you've invested in containers.
In order for you to get that same experience that you may see from an AWS, SageMaker or some of their tools, their cognitive tools, Microsoft's cognitive tools, you don't have to go in and reinvest somewhere else.
We can show you through this reference architecture how you can take some of Red Hat's more popular technologies and use those same things like OpenShift, like RHEL, like containers and be able to have that same experience that you may see in AWS, but in your own infrastructure.
Whether that's something that's on‑prem or whether that's something in the hybrid cloud or even an OpenShift deployed in the cloud, you're able to move those workloads freely between clouds and not feel like you have to reinvest in something brand new.
Gordon:  Well, the interesting things about OpenDataHub and I think is also an interesting thing about OpenShift, for example is, over the last few years, maybe, we've really started to see this power of different communities and different projects and different technological pieces coming together.
Certainly, Open Source has long history. But with Cloud‑native in the case of OpenShift, with the AI tools coming together in something like OpenDataHub, we're seeing more and more this strength of open bringing all these different pieces together.
Sherard:  Yeah, absolutely. OpenDataHub first started out as an internal proof point. How can we, with open source technologies, solve some AI needs at Red Hat? What we quickly understood is, there's more of a life cycle that machine learning models have to go through. First starting with collecting data all the way through to a business analyst and showing the value of a model.
That allowed us to map out all the different parts of the life cycle and then start to figure out, "How can we introduce Open Source Technologies at each stage of that to help the process along?" As we've discovered what those processes are, we've internally deployed those into our own systems.
We're working towards getting a more robust system that solves our own problems. As we do that, we share that with the broader community saying, "Hey, here are the open source tools we did for each part of the life cycle of a machine learning model and here's how you can do the same thing."
Gordon:  And even really goes beyond that. We're here at Boston University at conference. For example, there's a lot of work being done on the research side with Red Hat and BU, for instance, on privacy preserving AI techniques. That's too much to get into in this podcast, but that's part of the whole mix too.
Steven:  Privacy, obviously, with a lot of the trends that people have seen with some of the major companies out there, is a very hot topic. There's a lot that Red Hat has already, historically been doing in the space. Whether it was looking into things like multi‑party computing to preserve the anonymity of certain data sets, that's something we've been doing for quite some time.
There's other things we're looking into too, things like differential privacy. How do we allow access to data for analysis from multiple parties while still preserving that anonymous component? Then even beyond that, we're starting to look into things like data governance.
What exists in the open source world for data governance? How do I adhere to and maintain my GDPR compliance? These standards are only going to continue to emerge as more and more data gets collected on people. They're very hot topics. They are things that Red Hat is actively involved in and has a voice in going forward.
Gordon:  Outside of Red Hat, what are we seeing in terms of interest aand doption of not just the individual technologies but OpenDataHub more broadly?
Steven:  If I even pull that a step back, the reason why a lot of this technology now is taking off the way it is, is because industry has readily adopted a lot of the open source standards. They started to expect the open source frameworks to support their use cases.
It's not enough that you simply have a single component that can deliver one piece of value. You want an integrated suite that solves a whole myriad of your problems. In doing so, there's a natural correlation and integration that has to occur.
That's being done in pockets in different areas. They're solving maybe niche use cases. When you look at something like OpenDataHub, it's actually crossing the spectrum and the boundaries of what it takes to operationalize the solutions to these problems.
Historically, a lot of these problems were solved by individuals who could do something on a very high‑powered work station on their desk, but they never made their way into production. The value companies were to get from them was very, very limited. They made great PowerPoint slides, but they never really delivered any value.
Companies now expect that value to be delivered. The OpenDataHub and that type of framework is what allows for something to put be put in operations and then maintain, like you would any other sort of critical application.
Gordon:  I think the other thing that happened is if we looked at this space a few years ago, it almost looked like people were looking for that magic bullet, like Hadoop for example, "Oh, this is going to solve everybody's data problems."
What we're seeing is you need a toolkit. You don't necessarily want the toolkit that you have to go all over the vast Internet and assemble from scratch and figure out what projects do the job and which ones are works in progress. OpenDataHub kind of reaches that, it would seem.
Pete:  In recent talks I've started off the talk talking about the AI Winter which has been this notion of this cycle of enthusiasm and investment by companies and other institutions in artificial intelligence and machine learning, only to ultimately see it fall by the wayside, fall apart.
Everything has changed now. I think open source is a big component of that change because, as Steven was saying, these various individual component projects like JupyterHub, they're their own communities, but those communities are starting to interact with each other in various ways.
What's been missing is an integrated suite like Steve was talking about. That's what we're trying to do with the OpenDataHub, something that provides a comprehensive AI machine learning platform to defeat the cycle of AI machinery winters that come and go.
Sherard:  I would also say, what we've seen when we talk to customers is that they're all at so many different stages of their AI journey. Some of them are at the very beginning. They just want to know what AI is and what that means to their organization. Some of them are at the tail end where they've developed models, and they don't know how to productionalize it.
One of the things we're able to do is take the OpenDataHub as a grounding moment for us to all have the same basic conversation. Then we can start to talk to them and say, "Hey, if you're just starting out, here's something you can start out with, or if you're ready to productionalize it, the OpenDataHub can help you conceptualize and have a reference architecture for how to productionalize it.
It allows us to just have conversations with our customers to let them, first, understand that we know the problem because we're doing it internally, ourselves. But then also, as we work with other customers and get more information about what they may want to do with governance, with security, with auditing, all these other things that happen before you go push it to production.
How can we go about it in a broader community sense where we're tackling all of this together, and we're really pushing the envelope, where everybody's starting to contribute and we're getting feedback from the customers, but we're also providing guidance?
Gordon:  How does a new person or organization typically get involved, whether it's OpenDataHub specifically, but these types of programs, what's your general recommendation?
Pete:  Having served in various open source communities, it becomes readily apparent about best practices for getting involved. There's typically a lot of enthusiasm, somebody identifies a project that they think is going to be great to work on.
It's very important to approach the community and understand how the community conducts itself. Typically, open source communities these days have exactly that, a code of conduct. There's best practices around things like GitHub in terms of forming a pull request if you have a new feature or bug request to help the community.
Also, there's modern chat applications like Slack, and Google Chat, and things like that where these communities form around. It's always good to come into those arenas hat in hand, as it were, and be humble about what it is you're trying to do. Ask questions, listen to the conversations, and build up value and present that value back to the community.
Gordon:  You're saying that if I want to get involved, I shouldn't just join the mailing list and tell everyone they don't know what they're doing?
Pete:  I wouldn't recommend that, no.
Gordon:  How would you describe, very high level, the state of OpenDataHub today and what we should expect to see for next steps, what should our expectations be?
Sherard:  The state of OpenDataHub, it's ever‑evolving. We're meeting with customers, understanding what their use cases are, trying to see how do we solve the use cases with something with a reference architecture like the OpenDataHub and see where the gaps are.
If you look at when we first started this earlier this year, and we first put out our first Operator that deployed the OpenDataHub...
Gordon:  Operator?
Pete:  Talking about OpenShift and Kubernetes, Operator is a powerful new paradigm where you basically encapsulate application lifecycle for particular components. That component interacts with the Kubernetes and OpenShift API to do full lifecycle management of that component. That's the 50,000‑foot view.
Gordon:  Yeah. To add a little bit to that, having seen a workshop yesterday in this topic, then ideas give the operator. It couldn't install to your Kafka eventing system, Spark cluster, your Jupyter Notebooks, and for something that has law components like OpenDataHub.
The idea is, it's as if you had an expert on the system come in and spend a couple hours installing things for you.
Sherard:  Yeah, that's exactly right. That's why we gravitated towards operators pretty early. The OpenDataHub is an operator, a meta‑operator. It even deploys other operators like Kafka Strimzi. We're also working with the Seldon team. We're going to be looking at integrating some other of our partners into that ecosystem.
What I was getting at is, where we were earlier this year was, the OpenDataHub was really focused on the data scientist use case trying to replace the experience of all of your data scientists across an organization doing work on their laptops.
Certain people may have different components installed. Everyone's doing pip installs with different versions. You have all kinds of dependencies that are very specific to that data scientist's laptop or workstation.
What we tried to do is solve that by introducing this into Kubernetes so that we have a multi‑user environment in OpenShift, so that everybody has the same playing field, every user's using the same suite of tools. They're using the same suite of dependencies and same versions of packages so it makes it easier to collaborate.
Once we did that, the next step was to start to introduce more of a management of your machine learning models. Now, we've introduced Seldon where you can actually deploy your model as a REST API. Then, we also introduced Kafka for data ingestion down into your object storage. We also had the ability to query the data using Spark.
Coming down the pipeline and the next month here, we're going to be introducing tools for the data engineer. What we're doing is looking at how do you catalog your data that's stored in the object storage. This is Hive Metastore but we're also introducing technologies on top of that such as Hue, which will allow you to be able to manipulate the data before the data scientists even get there.
The reason that we decided on that is because we all know that before you do machine learning, data just doesn't come in cleanly. It's not perfect right out the gate. We knew that there was a step missing in enabling data engineers to massage and clean that data before the data scientists got ahold of it.
Then, down the pipeline after that, we're looking at BI tools but then also, there's going to be more governance. We're looking at tools that might help out such as Apache Ranger, Apache Atlas. We have a number of people that are contributing in that space.
We're looking at how can we introduce more cohesive end‑to‑end management of the platform. You'll see more of that as we move along in the next few months here.
Gordon:  Where could someone go to learn more?
Steven: is the community site. You'll find a number of listserves if you want to stay in the loop. If you want to get involved, you can sign up and we can pull you into the various workstreams.