Tuesday, November 15, 2016

Data and DevOps with Splunk's Andi Mann

AndiMann HiRes

Andi Mann is Chief Technology Advocate at Splunk. In this podcast, he discusses some of the ways in which data plays an important role in DevOps. I’ve known Andi for ages since we were both IT industry analysts and we had a chance to sit down at CloudExpo/DevOps Summit/IoT Summit in Santa Clara where Andi was chairing a DevOps track and I was one of the speakers. (We also did a data and DevOps panel together on the main stage but that video doesn’t seem to be up yet. I’ll post once it is.)

Among the topics we tackle are choosing appropriate metrics that align with the business rather than just technical measures, creating feedback loops, using data to promote accountability, and DevSecOps.

Show notes:

Listen to MP3 (22:13)

Listen to OGG (22:13)


Gordon Haff:  I'm sitting down here with an old analyst mate of mine, Andi Mann. Also, formally of CA, also the author of some books, and now he is the chief technology advocate with Splunk. What we're going to talk about today is data in DevOps. Welcome, Andi.

Andi Mann:  A lot of the customers I talk to, who are doing DevOps in various versions using Splunk...It boils down to three key areas is that they really want to know about, the metrics that matter for them.

The first is really about how fast are they? What's their cycle time? How quick does it take for an idea to get in front of a customer? How long does it take someone in business to come up with something and then basically make money from it, or, in government, they service their citizens with it? That cycle time is really important, the velocity of delivery.

The second key area that people look at is around the quality of what they're delivering. Are they doing good? Are they delivering good applications? Are they creating downtime? Are they having availability issues? Is one release better than another?

The third area is really around what sort of impact do they have? Measuring real business goals, MBOs, things like revenue and customer sign‑up rates, and cart fulfillment, and cart abandonment. These sorts of things. Those are the metrics that my customers, the people I talk to are interested in for DevOps, closing those feedback loops in those three areas.

Gordon:  One of the things I find interesting, what you just said, Andi, is that you read these DevOps surveys, DevOps reports, and often the metrics, or at least what they're calling metrics, are framed in much more technical terms. How many releases do we have per year, or per week, per hour?

What's the failure rate? How quickly can we do builds? How quickly can we integrate? Which, I think to your point, are probably worth measuring, but they're really...The ultimate goal of DevOps is not to release software faster.

Andi:  Exactly. It's interesting because you do look at these metrics in isolation, and they matter. All this matters. 10 deploys a day, we all know that from 2009 in Velocity. That matters, but 10 deploys a day is no good if they're all bad deploys. You need to measure quality in that.

But even if it's a good quality deploy and you do it quickly, if it's not moving the needle on what your business wants you to be doing, then again, it doesn't matter. I think it's actually really important to connect these together so you really are getting metrics, correlating metrics, that matter across the whole range to really understand whether you're doing good or not.

Gordon:  One of my favorite Dilbert cartoons, I don't remember the exact wording but to the effect of...Pointy Hair goes, "We're now going to measure you on the number of lines of you code you write," and Wally says, "I'm going to off to write myself a new car today."

Andi:  [laughs] Yeah, exactly. That's one of the things that I actually do measure. We measure it internally. A bunch of our customers do actually measure code volume. There's a couple of interesting reasons for that. Especially in a DevOps and Agile mode, actually delivering too much code can be a signifier that you're doing things badly.

You're writing too much code, you're doing too much in one release rather than doing small, iterative releases. It can also signify that one person has too much of a workload. When you think about DevOps and the concepts around empathy and wanting to make sure that life doesn't suck for everyone, when one person is doing all the work, that sucks for them.

There are actually good things that come out of measuring code volume [laughs] but saying that more code equals better code, equals a bonus? That's a really bad thing. [laughs]

Gordon:   I think a lot of people tend to lump data metrics into this one big bucket. As we've had discussions before, there are these business metrics which have to be somehow connected to things.

It's not clear that overall company revenue is necessarily a good DevOps metric. Some of the other things you mentioned certainly are. In many cases, it does make sense to collect a lot of underlying data for data analytics and things like that. Then, you also have alerts.

Andi:  Yeah, the business stuff is really interesting. I know one of our customers releases the software-as-a-service. They're a SaaS company, cloud native and all that. Their developers actually do care about who uses specific features.

They'll implement a feature. They do canary releases. They'll implement feature on 10 out of a 1,000 servers, or whatever. Certain volume or percentage of their customers will get access to it. Then they'll measure using Splunk the way that those features are being used or not. They also measure the satisfaction of those customers.

They've got these nice smileys, and tick marks, and stuff that say, "Yes, I enjoyed using this feature." They can correlate that together, and it actually means the next day after doing a commit, after doing a release, they actually know whether the business use cases being satisfied, which is very cool.

I know a television company in the UK that we work with. They actually send reports on a weekly basis, I think it is, to their marketing department, based on whether users are using the website, what they're doing on the website, whether they're clicking through on competitions.

That's actually really important, but obviously mostly what people are doing in using data and the feedback...Closing the feedback loops is what I'm talking about here at DevOps Summit.

They're closing the feedback loop around those technical measurements.

Am I creating more bugs? Am I creating availability issues? Am I creating problems with uptime? Am I closing out the feature set that is in the story or in the epic that I was promising to do? Partially, it's also around this accountability to each other. Am I doing what I'd promised I'd do?

Gordon:  Talk a little more about accountability.

Andi:  Yeah, that's one of my soapboxes at the moment. I see a lot of the empowerment that DevOps gives developers to make decisions. I think that's great, especially in companies where you've got systems thinking and they understand their role in the organization and what it means to deliver good outputs for their customers.

You give them a lot of responsibility. Their manager is the leader. You give your developers, and your operations team, and those DevOps professionals a lot of responsibility and lot of empowerment to do the right thing.

Also, I think that there's a need for them to be accountable for doing the right thing as well. Especially as DevOps grows in larger organizations and there are more and more people involved. Also, with the concept that DevOps is about helping and making sure that each other is having a good experience at their life and their work.

As a developer, you're not making sure that operators are getting called out late at night, and all this sort of stuff. If DevOps is about helping to work with each other, to collaborate, to communicate better, to make sure each other's lives get better as Dev and Ops professionals, then I think you need to be accountable in two ways.

You need to be accountable to your business, which often means being accountable to your manager for doing the work that you're meant to do, and doing the work you promised you would, within the bounds of the responsibility you've been given.

It's also being accountable to each other, from doing good work, and doing the right work in ways that helps your whole team move forward, and makes everyone else's life positive. I think we talk a lot about empowerment and enablement. We don't really talk much about the flip side of that, which I think is that accountability.

Gordon:  I think the culture talk around DevOps, and we did have lots of discussions around culture and some of the ways that it can be overextended and over‑applied. Yeah, it can turn into this "don't fear failure," empathy, transparency, etc. Unicorns farting rainbows. This very touchy feely, everyone's happy and sings "Kumbaya," but you are, at the end of the day, being paid to produce business outcomes.

There does need to be some accountability there. If you crash the SQL server three weekends in a row, and call in Ops, somebody's going to have to talk with you, as they should.

Andi:  Exactly. Especially when you talk about the DevOps toolchain and the life cycle of software. It's a very complex and opaque theme to try to see what's going on at every stage, especially if you're a manager who's not necessarily fully fluent in specific tools. They can't dig into the specific tools to have a look at that.

I think reporting up to your management and reporting to each other and saying, "I introduced these bugs and I'm sorry for it. I won't do it again." By the same token, "I introduced these newest features, and they were really successful. We should all celebrate that as a team."

I think that accountability is actually really important. You'll see this in manufacturing as well where we get a lot of our examples from. You'll see that if one person makes the same mistake several times, then they'll get into a training program, or they'll get different mentoring.

Maybe they'll move into a different part of the line where they're better suited, and their skills are better suited. You don't know how to make your team better if you're not being accountable to each other, and to your management.

That's, I think, something we've got to step up to as DevOps professionals for want of a better term, is how do we be accountable to each other, and to the company that pays us as you said to do the job?

Gordon:  You just talked about manufacturing. You just mentioned quality, and I think that's a pretty good segue because we often think about DevOps primarily, well, through the lens of developer for one thing, but that's another topic for another day.

We also tend to view DevOps, first and foremost, through the lens of this velocity, business agility, and so forth, but there is a very important quality component there as well. What are some of the ways that data can help to surface that quality component?

Andi:  Absolutely. Some of the things we're looking at ‑‑ and our customers are doing a lot of this at the moment ‑‑ is looking at areas like code coverage and tests, number of defects, defect rates per release. Looking at the aggregating and correlating the quality metrics out of multiple test and scanning tools.

Doing static analysis and looking at the defect rates, doing dynamic analysis, and then also looking at the defect rates, as well as application performance and health scores. Looking at the performance in terms of resource utilization, response time, availability, execution failures, and so forth.

Comparing current release in production with next release just about to come forward, and being able to run that over time, so you can see whether you're making quality improvements over time.

If you're able to actually give your application a health score, and then you can measure that not just in production, but also in staging, or pre‑prod, whatever you want to call it, then you can start to make sure that you're getting better with every release. Your quality is going up with every release.

You can do with actual data, real measurements, so coming out of these testing tools, as well as coming out of actually running that in a stood up environment. There's lots of feedback loops you can close there.

Once you start to find problems as you find them especially in production, but also in pre‑prod and staging, feeding those things back into the test cycle so that you never find the same mistake twice, because the first time you find it, the next time you'd test for it.

Gordon:  This idea of doing things incrementally in stages, before they hit production, is really important from a security perspective as well. I was just having a conversation with one of my colleagues, or actually several of my colleagues, about this kind of tension between the traditional security guy who is sort of, "Stop. Stop. Don't push it out there," and this idea of whether you like the term or not, DevSecOps, where security gets baked in, and added incrementally.

What we were saying, and what was really coming out as we were having this discussion was that while the reason there's this tension or maybe disconnect is from the security guys' point of view, to a degree, the serious security flaws are pushed out into production.

Well, that is something that simply needs to be stopped to a degree that you can tolerate failures and errors in security that don't hit the actual production environment, because you found them through automated testing, or whatever. Then that makes more sense as this incremental, and sometimes breaking things sort of process.

Andi:  Yeah. Absolutely. This is actually something I've done a little bit of work, and most of the work is being done by someone that you probably know well, Ed Haletky of the TVP, @Texiwill on Twitter.

He's done a bunch of work, and I've put my two cents worth, and it's probably worth maybe one. Looking at security, and security testing, pin testing, code quality testing, so finding things like potential SQL injection, these sorts of things.

Also using some of those tools like Fortifywhich will do quality of code scanning for security purposes. You can start to shift left in that respect, but also continuing to get inputs from security testing even post release. There's no reason why security testing can't keep going even after you've released.

You can get to a certain coverage rate. This is where data helps. You get to a 90 percent, or a 92 percent, or a 95 percent coverage rate, or confidence level if you will. You go, "OK, I'm ready to release. I know that the remaining five percent is potentially low impact, or low risk. I'll put it out there anyway, but continuing to test."

There's some really interesting work out there that Ed's published about cloud, cloud promotion, and cloud delivery that actually really focuses on using these metrics from security testing, both pre and post release, which I think is actually really important.

Gordon:  We're going to be hearing a lot more about this whole security angle everywhere. This is partly an IoT show. We've heard a lot about security. I'm not sure we've heard a lot of solutions, but we've heard a lot about security.

Obviously, it is a big part of the DevOps discussion. It's a big, scary world out there, and it's pretty universally recognized that having an auditor sign off once a year, and then you don't think about security for that application for another six months or whatever, really doesn't work today.

Andi:  Yeah. It's not my joke. I saw someone post it the other day. "What did you get owned by? Your toaster or your fridge?" It's so true, especially in IoT, but in a DevOps perspective, or DevOps context, being able to do that continuous security testing, I think, is really important, and bring security a shift left.

We talk about a shift left in all sorts of other areas, and we're doing it with QA which I think is awesome. We need to start doing it more with security, I believe. At Splunk, we do have a whole security practice around incident event monitoring, or unused behavior analytics. Being able to start to apply some of that in the test, and pre‑prod and staging environment I think is really important.

Being able to do some automated audit reporting around what is happening, penetrations, security violations, or passwords, or PII exposure, potential hard coded passwords, stuff like that, there's a bunch of stuff that developers could be, and should be responsible for that actually make security pro's life easier. Not harder. I think there's a lot of work yet to be done on that.

Gordon:  Absolutely. I'd go back to DevSecOps. I think there's this school of thought that, well, if you read the Phoenix Project properly, you wouldn't be having to have this discussion. Know security was baked in. Meanwhile in the real world, security has tended to be this separate profession.

We were both at DevOpsDays London. I still remember security professional I guess in his 40s, standing up in an open space, and go, "I'm one of those security guys who's been getting in your way. You know, this is the first time I've ever been to an IT conference that wasn't purely a security conference."

I love that story. Certainly not to pick on that guy. That's quite brave of him, getting up like that. I think that's such a perfect illustration of how security has operated in his own world as this gatekeeper to releasing applications.

Andi:  Yeah. People joke about IT being a department of no. Security has that moniker for fear or not. Obviously, security teams are just looking out to protect the business. That's their job. Having them in the tent, I think, is a better option, and we started to bring other teams into the tent of DevOps.

I actually gave a presentation. You can find it online at the Splunk user conference, that was titled something along the lines of "Biz PMO Dev Sec QA Biz Ops," or something crazy like that, about broadening the tent of DevOps.

Security's got to come into this tent. Being a security pro into your team, into your scrum, that's got to be a good start, doesn't it?

Gordon:  Right, even if they're not in the meeting in the stand‑up every week, or every day, at least having them be as part of the team. Just like there used to be a business analyst who's a part of the team. Our product and technologies operations, their DevOps story.

I call it the "Banana Pickle Story," because they would get asked for a banana, and as Katrinka describes it, six months later, they deliver this pickle. Really, their DevOps story...Again, the business level, because it's what matters to me.

They used a lot of technology like OpenShift, Platform Service, and Ansible for automation , things like that. But again, they were really focused on the business story of how do we get the stakeholders iterating with us. "Oops, that banana's looking a little green. Let's dial that back to yellow and get on with the other things."

Andi:  Yeah, and this is the agile model for development, is getting someone from the business...You're creating an MVP and getting someone from the business to evaluate it, and continue to iterate with their advice.

You know that you're creating the right thing as you're creating it, rather than finding out in six‑months time, that you've created a pickle, instead of banana. [laughs] I love that analogy.

We should be doing that more and more with security. If security is saying no to you all the time, then maybe you're not inviting them to the party as much as you should, so that they can say yes iteratively, rather than one big no at the end.

Gordon:  Right. Just to cap off this podcast. In order to prove to security, much less external auditors and to prove to these other stakeholders, you need data.

Andi:  Absolutely. Exactly right. This is fundamental to what I believe. We cannot continue making decisions based on "I feel that this is the right thing to do. I think we're going to have good results here." We're living in a society that's driven by data and facts. Especially as developers or IT professionals, we need to have these feedback loops based on real data.

Not just people coming back and saying "I don't feel like you did the right thing. I don't think that this was good. I think our release worked and helped our customers." We need to come back and stop having these back‑and‑forths over opinions.

There's some very crude statements about, "Everyone's got opinions," right? I like to say, "In God we trust. All others bring data." That's how we get these real feedback loop in a system's mode, getting feedback from productions systems, from customer interaction, from the security violations and the passes that we do make.

From the coverage, to know if we are doing the right thing in terms of speed, in terms of quality, in terms of impacting our business, that's where data has a huge role to play. It’s those feedback loops that DevOps depends on.

No comments: