While Etsy Ops has made production-facing application changes, they're few but real (and sometimes quite deep). While Etsy Dev makes Chef changes, they're few but real. If there's so much overlap in responsibilities, why the difference, you might ask? Domain expertise and background. Not many Devs have deep knowledge of how TCP slow start works, but Ops does. Not many Ops have a comprehensive knowledge of sorting or relevancy algorithms, but Dev does. Ops has years of experience in forecasting resource usage quickly with acceptable accuracy, Dev doesn't. Dev might not be aware of the pros and cons of distributing workload options across all layers1-7, maybe only just at 7, Ops does. Entity-relationship modeling may come natural to a developer, it may not to ops. In the end, they both discover solutions to various forms of Byzantine failure scenarios and resilience patterns, at all tiers and layers.
As a result, Etsy doesn't have to endure a drama-filled situation (like you allude to) with arguments concerning stability, availability, risk, and shipping new features and making change, between the two groups. Why is this? Because these (sometimes differing) perspectives are heralded as important and inform each other as the two groups equally take responsibility in allowing Etsy to work as effectively and efficiently as it needs to in our market.
These differences in domain expertise turn out to be important in practice, and we have both because it's beneficial for Etsy. If it wasn't, we wouldn't have both. They constantly influence each other, and educate each other, informing the decisions we make with different and complimenting perspectives. As we continue (as Netflix does, it sounds like) to evolve our processes and tooling, it's my job (as well as the CTO and VP of Engineering) to keep this flow strong and balanced.It's oftentimes useful to coin new terms. For example, DevOps seems to speak to a legitimate breaking down of the walls between development and operations people. Even if the walls weren't always as high and impervious as the contrast suggests. It's not like developers never needed to concern themselves with issues of scale and redundancy. Nor like operations people were wholly divorced from how the code they ran came into being.
I'm not yet convinced that NoOps brings a meaningful distinction to the fore, however. (Though I'm open to being convinced.) One of my colleagues noted to me recently that, at a recent event, "the notion of the needs for different management/operational models was well received but there was a bit of pushback on the 'NoOps' term. Questions like 'What is it, magic, then?'"
I think this is where the problem lies. As I read through all of Allspaw's and Cockcroft's thoughtful posts, what I take away is that operations is changing and, yes, operational concerns are increasingly embedded in code and made a joint responsibility of a variety of groups.
In other words. NoOps means something akin to "Not Traditional Ops."
A hosted approach, such as Red Hat's OpenShift Platform-as-a-Service offering or Amazon Web Services for Infrastructure-as-a-Service, may also take certain day-to-day operational concerns out of the hands of the user of the service. But this isn't fundamentally anything new; it's just been moved up a level in the stack by cloud computing. (See this video featuring Matt Hicks describing some of the low-level features that transparently help OpenShift performance and security.)
But none of this means that operations aren't present--somewhere. It's not magic.
No comments:
Post a Comment