Monday, May 05, 2014

Automation and autonomy

Bmw spartanburg plant 12

I’ve been thinking and reading about autonomous systems of late—both autonomous IT systems and autonomous systems of other types such as vehicles. I also read a lot of misconceptions about automation—whether it’s in the arguments against or in misunderstanding what automation really means. I’ll be writing further on the topic but here are five points to get started. Comments welcome.

Computers are good at things that can be automated

Back in my earlier life at Data General, we were selling some of the earlier symmetrical multiprocessor (SMP) servers to large enterprises, including Wall Street. SMP introduced a new wrinkle. Where to place individual processes so that the system as a whole, with its multiple processors, ran most efficiently. One approach was to manually place them—which is precisely what a number of our big customers wanted to do; we even wrote and sold them class software to help them do so. But know what? The operating system scheduler could actually do this job pretty well in the aggregate, as all these customers eventually recognized.

There are legitimate questions about what tasks can be readily handled by computers and which can’t. With respect to self-driving cars specifically, computer AI interacts with the physical world much differently from a human. It’s fair to say that computers will be able to do many things much better than can even a good driver while handling other situations will prove very difficult to solve. With datacenter computing though, it’s clear than many tasks have to be eventually automated and exceptions should be relatively rare.

Assistance can precede automation

Yet, even when complete automation isn’t (yet) achievable, it can still be used to significantly offload how many activitie people need to do. We’re already seeing this in automobiles with technologies like adaptive cruise control, which can adjust a car’s speed to maintain a safe distance from any vehicles ahead. Such systems are mostly in luxury cars today but I expect they’ll become both more widespread and more sophisticated. And judiciously applied assistive systems can be rolled out far more incrementally than anything taking over full control.

The same is true with cloud computing. One example that I like to use is around the idea of cloudbursting—typically used to mean the dynamic movement of workloads from private to public clouds in response to an increase in demand. As I’ve written previously, this strong form of cloudbursting—much less the idea of workload movement in response to changes in public cloud spot pricing—gets into a lot of complications. However, hybrid cloud management software and operating systems that can run in different environments make it possible to move applications around as needed (e.g. to switch cloud vendors) even if the process isn’t necessarily completely autonomous and hands-off. 

Automation isn’t all or nothing

Even when hands-off automation works well and is appropriate for some tasks, it may not be used—or may be used under a more rigorous set of controls—elsewhere. With respect to self-driving cars, I can easily imagine an interim stage where they can drive autonomously on designated sections of limited access highways—and not elsewhere. For anyone who commutes on the highway or does long Interstate drives, this should be an obvious win even if its not the nirvana of a robo-Uber.

Similarly, while “automate more” should be IT’s mantra, most companies aren’t starting from scratch. It won’t always make as much sense to aggressively automate stable legacy systems as it will to automate through a new OpenStack infrastructure that’s running primarily new cloud-enabled workloads. Standardizing and automating are effective at cutting costs and reducing errors just about everywhere—but the bang for the buck will be bigger in some places than others.  

But autonomy requires a defined control handoff

The above said, the handoff between manual (even if assisted) and autonomous needs to be clearly defined. Once you hand off control, you had better trust the autonomous system to do the right thing (within whatever margin of error you deem acceptable). You can’t wrest back control on the fly; it’s probably too late.

In so many autonomous car discussions, I hear statements to the effect of: “If there’s an emergency, the driver can just take over.” Well, actually he can’t. He’s playing a game on his iPad and he probably needs a good 30 seconds to evaluate the situation and take any corrective action. OK for some situations, not for others. If the car’s in control, it has to deal with things itself—at least anything urgent.

With complex distributed IT systems, as increasingly characterize cloud environments, it’s certainly important to understand what’s going on. But events happen and cascade at incredibly short time scales by human standards. Check out this presentation by Adrian Cockroft of Battery Ventures in which he talks about some of the challenges associated with monitoring of large-scale architectures.   

Autonomy can require new approaches/workflows

Finally, the best way to automate is likely not to just automate the old thing, certainly not if the old thing is a mess. A clean sheet approach may be constrained by coexisting with what’s already in place to be sure. The infrastructure that we’d build for 100% self-driving cars is much different than what we would build (and have built) for a 100% human one. However, even given a mixed environment, I suspect that over time we’ll add some infrastructure to help autonomous cars do things that they’d have trouble doing otherwise. 

In the case of IT, we’re seeing new classes of tools oriented to large-scale cloud workloads and DevOps processes. One big thing about these tools from those of the past is that they’re mostly open source. Donnie Berkholz of RedMonk discusses some of them in OpenDevOps: Transparency and open source in the modern era. These include configuration management like Puppet and Chef as well as monitoring and analysis tools like Nagios and Splunk. DevOps itself, whatever your precise definition, is very much tied into the idea that much of the manual, routine ops work of the traditional system admin is increasingly automated. This is the only thing enabling a developer to take over so many ops tasks.  

Automation done right is a huge positive. But we need to understand what it is, how to use it, and how to interact with it. 

[Photo credit: BMW. BMW Spartansburg SC assembly plant.]



No comments: