Wednesday, May 25, 2016

Issue #4 of my newsletter is live

This issue has links to an article I recently had published on public cloud security as well as to discussions around using Ansible with docker-compose and why it's important to orchestrate containers using tools such as Kubernetes.

Links for 05-25-2016

Thursday, May 19, 2016

Data, security, and IoT at MIT Sloan CIO Symposium 2016

As always, the MIT Sloan CIO Symposium covered a lot of ground. Going back through my notes, I think it’s worth highlighting a couple sessions in particular—in addition to the IoT birds of a feather that I led at lunchtime. They all end up relating to each other through data, data security, and trust.

Big Data 2.0: Next-Gen Privacy, Security, and Analytics moderated by Sandy Pentland of the MIT Media Lab

There were two major themes in this panel.

Sandy Pentland

The first was that it’s not about the size of the data but the insights you get from it. This is perhaps an obvious point but it’s fair to say that there’s probably been too much focus on how data gets stored and processed. These are important technical questions to be sure. But they’re technical details and not the end in itself.

I might be more forgiving had I not lived through the prior data warehousing enthusiasm of the mid- to late-1990s. As I wrote five years ago: "There are many reasons that traditional data warehousing and business intelligence has been, in the main, a disappointment. However, I'd argue that one big reason is that most companies never figured out what sort of answers would lead to actionable, valuable business results. After all, while there is a kernel of truth to the oft-repeated data warehousing fable about diapers and beer sales, that data never led to any shelves being rearranged."

However, the other theme is newer—or at least amplified. And that’s ensuring the security of data and the privacy of those whose data is being stored. One idea that Sandy Pentland discussed is the idea of sharing answers (especially aggregated answers) rather than raw data. See enigma.mit.edu as an example of a system that's designed to make it possible for parties to use and maintain data without having full access to that data. Pentland also noted that because systems such as this make it possible to securely ask questions across jurisdictional boundaries, they could help address some of the often conflicting laws about the treatment of personally identifiable information.

Getting Value from IoT

At my luncheon BoF table, we had folks with a diverse set of IoT experiences including Ester Pescio and Andrea Ridi of Rulex Analytics, Nirmal Parikh of Digital Wavefront , and Ron Pepin, a consultant and former Otis Elevator CIO. The conversation kept coming back to value from data. What data can you gather? What can you learn from it? And, critically, can you do anything with that data to create business value?

Per my earlier comment about data warehouses, gathering the data is relatively straightforward. It may not be easy, especially when you’re dealing with sensors that aren’t on your own property and therefore need dedicated networks of some sort. But the problems are mostly understood. It’s “just" a case of engineering cost-effective solutions.

But what data and what questions? Ron Pepin shared his experiences from Otis. Maintenance is a big deal for elevators. It’s also the main revenue stream; the elevators themselves are often a loss leader. Yet proactive elevator maintenance mostly consists of preventative maintenance on a fixed schedule. 

Anders Brownworth, Principle Engineer Circle, on Blockchain panel

It seems like a problem tailor-made for IoT. Surely, one can measure some things and predict impending failures. But it’s not obvious what combination of events (if any) are reliable signals for needed maintenance. There’s a potential for more intelligent and efficient maintenance but this isn’t a case where you can cost effectively just instrument everything—someone else owns the building—and the right measurements aren’t obvious. Is it number of hours, number of elevator door reversals, temperature, load, particular patterns of use, something else, or none of the above?

The Blockchain

Given the level of hype around blockchain, perhaps the most interesting thing about this panel by Christian Catalini of MIT Sloan was the the lack of such hype.

Interest, yes. Catalini described how blockchain is an interesting intersection of computer science, economics & market design and law. He also argued that it can not only make things today more efficient (which could potentially redefine the boundary of firms by reducing transaction costs) but also create new types of platforms.

That said, there was considerable skepticism about how broadly applicable the technology is. Anders Brownworth of Circle (which has a peer-to-peer payment application making use of blockchain) said that the benefits of blockchain are broadly in the area of time-based transactions, with interoperability, and with many able to audit those transactions. However, with respect to private blockchains outside of finance, “we trust all the people around the table anyway” and, therefore, the audibility that’s inherent to blockchain doesn’t buy you much.

In the same vein, Simon Peffers of Intel agreed that it’s "hard to let thousands of users have the same view of data with a traditional database. But some blockchain use cases would fit with traditional database.” He added that "There is a space for smaller consortiums of organizations that know who the parties are with other requirements that can be implemented in a private blockchain. Maybe you know who everyone is but don't fully trust them."

To sum up the panel: You’re usually going to be giving up some features relative to a more traditional database if you use blockchain. If you’re not making use of blockchain features such as providing visibility to potentially untrusted users, it may not be a good fit.

Photos (from top to bottom):

Sandy Pentland, MIT Media Lab

Anders Brownworth, Principal Engineer, Circle

Tuesday, May 10, 2016

Links for 05-10-2016

My newsletter experiment

There’s a certain range of materials–curated links to comment upon, updates, and short fragments–that to me have never felt particularly comfortable as blog posts or on twitter. Tumblr never quite did it for me and I’ve little interest in shoving content into yet another walled garden anyway. I’ve been thinking about trying a newsletter for a while and, when Stephen O'Grady joined the newsletter brigade, I figured it was time to give it a run. We’ll see how it goes.

Here’s a link to the first issue: https://www.getrevue.co/profile/ghaff/archive/19505

It includes some DevOps related links and short commentary, links to a couple of new papers I’ve written on security and deploying to public clouds, and upcoming events including Red Hat Summit in San Francisco at the end of June. (Regcode INcrowd16 saves $500 on a full conference pass!)

You can also subscribe directly to this newsletter here.

The need for precise and accurate data

8266473782 fef433d94b k

Death by GPS (Ars Technica):

What happened to the Chretiens is so common in some places that it has a name. The park rangers at Death Valley National Park in California call it “death by GPS.” It describes what happens when your GPS fails you, not by being wrong, exactly, but often by being too right. It does such a good job of computing the most direct route from Point A to Point B that it takes you down roads which barely exist, or were used at one time and abandoned, or are not suitable for your car, or which require all kinds of local knowledge that would make you aware that making that turn is bad news.

It's a longish piece that's worth a read. However, it seems that a lot of these GPS horror stories--many from the US West--are as much about visitor expectations of what constitutes a "road" as anything else. It's both about the quality of the underlying data and its interpretation, things that apply to many automated systems. 

According to Hacker News commentator Doctor_Fegg:

This is clearly traceable to TIGER, the US Census data that most map providers use as the bedrock of their map data in the rural US, yet was never meant for automotive navigation.

TIGER classes pretty much any rural "road" uniformly - class A41, if you're interested. That might be a paved two-lane road, it might be a forest track. Just as often, it's a drainage ditch or a non-existent path or other such nonsense. It's wholly unreliable.

But lest you think data problems are in any way unique to electronic GPS systems, read this lengthy investigation into a 1990s Death Valley tragedy.

For what it’s worth, I did some cursory examination into what Google Maps would do if I tried to entice it into taking me on a “shortcut” through the Panamint Mountains in western Death Valley. My conclusion was that it seemed robust about not taking the bait; it kept me on relatively major roads. However, if I gave it a final destination that required taking sketchy roads to get there (e.g. driving to Skidoo), it would go ahead and map the route.)

After writing this, it occurs to me that for situations such as this, we need data that is both accurate (represents the current physical reality) and precise (describes that physical reality with sufficient precision to be able to make appropriate decisions).

Monday, May 09, 2016

Interop 2016: The New Distributed Application Infrastructure


The platform for developing and running modern workloads has changed. This new platform brings together the open source innovation being driven in containers and container packaging, in distributed resource management and orchestration, and in DevOps toolchains and processes to deploy infrastructure and management optimized for the new class of distributed application that is becoming the norm.

In this session, Red Hat's Gordon Haff discuses the key trends coming together to change IT infrastructure and the applications that will run on it. These include:

  • Container-based platforms designed for modern application development and deployment 
  • The ability to design microservices-based applications using modular and reusable parts 
  • The orchestration of distributed components 
  • Data integration with mobile and Internet-of-Things services 
  • Iterative development, testing, and deployment using Platform-as-a-Service and integrated continuous delivery systems