Monday, December 16, 2013

Relearning past lessons about databases (& etc.)

There's an interesting interview with database guru Michael Stonebraker over at GigaOm. One of his points in particular caught my attention.
“My prediction is that NoSQL will come to mean not yet SQL,” he said.”… “Cassandra and Mongo have both announced what looks like, unless you squint, a high-level language that’s basically SQL.”
The perceived value of a purely low-level language all but gone, Stonebraker thinks NoSQL systems will also come to embrace ACID capabilities. It might already be happening.
“I think the biggest NoSQL proponent of non-ACID has been historically a guy named Jeff Dean at Google, who’s responsible for, essentially, most to all of their database offerings. And he recently … wrote a system called Spanner,” Stonebraker explained. “Spanner is a pure ACID system. So Google is moving to ACID and I think the NoSQL market will move away from eventual consistency and toward ACID.”
I suppose every new technology generation has to relearn the lessons of the prior one. I took a data science course earlier this year which, among other topics, spent some time going over NoSQL and "NewSQL" database technology. One of the clear trends was that a lot of the supposed baggage, such as ACID, that was ripped out of databases in service of performance and simplicity is now starting to get added back in many cases.
In Map-Reduce land, there are analogous trends. For example, Apache Pig "is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets."
I guess it wasn't such a bad idea after all to build a lot of the optimization and parallelization in at the DBMS layer after all rather than forcing application programmers to handle it. On a side note, as someone who was following processor tech quite closely when multi-core hit the scene, I suspect that one of the reasons the "parallel programming problem" didn't develop into as big a problem as some thought it would be is that databases and other middleware (to use the term broadly) largely abstract parallelization.
I see this relearning of past lessons pervasively through cloud computing more broadly. Although, perhaps, reimagining is a better term. When we see the pervasive use of RESTful APIs, we're not really seeing SOA 2.0, although that makes a convenient shorthand. We are seeing a services-centric approach to delivering IT services. But it's a services-oriented approach that's much lighter weight and doesn't carry nearly the same amount of baggage as, say, a mid-nineties SOAP-based implementation. It's useful to understand why we did things or tried to do them in the past. It's also useful to understand why they might have been suboptimal or even ultimately failed--and why the environment (whether tech, ecosystem, or need) might be different now.
Post a Comment