- Thanks, NSA, you're killing the cloud | Cloud Computing - InfoWorld - "Personally, I don't see much of a connection between the NSA and cloud computing, but those on the fence regarding cloud computing will cite this as another reason to kick the can further down the road. Thanks for nothing, NSA."
- Prepare for change! This is not your father’s database industry — Tech News and Analysis - This this overstates Hadoop vis-a-vis other options but still a good read.
- 50 Top Sources Of Free eLearning Courses
- Google Maps Help Find City Quizzes, Dylan Trivia, Garage Sales - Games with Google Maps
- 18 Useful Internet Marketing Statistics that You Can't Ignore - RT @samanthastone: This just in...nurtured leads make 47% larger purchases than non-nurtured leads. That stat and more can be found at
- Photos: MIT Class of 2017 Hacks Harvard Class of 2017 Website | BostInno - MIT’s incoming class hacked the Harvard 2017 website Sunday night by replacing all the pictures with Mitt Romney’s mug, according to a Reddit thread.
Thursday, June 27, 2013
Because it mirrors a point I often feel compelled to make when discussing security and cloud computing, I wanted to highlight a couple of paragraphs from Cloud Computing: Assessing the Risks by Jared Carstensen, Bernard Golden and J.P. Morgenthal as excerpted in Tech Target.
It's important to understand that this risk limitation [whereby service providers shift the primary responsibility for risk to consumers] is not unique to Cloud Computing. Outsource providers (e.g. firms that take over operating a company's IT data centre) also limit their financial responsibility in the event of an outage. Therefore, it is important not to regard this risk limitation as a complete restriction on using a Cloud provider, unless, that is, a company regards any risk limitation by a service provider as unacceptable. In that case, the company should continue to operate its own computing environment and forego use of an external Cloud provider.
The important point from this discussion is that when Cloud Computing security is raised as an issue, other issues are often being addressed. It's important to distinguish what type of issue is of concern, as that will change the method of evaluating the issue, the demarcation of the trust boundary and the appropriate actions to be taken by the Cloud user.
One of the reasons why I think this point is important is that discussing overall IT governance discussions solely in terms of security (whether we're talking public clouds, private clouds, or--increasingly--some manner of hybrid IT) is far too narrow a framing. This narrow framing, in turn, often leads to thinking about the issue in narrow technical terms such as multi-tenant security features, encryption and key management, and physical facility protection.
These are important matters certainly. But they're also matters that public cloud providers (like other types of outsources) can reasonably argue they have well in-hand using well-established procedures and processes. The more difficult answers about where workloads should run come down to broader questions--and those answers may well change over time.
(I covered some of these broader issues in a presentation at the Red Hat Summit in June. I'm hoping to get a version of that presentation Beyond Safety: Controlling Clouds posted over the next month or so.)
Wednesday, June 26, 2013
- Has the Time Arrived for Cloud Insurance? - Leverhawk
- Coursera Fantasy: Teacher Authority and Student Initiative in a MOOC
- Coursera Fantasy: Peer Feedback: The Good, the Bad and the Ugly :-)
- udacity and coursera – python editor, peer reviewing and complaining | Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky
- The Problems with Peer Grading in Coursera | Inside Higher Ed
- How the Design of Soda Cans Have Changed Over Time
- 66 job interview questions for data scientists - Data Science Central
- OpenShift Origin Community Day (Boston) Deep Dive into Cartridges with Jhon Honce - YouTube - RT @pythondj: Just posted: @OpenShift Origin Community Day (Boston) "Deep Dive into Cartridges" by Jhon Honce via @redhat
- What were the most ridiculous startup ideas that eventually became successful? - Quora - RT @tsimonite: Anti-pitches for tech startups: "Google: World's 20th search engine"; "iphone: It won't have cut and paste"
- From Red Shoes To Red Hat | Rishidot Research - RT @krishnan: I am joining Red Hat in July as Director, OpenShift Strategy
Monday, June 24, 2013
- David Simon | We are shocked, shocked…
- Aho/Ullman Foundations of Computer Science
- Heirs of Infocom: Where interactive fiction authors and games stand today | Ars Technica
- You Don't Have To Like Edward Snowden - “@dbfarber: You Don’t Have To Like Edward Snowden via @buzzfeedpol” << Smart piece.
- Cloud Platform Blog: Enabling Google App Engine to run in the Private Cloud with CapeDwarf - RT @DanJuengst: Code on! Google and Red Hat collaborating to help folks run GAE apps on #OpenShift PaaS and JBoss. #RedHat #gae
- Photo by ghaff • Instagram - Bloody Mary time at SFO United Club.
- ThinkGeek :: Tac Bac - Tactical Canned Bacon - RT @mccrory: Greatest invention EVER! Tactical #Bacon !!! Yes, that's not a mistake, TACTICAL BACON!
- PKN Gibsons #6 Mueller - YouTube - RT @pythondj: Sweet! my @discovertotems geospatial app presentation from @pechakucha Is up on YouTube hosted @openshift
- Photo by pythondj • Instagram - RT @pythondj: Rockin @Mongodb in the @openshift booth @ #mongonyc today with fotois @ New York Marriott Marquis
- PayPal now builds products on its internal PaaS — Tech News and Analysis - RT @krishnan: PayPal now builds products on its internal PaaS by @gigaom <- Anyone still thinking PaaS is for hobbyists? :-)
- Twitter / adrianco: Just shared a photo #throughglass ... - RT @adrianco: Just shared a photo #throughglass
- John McAfee releases NSFW video on how to uninstall security code • The Register - RT @valleyhack: This McAfee dude is turning drug use into performance art
- NVD3.js :: re-usable charts for d3.js
- Vega: A Visualization Grammar
- REST web services with Python, MongoDB, and Spatial data in the Cloud - Part 2 | OpenShift by Red Hat
- Generalized Set of Color Schemes
Friday, June 21, 2013
I like GigaOm Structure. I find it gives a good sense of the current zeitgeist in cloud computing and related areas. What's being talked about and what isn't? What new or reimagined techs are emerging as memes?
GigaOm's own writers (among others) covered the event in considerable depth and I won't attempt to recreate their reporting here. Rather, I wanted to hit on some general themes I noticed and a few points that particularly struck me. So with no further and in no particular order, here we go.
OpenStack was omnipresent. Other on-premise IaaS? Not so much.
Mind you. There was still a bit of commentary about how OpenStack was still in relatively early days. Maybe. Ryan Granard of PayPal told the audience that his company runs 20 percent of its production infrastructure on OpenStack. As readers probably know, there was much ado about PayPal's adoption of OpenStack--they were and are heavy VMware users--a while back. One of Grandard's points though was that PayPal has a strategy of deliberately making several bets as a way of getting velocity while still having a robust infrastructure.
Notably absent from this Structure was the once ubiquitous Marten Mickos of Eucalyptus. Nor did I hear much mention of CloudStack--though Citrix did have a sponsor workshop, which I didn't attend.
Speaking of PayPal. A nice endorsement for PaaS and OpenShift.
Granard also articulated, as well as I've heard it from anyone, why PaaS is such a big deal for organizations. As reported by GigaOm's Jordan Novet:
Companies big and little have been jumping aboard the concept of on-premise PaaS, to some degree because security, regulatory compliance and cloud vendor lock-in fears remain part of the conversation about running on public infrastructure.
How is PayPal going about this? It’s been running Red Hat’s OpenShift on-premise PaaS to build out products such as PayPal Here — the company’s answer to Square — as well as a developer sandbox.
With that tool, Granard said, a developer chooses a product to work on “and in minutes, we have you up and running in a fully connected container” with infrastructure resources immediately allocated.
The real money quote for me though was that PaaS lets PayPal "enable developers and get out of the way."
x86 vs. ARM. Come back next year.
By which I mean that there were a few threads on this topic. Especially in the vein of whether ARM will make a meaningful dent in the server world. But no clear resolution.
To the degree that there was something of a consensus among folks I spoke with, it largely parallels my opinion and goes something like the following: x86 is the clear incumbent on the server. ARM is the clear incumbent in new-style mobile (tablets, cell phones, etc.). There's considerable inertia to that default condition for reasons of ecosystem and other things. For either architecture to make a major dent (narrow use cases aside) outside of its home base will require it to develop a 10x advantage--which most people don't think is going to happen.
What's that SDN stuff anyway?
There was some skepticism. For example, Arne Josefsberg CTO of ServiceNow said that “The conversations today sound almost exactly like the conversations we had three to four years ago." Indeed, the "hot or not" panel he sat on declared SDN a loser technology.
But that seemed to be a minority opinion. Session after session returned to the idea that the networking component of infrastructures need the same sort of rewiring in software that you can do with compute and storage if the whole dynamic IT process is going to be realized. While I think it's fair to observe that there were still a lot of open questions about how we're going to get there (and what exactly it will look like), the consensus was squarely behind SDN--at least as a concept.
Private, private, hybrid
This topic really deserves a separate post, especially given that I was on a panel about private clouds at the ODCA Forecast event preceding structure. Suffice it to say that it's a complicated topic for a variety of reasons:
- Depending upon the specific requirements, there are strong economic reasons to choose private over public or vice versa.
- Different organizations have strong pre-dispositions for in-sourcing vs. out-sourcing
- Existing applications can't be ignored
- Regulation is a factor that may or may not be "fixed" (from the perspective of public clouds.
The bottom line is that there are plenty of arguments and cherry-picked examples available to bolster "your" side. That said, there was widespread agreement that, for at least the next n years (where n is much less agreed upon), the cloud world will be hybrid.
Here are links to some of the posts and other topics covered in the podcast:
Spatial MongoDB in OpenShift, be the next FourSquare - Part 1 | OpenShift by Red Hat
REST web services with Python, MongoDB, and Spatial data in the Cloud - Part 2 | OpenShift
Shameless plug: I'll be on a panel with GigaOm analysts talking about PaaS Tuesday, June 25: Red Hat-Flexibility with PaaS: How to Keep Your Options Open — GigaOM Pro.
Listen to MP3 (0:17:57)
Listen to OGG (0:17:57)
Monday, June 17, 2013
- Layman's Introduction to Random Forests - Edwin Chen's Blog
- Red Hat Summit: Open source trends, cloud outlook, innovation and more
- High-Performance Blending: Nutrition
- Lawsuit Filed To Prove Happy Birthday Is In The Public Domain; Demands Warner Pay Back Millions Of License Fees | Techdirt - RT @webmink: Lawsuit Filed To Prove Happy Birthday Is In The Public Domain; Demands Warner Pay Back Millions Of License Fees:
- Spatial MongoDB in OpenShift, be the next FourSquare - Part 1 | OpenShift by Red Hat
- Using Node.JS, MongoDB, Express for your spatial Web Service - and it's Free! | OpenShift by Red Hat
- Twitter / fbijlsma: @ghaff a lot of demand for ... - “@fbijlsma: @ghaff a lot of demand for Gordons book ” << We have a few copies left at #redhat booth
- Red Hat opens OpenShift PaaS cloud for business | ZDNet - RT @sjvn: Red Hat opens OpenShift PaaS cloud for business #RedHat #Linux #cloud by @sjvn
- Red Hat | Red Hat Unveils Fully-Supported Public PaaS Offering, OpenShift Online - OpenShift Online PaaS now generally available w paid support. (Free tier still available too.)
- OpenShift Origin Community Day (Boston) Writing Cartridges V2 by Jh... - RT @pythondj: Slides are up from @OpenShift Origin Community Day (Boston) Check Out: Writing Cartridges @slideshare #cloud #paas
Listen to MP3 (0:24:09)
Listen to OGG (0:24:09)
Wednesday, June 05, 2013
Data silos and integration challenges, more than security, are the biggest barriers to cloud adoption. Siloed data in cloud apps and data centers is costing companies millions annually due to inconsistency, inaccuracy and inefficiency across the business. And the enterprise software market is crossing the threshold of another transformation, now that cloud computing has shifted the center of gravity for data.
The MIT Media Lab's Sandy Pentland noted something similar at the MIT Sloan CIO Symposium a few weeks back when he said that "About 20% of big data is getting the data out of the silos and transforming it."
That's why at Red Hat we're so interested in the idea of open hybrid cloud--which includes open hybrid storage. (Starting with Red Hat Storage based on GlusterFS.) This isn't to minimize the important, and really difficult, role of organizational change in breaking down the silos. But there's at least technology available to help break down silos rather than aid in their creation.
- As Data Floods In, Massive Open Online Courses Evolve | MIT Technology Review - "It’s unclear whether the laundry lists of refinements that result from A/B testing will add up to a grand theory of learning and teaching that challenges tradition. Ng says he doesn’t think a grand theory is needed for MOOCs to succeed. “I read Piaget and Montessori, and they both seem compelling, but educators generally have no way to choose what really works,” he says. “Today, education is an anecdotal science, but I think we can turn education into a data-driven science, where you do what you know works.”"
- A comparison of approaches to large-scale data analysis
- New Government Documents Show the Sean Parker Wedding Is the Perfect Parable for Silicon Valley Excess - Alexis C. Madrigal - The Atlantic - Just... wow.
- Refactoring Coursera | Mike Caulfield
- The Ed Techie: You can stop worrying about MOOCs now
- MOOC as Courseware: Coursera's Big Announcement in Context |e-Literate
- Coursera Jumps the Shark | HESA
- Business as usual | Music for Deckchairs
- Economic Effects of State Bans on Direct Manufacturer Sales to Car Buyers
- Zipf, Power-law, Pareto - a ranking tutorial
- Your favorite tech companies/products get sassy new slogans courtesy of Reddit thread
- Why Big Data Is Not Truth - NYTimes.com
- The Lorenz Cipher and how Bletchley Park broke it
Data is hugely important. Long has been of course. It's often argued that applications are the longest-lived IT asset. Arguably the data that those long-lived applications create and access sticks around for at least as long. Data is only going to become more important.
Is there a lot of hype around data today? Sure. Raw data isn't information. And information doesn't necessarily lead to useful action if organizations (or their customers and users) aren't willing to change behavior based on new information. Just because pricing mechanisms can be used to reduce traffic congestion based on sensor data--just one of the ideas discussed under the "Smart Cities" term--doesn't mean that even basic congestion pricing is necessarily politically viable.
Furthermore, I strongly suspect that the lots of data hammer will turn out to be a rather unsuitable tool for certain (perhaps many) classes of problems--the enthusiasms of the "End of Theory" crowd notwithstanding. For example, it's unclear to what degree more data will really help companies design better products or target their ads better. (To be clear, there's long been lots of market research in the consumer space. It's just that it hasn't been terribly effective in creating the killer ad or the killer product.)
But it would be a mistake to think that today's data trends are 1990s-style Data Warehousing with just a fresh coat of paint. Whatever the jokes about the local weatherman, weather forecasting has improved. Self-driving cars will happen, though they may take a while to come into mainstream use. DNA sequencing is now commonplace--although, in a common theme, we're still in the early days of figuring out what we can (and should) do with the information obtained. And we're well on the way to sensors of all sorts becoming pervasive.
Which makes the "Big Data" term somewhat unfortunate in my view. I realize that may seem a bit of a contradiction given what I wrote above. Let me explain.
My first problem is that "Big Data" is too narrow. This is true even if we use the term in the broader sense of data that is atypical in some respect--not necessarily in its volume. The four Vs is a common shorthand. (For a less precise, but possibly more accurate description, I like the "Difficult Data" term that I heard from the University of Washington's Bill Howe.)
But an emerging culture of data doesn't have to be about big or even difficult. Discussions about data at the MIT Sloan CIO Symposium last month included big data examples, but it was also, in no small part, about cultures of data and the breaking down of silos. Just as with IT broadly and cloud computing, data and storage have to increasingly be based on a hybrid model in which data can be accessed when and where it is needed and not locked up in one department or even one organization. Governments and others are increasingly making open data available as well.
It's also worth remembering Nate Silver made headlines for calling the last US presidential election correctly, not because he did big data stuff or even because he applied particularly innovative or sophisticated analysis to polling data, but mostly because he used data and not his gut.
The second issue I have with "Big Data" isn't really that term's fault. Rather, it's that "Big Data," today, is so frequently conflated with Hadoop.
Based on Google's MapReduce concept, Hadoop divides data into many small chunks, each of which may be executed or re-executed on any node in a cluster of servers. Per Wkipedia: "A MapReduce program comprises a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies)."
Hadoop also provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. (The standard file system is HDFS, but other filesystems, such as Gluster, can be substituted for higher scalability or other desirable characteristics.
Hadoop is often a useful tool. If you can split up data sets and work on them with some degree of autonomy, Hadoop workloads can scale very large. It also allows data to be operated in-situ without being loaded and transformed into a database, which can greatly decrease overhead for certain types of jobs. (This presentation by Sam Madden at MIT CSAIL offers some benchmarks as well as some pros and cons of Hadoop relative to RDBMS systems.)
However, data can be processed and analyzed using a wide variety of tools, including NoSQL databases of various kinds, "NewSQL" databases, and even traditional RDBMs like PostgreSQL (which can still scale sufficiently to handle a great many types of data analysis and transformation tasks). In fact, we even see something of a trend with some of the new-style databases adding back in traditional RDBMS features that had been assumed to be unnecessary.
Even high volume data doesn't begin and end with Hadoop. As Dan Woods writes for CITO Research: "The Obama campaign did have Hadoop running in the background, doing the noble work of aggregating huge amounts of data, but the biggest win came from good old SQL on a Vertica data warehouse and from providing access to data to dozens of analytics staffers who could follow their own curiosity and distill and analyze data as they needed."
Hadoop is an important tool in the kit when the amount of data is large. But there are lots of other options for that kit bad too. And never forget that it's not just about the bigness of the data but whether you share it with those who need it and whether you do anything with it.
Monday, June 03, 2013
- A valley of ashes by Emily Esfahani Smith - The New Criterion - "In The Living Moment: Modernism in a Broken World, the literary critic Jeffrey Hart traces the efforts of a small but influential group of poets and novelists who sought to create a new cultural order following the chaotic aftermath of World War I. Their efforts came together in a new movement whose legacy is still with us today—literary modernism. The cultural fallout of the war—its devastation—was immense. The traditional order of nineteenth-century Europe had been blown to bits. “The First World War inaugurated the manufacture of mass death that the Second brought to a pitiless consummation,” in the words of the historian John Keegan."
- (3) Fraser Cain - Google+ - Tips and Tricks for Hangouts on Air updated Sept. 13, 2012 …
- The Flawed Ambitions of Better Place | MIT Technology Review - “@mlamonica: Flawed ambitions - how EV battery-swapping startup Better Place misjudged consumers and automakers #EVs”