Private clouds and the future

2016-08-29

I used to work for EMC. I was hired into EMC to build a backup product for VMware's attempt at a cloud - vCloud. After I had time to take a close look, this was a bit awkward. Simply put - vCloud was crap. The architecture was crap. The implementation was embarrassing.

Once long ago, as a developer, I was an early VMware customer and found their products useful. A few years later, their developer products became unusable, and I switched to using VirtualBox. Naturally I did not mention this in the EMC interview.

In a larger sense, I could understand VMware choosing "enterprise" customers over developers. The enterprise customers pay more.

At EMC, I came to understand that VMware had followed the Microsoft Windows model. In the late 1990's, after the first overwhelming success of Windows, Microsoft tried to again re-invent the world ... and repeatedly failed. VMware followed the Microsoft model.

VMware as the first virtualization wave

VMware had adopted a model of regular software releases with major new features. The problem was these new releases were richly laced with bugs and obscure behaviors.

This model worked well for IBM, for rather a long time. Until lately.

This model worked for Microsoft, for a time. Not so much now.

For VMware, the model is expiring quickly. Very quickly.

I am sure there are customers who - deeply invested in VMware - will continue buying VMware products. A small fraction.

I stepped away from using VMware for a couple years (doing OpenStack work). When I stepped back, and tried to use our local VMware infrastructure, I lost a day (and more) running into a forest of VMware bugs.

Talk to folk who are using VMware in production, and you find folk (mostly) all on different paths. Once through the VMware forest of bugs, they were able to deliver value to their employers, but on different paths. Some want to preserve the value of that investment. More want less noise in their work.

In the present, VMware has piles of money and can afford to invest in the space presently needed by private clouds - network virtualization (OpenFlow) - as they will otherwise become quickly irrelevant. This investment does buy them a bit of time, but they will still become irrelevant.

VMware was at the first wave of virtualization, and did well. We are well past that.

The second wave was Amazon's AWS - which remains amazingly competitive. But there will always be use-cases that require cost/performance/control measures that public clouds cannot meet.

Amazon as the second virtualization move

Amazon built a cloud for their own purpose, and chose to sold excess capacity. They did well. Aggressive prices and smart APIs made their offering attractive. They collected rich metrics, and were able to build out their clouds in cost-effective manner.

Be fully aware of the value of metrics. The hardware you put into a cloud should depend entirely on the pattern of use you are trying to meet. You find the pattern of use through metrics. Good metrics allow you to be efficient. AWS did very well building out their clouds based on collected metrics.

But the most-efficient global case is not the most-efficient local case.

Private clouds as the third virtualization wave

Public clouds cannot overcome Physics. Local compute resources can be both cheaper and more performant than remote public clouds.

This is not easy. For local clouds you need to determine baseline loads (better local), and compute what fraction of transient load should be local or on public clouds.

To make that computation, you need metrics. The planet-scale public clouds have rich metrics. The private clouds are still in gestation, and not as focused on matrics.

A vendor whose offerings cross a universe of private clouds could develop a razor-focus on loci of use. Lacking metrics, vendors are reduced to wild guessing.

Private clouds can (and will) become optimal, but require a change in model by both customers and vendors. In the end, Physics wins.

Vendors to private clouds are still in the wild-guessing stage.

The optimal solution

Most variable load will end up on remote/public clouds. Most baseline load will end up on local/private clouds.

Note there is a huge ... gigantic wildcard here.

In theory, public cloud vendors could identify load that should be local, and deploy local hardware. So far, they have failed.

In theory, hardware vendors could identify load that should be local, but so far have failed. (Metrics are not in their DNA.)

This is the story of the next decade. Variable load will land in the public cloud. Baseline load will land in private/local clouds.

Mutating hardware

The mega-scale public clouds demand and acquire custom hardware, based on collected metrics, and predicted usage. The vendors acoss private clouds need similar metrics to drive their designs. This change in behavior is going to filter out out old-school hardware vendors. The ability to collect and act on common metrics is key.

Hardware in private clouds will eventually be driven by well-collected metrics, but this transition will take time. Our present hardware needs to be re-factored for the cloud. Once again, our present/common choices do not make sense. Time to move forward.

Private clouds can and will be built from custom hardware based on horizontal-metrics, collected across customers. Vendors and sites that fail this exercise will do poorly. Site/vendors that meet this new model will do well. The model is changing.