The Virtualization and Cloud Efficiency Myth

At the beginning of the 20th century in the US, life was difficult for all but the upper class.  While 80% of families in the US had a stay at home mother, the hours were grueling for both parents in the family.  Technology driven innovation promised to change all that by making the life of a homemaker much more efficient and easier. 


One can argue though that life didn’t become easier – while new technologies like refrigerators, vacuum cleaners, dish washers and washing machines might make life better, people seem to be working just as hard if not harder than in ages past.   Technology increased the capacity for work, but instead of increasing leisure time, that excess capacity just shifted to other tasks. Case in point:  The number of singer earner families dropped from 80% in 1900 to 24% in 1999

Is this a good thing? Who knows, but if our lives aren’t better, it certainly isn’t the technology’s fault.

What does this have to do with virtualization and cloud?

Way back in 1998 when VMware was founded, virtualization presented a similar promise of ease and efficiency.  By allowing administrators to partition up underutilized physical servers into ‘virtual’ machines, they could increase utilization and free up capital.  Unfortunately that hasn’t happened for the most part.   It’s a poorly held secret that server utilization in enterprise datacenters is much lower than most people think as virtualization reaches saturation with about 75% of x86 servers now virtualized.  


Conversation between Alex Benik, Battery Ventures and a Wall Street technologist:

Alex: Do you track server and CPU utilization?
Wall Street IT Guru: Yes
Alex: So it’s a metric you report on with other infrastructure KPIs?
Wall Street IT Guru: No way, we don’t put it in reports. If people knew how low it really is, we’d all get fired.


Cloud isn’t any better.  A cloud services provider I recently worked with found that over 70% of virtual machines customers provisioned were just turned on and left on permanently, with utilization under 20%.  Google employees published a book with similar data.  Not very ‘cloudy’ is it? 

At this point, most cloud pundits would suggest a technological solution, like stacking containers or something…

Four reasons why virtualization and cloud don’t drive significantly better utilization


Jevon’s Paradox

Jevon’s paradox holds that as a technology increases the efficiency of using a resource, the rate of consumption of the resource accelerates. Virtualization and cloud make access to resources (i.e. servers, storage, etc.) easier.  The key resource metric to think about here is from the user’s perspective: i.e. the number of workloads (not cpu cycles or GB of storage, etc.)  This leads to server sprawl, VM sprawl, storage array sprawl etc.  and hurts utilization because the increasing number of nodes make environments more complex to manage.  


Virtualization vendors and cloud service providers don’t want you to be efficient

Take a look at licensing agreements from virtualization vendors. Whether it’s a persistent or utility license per physical processor, RAM consumed, per host, per VM, per GB, it doesn’t matter – the less efficient you are the more money they make.  Sure companies like VMware, Amazon and Microsoft provide capacity management and optimization tools and they may even make them part of standard bundles, but your account team has a negative incentive for you to use them. Is that why they didn’t help with deploying the tool?  And let’s be honest, if better usability reduces revenue, how much investment do you think the vendors are putting into user experience?   Cloud is no better – if you leave all your VMs on and leave multiple copies of your data sitting around unused, does Amazon make more or less money? That’s why 3rd party software from vendors like Stratacloud and Solar Winds are important.  Beware of capacity management solutions from the hardware, virtualization, and service providers, chances are they’re bloatware unless there is a financial incentive.


IT organizations don’t reward higher utilization

Okay, maybe this has been acceptable in the past but that’s changing.  In an era of flat or declining IT budgets and migration of IT spending authority to other lines of business, spending valuable resources and time on capacity optimization has been pushed way down on the list of priorities.  While meeting budget is an important KPI, utilization is typically not. Leadership has also become leery of ROI/TCO analysis, and rightly so with IT project failure rates resulting in organizations losing an average of US$109 million for every US$1 billion spent on projects.  It’s not just about buying a tool to improve efficiency, application architectures and processes also need rework – all of this creates risk from an IT perspective.



Application architectures and processes need rework

Like in the early 20th century example above, better technology driven efficiency doesn’t necessarily help people achieve their objectives. Without improvements in processes (and organizations) better technology can lead to unintended effects (i.e. virtualization sprawl).   As organizations acquire new skills - building application architectures that take advantage of cloud services, microservices, etc. this will change over time. But the pace of change will still be governed by organizational and process change, not technology change.


Software defined ‘X’

Many have heard about software defined networking (SDN), software defined datacenters (SDDC), network functions virtualization (NFV), and so on.   At its core, these technologies are all about automation and ease of deployment.   What we’ve found so far is that for the reasons above, this greater efficiency in provisioning new environments is likely to increase entropy, not decrease it.   Only by making the needed changes in an organizations structure and processes, will that complexity be manageable.  And this type of change will be much slower in coming than the technology itself.

 Do you agree or disagree with any of the point I've made?  Let's have that discussion in the comments below.