Free disk space > 15% = Wasted money

Your enterprise’s mileage may vary, but in every place I’ve ever worked at, I’ve taken a pretty dogmatic approach to disk space utilization on VMs, especially ones hosting specialty workloads, such as Engineering or financial applications.

And that dogma is: No workload is special enough that it needs greater than 15% free disk space on its attached volume, non-boot volume. 

This causes no end of consternation and panic among technicians who deploy & support software products.

“Don’t fence me in!” they shout via email. “Oh, give me space lots of space on your stack of spindles, don’t fence me in. Let me write my .isos and .baks till the free space dwindles! Please, don’t fence me in,” they cry.

“I can’t stand your fences. Let my IO wonder over yonder,” they bleat, escalating now, to their manager and mine.

Look, I get it. Seeing that the D: drive is down to 18% free space makes such techs feel a bit claustrophobic. And I mean no disrespect to my IT colleagues who deploy/support these applications. I know they are finicky, moody things, usually ported from a *nix world into Windows. I get it. You are, in a sense, advocating for your customer (the Engineering department, or Finance) and you think I’m getting in your way, making your job harder and your deployment less than optimal.

But from my seat, if you’ve got more than 15% free space on your attached volume in production, you’re wasting my business’ money. I know disk space is cheap, but if I gave all the specialty software vendors what they asked for when deploying their product in my stack, my enterprise would :

  • Still have a bunch of physical servers doing one workload, consuming electricity and generating heat instead of being hyper-rationalized on a few powerful hosts
  • Lots of wasted RAM & disk resources. 400GB free on this one, 500GB free on that one, and pretty soon we’re talking about real storage space

One of the great things about the success of virtualization is that it killed off the sacred cows in your 42U rack. It gave us in the Infrastructure side of the house the ability to economize, to study the inputs to our stack and adjust the outputs based not on what the vendor wanted, or even what us in IT wanted, but on what the business required of us.

And so, as we enter an age in which virtualization is the standard (indeed, some would argue we passed that mark a year or two ago), we’ve seen various software vendors remove the “must be physical server” requirement from their product literature. Which is a great thing cause I got tired of fighting that battle.

But they still ask for too much space. If you need more than 15% free on any of the attached, MPIO-based, highly-available, high performing LUNs I’ve given you, you didn’t plan something correctly. Here’s a hint: in modern IT, discrimination is not only allowed, but encouraged. I’m not going to provision you space on the best disk I have for backups, for instance. That workload will get a secondary LUN on my slow array!

Author: Jeff Wilson

20 yr Enterprise IT Pro | Master of Public Admin | BA in History | GSEC #42816 | Blogging on technology & trust topics at our workplaces, at our homes, and the spaces in between.

2 thoughts on “Free disk space > 15% = Wasted money”

  1. Hi JR-

    Some good points. Guess I did come across as the IT obstructionist that I work hard to avoid. But a few responses:

    Unless you can accurately predict the usage patterns and how disk is used by each of your applications (and you can’t because you’re not the application specialist)

    Actually in my role I am the application owner too. Not an expert in it, but I own it, deployed it, it’s in my charter if you will. I recognize that’s not usual but sometimes you have to wear storage guy hat & application guy hat.

    Not only is disk space something that can’t be accurately predicted, but once a filesystem gets to over a certain percentage full, fragmentation can have a negative effect on performance. For example, with ZFS, the recommendation is to not go over 80%.

    I’m talking about a LUN mapped through to a Windows virtual machine. The LUN itself is parked inside a large volume, next to several other LUNs with specific snapshot & dedupe rules applied to it. The volume never gets to 85% committed, nor does the agg containing it. ZFS or a NetApp would have no visibility into that LUN, and its used/free capacity only matters to the aggregate insofar as the sum of all LUNs & volumes = % committed in the agg. Right?

    A better approach is to a) charge for disk allocated, and b) use thin provisioning.

    Agreed on point A, but that supposes solid IT governance and business buy-in. But yes I love the charge-back idea.

    “Using thin provisioning allows us to allocate the extra disk without actually consuming the disk. Customer is happy because they aren’t running out of space, SAN admin is happy because disk isn’t unnecessarily wasted.”

    A thin provisioned LUN mapped through to a 2008 Windows server will format and show the maximum LUN size. So if your LUN is thin provisioned 100G, the Windows server will, unless you manually partition it differently, show 100G. And thin provisioning on the LUN side can cause more headaches too (see your point on fragmentation, NTFS will write stuff where it wants to write). We mostly use thin provisioning on a volume level with auto grow policies on LUN & volume but the customer, of course, can’t see that.

    Server 2012 is more attuned to living as a virtual machine with raw disks mapped to it, but sadly I don’t have that everywhere yet.

    There is a reason why your users are escalating this to their managers and your manager and it’s not unjustified. Sorry.

    My larger point still applies though. In this case, 10% of the LUN was being used for the storage of static, old backups.

    Had there been a requirement at the outset for attached disk for the purpose of housing static backups, I would have made it.

    IF the business puts you in charge of managing a resource, you have to manage it. This would be even more problematic if the resource was really scarce and/or precious, wouldn’t it? Say you’ve got 4TB of ultra-fast SLC SSD storage, and 200TB of old SAS 10k spinners. You provision 750G of your SSD storage for a Priority A workload, only to find that six months later, 10-20% of it it’s being used to store backups, or something trivial -related to the workload, but trivial-, something that is better stored elsewhere.

    Isn’t that waste?

    Like

  2. Sorry, I can’t agree with this approach.

    Your users may be able to tell you ahead of time how much disk they need, but mine can’t (and they’re responsible for developing the application(!)).

    I’ve personally witnessed a situation in the last week where one user (a customer) decided to run a report on a particularly large SQL Server dataset which had the side effect of gobbling up all the disk space for tempdb operations. Unless you can accurately predict the usage patterns and how disk is used by each of your applications (and you can’t because you’re not the application specialist), then setting a hard limit of 15% free is ludicrous.

    Not only is disk space something that can’t be accurately predicted, but once a filesystem gets to over a certain percentage full, fragmentation can have a negative effect on performance. For example, with ZFS, the recommendation is to not go over 80%.

    Yes, disk costs money, but so does downtime due to disk full errors and the need to fix it because “the SAN guy wouldn’t give us enough space”. All that does it make IT (or the storage team) come across as a bunch of know-it-all obstructionists.

    A better approach is to a) charge for disk allocated, and b) use thin provisioning.

    Charging for disk allocation is a way to make the business change it’s attitude to disk because while it’s cheap, it’s no longer free.

    Using thin provisioning allows us to allocate the extra disk without actually consuming the disk. Customer is happy because they aren’t running out of space, SAN admin is happy because disk isn’t unnecessarily wasted.

    There is a reason why your users are escalating this to their managers and your manager and it’s not unjustified. Sorry.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: