Labworks 1:1-2 : I Heart the ARC & Let’s Pull Some Drives!

Last week on Labworks’ debut, Labworks #1 : Building a Durable & Performance Oriented ZFS Box for Hyper-V & VMware, I discussed & shared a few tips, observations & excellent resources for building out a storage layer for your home IT lab using Sun’s Oracle’s the open source community’s Illumos’ the awesome Zetabyte File System via the excellent NAS4Free crew and FreeBSD.

The post has gotten quite a bit of traffic and I hope it’s been helpful to folks. I intended to do the followup posts soon after that, but boy, have I had a tough week in technology. 

Let’s hop to it, shall we?

Labworks  1.1-2 : I Heart the ARC and Let’s Pull Some Drives!

When we left Labworks 1, I assigned myself some homework. Here’s an update on each of those tasks, the grade I’d give myself (I went to Catholic school so I’m characteristically harsh) and some notes on the status:

[table]
Next Step, Completed, Grade, Notes
Find out why my writes suck, Kind of, B-, Replaced Switch & Deep dived the ZIL

Test NAS4Free’s NFS Performance, No,F, One Pink Screen of Death too many

Test SMB 3.0 from a VM inside ZFS box, No, F, Block vs File Bakeoff plans

Sell some stuff, No, C, Other priorities

Rebuild rookie standard switch into distributed, no,F, Cant build a vSwitch without a VMware host
[/table]

I have updates on all of these items, so if you’re curious stick around as they’ll be posted in subsequent Labworks. Suffice it to say, there’s been some infrastructure changes in the Daisetta Lab, and so here’s an updated physical layout with Skull & Crossbones over my VMware host, which I put out of its misery last week.

Lab 1a - Daisetta Labs

In the meantime, I wanted to share some of the benefits of ZFS for your Hyper-V or VMware lab.

1:1 – I Heart the ARC

So I covered some of the benchmark results in Labworks 1, but I wanted to get practical this week. Graphs & benchmarks are great, but how does ZFS storage architecture benefit your virtualization lab?

Dramatically.

At least in the case of Hyper-V, Clustered Share Volumes and dynamic .vhdxs on iSCSI.

To really show how it works, I had to zero out my ARC, empty the L2ARC, and wipe the writes/reads counters to each physical volume out. And to do that, I had to reboot SAN2. My three virtual machines -a Windows 7 vm, a SQL 2014 VM, and a Virtual Machine Management server, had to be shut down, and just to do it right and by the book, I placed both the CSV & LUN mapped to Node-1 into maintenance mode.

And then I started the whole thing back up. Here are the results followed by two animated gifs. Remember, the ARC is your system RAM, so watch how it grows as ZFS starts putting VMs into RAM and L2ARC, my SSD drives:

[table]
ARC Size (Cold Boot), ARC Size after VM Boot, ARC Size +5h, L2ARC Size (Cold Boot), L2ARC Size after VM Boot, L2ARC Size + 5h

7MB, 10GB, 14GB, 900KB, 4.59GB,6.8GB

[/table]

So for you in your lab, or if you’re pondering similar tech at work, what’s this mean?

Boot speed of your VM fleet is the easiest to quantify and the greatest to behold, but that’s just for starters.

ZFS’ ARC & L2ARC shaved over 80% off my VM’s boot times and massively reduced load on rotational disks on the second boot.

Awesome stuff:

[table]
Win7 Cold Boot to Login, Highest ZVol %busy, SSD Read/Write Ops, Win7 2nd Boot to Login, Highest ZVol %busy, SSD Read/Write Ops

121s, 103%, 8/44, 19.9s, 13%, 4/100k

[/table]

The gains here are enormous and hint at the reasons why SSD & caching are so attractive. Done right, the effect is multiplicative in nature; you’re not just adding IOPS when you add an SSD, you’re multiplying storage performance by several orders of magnitude in certain scenarios. And VM boot times are such a scenario where the effect is very dramatic:

[table]

% Improvement in Boot Time,ZVol %Busy Decrease, %ARC Growth, L2ARC Growth

84%, -87%, 43%,410%

[/table]

This is great news if you’re building lab storage because, as I said in Labworks 1, if you’re going to have to use an entire physical box for storage, best to use every last bit of that box for storage, including RAM. ZFS is the only non-commercial system I know of to give you that ability, and though the investment is steep, the payoff is impressive. 

46869718Now at work, imagine you have a fleet of 50 virtual machines, or 100 or more, and you have to boot or reboot them on a Saturday during your maintenance window. Without some sort of caching mechanism, be it a ZFS ARC & its MRU/MFU algorithms, or some of the newer stuff we saw at #VFD3 including Coho’s system & Atlantis’ ILIO USX, you’re screwed.

Kiss your Saturday goodbye because on old rotational arrays, you’re going to have to stagger your boots, spread it over two Saturdays, or suffer the logarithmic curve of filer entropy & death as more IO begets more IO delay in a vicious cycle of decay that will result in you banging your fists bloody on the filer, begging the storage gods for mercy and relief.

Oh man that was a painful Saturday four years ago.

I wish I could breakdown these results even further; what percentage of that 19s boot time is due to my .vhdx being stored in SAN2’s ARC, and what percentage is due, if any, to ZFS compression on the volume or by the CPU on the IO ‘stream’ itself, as I’ve got that particular box ticked on CSV1 as well?

That’s important to understand for lab work or your real job because SSD & caching are only half of the reason why the stodgy storage sector has been turned on its head. Step back and survey the new players vs the old, and I think you’ll find that many of the new players are reading & writing data to/from their arrays in more intelligent (or risky, depending on your perspective) ways, by leveraging the CPU to compress inbound IO, or de-duping on the front-end rather than on the back-end or, in the case of a Coho, just handing over the switch & Layer 2 to the array itself in wild yet amazing & extensible ways.

My humble NAS4Free box isn’t near those levels of sophistication yet I don’t think it’s improper to draw an academic-family tree-style dotted line between my ZFS lab storage & some of the great new storage products on the market that are using sophisticated caching algorithms & compression/processing to deliver high performance storage downmarket, so downmarket that I’ve got fast storage in my garage!

Perhaps a future labworks will explore compression vs caching, but for now, let’s take a look at what ZFS is doing during the cold & warm boots of my VMs.

Single Pane O’GifGlass animated shot of the cold boot (truncated):

In the putty window, ada0-5 are HDD, ada6&7 are SSD, and ada8 is boot
In the putty window, ada0-5 are HDD, ada6&7 are SSD, and ada8 is boot. GStat de-abstracts ZFS & shows you what your disks are doing. Check out how ZFS alternates writes to the two SSDs. Neat stuff.

And the near #StorageGlory Gifcam shot of the entire 19s 2nd boot cycle after my ARC & L2ARC are sufficiently populated:

80% decrease in boot times thanks to the ARC & l2ARC. Value boner indeed.
80% decrease in boot times thanks to the ARC & L2ARC. Now ZFS has some idea of what my most frequently used & most recently used data is, and that algorithm will populate the ARC & L2ARC.

Of course, how often are we rebooting VMs anyway? Fair point.

One could argue the results above, while interesting, have limited applicability in a lab, a small enterprise or even a large one, but consider this: if you deliver applications via session virtualization technologies -XenApp or RDS come to mind- on top of a hypervisor (virtualization within virtualization for the win!), then ZFS and other caching systems will likely ease your pain and get your users to their application faster than you ever could achieve with rotational storage alone. So in my book, it’s something you should master and understand.

Durability Testing

So all this is great. ZFS performs very well for Hyper-V, the ARC/L2ARC paradigm works, and it’s all rather convincing isn’t it? I’ll save some thoughts on writes for a subsequent Labworks, but so far, things are looking up.

Of course you can’t be in IT and not worry about the durability & integrity of your data. As storage guys say, all else is plumbing; when it comes to data and storage, an array has to guarantee integrity.

This is probably most enjoyable test of all IT testing regimes, if only because it’s so physical, so dramatic, so violent, and so rare. I’m talking about drive pulls & storage failure simulations, the kind of test you only get to do when you’re engaging in a PoC at work, and then, perhaps for SMB guys like me, only once every few years.

As I put it back in January when I was testing a Nimble array at work, “Wreck that array.”

At home of course I can’t afford true n+1 everywhere, let alone waste disks on something approaching the level of reliability of RAID DP, but I can at least test RAIDZ2, ZFS’ equivalent to RAID 6.

Drive Pull test below. Will my CSVs stay online? Click play.

More Labworks results tomorrow!

4 thoughts on “Labworks 1:1-2 : I Heart the ARC & Let’s Pull Some Drives!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s