Labworks #1: Building a durable, performance-oriented ZFS box for Hyper-V, VMware
Primary Goal: To build a durable and performance-oriented storage array using Sun’s fantastic, 128 bit, high-integrity Zetabyte File System for use with Lab Hyper-V CSVs & Windows clusters, VMware ESXi 5.5, other hypervisors,
The ARC: My RAM makes your SSD look like a couplel of old, wheezing 15k drives
Secondary Goal: Leverage consumer-grade SSDs to increase/multiply performance by using them as ZFS Intent Log (ZIL) write-cache and L2ARC read cache
Bonus: The Windows 7 PC in the living room that’s running Windows Media Center with CableCARD & HD Home Run was running out of DVR disk space and can’t record to SMB shares but can record to iSCSI LUNs.
Technologies used: iSCSI, MPIO, LACP, Jumbo Frames, IOMETER, SQLIO, ATTO, Robocopy, CrystalDiskMark, FreeBSD, NAS4Free, Windows Server 2012 R2, Hyper-V 3.0, Converged switch, VMware, standard switch, Cisco SG300
I picked the Gigabyte board above because it’s got an outstanding eight SATA 6Gbit ports, all running on the native AMD A88x Bolton-D4 chipset, which, it turns out, isn’t supported well in Illumos (see Lab Notes below).
I added to that a cheap $20 Marve 9128se two port SATA 6gbit PCIe card, which hosts the boot volume & the SanDisk SSD.
[table]
Disk Type, Quantity, Size, Format, Speed, Function
WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Zpool Members
I’m not finished with all the benchmarking, which is notoriously difficult to get right, but here’s a taste. Expect a followup soon.
All shots below involved lzp2 compression on SAN2
SQLIO Short Test:
Obviously seeing the benefit of ZFS compression & ARC at the front end. IOPS become more realistic toward the middle and right as read cache is exhausted. Consistently in around 150MB-240Mb/s though, the limit of two 1GbE cables.
ATTO standard run:
I’ve got a big write problem somewhere. Is it the ZIL, which don’t seem to be performing under BSD as they did under Nexenta? Something else? Could also be related to the Test Volume being formatted NTFS 64kb. Still trying to figure it out
NFS Tests:
None so far. From a VMware perspective, I want to rebuild the Standard switch as a distributed switch now that I’ve got a VCenter appliance running. But that’s not my priority at the moment.
Durability Tests:
Pulled two drives -the limit on RAIDZ2- under normal conditions. Put them back in, saw some alerts about the “administrator pulling drives” and the Zpool being in a degraded state. My CSVs remained online, however. Following a short zpool online command, both drives rejoined the pool and the degraded error went away.
Fun shots:
Because it’s not all about repeatable lab experiments. Here’s a Gifcam shot from Node-1 as it completely saturates both 2x1GbE Intel NICs:
and some pretty blinking lights from the six 2.5″ drives:
Lab notes & Lessons Learned:
First off, I’d like to buy a beer for the unknown technology enthusiast/lab guy who uttered these sage words of wisdom, which I failed to heed:
You buy cheap, you buy twice
Listen to that man, would you? Because going consumer, while tempting, is not smart. Learn from my mistakes: if you have to buy, buy server boards.
Secondly, I prefer NexentaStor to NAS4Free with ZFS, but like others, I worry about and have been stung by Open Solaris/Illumos hardware support. Most of that is my own fault, cf the note above, but still: does Illumos have a future? I’m hopeful, NextentaStor is going to appear at next month’s Storage Field Day 5, so that’s a good sign, and version 4.0 is due out anytime.
The Illumos/Nexenta command structure is much more intuitive to me than FreeBSD. In place of your favorite *nix commands, Nexenta employs some great, verb-noun show commands, and dtrace, the excellent diagnostic/performance tool included in Solaris is baked right into Nexenta. In NAS4Free/FreeBSD 9.1, you’ve got to add a few packages to get the equivalent stats for the ARC, L2ARC and ZFS, and adding dtrace involves a make & kernel modification, something I haven’t been brave enough to try yet.
Next: Jumbo Frames for the win. From Node-1, the desktop in my office, my Core i5-4670k CPU would regularly hit 35-50% utilization during my standard SQLIO benchmark before I configured jumbo frames from end-to-end. Now, after enabling Jumbo frames on the Intel NICs, the Hyper-V converged switch, the SG-300 and the ZFS box, utilization peaks at 15-20% during the same SQLIO test, and the benchmarks have show an increase as well. Unfortunately in FreeBSD world, adding jumbo frames is something you have to do on the interface & routing table, and it doesn’t persist across reboots for me, though that may be due to a driver issue on the Broadcom card.
The Western Digital 2.5″ drives aren’t stellar performers and they aren’t cheap, but boy are they quiet, well-built, and run cool, asking politely for only 1 watt under load. I’ve returned the hot, loud & failure prone HGST 3.5″ 2 TB drives I borrowed from work; it’s too hard to put them in a chassis that’s short-depth.
Lastly, ZFS’ adaptive replacement cache, which I’ve enthused over a lot in recent weeks, is quite the value & performance-multiplier. I’ve tested Windows Server 2012 R2 Storage Appliance’s tiered storage model, and while I was impressed with it’s responsiveness, ReFS, and ability to pool storage in interesting ways, nothing can compete with ZFS’ ARC model. It’s simply awesome; deceptively-simple, but awesome.
Lesson is that if you’re going to lose an entire box to storage in your lab, your chosen storage system better use every last ounce of that box, including its RAM, to serve storage up to you. 2012 R2 doesn’t, but I’m hopeful soon that it may (Update 1 perhaps?)
Here’s a cool screenshot from Nexenta, my last build before I re-did everything, showing ARC-hits following a cold boot of the array (top), and a few days later, when things are really cooking for my Hyper-V VMs stored, which are getting tagged with ZFS’ “Most Frequently Used” category and thus getting the benefit of fast RAM & L2ARC:
Next Steps:
Find out why my writes suck so bad.
Test Nas4Free’s NFS performance
Test SMB 3.0 from a virtual machine inside the ZFS box
Sell some stuff so I can buy a proper SLC SSD drive for the ZIL
Re-build the rookie Standard Switch into a true Distributed Switch in ESXi
Links/Knowledge/Required Reading Used in this Post: