It’s been a tough year for those of us in IT who engineer, deploy, support & maintain Microsoft technology products.
First, Windows 8 happened, which, as I’ve written about before, sent me into a downward spiral of confusion and despair. Shortly after that but before Windows 8.1, Microsoft killed off Technet subscriptions in the summer of 2013, telling Technet fans they should get used to the idea of MSDN subscriptions. As the fall arrived, Windows 8.1 and 2012 R2 cured my Chrome fever just as Ballmer & Crew were heading out the door.
Next, Microsoft took Satya Nadella out of his office in the Azure-plex and sat him behind the big mahogany CEO desk at One Microsoft Way. I like Nadella, but his selection spelled more gloom for Microsoft Infrastructure IT guys; remember it was Nadella who told the New York Times that Microsoft’s on-prem infrastructure products are old & tired and don’t make money for Microsoft anymore.
And then, this spring…first at BUILD, then TechEd, Microsoft did the unthinkable. They invited the Linux & Open source guys into the tent, sat them in the front row next to the developers and handed them drinks and party favors, while more or less making us on-prrem Infrastructure guys feel like we were crashing the party.
No new products announced for us at BUILD or TechEd, ostensibly the event built for us. Instead, the TechEdders got Azured on until they were blue in the face, leading Ars’ @DrPizza to observe:
So basically, not a single on-prem announcement. I wonder what IT people think about that….
We think it feels pretty shitty Dr. Pizza, that’s how. It feels like we’re about to be made obsolete, that we in the infrastructure side of the IT house are about to be disrupted out of existence by Jeffrey Snover’s cmdlets, Satya’s business sense and something menacingly named the Azure Pack.
And the guys who will replace us are all insufferable devs, Visual Studio jockeys who couldn’t tell you the difference between a spindle and a port-channel, even when threatened with a C#.
Which makes it hurt even more Dr. Pizza, if that is your real name.
But it also feels like a wake-up call and a challenge. A call to end the cynicism and embrace this cloud thing because it’s not going away. In fact, it’s only getting bigger, encroaching more and more each day into the DMZ and onto the LAN, forcing us to reckon with it.
The writing’s on the wall fellow Microsofties. BPOS uptime jokes were funny in 2011 and Azure doesn’t go down anymore because of expired certs. The stack is mature, scalable, and actually pretty awesome (even if they’re still using .vhd for VMs, which is crazy). It’s time we step up, adopt the language & manners of the dev, embrace the cloud vision, and take charge & ownership of our own futures.
I’d argue that learning Microsoft’s cloud is so urgent you should be exploring it and getting experienced with it even if your employer is cloud-shy and can’t commit.Don’t wait on them if that’s the case, do it yourself!
Because, if you don’t, you’ll get left behind. Think of the cloud as an operating system or technology platform and now imagine your resume in two, five, or seven years without any Office 365 or Azure experience on it. Now think of yourself actually scoring an interview, sitting down before the guy you want to work for in 2017 or 2018, and awkwardly telling him you have zero or very little experience in the cloud.
Would you hire that guy? I wouldn’t.
That guy will end up where all failed IT Pros end up: at Geek Squad, repairing consumer laptops & wifi routers and up-selling anti-virus subscriptions until he dies, sad, lonely & wondering where he went wrong.
Don’t be that guy. Aim for #InfrastructureGlory on-prem, hybrid, or in the cloud.
Over the coming days, I’ll show you how I did this on my own in a series of posts titled Cloud Praxis.
Link, On-prem/Hybrid/Cloud?, Notes
Cloud Praxis #2, On Prem, General guidance on building an AD lab to get started
Cloud Praxis #3, Cloud, Wherein I think about on-prem email and purchase an O365 E1 sub
Greetings to you Labworks readers, consumers, and conversationalists. Welcome to the last verse of Labworks Chapter 1, which has been all about building a durable and performance-oriented ZFS storage array for Hyper-V and/or VMware.
Today we’re going to circle back to the very end of Labworks 1:1, where I assigned myself some homework: find out why my writes suck so bad. We’re going to talk about a man named ZIL and his sidekick the SLOG and then we’re going to check out some Excel charts and finish by considering ZFS’ sync models.
But first, some housekeeping: SAN2, the ZFS box, has undergone minor modification. You can find the current array setup below. Also, I have a new switch in the Daisetta Lab, and as switching is intimately tied to storage networking & performance, it’s important I detail a little bit about it.
Labworks 1:4 – Small Business SG300 vs Catalyst 2960S
Cisco’s SG-300 & SG-500 series switches are getting some pretty good reviews, especially in a home lab context. I’ve got an SG-300 and really like it as it offers a solid spectrum of switching options at Layer 2 as well as a nice Layer 3-lite mode all for a tick under $200. It even has a real web-interface if your CLI-shy, which
I’m not but some folks are.
Sadly for me & the Daisetta Lab, I need more ports than my little SG-300 has to offer. So I’ve removed it from my rack and swapped it for a 2960S-48TS-L from the office, but not just any 2960S.
No, I have spiritual & emotional ties to this 2960s, this exact one. It’s the same 2960s I used in my January storage bakeoff of a Nimble array, the same 2960s on which I broke my Hyper-V & VMware cherry in those painful early days of virtualization, yes, this five year old switch is now in my lab:
Sure it’s not a storage switch, in fact it’s meant for IDFs and end-users and if the guys on that great storage networking podcast from a few weeks back knew I was using this as a storage switch, I’d be finished in this industry for good.
But I love this switch and I’m glad its at the top of my rack. I saved 1U, the energy costs of this switch vs two smaller ones are probably a wash, and though I lost Layer 3 Lite, I gained so much more: 48 x 1GbE ports and full LAN-licensed Cisco IOS v 15.2, which, agnostic computing goals aside for a moment, just feels so right and so good.
And with the increased amount of full-featured switch ports available to me, I’ve now got LACP teams of three on agnostic_node_1 & 2, jumbo frames from end to end, and the same VLAN layout.
Here’s the updated Labworks schematic and the disk layout for SAN2:
Disk Type, Quantity, Size, Format, Speed, Function
WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Zpool Members
Samsung 840 EVO SSD, 1, 128GB, 512byte, SATA 3, L2ARC Read Cache
Labworks 1:5 – A Man named ZIL and his sidekick, the SLOG
Labworks 1:1 was all about building durable & performance-oriented storage for Hyper-V & VMware. And one of the unresolved questions I aimed to solve out of that post was my poor write performance.
Review the hardware table and you’ll feel like I felt. I got me some SSD and some RAM, I provisioned a ZIL so write-cache that inbound IO already ZFS, amiright? Show me the IOPSMoney Jerry!
Well, about that. I mischaracterized the ZIL and I apologize to readers for the error. Let’s just get this out of the way: The ZFS Intent Log (ZIL) is not a write-cache device as I implied in Labworks 1:1.
The ZIL, whether spread out among your rotational disks by ZFS design, or applied to a Separate Log Device (a SLOG), is simply a synchronous writes mechanism, a log designed to ensure data integrity and report (IO ACK) back to the application layer that writes are safe somewhere on your rotational media. The ZIL & SLOG are also a disaster recovery mechanisms/devices ; in the event of power-loss, the ZIL, or the ZIL functioning on a SLOG device, will ensure that the writes it logged prior to the event are written to your spinners when your disks are back online.
Now there seem to be some differences in how the various implementations of ZFS look at the ZIL/SLOG mechanism.
Nexenta Community Edition, based off Illumos which is the open source descendant of Sun’s Solaris, says your SLOG should just be a write-optimized SSD, but even that’s more best practice than hard & fast requirement. Nexenta touts the ZIL/SLOG as a performance multiplier, and their excellent documentation has helpful charts and graphics reinforcing that.
In contrast, the most popular FreeBSD ZFS implementations documentation paints the ZIL as likely more trouble than its worth. FreeNAS actively discourages you from provisioning a SLOG unless it’s enterprise-grade, accurately pointing out that the ZIL & a SLOG device aren’t write-cache and probably won’t make your writes faster anyway, unless you’re NFS-focused (which I’m proudly, defiantly even, not) or operating a large database at scale.
What’s to account for the difference in documentation & best practice guides? I’m not sure; some of it’s probably related to *BSD vs Illumos implementations of ZFS, some of it’s probably related to different audiences & users of the free tier of these storage systems.
The question for us here is this: Will you benefit from provisioning a SLOG device if you build a ZFS box for Hyper-V and VMWare storage for iSCSI?
I hate sounding like a waffling storage VAR here, but I will: it depends. I’ve run both Nexenta and NAS4Free; when I ran Nexenta, I saw my SLOG being used during random & synchronous write operations. In NAS4Free, the SSD I had dedicated as a SLOG never showed any activity in zfs-stats, gstat or any other IO disk tool I could find.
One could spend weeks of valuable lab time verifying under which conditions a dedicated SLOG device adds performance to your storage array, but I decided to cut bait. Check out some of the links at the bottom for more color on this, but in the meantime, let me leave you with this advice: if you have $80 to spend on your FreeBSD-based ZFS storage, buy an extra 8GB of RAM rather than a tiny, used SLC or MLC device to function as your SLOG. You will almost certainly get more performance out of a larger ARC than by dedicating a disk as your SLOG.
Labworks 1:6 – Great…so, again, why do my writes suck?
Recall this SQLIO test from Labworks 1:1:
As you can see, read or write, I was hitting a wall at around 235-240 megabytes per second during much of “Short Test”, which is pretty close to the theoretical limit of an LACP team with two GigE NICs.
But as I said above, we don’t have that limit anymore. Whereas there were once 2x1GbE Teams, there are now 3x1GbE. Let’s see what the same test on the same 4KB block/4KB NTFS volume yields now.
SQLIO short test, take two, sort by Random vs Sequential writes & reads:
By jove, what’s going on here? This graph was built off the same SQLIO recipe, but looks completely different than Labworks 1. For one, the writes look much better, and reads look much worse. Yet step back and the patterns are largely the same.
It’s data like this that makes benchmarking, validating & ultimately purchasing storage so tricky. Some would argue with my reliance on SQLIO and those arguments have merit, but I feel SQLIO, which is easy to script/run and automate, can give you some valuable hints into the characteristics of an array you’re considering.
Let’s look at the writes question specifically.
Am I really writing 350MB/s to SAN2?
On the one hand, everything I’m looking at says YES: I am a Storage God and I have achieved #StorageGlory inside the humble Daisetta Lab HQ on consumer-level hardware:
SAN2 is showing about 115MB/s to each Broadcom interface during the 32KB & 64KB samples
Agnostic_Node_1 perfmon shows about the same amount of traffic eggressing the three vEthernet adapters
The 2960S is reflecting all that traffic; I’m definitely pushing about 350 megabytes per second to SAN2; interface port channel 3 shows TX load at 219 out of 255 and maxing out my LACP team
On the other hand, I am just an IT Mortal and something bothers:
CPU is very high on SAN2 during the 32KB & 64KB runs…so busy it seems like the little AMD CPU is responsible for some of the good performance marks
While I’m a fan of the itsy-bitsy 2.5″ Western Digitial RED 1TB drives in SAN2, under no theoretical IOPS model is it likely that six of them, in RAIDZ-2 (RAID 6 equivalent) can achieve 5,000-10,000 IOPS under traditional storage principles. Each drive by itself is capable of only 75-90 IOPS
If something is too good to be true, it probably is
Sr. Storage Engineer Neo feels really frustrated at this point; he can’t figure out why his writes suck, or even if they suck, and so he wanders up to the Oracle to get her take on the situation and comes across this strange Buddha Storage kid.
Labworks 1:7 – The Essence of ZFS & New Storage model
In effect, what we see here is is just a sample of the technology & techniques that have been disrupting the storage market for several years now: compression & caching multiply performance of storage systems beyond what they should be capable of, in certain scenarios.
As the chart above shows, the test2 volume is compressed by SAN2 using lzjb. On top of that, we’ve got the ZFS ARC, L2ARC, and the ZIL in the mix. And then, to make things even more complicated, we have some sync policies ZFS allows us to toggle. They look like this:
The sync toggle documentation is out there and you should understand it it is crucial to understanding ZFS, but I want to demonstrate the choices as well.
I’ve got three choices + the compression options. Which one of these combinations is going to give me the best performance & durability for my Hyper-V VMs?
SQLIO Short Test Runs 3-6, all PivotTabled up for your enjoyment and ease of digestion:
As is usually the case in storage, IT, and hell, life in general, there are no free lunches here people. This graph tells you what you already know in your heart: the safest storage policy in ZFS-land (Always Sync, that is to say, commit writes to the rotationals post haste as if it was the last day on earth) is also the slowest. Nearly 20 seconds of latency as I force ZFS to commit everything I send it immediately (vs flush it later), which it struggles to do at a measly average speed of 4.4 megabytes/second.
Compression-wise, I thought I’d see a big difference between the various compression schemes, but I don’t. Lzgb, lz4, and the ultra-space-saving/high-cpu-cost gzip-9 all turn in about equal results from an IOPS & performance perspective. It’s almost a wash, really, and that’s likely because of the predictable nature of the IO SQLIO is generating.
Last point: ZFS, as Chris Wahl pointed out, is a sort of virtualization layer atop your storage. Now if you’re a virtualization guy like me or Wahl, that’s easy to grasp; Windows 2012 R2’s Storage Spaces concept is similar in function.
But sometimes in virtualization, it’s good to peel away the abstraction onion and watch what that looks like in practice. ZFS has a number of tools and monitors that look at your Zpool IO, but to really see how ZFS works, I advise you to run gstat. GStat shows what your disks are doing and if you’re carefully setting up your environment, you ought to be able to see the effects of your settings on each individual spindle.
Now look at this gstat sample. Under SQLIO-load, the zvol is showing 10,000 IOPS, 300+MB/s. But ada0-5, the physical drives, aren’t doing squat for several seconds at a time as SAN2 absorbs & processes all the IO coming at it.
That, friends, is the essence of ZFS.
Links/Knowledge/Required Reading Used in this Post:
Hello Labworks fans, detractors and partisans alike, hope you had a nice Easter / Resurrection / Agnostic Spring Celebration weekend.
Last time on Labworks 2:1-4, we looked at some of the awesome teaming options Microsoft gave us with Server 2012 via its multiplexor driver. We also made the required configuration adjustments on our switch for jumbo frames & VLAN trunking, then we built ourselves some port channel interfaces flavored with LACP.
I think the multiplexor driver/protocol is one of the great (unsung?) enhancements of Server 2012/R2 because it’s a sort of pre-virtualization abstraction layer (That is to say, your NICs are abstracted & standardized via this driver before we build our important virtual switches) and because it’s a value & performance multiplier you can use on just about any modern NIC, from the humble RealTek to the Mighty Intel Server 10GbE.
But I’m getting too excited here; let’s get back to the curriculum and get started shall we?
5. Understand what Microsoft’s multiplexor driver/LBFO has done to our NICs
6. Build our Virtual Machine Switch for maximum flexibility & performance
7. The vEthernets are Coming
8. Next Steps: Jumbo frames from End-to-end and performance tuning
2:5 Understand what Microsoft’s Multiplexor driver/LBFO has done to our NICs
So as I said above, the best way to think about the multiplexor driver & Microsoft’s Load Balancing/Failover tech is by viewing it as a pre-virtualization abstraction layer for your NICs. Let’s take a look.
Our Network Connections screen doesn’t look much different yet, save for one new decked-out icon labeled “Daisetta-Team:”
Meanwhile, this screen is still showing the four NICs we joined into a team in Labworks 2:3, so what gives?
A click on the properties of any of those NICs (save for the RealTek) reveals what’s happened:
The LBFO process unbinds many (though not all) settings, configurations, protocols and certain driver elements from your physical NICs, then binds the fabulous Multiplexor driver/protocol to the NIC as you see in the screenshot above.
In the dark days of 2008 R2 & Windows core, when we had to walk up hill to school both ways in the snow I had to download and run a cmd tool called nvspbind to get this kind of information.
Fortunately for us in 2012 & R2, we have some simple cmdlets:
So notice Microsoft has essentially stripped “Ethernet 4” of all that would have made it special & unique amongst my 4x1GbE NICs; where I might have thought to tag a VLAN onto that Intel GbE, the multiplexor has stripped that option out. If I had statically assigned an IP address to this interface, TCP/IP v4 & v6 are now no longer bound to the NIC itself and thus are incapable of having an IP address.
And the awesome thing is you can do this across NICs, even NICs made by separate vendors. I could, for example, mix the sacred NICs (Intel) with the profane NICs (RealTek)…it don’t matter, all NICs are invited to the LBFO party.
No extra licensing costs here either; if you own a Server 2012 or 2012 R2 license, you get this for free, which is all kinds of kick ass as this bit of tech has allowed me in many situations to delay hardware spend. Why go for 10GbE NICs & Switches when I can combine some old Broadcom NICs, leverage LACP on the switch, and build 6×1 or 8x1GbE Converged LACP teams?
LBFO even adds up all the NICs you’ve given it and teases you with a calculated LinkSpeed figure, which we’re going to hold it to in the next step:
2:6 Build our Virtual Machine Switch for maximum flexibility & performance
If we just had the multiplexor protocol & LBFO available to us, it’d be great for physical server performance & durability. But if you’re deploying Hyper-V, you get to have your LBFO cake and eat it too, by putting a virtual switch atop the team.
This is all very easy to do in Hyper-V manager. Simply right click your server, select Virtual Switch Manager, make sure the Multiplexor driver is selected as the NIC, and press OK.
Bob’s your Uncle:
But let’s go a bit deeper and do this via powershell, where we get some extra options & control:
New-vmswitch : the cmdlet we’re invoking to build the switch. Run get-help new-vmswitch for a rundown of the cmdlet’s structure & options
-NetAdapterInterfaceDescription : here we’re telling Windows which NIC to build the VM Switch on top of. Get the precise name from Get-NetAdapter and enclose it in quotes
-Allow ManagementOS 1 : Recall the diagram above. This boolean switch (1 yes, 0 no) tells Windows to create the VM Switch & plug the Host/Management Operating System into said Switch. You may or may not want this; in the lab I say yes; at work I’ve used No.
-Minimum Bandwidth Mode Weight: We lay out the rules for how the switch will apportion some of the 4Gb/s bandwidth available to it. By using “Weight,” we’re telling the switch we’ll assign some values later
Name: Name your switch
A few seconds later, and congrats Mr. Hyper-V admin, you have built a converged virtual switch!
2:7 The vEthernets are Coming
Now that we’ve built our converged virtual switch, we need to plug some things into it. And that starts on the physical host.
If you’re building a Hyper-V cluster or stand-alone Hyper-V host with VMs on networked storage, you’ll approach vEthernet adpaters differently than if you’re building Hyper-V for VMs on attached/internal storage or on SMB 3.0 share storage. In the former, you’re going to need storage vEthernet adpters; in the latter you won’t need as many vEthernets unless you’re going multi-channel SMB 3.0, which we’ll cover in another labworks session.
I’m going to show you the iSCSI + Failover Clustering model.
In traditional Microsoft Failover Clustering for Virtual Machines, we need a minimum of five discrete networks. Here’s how that shakes out in the Daisetta Lab:
Network Name, VLAN ID, Purpose, Notes
Management, 1, Host & VM management network, You can separate the two if you like
CSV, 14, Host Cluster & communication and coordination, Important for clustering Hyper-V hosts
LM, 15, Live Migration network, When you must send VMs from broke host to host with the most LM is there for you
iSCSI 1-3, 11-13, Storage, Soemwhat controversial but supported
Now you should be connecting that dots: remember in Labworks 2:1, we built a trunked port-channel on our Cisco 2960S for the sole purpose of these vEthernet adapters & our converged switch.
So, we’re going to attach tagged vethernet adapters to our host via powershell. Pay attention here to the “-managementOS” tag; though our Converged switch is for virtual machines, we’re using it for our physical host as well.
You can script his out of course (and VMM does that for you), but if you just want to copy paste, do it in this order:
Notice we didn’t include a Gateway in the New-NetIPAddress cmdlet; that’s because when we built our Virtual Switch with the “-managementOS 1” switch attached, Windows automatically provisioned a vEthernet adapter for us, which either got an IP via DHCP or took an apipa address.
So now we have our vEthernets and their appropriate VLAN tags:
2:8: Next Steps : Jumbo Frames from end-to-end & Performance Tuning
So if you’ve made it this far, congrats. If you do nothing else, you now have a converged Hyper-V virtual switch, tagged vEthernets on your host, and a virtualized infrastructure that’s ready for VMs.
But there’s more you can do; stay tuned for the next labworks post where we’ll get into jumbo frames & performance tuning this baby so she can run with all the bandwidth we’ve given her.
Links/Knowledge/Required Reading Used in this Post:
The post has gotten quite a bit of traffic and I hope it’s been helpful to folks. I intended to do the followup posts soon after that, but boy, have I had a tough week in technology.
Let’s hop to it, shall we?
Labworks 1.1-2 : I Heart the ARC and Let’s Pull Some Drives!
When we left Labworks 1, I assigned myself some homework. Here’s an update on each of those tasks, the grade I’d give myself (I went to Catholic school so I’m characteristically harsh) and some notes on the status:
Next Step, Completed, Grade, Notes
Find out why my writes suck, Kind of, B-, Replaced Switch & Deep dived the ZIL
Test NAS4Free’s NFS Performance, No,F, One Pink Screen of Death too many
Test SMB 3.0 from a VM inside ZFS box, No, F, Block vs File Bakeoff plans
Sell some stuff, No, C, Other priorities
Rebuild rookie standard switch into distributed, no,F, Cant build a vSwitch without a VMware host
I have updates on all of these items, so if you’re curious stick around as they’ll be posted in subsequent Labworks. Suffice it to say, there’s been some infrastructure changes in the Daisetta Lab, and so here’s an updated physical layout with Skull & Crossbones over my VMware host, which I put out of its misery last week.
In the meantime, I wanted to share some of the benefits of ZFS for your Hyper-V or VMware lab.
1:1 – I Heart the ARC
So I covered some of the benchmark results in Labworks 1, but I wanted to get practical this week. Graphs & benchmarks are great, but how does ZFS storage architecture benefit your virtualization lab?
At least in the case of Hyper-V, Clustered Share Volumes and dynamic .vhdxs on iSCSI.
To really show how it works, I had to zero out my ARC, empty the L2ARC, and wipe the writes/reads counters to each physical volume out. And to do that, I had to reboot SAN2. My three virtual machines -a Windows 7 vm, a SQL 2014 VM, and a Virtual Machine Management server, had to be shut down, and just to do it right and by the book, I placed both the CSV & LUN mapped to Node-1 into maintenance mode.
And then I started the whole thing back up. Here are the results followed by two animated gifs. Remember, the ARC is your system RAM, so watch how it grows as ZFS starts putting VMs into RAM and L2ARC, my SSD drives:
ARC Size (Cold Boot), ARC Size after VM Boot, ARC Size +5h, L2ARC Size (Cold Boot), L2ARC Size after VM Boot, L2ARC Size + 5h
7MB, 10GB, 14GB, 900KB, 4.59GB,6.8GB
So for you in your lab, or if you’re pondering similar tech at work, what’s this mean?
Boot speed of your VM fleet is the easiest to quantify and the greatest to behold, but that’s just for starters.
ZFS’ ARC & L2ARC shaved over 80% off my VM’s boot times and massively reduced load on rotational disks on the second boot.
The gains here are enormous and hint at the reasons why SSD & caching are so attractive. Done right, the effect is multiplicative in nature; you’re not just adding IOPS when you add an SSD, you’re multiplying storage performance by several orders of magnitude in certain scenarios. And VM boot times are such a scenario where the effect is very dramatic:
This is great news if you’re building lab storage because, as I said in Labworks 1, if you’re going to have to use an entire physical box for storage, best to use every last bit of that box for storage, including RAM. ZFS is the only non-commercial system I know of to give you that ability, and though the investment is steep, the payoff is impressive.
Now at work, imagine you have a fleet of 50 virtual machines, or 100 or more, and you have to boot or reboot them on a Saturday during your maintenance window. Without some sort of caching mechanism, be it a ZFS ARC & its MRU/MFU algorithms, or some of the newer stuff we saw at #VFD3 including Coho’s system & Atlantis’ ILIO USX, you’re screwed.
Kiss your Saturday goodbye because on old rotational arrays, you’re going to have to stagger your boots, spread it over two Saturdays, or suffer the logarithmic curve of filer entropy & death as more IO begets more IO delay in a vicious cycle of decay that will result in you banging your fists bloody on the filer, begging the storage gods for mercy and relief.
Oh man that was a painful Saturday four years ago.
I wish I could breakdown these results even further; what percentage of that 19s boot time is due to my .vhdx being stored in SAN2’s ARC, and what percentage is due, if any, to ZFS compression on the volume or by the CPU on the IO ‘stream’ itself, as I’ve got that particular box ticked on CSV1 as well?
That’s important to understand for lab work or your real job because SSD & caching are only half of the reason why the stodgy storage sector has been turned on its head. Step back and survey the new players vs the old, and I think you’ll find that many of the new players are reading & writing data to/from their arrays in more intelligent (or risky, depending on your perspective) ways, by leveraging the CPU to compress inbound IO, or de-duping on the front-end rather than on the back-end or, in the case of a Coho, just handing over the switch & Layer 2 to the array itself in wild yet amazing & extensible ways.
My humble NAS4Free box isn’t near those levels of sophistication yet I don’t think it’s improper to draw an academic-family tree-style dotted line between my ZFS lab storage & some of the great new storage products on the market that are using sophisticated caching algorithms & compression/processing to deliver high performance storage downmarket, so downmarket that I’ve got fast storage in my garage!
Perhaps a future labworks will explore compression vs caching, but for now, let’s take a look at what ZFS is doing during the cold & warm boots of my VMs.
Single Pane O’GifGlass animated shot of the cold boot (truncated):
And the near #StorageGlory Gifcam shot of the entire 19s 2nd boot cycle after my ARC & L2ARC are sufficiently populated:
Of course, how often are we rebooting VMs anyway? Fair point.
One could argue the results above, while interesting, have limited applicability in a lab, a small enterprise or even a large one, but consider this: if you deliver applications via session virtualization technologies -XenApp or RDS come to mind- on top of a hypervisor (virtualization within virtualization for the win!), then ZFS and other caching systems will likely ease your pain and get your users to their application faster than you ever could achieve with rotational storage alone. So in my book, it’s something you should master and understand.
So all this is great. ZFS performs very well for Hyper-V, the ARC/L2ARC paradigm works, and it’s all rather convincing isn’t it? I’ll save some thoughts on writes for a subsequent Labworks, but so far, things are looking up.
Of course you can’t be in IT and not worry about the durability & integrity of your data. As storage guys say, all else is plumbing; when it comes to data and storage, an array has to guarantee integrity.
This is probably most enjoyable test of all IT testing regimes, if only because it’s so physical, so dramatic, so violent, and so rare. I’m talking about drive pulls & storage failure simulations, the kind of test you only get to do when you’re engaging in a PoC at work, and then, perhaps for SMB guys like me, only once every few years.
As I put it back in January when I was testing a Nimble array at work, “Wreck that array.”
At home of course I can’t afford true n+1 everywhere, let alone waste disks on something approaching the level of reliability of RAID DP, but I can at least test RAIDZ2, ZFS’ equivalent to RAID 6.
Drive Pull test below. Will my CSVs stay online? Click play.
Labworks #1: Building a durable, performance-oriented ZFS box for Hyper-V, VMware
Primary Goal: To build a durable and performance-oriented storage array using Sun’s fantastic, 128 bit, high-integrity Zetabyte File System for use with Lab Hyper-V CSVs & Windows clusters, VMware ESXi 5.5, other hypervisors,
Secondary Goal: Leverage consumer-grade SSDs to increase/multiply performance by using them as ZFS Intent Log (ZIL) write-cache and L2ARC read cache
Bonus: The Windows 7 PC in the living room that’s running Windows Media Center with CableCARD & HD Home Run was running out of DVR disk space and can’t record to SMB shares but can record to iSCSI LUNs.
Technologies used: iSCSI, MPIO, LACP, Jumbo Frames, IOMETER, SQLIO, ATTO, Robocopy, CrystalDiskMark, FreeBSD, NAS4Free, Windows Server 2012 R2, Hyper-V 3.0, Converged switch, VMware, standard switch, Cisco SG300
I picked the Gigabyte board above because it’s got an outstanding eight SATA 6Gbit ports, all running on the native AMD A88x Bolton-D4 chipset, which, it turns out, isn’t supported well in Illumos (see Lab Notes below).
I added to that a cheap $20 Marve 9128se two port SATA 6gbit PCIe card, which hosts the boot volume & the SanDisk SSD.
Disk Type, Quantity, Size, Format, Speed, Function
WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Zpool Members
I’m not finished with all the benchmarking, which is notoriously difficult to get right, but here’s a taste. Expect a followup soon.
All shots below involved lzp2 compression on SAN2
SQLIO Short Test:
ATTO standard run:
None so far. From a VMware perspective, I want to rebuild the Standard switch as a distributed switch now that I’ve got a VCenter appliance running. But that’s not my priority at the moment.
Pulled two drives -the limit on RAIDZ2- under normal conditions. Put them back in, saw some alerts about the “administrator pulling drives” and the Zpool being in a degraded state. My CSVs remained online, however. Following a short zpool online command, both drives rejoined the pool and the degraded error went away.
Because it’s not all about repeatable lab experiments. Here’s a Gifcam shot from Node-1 as it completely saturates both 2x1GbE Intel NICs:
and some pretty blinking lights from the six 2.5″ drives:
Lab notes & Lessons Learned:
First off, I’d like to buy a beer for the unknown technology enthusiast/lab guy who uttered these sage words of wisdom, which I failed to heed:
You buy cheap, you buy twice
Listen to that man, would you? Because going consumer, while tempting, is not smart. Learn from my mistakes: if you have to buy, buy server boards.
Secondly, I prefer NexentaStor to NAS4Free with ZFS, but like others, I worry about and have been stung by Open Solaris/Illumos hardware support. Most of that is my own fault, cf the note above, but still: does Illumos have a future? I’m hopeful, NextentaStor is going to appear at next month’s Storage Field Day 5, so that’s a good sign, and version 4.0 is due out anytime.
The Illumos/Nexenta command structure is much more intuitive to me than FreeBSD. In place of your favorite *nix commands, Nexenta employs some great, verb-noun show commands, and dtrace, the excellent diagnostic/performance tool included in Solaris is baked right into Nexenta. In NAS4Free/FreeBSD 9.1, you’ve got to add a few packages to get the equivalent stats for the ARC, L2ARC and ZFS, and adding dtrace involves a make & kernel modification, something I haven’t been brave enough to try yet.
Next: Jumbo Frames for the win. From Node-1, the desktop in my office, my Core i5-4670k CPU would regularly hit 35-50% utilization during my standard SQLIO benchmark before I configured jumbo frames from end-to-end. Now, after enabling Jumbo frames on the Intel NICs, the Hyper-V converged switch, the SG-300 and the ZFS box, utilization peaks at 15-20% during the same SQLIO test, and the benchmarks have show an increase as well. Unfortunately in FreeBSD world, adding jumbo frames is something you have to do on the interface & routing table, and it doesn’t persist across reboots for me, though that may be due to a driver issue on the Broadcom card.
The Western Digital 2.5″ drives aren’t stellar performers and they aren’t cheap, but boy are they quiet, well-built, and run cool, asking politely for only 1 watt under load. I’ve returned the hot, loud & failure prone HGST 3.5″ 2 TB drives I borrowed from work; it’s too hard to put them in a chassis that’s short-depth.
Lastly, ZFS’ adaptive replacement cache, which I’ve enthused over a lot in recent weeks, is quite the value & performance-multiplier. I’ve tested Windows Server 2012 R2 Storage Appliance’s tiered storage model, and while I was impressed with it’s responsiveness, ReFS, and ability to pool storage in interesting ways, nothing can compete with ZFS’ ARC model. It’s simply awesome; deceptively-simple, but awesome.
Lesson is that if you’re going to lose an entire box to storage in your lab, your chosen storage system better use every last ounce of that box, including its RAM, to serve storage up to you. 2012 R2 doesn’t, but I’m hopeful soon that it may (Update 1 perhaps?)
Here’s a cool screenshot from Nexenta, my last build before I re-did everything, showing ARC-hits following a cold boot of the array (top), and a few days later, when things are really cooking for my Hyper-V VMs stored, which are getting tagged with ZFS’ “Most Frequently Used” category and thus getting the benefit of fast RAM & L2ARC:
Find out why my writes suck so bad.
Test Nas4Free’s NFS performance
Test SMB 3.0 from a virtual machine inside the ZFS box
Sell some stuff so I can buy a proper SLC SSD drive for the ZIL
Re-build the rookie Standard Switch into a true Distributed Switch in ESXi
Links/Knowledge/Required Reading Used in this Post: