Greg Ferro, Philosopher King of networking and prolific tech blogger/personality, had me in stitches during the latest Coffee Break episode on the Packet Pushers podcast.
Coffee Breaks are relatively short podcasts focused on networking vendor & industry news, moves and initiatives. Ferro usually hosts these episodes and chats about the state of the industry with two other rotating experts in the “time it takes have a coffee break.”
Some Coffee Breaks are great, some I skip, and then, some, are Vintage Greg Ferro, encapsulated IT wisdom with some .co.uk attitude.
During the public cloud services discussion, the conversation turned toward on-prem expertise in firewalls, which, somehow, touched Ferro’s IT Infrastructure Library (ITIL) nerve.
ITIL, if you’re not familiar with it, is sort of a set of standards & processes for IT organizations, or, as Ferro sees it:
ITIL is an emotional poison that sucks the inspiration and joy from technology and reduces us to grey people who can evaluate their lives in terms of “didn’t fail”. I have spent two decades of my professional living a grey zone of never winning and never failing.
Death to bloody ITIL. I want to win.
Anyway on the podcast, Ferro got animated discussing a theoretical on-prem firewall guy operating under an ITIL framework:
“Oh give me a break. It’s all because of ITIL. Everybody’s in ITIL. So when you say you’re going to change your firewall, these people have a change management problem, a self change management problem, because ITIL prevents them from being clever enough. You’re not allowed to be a compute guy & a firewall guy [in an ITIL framework]. When you move to the public cloud, you throw away all those skills because you don’t need them.
Ferro’s point (I think), was that ITIL serves as a kind of retardant for IT organizations looking to move parts of their infrastructure to the public cloud, but not in just the obvious ways you might think (ie it’d be an arduous process to redo the Change Management Database & Configuration Items involved in putting some of your stack in the cloud!)
It seems Ferro is saying that specialized knowledge (ie the firewall guy & his bespoke firewall config) are threatened by the ease of deploying public cloud infrastructure, and to get to the cloud, some organizations will have to break through ITIL orthodoxy as it tends to elevate and protect complexity.
But that wasn’t all. Ferro also helped me understand the real difference between declarative & imperative programming. Wheras before I just nodded my head and thought, “Hell yeah, Desired State Configuration & Declarative Programming. That’s where I want to be,” now I actually comprehend it.
It’s all about sausage rolls, you see:
Let’s say you want a sausage roll. And it’s a long way to the shop for a sausage roll. If you’re going to send a six year old down to the shop to get you a sausage roll, you’re going to say, Right. Here is $2, here’s the way to the shop. You go down the street, turn right, then left. You go into the shop you ask the man for a sausage roll. Then you carry the sausage roll home very carefully because you don’t want the sausage roll to get cold.
That’s imperative programming. Precise instructions for every step of the process. And you get a nice sausage roll at the end.
Declarative programming (or promise theory as Ferro called it), is more like:
You have a teenager of 13 or 14, old enough to know how to walk to the shop, but not intelligent enough to fetch a sausage roll without some instructions. Here’s $10, go and fetch me a sausage roll. The teenager can go to the shop, fetch you a sausage roll and return with change.
See the distinction? The teenager gets some loose instructions & rules within which to operate, yet you still get a sausage roll at the end.
Jeffrey Snover, Microsoft Senior Technical Fellow (or maybe he’s higher in the Knights of Columbus-like Microsoft order), likened declarative programming to Captain Picard simply saying, ‘Make it so!.” I was happy with that framework but I think sausage rolls & children work better.
By the way, for Americans, a Sausage Roll looks like the image above and appears to be what I would think of as a Pig in a Blanket, with my midwestern & west coast American roots. How awesome is it that you can buy Pigs in a Blanket in UK shops?
Some blatant fanboism here, but what’s not to like?
Child partition dancing
High quality optics
Fun camera app
Outputs to animated gif, the only truly agnostic video format out there
Building something like this used to involve about 15 hours on the PC, a fast camera, a copy of Paint Shop Pro, and some serious patience and tolerance for folder sprawl.
Now I’m doing it on a phone in seconds, and I don’t even have to think about it. I can even do what were once difficult color modifications, holding the phone with one hand, and the dog’s leash with the other.
Of course, as a new dad, I’m a sucker for gimmicky pics, almost going out of my way to get them. Credit/blame to Google+ here; they sort of reintroduced the animated gif genre and burned probably thousands of compute hours just to try and impress me with Auto Awesome. Some have been gems, most have been duds. Thanks for the free compute G+!
In contrast, Nokkia’s Cinemagraph is hands-on. It’s not hard to use, but you have to visualize your shot, hold your phone steady, and frame the action. Then you press the screen and a few seconds later, you can introduce some neat & fun little effects.
And then you hit the save button, and the little Snapdragon inside gets busy and builds what you see above.
Gimmicky? Sure. But this is my gimmick, damnit, not a G+ cron job.
Final note: when are Facebook & Twitter going to man up and allow us to post animated gifs? I’m tired of having to use that other old yet fully agnostic communications protocol (SMTP) to share my cheesy dad pics with the fam.
Google lets me post these to G+ (if only someone were there to see them). My rinky-dink blog sitting on a shared LAMP server in some God-forsaken GoDaddy datacenter has no problem hosting these ineffecient but highly useful animated gifs.
Hell, I bet you can post animated gif shots to Yammer.
Greetings to you Labworks readers, consumers, and conversationalists. Welcome to the last verse of Labworks Chapter 1, which has been all about building a durable and performance-oriented ZFS storage array for Hyper-V and/or VMware.
Today we’re going to circle back to the very end of Labworks 1:1, where I assigned myself some homework: find out why my writes suck so bad. We’re going to talk about a man named ZIL and his sidekick the SLOG and then we’re going to check out some Excel charts and finish by considering ZFS’ sync models.
But first, some housekeeping: SAN2, the ZFS box, has undergone minor modification. You can find the current array setup below. Also, I have a new switch in the Daisetta Lab, and as switching is intimately tied to storage networking & performance, it’s important I detail a little bit about it.
Labworks 1:4 – Small Business SG300 vs Catalyst 2960S
Cisco’s SG-300 & SG-500 series switches are getting some pretty good reviews, especially in a home lab context. I’ve got an SG-300 and really like it as it offers a solid spectrum of switching options at Layer 2 as well as a nice Layer 3-lite mode all for a tick under $200. It even has a real web-interface if your CLI-shy, which
I’m not but some folks are.
Sadly for me & the Daisetta Lab, I need more ports than my little SG-300 has to offer. So I’ve removed it from my rack and swapped it for a 2960S-48TS-L from the office, but not just any 2960S.
No, I have spiritual & emotional ties to this 2960s, this exact one. It’s the same 2960s I used in my January storage bakeoff of a Nimble array, the same 2960s on which I broke my Hyper-V & VMware cherry in those painful early days of virtualization, yes, this five year old switch is now in my lab:
Sure it’s not a storage switch, in fact it’s meant for IDFs and end-users and if the guys on that great storage networking podcast from a few weeks back knew I was using this as a storage switch, I’d be finished in this industry for good.
But I love this switch and I’m glad its at the top of my rack. I saved 1U, the energy costs of this switch vs two smaller ones are probably a wash, and though I lost Layer 3 Lite, I gained so much more: 48 x 1GbE ports and full LAN-licensed Cisco IOS v 15.2, which, agnostic computing goals aside for a moment, just feels so right and so good.
And with the increased amount of full-featured switch ports available to me, I’ve now got LACP teams of three on agnostic_node_1 & 2, jumbo frames from end to end, and the same VLAN layout.
Here’s the updated Labworks schematic and the disk layout for SAN2:
Disk Type, Quantity, Size, Format, Speed, Function
WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Zpool Members
Samsung 840 EVO SSD, 1, 128GB, 512byte, SATA 3, L2ARC Read Cache
Labworks 1:5 – A Man named ZIL and his sidekick, the SLOG
Labworks 1:1 was all about building durable & performance-oriented storage for Hyper-V & VMware. And one of the unresolved questions I aimed to solve out of that post was my poor write performance.
Review the hardware table and you’ll feel like I felt. I got me some SSD and some RAM, I provisioned a ZIL so write-cache that inbound IO already ZFS, amiright? Show me the IOPSMoney Jerry!
Well, about that. I mischaracterized the ZIL and I apologize to readers for the error. Let’s just get this out of the way: The ZFS Intent Log (ZIL) is not a write-cache device as I implied in Labworks 1:1.
The ZIL, whether spread out among your rotational disks by ZFS design, or applied to a Separate Log Device (a SLOG), is simply a synchronous writes mechanism, a log designed to ensure data integrity and report (IO ACK) back to the application layer that writes are safe somewhere on your rotational media. The ZIL & SLOG are also a disaster recovery mechanisms/devices ; in the event of power-loss, the ZIL, or the ZIL functioning on a SLOG device, will ensure that the writes it logged prior to the event are written to your spinners when your disks are back online.
Now there seem to be some differences in how the various implementations of ZFS look at the ZIL/SLOG mechanism.
Nexenta Community Edition, based off Illumos which is the open source descendant of Sun’s Solaris, says your SLOG should just be a write-optimized SSD, but even that’s more best practice than hard & fast requirement. Nexenta touts the ZIL/SLOG as a performance multiplier, and their excellent documentation has helpful charts and graphics reinforcing that.
In contrast, the most popular FreeBSD ZFS implementations documentation paints the ZIL as likely more trouble than its worth. FreeNAS actively discourages you from provisioning a SLOG unless it’s enterprise-grade, accurately pointing out that the ZIL & a SLOG device aren’t write-cache and probably won’t make your writes faster anyway, unless you’re NFS-focused (which I’m proudly, defiantly even, not) or operating a large database at scale.
What’s to account for the difference in documentation & best practice guides? I’m not sure; some of it’s probably related to *BSD vs Illumos implementations of ZFS, some of it’s probably related to different audiences & users of the free tier of these storage systems.
The question for us here is this: Will you benefit from provisioning a SLOG device if you build a ZFS box for Hyper-V and VMWare storage for iSCSI?
I hate sounding like a waffling storage VAR here, but I will: it depends. I’ve run both Nexenta and NAS4Free; when I ran Nexenta, I saw my SLOG being used during random & synchronous write operations. In NAS4Free, the SSD I had dedicated as a SLOG never showed any activity in zfs-stats, gstat or any other IO disk tool I could find.
One could spend weeks of valuable lab time verifying under which conditions a dedicated SLOG device adds performance to your storage array, but I decided to cut bait. Check out some of the links at the bottom for more color on this, but in the meantime, let me leave you with this advice: if you have $80 to spend on your FreeBSD-based ZFS storage, buy an extra 8GB of RAM rather than a tiny, used SLC or MLC device to function as your SLOG. You will almost certainly get more performance out of a larger ARC than by dedicating a disk as your SLOG.
Labworks 1:6 – Great…so, again, why do my writes suck?
Recall this SQLIO test from Labworks 1:1:
As you can see, read or write, I was hitting a wall at around 235-240 megabytes per second during much of “Short Test”, which is pretty close to the theoretical limit of an LACP team with two GigE NICs.
But as I said above, we don’t have that limit anymore. Whereas there were once 2x1GbE Teams, there are now 3x1GbE. Let’s see what the same test on the same 4KB block/4KB NTFS volume yields now.
SQLIO short test, take two, sort by Random vs Sequential writes & reads:
By jove, what’s going on here? This graph was built off the same SQLIO recipe, but looks completely different than Labworks 1. For one, the writes look much better, and reads look much worse. Yet step back and the patterns are largely the same.
It’s data like this that makes benchmarking, validating & ultimately purchasing storage so tricky. Some would argue with my reliance on SQLIO and those arguments have merit, but I feel SQLIO, which is easy to script/run and automate, can give you some valuable hints into the characteristics of an array you’re considering.
Let’s look at the writes question specifically.
Am I really writing 350MB/s to SAN2?
On the one hand, everything I’m looking at says YES: I am a Storage God and I have achieved #StorageGlory inside the humble Daisetta Lab HQ on consumer-level hardware:
SAN2 is showing about 115MB/s to each Broadcom interface during the 32KB & 64KB samples
Agnostic_Node_1 perfmon shows about the same amount of traffic eggressing the three vEthernet adapters
The 2960S is reflecting all that traffic; I’m definitely pushing about 350 megabytes per second to SAN2; interface port channel 3 shows TX load at 219 out of 255 and maxing out my LACP team
On the other hand, I am just an IT Mortal and something bothers:
CPU is very high on SAN2 during the 32KB & 64KB runs…so busy it seems like the little AMD CPU is responsible for some of the good performance marks
While I’m a fan of the itsy-bitsy 2.5″ Western Digitial RED 1TB drives in SAN2, under no theoretical IOPS model is it likely that six of them, in RAIDZ-2 (RAID 6 equivalent) can achieve 5,000-10,000 IOPS under traditional storage principles. Each drive by itself is capable of only 75-90 IOPS
If something is too good to be true, it probably is
Sr. Storage Engineer Neo feels really frustrated at this point; he can’t figure out why his writes suck, or even if they suck, and so he wanders up to the Oracle to get her take on the situation and comes across this strange Buddha Storage kid.
Labworks 1:7 – The Essence of ZFS & New Storage model
In effect, what we see here is is just a sample of the technology & techniques that have been disrupting the storage market for several years now: compression & caching multiply performance of storage systems beyond what they should be capable of, in certain scenarios.
As the chart above shows, the test2 volume is compressed by SAN2 using lzjb. On top of that, we’ve got the ZFS ARC, L2ARC, and the ZIL in the mix. And then, to make things even more complicated, we have some sync policies ZFS allows us to toggle. They look like this:
The sync toggle documentation is out there and you should understand it it is crucial to understanding ZFS, but I want to demonstrate the choices as well.
I’ve got three choices + the compression options. Which one of these combinations is going to give me the best performance & durability for my Hyper-V VMs?
SQLIO Short Test Runs 3-6, all PivotTabled up for your enjoyment and ease of digestion:
As is usually the case in storage, IT, and hell, life in general, there are no free lunches here people. This graph tells you what you already know in your heart: the safest storage policy in ZFS-land (Always Sync, that is to say, commit writes to the rotationals post haste as if it was the last day on earth) is also the slowest. Nearly 20 seconds of latency as I force ZFS to commit everything I send it immediately (vs flush it later), which it struggles to do at a measly average speed of 4.4 megabytes/second.
Compression-wise, I thought I’d see a big difference between the various compression schemes, but I don’t. Lzgb, lz4, and the ultra-space-saving/high-cpu-cost gzip-9 all turn in about equal results from an IOPS & performance perspective. It’s almost a wash, really, and that’s likely because of the predictable nature of the IO SQLIO is generating.
Last point: ZFS, as Chris Wahl pointed out, is a sort of virtualization layer atop your storage. Now if you’re a virtualization guy like me or Wahl, that’s easy to grasp; Windows 2012 R2’s Storage Spaces concept is similar in function.
But sometimes in virtualization, it’s good to peel away the abstraction onion and watch what that looks like in practice. ZFS has a number of tools and monitors that look at your Zpool IO, but to really see how ZFS works, I advise you to run gstat. GStat shows what your disks are doing and if you’re carefully setting up your environment, you ought to be able to see the effects of your settings on each individual spindle.
Now look at this gstat sample. Under SQLIO-load, the zvol is showing 10,000 IOPS, 300+MB/s. But ada0-5, the physical drives, aren’t doing squat for several seconds at a time as SAN2 absorbs & processes all the IO coming at it.
That, friends, is the essence of ZFS.
Links/Knowledge/Required Reading Used in this Post: