So the family car (a tiny 2012 Mazda 3) lease is up in February which means it’s time to get a new Agnosto-ride for the Supe Module spouse, the Child Partition and -like dads everywhere know- all the heavy, awkwardly-shaped stuff that’s required to go everywhere the Child Partition goes.
It’s 2015, I’m nearing 40 and so I’m thinking Agnosto-ride 2.0 will be something bigger, safer, and because gas is so cheap and will never, ever, ever go up again, suitably powerful & commanding. Something established, something that says “Look upon and fear me,” yet is soft, friendly and maneuverable enough that my wife and I can park it without effort.
Or hell, maybe it can park itself.
That’s right. Time to go car shopping, baby.
I love shopping for cars, almost as much as I love shopping for storage arrays. When you step back and think about it, the two industries (cars & storage arrays) are so similar I’m convinced a skilled salesman could make a great living selling cars in the morning and slinging shelves in the afternoon. ((Or perhaps NetApp could merge with Ford and the same guy who sells you a Taurus could sell you a filer out of the same dealership))
Think about it. Glen works for a dealer selling Camrys in the morning, and he’s really good at bumping his commission up by convincing his mark to buy something that really should be included: a spare tire. By late afternoon, he’s pitching the exact same thing (High Availability via Active/Passive controllers) in expensive recurring license form to some poor storage schlub who just needs a few more TBs so he can sleep at night without worrying about his backups.
What’s more, the customer victim can’t just go and purchase the car/array from the manufacturer himself, he’s got to have some value added to that transaction by way of a VAR or a dealer, you see, else what reason is there for Glen? The customer must have Glen’s guidance; he literally is incapable of picking the right car or array for himself, even if the mark produces his own storage podcast or subscribes to Auto Week & Consumer Reports. The mark’s hands are held until such time that he selects the right car/array, which is always either the car/array closest to Glen, or the car/array that offers Glen’s employer the most margin.
For this is the way of things, except during quarter or year end.
And in both industries, the true cost of the product is either really hard to find or it’s been hidden in plain site, or it only applies in certain use cases, all of which makes determining a car/array’s value very hard to quantify. Yes, you can take all the variables, drop them in Excel, but pivot tables only go so far: the electric gets you an invaluable HOV sticker for 2x the cost of the range-anxiety free hybrid, while the all flash array that dedupes & compresses inline and goes like a bat out of hell costs twice as much as the compress-only hybrid array which has honest-to-God cheep ‘n deeps that you know and trust.
Lastly, no buyer of metal boxes with rotating round things ((usually)) is as biased & opinionated as car & storage buyers. “You’ll regret that POS Kia in a few years, it’ll let you down!” says the Honda snob to the dad trying to save a buck or two. “No one ever got fired for buying EMC!” shouts the storage traditionalist at his colleague who just wants a bunch of disks & software.
And in the end, all this …analysis if you can call it that…. is utterly worthless if your family doesn’t like the way the car handles or your DBA can’t quite grasp the concept of mounting a cloned snapshot of his prod LUN and insists on doing SQL backups the way he learned to do them in 19-diggity-7.
Don’t hate the player, hate the game, Jeff you’re thinking.
But I don’t! I love the player and the game. I just like winning and if that means Glen loses a point or two on his commission, so be it.
Which is why before I buy a car or a storage array, I arm myself as best I can. In the case of storage, it’s imperfect spreadsheets with complex formulas, some Greybeards on Storage, some SQLIO & IOMETER, and some caffeine. In the case of cars, it’s perfect spreadsheets + Clark Howard + myFico.com credit report ((Incidentally, it won’t be Myfico.com this time around since Fair Isaac apparently refuses to encrypt their entire site like a real bank would
)) + bank check just to let Glen know that I’m the real deal, that I could bolt and buy that other car he’s trash-talking if he doesn’t toss in the spare tire gratis.
Just as I was wrapping up my time at my last employer, Nimble Storage delivered a great big Christmas gift, seemingly prepared just for me. It was a gift that brought a bit of joy to my blackened, wounded heart, which has suffered so much at the hands of storage vendors in years gone by.
What was this amazing gift that warmed my soul in the bleak, cold Southern California winter? Something called SMI-S, or Smizz as I think of it. SMI-S is an open standard management framework for storage. But before I get into that, some background.
You may recall Nimble Storage from such posts as “#StorageGlory at 30,000 IOPS,” and “Nimble Storage Review: 30 Days at Ludicrous Speed.” It’s fair to say I’m a fan of Nimble, having deployed two of their mid-level arrays this year into separate production datacenter environments I was responsible for as an employee, not as a consultant. From designing the storage network & virtualization components, to racking & stacking the Nimble, to entrusting it with my VMs, my SQL volumes, and Exchange, I got to see and experience the whole product, warts and all, and came away damned impressed with its time-to-deploy, its flexibility, snapshotting, and speed.
But one of the warts really stood out, festered, itched and nagged at me. While there has been support for VMware infrastructure inside a Nimble array since day one, there was no integration or support for Microsoft’s System Center Virtual Machine Manager, or VMM as us ‘softies call it. What’s a Hyper-V & System Center fanboy to do?
Enter SMI-S, the Storage Management Initiative – Specification,
a somewhat awkwardly-named but comprehensive storage management spec allowing you to provision/destroy volumes, create snapshots or clones, and classify your tiers via 3rd party tools, just the way $Deity intended it.
SMI-S is a product of the Storage Networking Industry Association and there’s a ton of in-depth, technical PDFs up on their site, but what you need to know is the specification has been maturing for a decade or longer, and it’s been adopted by a modest but growing number of storage vendors. The big blue N has it, for instance, as does HP and Hitachi Data Systems.
The neat thing about SMI-S is that it’s built atop yet another open management model, the Common Information Model, which, as MS engineers know, is baked right into Windows Server (both as a listener and provider).
And that has made all the difference.
I love SMI-S and CIM (as well as WBEM) because it’s a great example of agnostic computing theory working out to my benefit in practice. SMI-S and CIM are open-standards that save time, money & complexity, abstracting (in this case) the particulars of your storage array and giving you the freedom to purchase & manage multiple different arrays from one software interface, System Center via that other great agnostic system, https.
Or, to put it another way, SMI-S and CIM help keep your butt where it should be, in your chair, doing great IT engineering work, not in the CIO’s office meekly asking, “Please sir, may I have another storage system API license?”
Fantastic. No proprietary or secret or expensive API here, no extra licensing costs on the compute side, no new SKUs, no gotchas.*
And now Nimble Storage has it.
Nimble’s implementation of SMI-S is based on the Open Pegasus project**, the Linux/Unix world’s implementation of CIM/WBEM. All Nimble had to do to make me feel happy & warm inside was download the tarball, make it, and stuff it into NimbleOS version 2.2, which is the release candidate OS posted last week.
For IT organizations looking to reduce complexity & consolidate vendors, a Nimble Array that can be managed via System Center is a good play. For Nimble, that may only be a small slice of the market, but in that slice and among IT pros who focus on value-engineering just as much as they focus on convergence, System Center support enhances the Nimble story and puts them in league with the bigger, more established players, like the big blue N.
Which is just where they want to be, it appears.
Nimble’s on a roll and closing out 2014 strong, with fiber channel support, new all-flash shelves, faster models, a more mature OS (in fact, I believe it’s mostly re-written from the 1.4x days), stable DSMs for my Microsoft servers, and now, like icing on the cake, an agnostic standards-based management layer that plugs right into my System Center.
* Well, one gotcha. As the release notes say: “Note: SCVMM can only discover volumes that have the agent_type smis attribute.When logical units are created using SCVMM, the SMI-S provider ensures the agent_type smis attribute is added to the volumes. However, volumes created from the array do not automatically have the attribute.You must add the attribute when you create the volume; otherwise, SCVMM will not be able to discover it. For more information about the agent_type smis attribute, see Create a Starter Volume.” So existing volumes won’t show in your VMM but’s not too big of a headache as you can storage live migrate your VMs to volumes you’ve provisioned via VMM.
Also, as a footnote, I believe NetApp charges for SMI-S support.
** Open Pegasus is itself affiliated with the Open Group, an unsexy but in my view exciting & important IT standards organization that 1) is legit as the official certifying body of the UNIX trademark, 2) is not ITIL-affiliated as best I can tell and 3) aligns very well with Microsoft’s servers & systems. SMI-S is Ajust one piece of the puzzle; another is instrumentation & other infrastructure items. To that end, the Open Group oversees work being done on Open Management Infrastructure, which Microsoft supports and can utilize via WSMAN and wmi. Cisco, Arista and others are on board with this, and though I haven’t yet programmed a Nexus switch with Powershell yet, it is a real option and offers a compelling vision for infrastructurists like me: best-in-class storage, network, compute hardware, all managed & instrumented via System Center or whatever https front-end is suitable. Jeff Snover detailed the relationship over two years ago in this blog. *** Incidentally,without SMI-S & CIM, there’d be no way for me to build a simulation SAN in the Daisetta Lab (#StorageGlory Achieved : 30 Days on a Windows SAN) and management via VMM, but as I detailed earlier this summer, you can: stand up a Windows file server box, turn on the feature “Standards Based Storage Management,” point VMM at it and provision
VXLAN, NVGRE & Network Controller, courtesy of Azure: This is something I’ve hoped for in the next version of Windows Server: a more compelling SDN story, something more than Network Function Virtualization & NVGRE encapsulation. If bringing the some of the best -and widely supported- bits of the VMware ecosystem to on-prem Hyper-V & System Center isn’t a virtualization engineer’s wet dream, I don’t know what is.
VMware meet Azure Site Recovery: Coming soon to a datacenter near you, failover your VMware infrastructure via Azure Site Recovery, the same way Hyper-V shops can
In-place/rolling upgrades for Hyper-V Clusters: This feature was announced with the release of Windows Server Technical Preview (of course, I only read about it after I wiped out my lab 2012 R2 cluster) but there’s a lot more detail on it from TechEd via Finn: rebuild physical nodes without evicting them first.You keep the same Cluster Name Object, simply live migrating your VMs off your targeted hosts. Killer.
Single cluster node failure: In the old days, I used to lose sleep over clusres.dll, or clussvc.exe, two important pieces in Microsoft Clustering technology. Sure, your VMs will failover & restart on a new host, but that’s no fun. Ben Armstrong demonstrated how vNext handles node failure by killing the cluster service live during his presentation. Finn says the VMs didn’t failover,but the host was isolated by the other nodes and the cluster simply paused and waited for the node to recovery (up to 4 minutes). Awesome!
Azure Witness: Also for clustering fans who are torn (as I am) between selecting file or disk witness for clusters: you will soon be able to add mighty Azure as a witness to your on-prem cluster. Split brain fears no more!
More enhancements for Storage QoS: Ensure that your tenant doesn’t rob IOPS from everyone else.
The Windows SAN, for real: Yes, we can soon do offsite block-level replication from our on-prem Tiered Storage Spaces servers.
New System Center coming next year: So much to unpack here, but I’ll keep it brief. You may love System Center, you may hate it, but it’s not dead. I’m a fan of the big two: VMM, and ConfigMan. OpsMan I’ve had a love/hate relationship with. Well the news out of TechEd Europe is that System Center is still alive, but more integration with Azure + a substantial new release will debut next summer. So the VMM Technical Preview I’m running in the Daisetta Lab (which installs to C:Program FilesVMM 2012 R2 btw) is not the VMM I was looking for.
Other incredible announcements:
Docker, CoreOS & Azure: Integration of the market-leading container technology with Azure is apparently further along than I believed. A demo was shown that hurts my brain to think about: Azure + Docker + CoreOS, the linux OS that has two OS partitions and is fault-tolerant. Wow
Azure Operational Insights: I’m a fan of the Splunk model (point your firehose of data/logs/events at a server, and let it make sense of it) and it appears Azure Operational Insights is a product that will jump into that space. Screen cap from Finn
This is really exciting stuff.
Looking back on the last few years in Microsoft’s history, one thing stands out: the painful change from the old Server 2008R2 model to the new 2012 model was worth it. All of the things I’ve raved about on this blog in Hyper-V (converged network, storage spaces etc) were just teasers -but also important architectural elements- that made the things we see announced today possible.
The overhaul* of Windows Server is paying huge dividends for Microsoft and for IT pros who can adapt & master it. Exciting times.
* unlike the Windows mobile > Windows Phone transition, which was not worth it
So if you work in IT, and even better, if you’re in the virtualization space of IT as I am, you have to know that VMworld is happening this week.
VMworld is just about the biggest vCelebration of vTechnologies there is. Part trade-show, part pilgrimage, part vLollapalooza, VMworld is where all the sexy new vProducts are announced by VMware, makers of ESXi, vSphere, vCenter, and so many other vThings.
It’s an awesome show…think MacWorld at the height of Steve Jobs but with fewer hipsters and way more virtualization engineers. Awesome.
And I’ve never been :sadface:
And 2014’s VMworld was a doozy. You see, the vGiant announced a new 2U, four node vSphere & vSAN cluster-in-a-box hardware device called EVO:RAIL. I’ve been reading all about EVO:RAIL for the last two days and here’s what I think as your loyal Hyper-V blogger:
What’s in a name? Right off the bat, I was struck by the name for this appliance. EVO:RAIL…say what? What’s VMware trying to get across here? Am I to associate EVO with the fast Mitsubishi Lancers of my youth, or is this EVO in the more Manga/Anime sense of the word? Taken together, EVO:RAIL also calls to mind sci-fi, does it not? You could picture Lt. Cmdr Data talking about an EVO:RAIL to Cmdr Riker, as in “The Romulan bird of prey is outfitted with four EVO:RAIL phase cannons, against which the Enterprise’s shields stand no chance.” Speaking of guns: I also thought of the US Navy’s Railguns; long range kinetic weapons designed to destroy the Nutanix/Simplivity the enemy.
If you’re selling an appliance, do you need vExperts? One thing that struck me about VMware’s introduction of EVO:RAIL was their emphasis on how simple it is to rack, stack, install, deploy and virtualize. They claim the “hyper-converged” 2U box can be up and running in about 15 minutes; a full rack of these babies could be computing for you in less than 2 hours. They’ve built a sexy HTML 5 GUI to manage the thing, no vSphere console or PowerCLI in sight. It’s all pre-baked, pre-configured, and pre-built for you, the small-to-medium enterprise. It’s so simple a help desk guy could set it up. So with all that said, do I still need to hire vExperts and VCDX pros to build out my virtualization infrastructure? It would appear not. Is that the message VMware is trying to convey here?
One SKU for the Win: I can’t be the only one that thinks buying the VMware stack is a complicated & time-consuming affair. Chris Wahl points out that EVO:RAIL is one SKU, one invoice, one price to pay, and VMware’s product page confirms that, saying you can buy a Dell EVO:RAIL or a Fujitsu EVO:RAIL, but whatever you buy, it’ll be one SKU. This is really nice. But why? VMware is famous for licensing its best-in-class features…why mess with something that’s worked so well for them?
One could argue that EVO:RAIL is a reaction to simplified pricing structures on rival systems…let’s be honest with ourselves. What’s more complicated: buying a full vSphere and/or vHorizon suite for a new four node cluster, or purchasing the equivalent amount of computing units in Azure/AWS/Google Compute? What model is faster to deploy, from sales call to purchasing to receiving to service? What model probably requires consulting help?
Don’t get me wrong, I think it’s great. I like simple menus, and whereas buying VMware stuff before was like choosing from a complicated, multi-page, multi-entree menu, now it’s like buying burgers at In ‘n Out. That’s very cool, but it means something has changed in vLand.
I love the density: As someone who’s putting the finishing touches on my own new virtualization infrastructure, I love the density in EVO:RAIL. 2 Rack Units with E5-26xx class Xeons packing 6 cores each means you can pack about 48 cores into 2U! Not bad, not bad at all. The product page also says you can have up to 16TB of stroage in those same 2U (courtesy of VSAN) and while you still need a ToR switch to jack into, each node has 2x10GbE SFP+ or Copper. Which is excellent. RAM is the only thing that’s a bit constrained; each node in an EVO:RAIL can only hold 192GB of RAM, a total of 768GB per EVO:RAIL.In comparison, my beloved 2U pizza boxes offer more density in some places, but less overall, given than 1 Pizza Box = one node. In the Supermicros I’m racking up later this week, I can match the core count (4×12 Core E5-46xx), improve upon the RAM (up to 1TB per node) and easily surpass the 16TB of storage. That’s all in 2U and all for about $15-18k.Where the EVO:RAIL appears to really shine is in VM/VDI density. VMware claims a single EVO:RAIL is built to support 100 General Purpose VMs or to support up to 250 VDI sessions, which is f*(*U#$ outstanding.
I wonder if I can run Hyper-V on that: Of course I thought that. Because that would really kick ass if I could.
Overall, a mighty impressive showing from VMware this week. Like my VMware colleagues, I pine for an EVO:RAIL in my lab.
I think EVO:RAIL points to something bigger though…This product marks a shift in VMware’s thinking, a strategic reaction to the changes in the marketplace. This is not just a play against Nutranix and other hyper-converged vendors, but against the simplicity and non-specialist nature of cloud Infrastructure as a Service. This is a play against complexity in other words…this is VMware telling the marketplace that you can have best-in-class virtualization without worst-in-class licensing pain and without hiring vExperts to help you deploy it.
My idea was to combine all types of disks -rotational 3.5″ & 2.5″ drives, SSDs, mSATAs, hell, I considered USB- into one tight, well-built storage box for my lab and home data needs. A sort of Storage Ark, if you will; all media types were welcome, but only if they came in twos (for mirroring & Parity sake, of course) and only if they rotated at exactly 7200 RPM and/or leveled their wears evenly across the silica.
And onto this unholy motley crue of hard disks I slapped a software architecture that promised to abstract all the typical storage driver, interface, and controller nonsense away, far, far away in fact, to a land where the storage can be mixed, the controllers diverse, and by virtue of the software-definition bits, network & hypervisor agnostic. In short, I wanted to build an agnostic #StorageGlory box in the Daisetta Lab.
Right. So what did I use to achieve this? ZFS and Zpools?
Hell no, that’s so January.
VSAN? Ha! I’m no Chris Wahl.
I used Windows, naturally.
That’s right. Windows. Server 2012 R2 to be specific, running Core + Infrastructure GUI with 8GB of RAM, and some 17TB of raw disk space available to it. And a little technique developed by the ace Microsoft server team called Tiered Storage Spaces.
Was a #StorageGlory Achievement Unlocked, or was it a dud?
Here’s my review after 30 days on my Windows SAN: san.daisettalabs.net.
It doesn’t make you pick a side in either storage or storage-networking: Do you like abstracted pools of storage, managed entirely by software? Put another way, do you hate your RAID controller and crush on your old-school NetApp filer, which seemingly could do everything but object storage?
When I say block, do you instinctively say file? Or vice-versa?
Well then my friend, have I got a storage system for your lab (and maybe production!) environment: Windows Storage Spaces (now with Tiering!) offers just about everything guys like you or me need in a storage system for lab & home media environments. I love it not just because it’s Microsoft, but also because it doesn’t make me choose between storage & storage-networking paradigms. It’s perhaps the ultimate agnostic storage technology, and I say that as someone who thinks about agnosticism and storage.
You know what I’m talking about. Maybe today, you’ll need some block storage for this VM or that particular job. Maybe you’re in a *nix state of mind and want to fiddle with NFS. Or perhaps you’re feeling bold & courageous and decide to try out VMware again, building some datastores on both iSCSI LUNs and NFS shares. Then again, maybe you want to see what SMB 3.0 3.0 is all about, the MS fanboys sure seem to be talking it up.
The point is this: I don’t care what your storage fancy is, but for lab-work (which makes for excellence in work-work) you need a storage platform that’s flexible and supportive of as many technologies as possible and is, hopefully, software-defined.
And that storage system is -hard to believe I’ll grant you- Windows Server 2012 R2.
I love storage and I can’t think of one other storage system -save for maybe NetApp- that let’s me do crazy things like store .vmdks inside of .vhdxs (oh the vIrony!), use SMB 3 multichannel over the same NICs I’m using for iSCSI traffic, create snapshots & clones just like big filers all while giving me the performance-multiplier benefits of SSDs and caching and a reasonable level of resiliency.
File this one under WackWackStorageGloryAchievedWindows boys and girls.
I can do it all with Storage Spaces in 2012 R2.
As I was thinking about how to write about Storage Spaces, I decided to make a chart, if only to help me keep it straight. It’s rough but maybe you’ll find it useful as you think about storage abstraction/virtualization tech:
It’s easy to build and supports your disks & controllers: This is a Microsoft product. Which means it’s easy to deploy & build for your average server guy. Mine’s running on a very skinny, re-re-purposed SanDisk Ready Cache SSD. With Windows 2012 R2 server running the Infrastructure Management GUI (no explorer.exe, just Server Manager + your favorite snap-ins), it’s using about 6GB of space on the boot drive.
And drivers for the Intel C226 SATA controller, the LSI 9218si SAS card, and the extra ASMedia 1061 controller were all installed automagically by Windows during the build.
The only other system that came close to being this easy to install -as a server product- was Oracle Solaris 11.2 Beta. It found, installed drivers for, and exposed all controllers & disks, so I was well on my way to going the ZFS route again, but figured I’d give Windows a chance this time around.
Nexenta 4, in contrast, never loaded past the Install Community Edition screen.
It’s improved a lot over 2012: Storage Spaces almost two years ago now, and I remember playing with it at work a bit. I found it to be a mind-f*** as it was a radically different approach to storage within the Windows server context.
I also found it to be slow, dreadfully slow even, and not very survivable. Though it did accept any disk disk I gave it, it didn’t exactly like it when I removed a USB drive during an extended write test. And it didn’t take the disk back at the conclusion of the test either.
Like everything else in Microsoft’s current generation, Storage Spaces in 2012 R2 is much better, more configurable, easier to monitor, and more tolerant of disk failures.
It also has something for the IOPS speedfreak inside all of us.
Tiered Storage Spaces & Adjustable write cache: Coming from ZFS & the Adaptive Replacement Cache, the ZFS Intent Log, the SLOG, and L2ARC, I was kind of hooked on the idea of using massive amounts of my ECC RAM to function as a sort of poor-mans NVRAM.
Windows can’t do that, but with Tiered Storage Spaces, you can at least drop a few SSDs in your array (in my case three x 256GB 840 EVO & one 128GB Samsung 830), mix them into your disk pool, and voila! Fast read-cache, with a Microsoft-flavored MRU/LFU algorithm of some type keeping your hottest data on the fastest disks and your old data on the cheep ‘n deep rotationals.
Which i naturally did while building this guy out. I mean, who wouldn’t want more write-cache?
But there’s a huge gotcha buried in the Technet and blogposts I found about this. I wanted to pool all my disks together into as large of a single virtual disk as possible, then pack iSCSI-connected .vhdxs, SMB 3 shares, and more inside that single, durable & tiered virtual disk.What I didn’t want was several virtual disks (it helped me to think of virtual disks as a sort of Aggregate) with SMB 3 shares and vhdx files stored haphazardly between them.
Which is what you get when you adjust the write-cache size. Recall that I have a capacity of about 17TB raw among all my disks. Building a storage pool, then a virtual disk with a 10GB write cache gave me a tiered virtual disk with a maximum size of about 965GB. More on that below.
It can be wicked fast, but so is RAID 0: Check out my standard SQLIO benchmark routine, which I run against all storage technologies that come my way. The 1.5 hour test is by no means comprehensive -and I’m not saying the IOPS counter is accurate at all (showing max values across all tests by the way)- but I like this test because it lets me kick the tires on my array, take her out for a spin, and see how she handles.
And with a “Simple” layout (no redundancy, probably equivalent to RAID 0), she handles pretty damn well, but even I’m not crazy enough to run tiered storage spaces in a simple layout config:
What’s odd is how poorly the array performed with 10GB of “Write Cache.” Not sure what happened here, but as you can see, latency spiked higher during the 10GB write cache write phase of the test than just about every other test segment.
Something to do with parity no doubt.
For my lab & home storage needs, I settled on a Mirror 2-way parity setup that gives me moderate performance with durability in mind, though not much as you’ll see below.
Making the most of my lab/home network and my NICs: Recall that I have six GbE NICs on this box. Two are built into the Supermicro board itself (Intel), and the other four come by way of a quad-port Intel I350-T4 server NIC.
Anytime you’re planning to do a Microsoft cluster in the 1GbE world, you need lots of NICs. It’s a bit of a crutch in some respects, especially in iSCSI. Typically you VLAN off each iSCSI NIC for your Hyper-V hosts and those NICs do one thing and one thing only: iSCSI, or Live Migration, or CSV etc. Feels wasteful.
But on my new storage box at home, I can use them for double-duty: iSCSI (or LM/CSV) as well as SMB 3. Yes!
Usually I turn off Client for Microsoft Networks (the SMB file sharing toggle in NIC properties) on each dedicated NIC (or vEthernet), but since I want my file cake & my block cake at the same time, I decided to turn SMB on on all iSCSI vEthernet adapters (from the physical & virtual hosts) and leave SMB on the iSCSI NICs on san.daisettalabs.net as well.
The end result? This:
[table caption=”Storage Networking-All of the Above Approach” width=”500″ colwidth=”20|100|50″ colalign=”left|left|center|left|right”]
1,MGMT,100,192.168.100.15,MGMT & SMB3
2,CLNT,102,192.168.102.15,Home net & SMB3
3,iSCSI-10,10,172.16.10.x,iSCSI & SMB3
4,iSCSI-11,10,172.16.11.x,iSCSI & SMB3
5,iSCSI-12,10,172.16.12.x,iSCSI & SMB3
That’s five, count ’em five NICs (or discrete channels, more specifically) I can use to fully soak in the goodness that is SMB 3 multichannel, with the cost of only a slightly unsettling epistemological question about whether iSCSI NICs are truly iSCSI if they’re doing file storage protocols.
Now SMB 3 is so transparent (on by default) you almost forget that you can configure it, but there’s quite a few ways to adjust file share performance. Aidan Finn argues for constraining SMB 3 to certain NICs, while Jose Barreto details how multichannel works on standalone physical NICs, a pair in a team, and multiple teams of NICs.
I haven’t decided which model to follow (though on san.daisettalabs.net, I’m not going to change anything or use Converged switching…it’s just storage), but SMB 3 is really exciting and it’s great that with Storage Spaces, you can have high performance file & block storage. I’ve hit 420MB/sec on synchronous file copies from san to host and back again. Outstanding!
I Finally got iSNS to work and it’s…meh: One nice thing about san.daisettalabs.net is that that’s all you need to know…the FQDN is now the resident iSCSI Name Server, meaning it’s all I need to set on an MS iSCSI Initiator. It’s a nice feature to have, but probably wasn’t worth the 30 minutes I spent getting it to work (hint: run set-wmiinstance before you run iSNS cmdlets in powershell!) as iSNS isn’t so great when you have…
SMI-S, which is awesome for Virtual Machine Manager fans: SMI-S, you’re thinking, what the the hell is that? Well, it’s a standardized framework for communicating block storage information between your storage array and whatever interface you use to manage & deploy resources on your array. Developed by no less an august body than the Storage Networking Industry Association (SNIA), it’s one of those “standards” that seem like a good idea, but you can’t find it much in the wild as it were. I’ve used SMI-S against a NetApp Filer (in the Classic DoT days, not sure if it works against cDoT) but your Nimbles, your Pures, and other new players in the market get the same funny look on their face when you ask them if they support SMI-S.
“Is that a vCenter thing?” they ask.
Microsoft, to its credit, does. Right on Windows Server. It’s a simple feature you install and two or three powershell commands later, you can point Virtual Machine Manager at it and voila! Provision, delete, resize, and classify iSCSI LUNS on your Windows SAN, just like the big boys do (probably) in Azure, only here, we’re totally enjoying the use of our corpulent.vhdx drives, whereas in Azure, for some reason, they’re still stuck on .vhds like rookies. Haha!
It’s a very stable storage platform for Microsoft Clustering: I’ve built a lot of Microsoft Hyper-V clusters. A lot. More than half a dozen in production, and probably three times that in dev or lab environments, so it’s like second nature to me. Stable storage & networking are not just important factors in Microsoft clusters, they are the only factors.
So how is it building out a Hyper-V cluster atop a Windows SAN? It’s the same, and different at the same time, but, unlike so many other cluster builds, I passed the validation test on the first attempt with green check marks everywhere. And weeks have gone by without a single error in the Failover Clustering snap-in; it’s great.
It’s expensive and seemingly not as redundant as other storage tech: When you build your storage pool out of offlined disks, your first choice is going to involve (just like other storage abstraction platforms) disk redundancy. Microsoft makes it simple, but doesn’t really tell you the cost of that redundancy until later in the process.
Recall that I have 17TB of raw storage on san.daisettalabs.net, organized as follows:
Disk Type, Quantity, Size, Format, Speed, Function
WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Cheep ‘n deep
Samsung 840 EVO SSD, 3, 256GB, 512byte, 250MB/read, Tiers not fears
Samsung 830 SSD, 1, 128GB, 512byte, 250MB/read, Tiers not fears
HGST 3.5″ Momentus, 6, 2TB, 512byte, 105MB/r/w, Cheep ‘n deep
Now, according to my trusty IOPS Excel calculator, if I were to use traditional RAID 5 or RAID 6 on that set of spinners, I’d get about 16.5TB usable in the former, 15TB usable in the latter (assuming RAID penalty of 5 & 6, respectively)
For much of the last year, I’ve been using ZFS & RAIDZ2 on the set of six WD Red 2.5″ drives. Those have a raw capacity of 6TB. In RAIDZ2 (roughly analogous to RAID 6), I recall getting about 4.2TB usable.
All in all, traditional RAID & ZFS’ RAIDZ cost me between 12% and 35% of my capacity respectively.
So how much does Windows Storage Spaces resiliency model (Mirrored, 2-way parity) cost me? A lot. We’re in RAID-DP territory here people:
Ack! With 17TB of raw storage, I get about 5.7TB usable, a cost of about 66%!
And for that, what kind of resieliency do I get?
I sure as hell can’t pull two disks simultaneously, as I did live during prod in my ZFS box. I can suffer the loss of only a single disk. And even then, other Windows bloggers point to some pain as the array tries to adjust.
Now, I’m not the brightest on RAID & parity and such, so perhaps there’s a more resilient, less costly way to use Storage Spaces with Tiering, but wow…this strikes me as a lot of wasted disk.
Not as easy to de-abstract the storage: When a disk array is under load, one of my favorite things to do is watch how the IO hits the physical elements in the array. Modern disk arrays make what your disks are doing abstract, almost invisible, but to truly understand how these things work, sometimes you just want the modern equivalent of lun stats.
In ZFS, I loved just letting gstat run, which showed me the load my IO was placing on the ARC, the L2ARC and finally, the disks. Awesome stuff:
As best as I can tell, there’s no live powershell equivalent to gstat for Storage Spaces. There are teases though; you can query your disks, get their SMART vitals, and more, but peeling away the onion layers and actually watching how Windows handles your IO would make Storage Spaces the total package.
So that’s about it: this is the best storage box I’ve built in the Daisetta Lab. No regrets going with Windows. The platform is mature, stable, offers very good performance, and decent resiliency, if at a high disk cost.
I’m so impressed I’ve checked my Windows SAN skepticism at the door and would run this in a production environment at a small/medium business (clustered, in the Scaled Out File Server role). Cost-wise, it’s a bargain. Check out this array: it’s the same exact Hardware a certain upstart Storage vendor I like (that rhymes with Gymbal Porridge) sells, but for a lot less!
So the three of you who read this blog might be wondering why I haven’t been posting much lately.
Where’s Jeff, the cloud praxis guy & Hyper-V fanboy, who says IT pros should practice their cloud skills? you might have asked.
Well, I’ll tell you where I’ve been. One, I’ve been working my tail off at my new job where Cloud Praxis is Cloud Game Time, and two, the Child Partition, as adorable and fun as he is, is now 19 months old, and when he’s not gone down for a maintenance cycle in the crib, he’s running Parent Partition and Supervisor Module spouse ragged, consuming all CPU resources in the cluster. Wow that kid has some energy!
Yet despite that (or perhaps because of that), I found some time to re-think my storage strategy for the Daisetta Lab.
Recall that for months I’ve been running a ZFS array atop a simple NAS4Free instance, using the AMD-powered box as a multi-path iSCSI target for Cluster Shared Volumes. But continuing kernel-on-iscsi-target-service homicides, a desire to combine all my spare drives & resources into a new array, and a vacation-time cash-infusion following my exit from the last job lead me to build this for only about $600 all-in:
Here are some superlatives and other interesting curios about this new box:
It was born on the 4th of July, just like ‘Merica and is as big, loud, ostentatious and overbearing as ‘Merica itself
I would name it ‘Merica.daisettalabs.net if the OS would accept it
It’s a real server. With a real Supermicro X10SAT server/workstation board. No more hacking Intel .inf files to get server-quality drivers
It has a real server SAS card, an LSI 9218i something or other with SAS-SATA breakout cables
It doesn’t make me choose between file or block storage, and is object-storage curious. It can even do NFS or SMB 3…at the same time.
It does ex post facto dedupe -the old model- rather than the new hot model of inline dedupe and/or compression, which makes me resent it, but only a little bit
It’s combining three storage chipsets -the LSI card, the Supermicro’s Intel C226, and ASMedia 1061- into one software-defined logical system. It’s abstracting all that hardware away using pools, similar to ZFS, but in a different, more sublime & elegant way.
It doesn’t have the ARC –ie RAM AS STORAGE– which makes me really resent it, but on the plus side, I’m only giving it 12GB of RAM and now have 16GB left for other uses.
It has 16 Disks : 12 rotational drives (6x1TB 5400 RPM & 6x2TB 7200RPM) and four SSDs (3x256GB Samsung 840 EVO & 1x128GB Samsung 830) and one boot drive (1x32GB SanDisk ReadyCache drive re-purposed as general SSD)
Total capacity RAW: nearly 19TB. Usable? I’ll let you know. Asking
“Do I need that much?” is like asking “Does ‘Merica need to stretch from Sea to Shining Sea?” No I don’t, but yes ‘Merica does. But I had these drives in stock, as it were, so why not?
It uses so much energy & power that it has, in just a few days, erased any greenhouse gas savings I’ve made driving a hybrid for one year. Sorry Mother Earth, looks like I’m in your debt again
But seriously, under load, it’s hitting about 310 watts. At idle, 150w. Not bad all things considered. Haswell + full C states & PCIe power management work.
It’s built as veritable wind-tunnel as it lives the garage. In Southern California. And it’s summer. Under load, the CPU is hitting about 65C and the south-bridge flirts with 80c, but it’s stable.
It has six, yes, six, 1GbE Intel NICs. Two are on the motherboard, and I’m using a 4 port PCIe 2 card. And of course, I’ve enabled Jumbo Frames. I mean do you have to even ask at this point?
It uses virtual disks. Into which you can put other virtual disks. And even more virtual disks inside those virtual disks. It’s like Christopher Nolan designed this storage archetype while he wrote Inception…virtual disk within virtual disk within virtual disk. Sounds dangerous, but in the Daisetta Lab, Who Dares Wins!
So yeah. That’s what I’ve been up to. Geeking out a little bit like a gamer, but simultaneously taking the next step in my understanding, mastery & skilled manipulation of a critical next-gen storage technology I’ll be using at work soon.
Can you guess what that is?
Stay tuned. Full reveal & some benchmarks/thoughts tomorrow.
Greetings to you Labworks readers, consumers, and conversationalists. Welcome to the last verse of Labworks Chapter 1, which has been all about building a durable and performance-oriented ZFS storage array for Hyper-V and/or VMware.
Today we’re going to circle back to the very end of Labworks 1:1, where I assigned myself some homework: find out why my writes suck so bad. We’re going to talk about a man named ZIL and his sidekick the SLOG and then we’re going to check out some Excel charts and finish by considering ZFS’ sync models.
But first, some housekeeping: SAN2, the ZFS box, has undergone minor modification. You can find the current array setup below. Also, I have a new switch in the Daisetta Lab, and as switching is intimately tied to storage networking & performance, it’s important I detail a little bit about it.
Labworks 1:4 – Small Business SG300 vs Catalyst 2960S
Cisco’s SG-300 & SG-500 series switches are getting some pretty good reviews, especially in a home lab context. I’ve got an SG-300 and really like it as it offers a solid spectrum of switching options at Layer 2 as well as a nice Layer 3-lite mode all for a tick under $200. It even has a real web-interface if your CLI-shy, which
I’m not but some folks are.
Sadly for me & the Daisetta Lab, I need more ports than my little SG-300 has to offer. So I’ve removed it from my rack and swapped it for a 2960S-48TS-L from the office, but not just any 2960S.
No, I have spiritual & emotional ties to this 2960s, this exact one. It’s the same 2960s I used in my January storage bakeoff of a Nimble array, the same 2960s on which I broke my Hyper-V & VMware cherry in those painful early days of virtualization, yes, this five year old switch is now in my lab:
Sure it’s not a storage switch, in fact it’s meant for IDFs and end-users and if the guys on that great storage networking podcast from a few weeks back knew I was using this as a storage switch, I’d be finished in this industry for good.
But I love this switch and I’m glad its at the top of my rack. I saved 1U, the energy costs of this switch vs two smaller ones are probably a wash, and though I lost Layer 3 Lite, I gained so much more: 48 x 1GbE ports and full LAN-licensed Cisco IOS v 15.2, which, agnostic computing goals aside for a moment, just feels so right and so good.
And with the increased amount of full-featured switch ports available to me, I’ve now got LACP teams of three on agnostic_node_1 & 2, jumbo frames from end to end, and the same VLAN layout.
Here’s the updated Labworks schematic and the disk layout for SAN2:
Disk Type, Quantity, Size, Format, Speed, Function
WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Zpool Members
Samsung 840 EVO SSD, 1, 128GB, 512byte, SATA 3, L2ARC Read Cache
Labworks 1:5 – A Man named ZIL and his sidekick, the SLOG
Labworks 1:1 was all about building durable & performance-oriented storage for Hyper-V & VMware. And one of the unresolved questions I aimed to solve out of that post was my poor write performance.
Review the hardware table and you’ll feel like I felt. I got me some SSD and some RAM, I provisioned a ZIL so write-cache that inbound IO already ZFS, amiright? Show me the IOPSMoney Jerry!
Well, about that. I mischaracterized the ZIL and I apologize to readers for the error. Let’s just get this out of the way: The ZFS Intent Log (ZIL) is not a write-cache device as I implied in Labworks 1:1.
The ZIL, whether spread out among your rotational disks by ZFS design, or applied to a Separate Log Device (a SLOG), is simply a synchronous writes mechanism, a log designed to ensure data integrity and report (IO ACK) back to the application layer that writes are safe somewhere on your rotational media. The ZIL & SLOG are also a disaster recovery mechanisms/devices ; in the event of power-loss, the ZIL, or the ZIL functioning on a SLOG device, will ensure that the writes it logged prior to the event are written to your spinners when your disks are back online.
Now there seem to be some differences in how the various implementations of ZFS look at the ZIL/SLOG mechanism.
Nexenta Community Edition, based off Illumos which is the open source descendant of Sun’s Solaris, says your SLOG should just be a write-optimized SSD, but even that’s more best practice than hard & fast requirement. Nexenta touts the ZIL/SLOG as a performance multiplier, and their excellent documentation has helpful charts and graphics reinforcing that.
In contrast, the most popular FreeBSD ZFS implementations documentation paints the ZIL as likely more trouble than its worth. FreeNAS actively discourages you from provisioning a SLOG unless it’s enterprise-grade, accurately pointing out that the ZIL & a SLOG device aren’t write-cache and probably won’t make your writes faster anyway, unless you’re NFS-focused (which I’m proudly, defiantly even, not) or operating a large database at scale.
What’s to account for the difference in documentation & best practice guides? I’m not sure; some of it’s probably related to *BSD vs Illumos implementations of ZFS, some of it’s probably related to different audiences & users of the free tier of these storage systems.
The question for us here is this: Will you benefit from provisioning a SLOG device if you build a ZFS box for Hyper-V and VMWare storage for iSCSI?
I hate sounding like a waffling storage VAR here, but I will: it depends. I’ve run both Nexenta and NAS4Free; when I ran Nexenta, I saw my SLOG being used during random & synchronous write operations. In NAS4Free, the SSD I had dedicated as a SLOG never showed any activity in zfs-stats, gstat or any other IO disk tool I could find.
One could spend weeks of valuable lab time verifying under which conditions a dedicated SLOG device adds performance to your storage array, but I decided to cut bait. Check out some of the links at the bottom for more color on this, but in the meantime, let me leave you with this advice: if you have $80 to spend on your FreeBSD-based ZFS storage, buy an extra 8GB of RAM rather than a tiny, used SLC or MLC device to function as your SLOG. You will almost certainly get more performance out of a larger ARC than by dedicating a disk as your SLOG.
Labworks 1:6 – Great…so, again, why do my writes suck?
Recall this SQLIO test from Labworks 1:1:
As you can see, read or write, I was hitting a wall at around 235-240 megabytes per second during much of “Short Test”, which is pretty close to the theoretical limit of an LACP team with two GigE NICs.
But as I said above, we don’t have that limit anymore. Whereas there were once 2x1GbE Teams, there are now 3x1GbE. Let’s see what the same test on the same 4KB block/4KB NTFS volume yields now.
SQLIO short test, take two, sort by Random vs Sequential writes & reads:
By jove, what’s going on here? This graph was built off the same SQLIO recipe, but looks completely different than Labworks 1. For one, the writes look much better, and reads look much worse. Yet step back and the patterns are largely the same.
It’s data like this that makes benchmarking, validating & ultimately purchasing storage so tricky. Some would argue with my reliance on SQLIO and those arguments have merit, but I feel SQLIO, which is easy to script/run and automate, can give you some valuable hints into the characteristics of an array you’re considering.
Let’s look at the writes question specifically.
Am I really writing 350MB/s to SAN2?
On the one hand, everything I’m looking at says YES: I am a Storage God and I have achieved #StorageGlory inside the humble Daisetta Lab HQ on consumer-level hardware:
SAN2 is showing about 115MB/s to each Broadcom interface during the 32KB & 64KB samples
Agnostic_Node_1 perfmon shows about the same amount of traffic eggressing the three vEthernet adapters
The 2960S is reflecting all that traffic; I’m definitely pushing about 350 megabytes per second to SAN2; interface port channel 3 shows TX load at 219 out of 255 and maxing out my LACP team
On the other hand, I am just an IT Mortal and something bothers:
CPU is very high on SAN2 during the 32KB & 64KB runs…so busy it seems like the little AMD CPU is responsible for some of the good performance marks
While I’m a fan of the itsy-bitsy 2.5″ Western Digitial RED 1TB drives in SAN2, under no theoretical IOPS model is it likely that six of them, in RAIDZ-2 (RAID 6 equivalent) can achieve 5,000-10,000 IOPS under traditional storage principles. Each drive by itself is capable of only 75-90 IOPS
If something is too good to be true, it probably is
Sr. Storage Engineer Neo feels really frustrated at this point; he can’t figure out why his writes suck, or even if they suck, and so he wanders up to the Oracle to get her take on the situation and comes across this strange Buddha Storage kid.
Labworks 1:7 – The Essence of ZFS & New Storage model
In effect, what we see here is is just a sample of the technology & techniques that have been disrupting the storage market for several years now: compression & caching multiply performance of storage systems beyond what they should be capable of, in certain scenarios.
As the chart above shows, the test2 volume is compressed by SAN2 using lzjb. On top of that, we’ve got the ZFS ARC, L2ARC, and the ZIL in the mix. And then, to make things even more complicated, we have some sync policies ZFS allows us to toggle. They look like this:
The sync toggle documentation is out there and you should understand it it is crucial to understanding ZFS, but I want to demonstrate the choices as well.
I’ve got three choices + the compression options. Which one of these combinations is going to give me the best performance & durability for my Hyper-V VMs?
SQLIO Short Test Runs 3-6, all PivotTabled up for your enjoyment and ease of digestion:
As is usually the case in storage, IT, and hell, life in general, there are no free lunches here people. This graph tells you what you already know in your heart: the safest storage policy in ZFS-land (Always Sync, that is to say, commit writes to the rotationals post haste as if it was the last day on earth) is also the slowest. Nearly 20 seconds of latency as I force ZFS to commit everything I send it immediately (vs flush it later), which it struggles to do at a measly average speed of 4.4 megabytes/second.
Compression-wise, I thought I’d see a big difference between the various compression schemes, but I don’t. Lzgb, lz4, and the ultra-space-saving/high-cpu-cost gzip-9 all turn in about equal results from an IOPS & performance perspective. It’s almost a wash, really, and that’s likely because of the predictable nature of the IO SQLIO is generating.
Last point: ZFS, as Chris Wahl pointed out, is a sort of virtualization layer atop your storage. Now if you’re a virtualization guy like me or Wahl, that’s easy to grasp; Windows 2012 R2’s Storage Spaces concept is similar in function.
But sometimes in virtualization, it’s good to peel away the abstraction onion and watch what that looks like in practice. ZFS has a number of tools and monitors that look at your Zpool IO, but to really see how ZFS works, I advise you to run gstat. GStat shows what your disks are doing and if you’re carefully setting up your environment, you ought to be able to see the effects of your settings on each individual spindle.
Now look at this gstat sample. Under SQLIO-load, the zvol is showing 10,000 IOPS, 300+MB/s. But ada0-5, the physical drives, aren’t doing squat for several seconds at a time as SAN2 absorbs & processes all the IO coming at it.
That, friends, is the essence of ZFS.
Links/Knowledge/Required Reading Used in this Post:
Hello Labworks fans, detractors and partisans alike, hope you had a nice Easter / Resurrection / Agnostic Spring Celebration weekend.
Last time on Labworks 2:1-4, we looked at some of the awesome teaming options Microsoft gave us with Server 2012 via its multiplexor driver. We also made the required configuration adjustments on our switch for jumbo frames & VLAN trunking, then we built ourselves some port channel interfaces flavored with LACP.
I think the multiplexor driver/protocol is one of the great (unsung?) enhancements of Server 2012/R2 because it’s a sort of pre-virtualization abstraction layer (That is to say, your NICs are abstracted & standardized via this driver before we build our important virtual switches) and because it’s a value & performance multiplier you can use on just about any modern NIC, from the humble RealTek to the Mighty Intel Server 10GbE.
But I’m getting too excited here; let’s get back to the curriculum and get started shall we?
5. Understand what Microsoft’s multiplexor driver/LBFO has done to our NICs
6. Build our Virtual Machine Switch for maximum flexibility & performance
7. The vEthernets are Coming
8. Next Steps: Jumbo frames from End-to-end and performance tuning
2:5 Understand what Microsoft’s Multiplexor driver/LBFO has done to our NICs
So as I said above, the best way to think about the multiplexor driver & Microsoft’s Load Balancing/Failover tech is by viewing it as a pre-virtualization abstraction layer for your NICs. Let’s take a look.
Our Network Connections screen doesn’t look much different yet, save for one new decked-out icon labeled “Daisetta-Team:”
Meanwhile, this screen is still showing the four NICs we joined into a team in Labworks 2:3, so what gives?
A click on the properties of any of those NICs (save for the RealTek) reveals what’s happened:
The LBFO process unbinds many (though not all) settings, configurations, protocols and certain driver elements from your physical NICs, then binds the fabulous Multiplexor driver/protocol to the NIC as you see in the screenshot above.
In the dark days of 2008 R2 & Windows core, when we had to walk up hill to school both ways in the snow I had to download and run a cmd tool called nvspbind to get this kind of information.
Fortunately for us in 2012 & R2, we have some simple cmdlets:
So notice Microsoft has essentially stripped “Ethernet 4” of all that would have made it special & unique amongst my 4x1GbE NICs; where I might have thought to tag a VLAN onto that Intel GbE, the multiplexor has stripped that option out. If I had statically assigned an IP address to this interface, TCP/IP v4 & v6 are now no longer bound to the NIC itself and thus are incapable of having an IP address.
And the awesome thing is you can do this across NICs, even NICs made by separate vendors. I could, for example, mix the sacred NICs (Intel) with the profane NICs (RealTek)…it don’t matter, all NICs are invited to the LBFO party.
No extra licensing costs here either; if you own a Server 2012 or 2012 R2 license, you get this for free, which is all kinds of kick ass as this bit of tech has allowed me in many situations to delay hardware spend. Why go for 10GbE NICs & Switches when I can combine some old Broadcom NICs, leverage LACP on the switch, and build 6×1 or 8x1GbE Converged LACP teams?
LBFO even adds up all the NICs you’ve given it and teases you with a calculated LinkSpeed figure, which we’re going to hold it to in the next step:
2:6 Build our Virtual Machine Switch for maximum flexibility & performance
If we just had the multiplexor protocol & LBFO available to us, it’d be great for physical server performance & durability. But if you’re deploying Hyper-V, you get to have your LBFO cake and eat it too, by putting a virtual switch atop the team.
This is all very easy to do in Hyper-V manager. Simply right click your server, select Virtual Switch Manager, make sure the Multiplexor driver is selected as the NIC, and press OK.
Bob’s your Uncle:
But let’s go a bit deeper and do this via powershell, where we get some extra options & control:
New-vmswitch : the cmdlet we’re invoking to build the switch. Run get-help new-vmswitch for a rundown of the cmdlet’s structure & options
-NetAdapterInterfaceDescription : here we’re telling Windows which NIC to build the VM Switch on top of. Get the precise name from Get-NetAdapter and enclose it in quotes
-Allow ManagementOS 1 : Recall the diagram above. This boolean switch (1 yes, 0 no) tells Windows to create the VM Switch & plug the Host/Management Operating System into said Switch. You may or may not want this; in the lab I say yes; at work I’ve used No.
-Minimum Bandwidth Mode Weight: We lay out the rules for how the switch will apportion some of the 4Gb/s bandwidth available to it. By using “Weight,” we’re telling the switch we’ll assign some values later
Name: Name your switch
A few seconds later, and congrats Mr. Hyper-V admin, you have built a converged virtual switch!
2:7 The vEthernets are Coming
Now that we’ve built our converged virtual switch, we need to plug some things into it. And that starts on the physical host.
If you’re building a Hyper-V cluster or stand-alone Hyper-V host with VMs on networked storage, you’ll approach vEthernet adpaters differently than if you’re building Hyper-V for VMs on attached/internal storage or on SMB 3.0 share storage. In the former, you’re going to need storage vEthernet adpters; in the latter you won’t need as many vEthernets unless you’re going multi-channel SMB 3.0, which we’ll cover in another labworks session.
I’m going to show you the iSCSI + Failover Clustering model.
In traditional Microsoft Failover Clustering for Virtual Machines, we need a minimum of five discrete networks. Here’s how that shakes out in the Daisetta Lab:
Network Name, VLAN ID, Purpose, Notes
Management, 1, Host & VM management network, You can separate the two if you like
CSV, 14, Host Cluster & communication and coordination, Important for clustering Hyper-V hosts
LM, 15, Live Migration network, When you must send VMs from broke host to host with the most LM is there for you
iSCSI 1-3, 11-13, Storage, Soemwhat controversial but supported
Now you should be connecting that dots: remember in Labworks 2:1, we built a trunked port-channel on our Cisco 2960S for the sole purpose of these vEthernet adapters & our converged switch.
So, we’re going to attach tagged vethernet adapters to our host via powershell. Pay attention here to the “-managementOS” tag; though our Converged switch is for virtual machines, we’re using it for our physical host as well.
You can script his out of course (and VMM does that for you), but if you just want to copy paste, do it in this order:
Notice we didn’t include a Gateway in the New-NetIPAddress cmdlet; that’s because when we built our Virtual Switch with the “-managementOS 1” switch attached, Windows automatically provisioned a vEthernet adapter for us, which either got an IP via DHCP or took an apipa address.
So now we have our vEthernets and their appropriate VLAN tags:
2:8: Next Steps : Jumbo Frames from end-to-end & Performance Tuning
So if you’ve made it this far, congrats. If you do nothing else, you now have a converged Hyper-V virtual switch, tagged vEthernets on your host, and a virtualized infrastructure that’s ready for VMs.
But there’s more you can do; stay tuned for the next labworks post where we’ll get into jumbo frames & performance tuning this baby so she can run with all the bandwidth we’ve given her.
Links/Knowledge/Required Reading Used in this Post:
Aidan Finn, upstanding Irishman, apparent bear-cub puncher, hobbyist photog, MVP all-star and one of my favorite Hyper-V bloggers (seriously, he’s good, and along with DidierV & the Hyper-Dutchman has probably saved my vAss more times than I can vCount) appeared on one of my favorite podcasts last week, RunAs Radio with Canuck Richard Campbell.
Which is all sorts of awesome as these are a few of my favorite things piled on top of each other (Finn on RunAs).
The subject? Hyper-V, scale out file servers (SoFS) in 2012 R2, SMB 3.0 multichannel and Microsoft storage networking, which are just about my favoritest subjects in the whole wide world. I mean what are the odds that one of my favorite Hyper-V bloggers would appear on one of my favorite tech podcasts? Remote. And talk about storage networking tech, Redmond-style, during that podcast?
All that and an adorable Irish brogue?
This is Instant nerdgasm territory here people; if you’re into these black arts as I am, it’s a must-listen.
Anyway, Finn reminded me of his famous powershell demos in which he demonstrates all the options we Hyper-V admins have at our disposal now when it comes to Live Migrating VMs from host to host.
And believe me, we have so many now it’s almost embarrassing, especially if you cut your teeth on Hyper-V 2.0 in 2008 R2, where successfully Live Migrating VMs off a host (or draining one during production) involved a few right clicks, chicken sacrifice, Earth-Jupiter-Moon alignment, a reliable Geiger counter by your side and a tolerance for Pucker Factor Values greater than 10* **.
Nowadays, we can:
Live Migrate VMs between hosts in a cluster (.vhdx parked in a Cluster Shared Volume, VM config, RAM & CPU on a host….block storage, the Coke Classic option)
Live Migrate VMs parked on SMB 3.0 shares, just like you NFS jockeys do
Shared-nothing Live Migration, either storage + VM, just storage, or just VM!
A for instance: from my Dell Latitude i7 ultrabook with Windows 8.1 and client hyper-v installed (natch), I can storage Live Migrate a .vhdx off my skinny but fast 256GB SSD to a spacious SMB share at work, then drop it back on my laptop at the end of the day, all via Scheduled Task or powershell with no downtime for the VM
With Server 2012/2012 R2 you get all those options + SMB 3.0 multichannel
Not only that, but we have some cool new toys with which to make the cost of Live Migration a VM to the host with the most a little less painful:
Standard TCP/IP : I like this because I’m old school and anything that stresses the network and LACP is fun because it makes the network guy sweat
Compression: Borrow spare cycles from the host CPU, compress the VM’s RAM, and Live Migrate your way out of a tight spot
SMB via Remote Direct Memory Access : the holy of holies in Live Migration. As Finn points out, this bit of tech can scale beyond the bandwidth capabilities of the PCIe 3 bus. SMB 3.0 + RDMA makes you hate your Northbridge
Finn*** of course provided some Live Migration start:finish times resulting from the various methods above, which I then, of course, interpreted as Finn daring me personally over the radio to try and beat those times in my humble Daisetta Lab.
Now this is just for fun people; not a Labworks-style list of repeatable results, so let’s not nerd-out on how my testing methodology isn’t sound & I’m a stupidhead, ok?
Anyway, Sysinternals has a nice little tool to redline the RAM in your Windows VM. I don’t know how Finn does it, but I don’t have workloads (yet!) in the Lab that would fill 4GB of RAM with non-random data on a VM, so off to the cmd we go:
You type this (haven’t played with all the switches yet) in this navy blue screen:
And then this happen and the somewhat pink graph goes full pink:
Then we press this button to test Live Migration w/ compression, as the Daisetta Lab doesn’t have fancy RDMA NICs like certain well-connected Irish Hyper-V bloggers:
Which makes this bluecelestedenim Azure colored line get all spikey:
all of which results in a wicked-fast Live Migrations & really cool orange-colored charts in my totally non-random, non-scientific but highly enjoyable laboratory experiment
Still, in the end, I like my TCP/IP uncompressed Live Migrations because 1) sackcloth & ashes, and 2) I didn’t go to the trouble of building a multiplexed LACP team -with a virtual switch on top!- just to let the Cat5es in my attic have an easy day at the office:
But at work: yes. I love this compression stuff and echo Finn’s observations on how Hyper-V doesn’t slam your host CPUs beyond what the host & its VM fleet could bear.
Anyway, did I beat Finn’s Live Migration times in this fun little test? Will the Irish MVP have to admit he’s not so esteemed after all and surrender his Hyper_V_MVP_badge.gif to me?
Of course I did and yes he will.
But not really.
[table caption=”Daisetta Lab LM vs Finn’s Powershell LM Scripts – 4GB VM” width=”500″ colwidth=”20|100|50″ colalign=”left|left|left|left|left”]
Who,TCP/IP LM,Compressed LM,RDMA & SMB 3 LM,Notes
Finn,78 seconds, 15 seconds,6.8 seconds, “Mr. I once moved a VM with 56GB of RAM in 35 seconds probably has a few Xeons”
D-Lab,38 seconds,Like 12 or something,Who’s ass do I need to kiss to get RDMA/iWarp?, But seriously my VM RAM was probably not random
Finn notes in his posts that he’s dedicating an entire 1GbE NIC for his Live Migration Demos, wheras I’m embracing the converged switch model and haven’t even played with bandwidth or QOS settings on my Hyper-V switch.
How do my VMware colleagues & friends measure this stuff & think about vMotion performance & reliability? I know NFS can scale & perform, but am ignorant on the nuances of v3 vs v4, how it works on the host and Distributed vSwitch and your “Shared nothing” storage vMotion. And what’s this I hear that vSphere won’t begin a vMotion without knowing it will complete? How’s that determined?
I mean I could spend an hour or two googling it, or you could, I don’t know, post a comment and save me the time and spread some of your knowledge 😀
I’m jazzed about SMB 3.0, but there are only a handful of storage vendors who have support for the new stack, and among them, as Finn points out, Microsoft is #1 storage vendor for SMB 3 fans, with NetApp probably in 2nd place.
* Just kidding, it wasn’t that bad. Most days.
** Pucker Factor Value can be measured by querying obscure wmi class win32_pfv
*** Finn is a consultant. So you can hire him. I have no relationship with him other than admiration for his scripting skillz
I’ve been going on, insufferably at times, about my new Nimble storage array at work. Back in January, it passed my home-grown bakeoff with flying colors, in February I wrote about how it was inbound to my datacenter, in March I fretted over iSCSI traffic, .vhdx parades, and my 6509-E.
Well it’s been just about a month since it was racked up and jacked into my Hyper-V fabric and I thought maybe the storage nerds among my readers would like an update on how its performing.
Fast: It’s been strange getting compliments, kudos and thank yous rather than complaints and ALL CAPS emails punctuated by Exclamation Marks. I have a couple of very critical SQL databases, the performance of which can make or break my job, and after some deliberation, we took the risk and moved the biggest of them to the Nimble about three weeks ago.
Here’s a slightly edited email from one power user 72 hours later:
Did I say THANK YOU for the extra zip yet?
STILL LOVING IT!!
I’m taken aback by all the affection coming my way…no longer under user-siege, I feel like maybe I should dress better at work, shave every day, turn on some lights in the office perhaps. Even the dev team was shocked, with one of them invoking Spaceballs and saying his storage-dependent process was moving at “Ludicrous speed.”
It’s Easy: I can’t underscore this enough. If you’re a mid-sized enterprise with vanilla/commodity workloads and you can tolerate an array that’s just iSCSI (you can still use NFS or SMB 3, just from inside clustered VMs!), Nimble’s a good fit, especially if your staff is more generalist in nature. or you don’t have time to engineer a new SAN from scratch.
This was a Do It Yourself storage project for me; I didn’t have the luxury or time to hire storage engineers or VARs to come in and engineer it for me. Nimble will try to sell you on professional services, but you can decline and hook it up yourself, as I did. There are best practice guides a-plenty, and if you understand your stack & workload, your switching & compute, you’ll do fine.
Buying it was easy: Nimble’s lineup is simple and from a customer standpoint, it was a radically different experience to buy a Nimble than a traditional SAN.
Purchasing a big SAN is like trying to decide what to eat at French restaurant in Chinatown…you recognize the letters and & the pictures on the menu look familiar, but you don’t know what that SKU is exactly or how you’ll feel in the morning after buying & eating it. And while the restaurant has provided a helpful & knowledgeable garçon to explain French cuisine & etiquette to you, you know the garçon & his assistants moonlight at the Italian, German and Sushi place down the road, where they are equally knowledgeable & enthusiastic about those cuisines. But they can’t talk about the Italian place because they have something called agency with the french restaurant; so with you, they are only French cuisine experts, and their professional opinion is that Italian, German and Sushi are horrible food choices. Also, your spend with the restaurant is too small to get the chef’s attention..you have to go through this obnoxious garçon system.
Buying from Nimble, meanwhile, is like picking a burger at In ‘n Out. You have three options, all of them containing meat, and from left to right, the choices are simply Good, Better, Best. You can stack shelves onto controller-shelves, just like a Double-Double, and you know what you’ll get in the end. Oh sure, there’s probably an Animal Style option somewhere, but you don’t need Animal Style to enjoy In ‘n Out, do you?
Lesson is this: Maybe your organization needs a real full-featured SAN & VAR-expertise. But maybe you just need fast, reliable iSCSI that you can hook up yourself.
It’s nice that we in customer-land have that option now.
ASUP & Community: The Autosupport from Nimble has left nothing to be desired, in fact, I think they nag too much. But I’ll take that over a downed array.
I’ve grown to enjoy Connect.Nimble.com, the company’s forum where guys like me can compare notes. Shout out to one awesome Nimble SE named Adam Herbert who built a perfect signed MPIO Powershell script that maps your initiators to your targets in no time at all.
And then you get to sit back and watch as MPIO does its thang across all your iSCSI HBAs, producing symmetrical & balanced utilization charts which, in turn, release pleasing little bursts of storage-dopamine in your brain.
It works fine with Hyper-V, CSVs, and Converged Fabric vEthernets: What a mouthful, but it’s true. Zero issues fitting this array into System Center Virtual Machine Manager storage (though it doesn’t have SMI-S support a “standard” which few seem to have adopted), failing CSVs from one Hyper-V node to another, and resizing CSVs or RDMs live.
And for the convergence fans: I pretty much lost my fear of using vEthernet adapters for iSCSI traffic during the bakeoff and in the Daisetta Lab at home, but in case you needed further convincing that Hyper-V’s converged fabric architecture kicks ass, here it is: Each Hyper-V node in my datacenter has 12 gigabit NICs. Eight of them per host are teamed (that is to say they get the Microsoft Multiplexor driver treatment, LACP-flavor) and then a Converged Virtual switch is built atop the multiplexor driver. From that converged v-switch, I’m dangling six virtual Ethernet adapters per host, two of which, are tagged for the Nimble VLAN I built in the 6509.
That’s a really long and complicated way of saying that in a modest-sized production environment, I’m using LACP teaming on the hosts, up to 4x1GbE vNics on the VIP guests, and MPIO to the storage, which conventional storage networking wisdom says is a bit like kissing your sister and bragging about it. Maybe it’s harmless (even enjoyable?) once or twice, but sooner or later, you’ll live to regret it. And hey the Department of Redundancy Department called, they want one of their protocols back.
I’ve read a lot of thoughtful pieces from VMware engineers & colleagues about doing this, but from a Hyper-V perspective,this is supported, and from a Nimble array perspective, I’m sure they’d point the finger at this if something went wrong, but it hasn’t, and from my perspective : one converged virtual switch = easy to deploy/templatize, easy to manage & monitor. Case closed.
LACP + MPIO in Hyper-V works so well that in three weeks of recording iSCSI stats, I’ve yet to record a single TCP error/re-transmit or anything that would make me think the old model was better. And I haven’t even applied bandwidth policies on the converged switches yet; that tool is still in my box and right now iSCSI is getting the Hyper-V equivalent of best effort.
It’s getting faster: Caching is legit. All my monitors and measurements prove it out. Implement your Nimble correctly, and you may see it get faster as time goes on.
And by that I mean don’t tick the “caching” box for every volume. Conserve your resources, develop a strategy and watch it bloom and grow as your iSCSI packets find their way home faster and faster.
The DBA is noticing it too in his latency timers & long running query measurements, but this graph suffices to show caching in action over three weeks in a more exciting way than a select * from slow-ass-tables query:
Least Frequently Used, Most Recently Used, Most Frequently Used….who frequently/recently cares what caching algorithm the CASL architecture is using? A thousand whiteboard sessions conducted by the world’s greatest SE with the world’s greatest schwag gifts couldn’t sell me on this the way my own charts and my precious perfmons do.
My cached Nimble volumes are getting faster baby.
Compression wise, I’m seeing some things I didn’t expect. Some volumes are compressing up to 40x. Others are barely hitting 1.2x. The performance impact of this is hard to quantify, but from a conservation standpoint, I’m not having to grow volumes very often. It’s a wash with the old dedupe model, save for one thing: I don’t have to schedule compression. That’s the CPUs job, and for all I know, the Nehalems inside my CS260 are, or should be, redlining as they lz4 my furious iSCSI traffic.
Busy Box & CLI: The Nimble command line in version 1.4x felt familiar to me the first time I used it. I recognized the command structure, the help files and more, and thought it looked like Busy Box.
What’s Busy Box? How to put this without making enemies of The Guys Who Say Vi…Busy Box is a collection of packages, tools, servers and scripts for the unix world developed about 25 years ago by an amazing Unix engineer. It’s very popular, it’s everywhere, and it’s reliable and I have no complaints about it other than the fact that it’s disconcerting that my Nimble has the same package of tools I once installed on my Android handset.
But that’s just the Windows guy talking, a Windows guy who was really fond of his WAFL and misses it but will adapt and holds out hope that OneGet & PowerShell, one day, will emerge victorious over all.
The SSL cert situation is embarrassing and I’m glad my former boss hasn’t seen it. Namely that situation is this: you can’t replace the stock SSL cert, which, frankly looks like something I would do while tooling around with OpenSSL in the lab.
I understand this is fixed in the new 2.x OS version but holy shit what a fail.
Other than that, I’m very pleased -and the organization is very pleased***- with our Nimble array.
It feels like at last, I’m enjoying the fruits of my labor, I’m riding a high-performance storage array that was cost-effective, easy to install, and is performing at/above expectations. I’m like Major Kong, my array is literally the bomb, man and his machine are in harmony and there’s some joy & euphoria up in the datacenter as my task is complete.
*Remember this lesson #StorageGlory seekers: no one knows your workload like you. The above screenshot of cache hits is of a 400GB SQL transaction log volume of a larger SQL DB that’s in use 24/6. Your mileage may vary.
*** I do not speak for the organization even though I just did.