It’s been awhile since I posted about my home lab, Daisettalabs.net, but rest assured, though I’ve been largely radio silent on it, I’ve been busy.

If 2013 saw the birth of Daisetta Labs.net, 2014 was akin to the terrible twos, with some joy & victories mixed together with teething pains and bruising.

So what’s 2015 shaping up to be?

Well, if I had to characterize it, I’d say it’s #LabGlory, through and through. Honestly. Why?

I’ve assembled a home lab that’s capable of simulating just about anything I run into in the ‘wild’ as a professional. And that’s always been the goal with my lab: practicing technology at home so that I can excel at work.

Let’s have a look at the state of the lab, shall we?

Hardware & Software

Daisetta Labs.net 2015 is comprised of the following:

  • Five (5) physical servers
  • 136 GB RAM
  • Sixteen (16) non-HT Cores
  • One (1) wireless access point
  • One (1) zone-based Firewall
  • Two (2) multilayer gigabit switches
  • One (1) Cable modem in bridge mode
  • Two (2) Public IPs (DHCP)
  • One (1) Silicon Dust HD
  • Ten (10) VLANs
  • Thirteen (13) VMs
  • Five (5) Port-Channels
  • One (1) Windows Media Center PC

That’s quite a bit of kit, as a former British colleague used to say. What’s it all do? Let’s dive in:

Physical Layout

The bulk of my lab gear is in my garage on a wooden workbench.

Nodes 2-4, the core switch, my Zywall edge device, modem, TV tuner, Silicon Dust device and Ooma phone all reside in a secured 12U, two post rack I picked up on ebay about two years ago for $40. One other server, core.daisettalabs.net, sits inside a mid-tower case stuffed with nine 2TB Hitachi HDDs and five 256GB SSDs below the rack.

Placing my lab in the garage has a few benefits, chief among them: I don’t hear (as many) complaints from the family cluster about noise. Also, because it’s largely in the garage, it’s isolated & out of reach of the Child Partition’s curious fingers, which, as every parent knows, are attracted to buttons of all types.

Power & Thermal

Of course you can’t build a lab at home without reliable power, so I’ve got one rack-mounted APC UPS, and one consumer-grade Cyberpower UPS for core.daisettalabs.net and all the internet gear.

On average, the lab gear in the garage consumes about 346 watts, or about 3 amps. That’s significant, no doubt, costing me about $38/month to power, or about 2/3rds the cost of a subscription to IT Pro TV or Pluralsight. 🙂

Thermals are a big challenge. My house was built in 1967, has decent insulation and holds temperature fairly well in the habitable parts of the space. But none of that is true about the garage, where my USB lab thermometer has recorded temps as low as 3C last winter and as high as 39c in Summer 2014. That’s air-temperature at the top of the rack, mind you, not at the CPU.

One of my goals for this year is to automate the shutdown/powerup of all node servers in the Garage based on the temperature reading of the USB thermometer. The $25 thermometer is something I picked up on Amazon awhile ago; it outputs to .csv but I haven’t figured out how to automate its software interface with powershell….yet.

Anyway, here’s my stack, all stickered up and ready for review:

IMG_20150329_214535914

Beyond the garage, the Daisetta Lab extends to my home’s main hallway, the living room, and of course, my home office.

Here’s the layout:

homelab2015

Compute

On the compute side of things, it’s almost all Haswell with the exception of core and node3:

[table]

Server, Architecture, CPU, Cores, RAM, Function, OS, Motherboard

Core, AMD A-series, A8-5500, 2, 8GB, Tiered Storage Spaces & DC/DHCP/DNS, Server 2012 R2, Gigabyte D4

Node1, Haswell, i7-4770k, 4, 32GB, Main PC/Office/VM host/storage, 2012R2, Supermicro X10SAT

Node2, Haswell, Xeon E3-1241, 4, 32GB, Cluster node, 2012r2 core, Supermicro X10SAF

Node3, Ivy Bridge, i7-2600, 4, 32GB, Cluster node, 2012r2 core, Biostar

Node4, Haswell, i5-4670, 4, 32GB, Cluster node/storage, 2012r2 core, Asus

[/table]

I love Haswell for its speed, thermal properties and affordability, but damn! That’s a lot of boxes, isn’t it? Unfortunately, you just can’t get very VM dense when 32GB is the max amount of RAM Haswell E3/i7 chipsets support. I love dynamic RAM on a VM as much as the next guy, but even with Windows core, it’s been hard to squeeze more than 8-10 VMs on a single host. With Hyper-V Containers coming, who knows, maybe that will change?

Node1, the pride of the fleet and my main productivity machine, boasting 2x850 Pro SSDs in RAID 0, an AMD FirePro, and Tiered Storage Spaces

Node1, the pride of the fleet and my main productivity machine, boasting 2×850 Pro SSDs in RAID 0, an AMD FirePro, and Tiered Storage Spaces

While I included it in the diagram, TVPC3 is not really a lab machine. It’s a cheap Ivy Bridge Pentium with 8GB of RAM and 3TB of local storage. It’s sole function in life is to decrypt the HD stream it receives from the Silicon Dust tuner and display HGTV for my mother-in-law with as little friction as possible. Running Windows 8.1 with Media Center, it’s the only PC in the house without battery backup.

Physical Network
About 18 months ago, I poured gallons of sweat equity into cabling my house. I ran at least a dozen CAT-5e cables from the garage to my home office, bedrooms, living room and to some external parts of the house for video surveillance.
I don’t regret it in the least; nothing like having a reliable, physical backbone to connect up your home network/lab environment!

Meet my underlay

Meet my underlay

At the core of the physical network lies my venerable Cisco 2960S-48TS-L switch. Switch1 may be a humble access-layer switch, but in my lab, the 2960S bundles 17 ports into five port channels, serves as my DG, routes with some rudimentary Layer 3 functions ((Up to 16 static routes, no dynamic route features are available)) and segments 9 VLANs and one port-security VLAN, a feature that’s akin to PVLAN.

Switch2 is a 10 port Cisco Small Business SG-300 running at Layer 3 and connected to Switch1 via a 2-port port-channel. I use a few ports on switch2 for the TV and an IP cam.

On the edge is redzed.daisettalabs.net, the Zyxel USG-50, which I wrote about last month.

Connecting this kit up to the internet is my Motorola Surfboard router/modem/switch/AP, which I run in bridge mode. The great thing about this device and my cable service is that for some reason, up to two LAN ports can be active at any given time. This means that CableCo gives me two public, DHCP addresses, simultaneously. One of these goes into a WAN port on the Zyxel, and the other goes into a downed switchport

Love Meraki's RF Spectrum chart!

Love Meraki’s RF Spectrum chart!

Lastly, there’s my Meraki MR-16, an access point a friend and Ubiquity networks fan gave me. Though it’s a bit underpowered for my tastes, I love this device. The MR-16 is trunked to switch1 and connects via an 802.3af power injector. I announce two SSIDs off the Meraki, both secured with WPA2 Personal ((WPA2 Enterprise is on the agenda this year)). Depending on which SSID you connect to, you’ll end up on the Device or VM VLANs.

Virtual Network

The virtual network was built entirely in System Center VMM 2012 R2. Nothing too fancy here, with multiple Gigabit adapters per physical host, one converged logical vSwitch and a separate NIC on each host fronting for the DMZ network:

Nodes 1, 2 & 4 are all Haswell, and are clustered. Node3 is standalone.

Thanks to VMM, building this out is largely a breeze, once you’ve settled on an architecture. I like to run the cmdlets to build the virtual & logical networks myself, but there’s also a great script available that will build a converged network for you.

A physical host typically looks like this (I say typically because I don’t have an equal number of adapters in all hosts):

I trust VLANs and VMM's segmentation abilities, but chose to build what is in effect air-gapped vSwitch for the DMZ/DIA networks

I trust VLANs and VMM’s segmentation abilities, but chose to build what is in effect air-gapped vSwitch for the DMZ/DIA networks

We’re already several levels deep in my personal abstraction cave, why stop here? Here’s the layout of VM Networks, which are distinguished from but related to logical networks in VMM:

labnet13

I get a lot of questions on this blog about jumbo frames and Hyper-V switching, and I just want to reiterate that it’s not that hard to do, and look, here’s proof:

jumbopacket

Good stuff!

Storage

And last, and certainly most-interestingly, we arrive at Daisetta Lab’s storage resources.

My lab journey began with storage testing, in particular ZFS via NexentaCore (Illumos), NAS4Free and Solaris 11. But that’s ancient history; since last summer, I’ve been all Windows, all the time in my lab, starting with SAN.Daisettalabs.net ((cf #StorageGlory : 30 Days on a Windows SAN)).

Now?

Well, I had so much fun -and importantly so few failures/pains- with Microsoft’s Tiered Storage Spaces that I’ve decided to deploy not one, or even two, but three Tiered Storage Spaces. Here’s the layout:

[table]Server, #HDD, #SSD, StoragePool Capacity, StoragePool Free, #vDisks, Function

Core, 9, 6, 16.7TB, 12.7TB, 6 So far, SMB3/iSCSI target for entire lab

Node1,2, 2, 2.05TB, 1.15TB,2, SMB3 target for Hyper-V replication

Node4,3,1, 2.86TB, 1.97TB,2, SMB3 target for Hyper-V replication

[/table]

I have to say, I continue to be very impressed with Tiered Storage Spaces. It’s super-flexible, the cmdlets are well-documented, and Microsoft is iterating on it rapidly. More on the performance of Tiered Storage Spaces in a subsequent post.

Thanks for reading!

Whitebox lab server

Node1.daisettalabs.net, my primary PC and the best-equipped server in the homelab, has received an upgrade.

A whitebox upgrade. Literally:

IMG_20150303_052455318

 

I’m a fan of metaphors and whitebox everything is a powerful one in our line of work, so I figured why not roll my own whitebox server in the lab?

Node1 vitals:

  • Motherboard: Supermicro X10SAT with all the PCIe 3.0 slots you’d need, Thunderbolt port, and integrated Haswell graphics plus a pair of Intel NICs
  • CPU: Intel Core i7-4770K (Haswell), quad core with hyperthreading
  • RAM: 4x8GB Kingston Hyper-X non-ECC
  • Storage (Boot): 2xSamsung 850 SSD (240GB) in RAID 0 because I like to live dangerously  I’ve just about automated the buildout of this server and most of my data is in One Drive for Business
  • Storage (Tiered Storage Spaces): 2x 128GB SanDisk Extreme + 2x1TB WD Red 2.5″
  • Graphics: AMD FirePro W4100 w/ 2GB RAM makes my Visio buttery smooth.
  • Networking:  The Supermicro has a pair of Intels, an I-210 and a 217V, both of which connect up to my Cisco 2960S in the garage. To that I’ve also added a Pro1000 PCIe 2.0 card with dual ports, one of which also connects to the 2960S (I only ran 3 cables from the garage to my home office)
  • OS: Server 2012 R2 Standard, naturally, with full Desktop GUI and Windows Management Framework February 2015 preview so that I can tinker with DSC
  • Case: NZXT 340 something or other. Very nice case for $70. I’ve never wanted to exhibit the inside of a PC I’ve built, but this case makes it so simple to hide the nasty PC underlay (power, SATA etc)

#WhiteboxGlory shot of the innards that make the child partition go “wooooow!!”

IMG_20150303_052331601

 

 

Tales from the Hot Lane

A few brief updates & random thoughts from the last few days on all the stuff I’ve been working on.

Refreshing the Core at work: Summer’s ending, but at work, a new season is advancing, one rack unit at a time. I am gradually racking up & configuring new compute, storage, and network as it arrives; It Is Not About the Hardware™, but since you were wondering: 64 Ivy Bridge cores and about 512GB RAM, 30TB of storage, and Nexus 3k switching.

Cisco_logoAhh, the Nexus line. Never had the privilege to work on such fine switching infrastructure. Long time admirer, first-time NX-OS user. I have a pair of them plus a Layer 3 license so the long-term thinking involves not just connecting my compute to my storage, but connecting this dense stack northbound & out via OSPF or static routes over a fault-tolerant HSRP or VRRP config.

To do that, I need to get familiar with some Nexus-flavored acronyms that aren’t familiar to me: virtual port channels (VPC), Control Plane policy (COPP), VRF, and oh-so-many-more. I’ll also be attempting to answer the question once and for all: what spanning tree mode does one use to connect a Nexus switch to a virtualization host running Hyper-V’s converged switching architecture? I’ve used portfast in the lab on my Catalyst, but the lab switch is five years old, whereas this Nexus is brand new. And portfast never struck me as the right answer, just the easy one.

To answer those questions and more, I have TAC and this excellent tome provided gratis by the awesome VAR who sold us much of the equipment.

Into the vCPU Blender goes Lync: Last Friday, I got a call from my former boss & friend who now heads up a fast-growing IT department on the coast. He’s been busy refreshing & rationalizing much of his infrastructure as well, but as is typical for him, he wants more. He wants total IT transformation, so as he’s built out his infrastructure, he laid the groundwork to go 100% Microsoft Lync 2013 for voice.

Yeah baby. Lync 2013 as your PBX, delivering dial tone to your endpoints, whether they are Bluetooth-connected PC headsets, desk phones, or apps on a mobile.

Forget software-defined networking. This is software-defined voice & video, with no special server hardware, cloud services, or any other the other typical expensive nonsense you’d see in a VoIP implementation.

If Lync 2013 as PBX is not on your IT Bucket List, it should be. It was something my former boss & I never managed to accomplish at our previous employer on Hyper-V.

Now he was doing it alone. On a fast VMware/Nexus/NetApp stack with distributed vSwitches. And he wanted to run something by me.

So you can imagine how pleased I was to have a chat with him about it.

He was facing one problem which threatened his Go Live date: Mean Opinion Score, or MOS, a simple 0-5 score Lync provides to its administrators that summarizes call quality. MOS is a subset of a hugely detailed Media Quality Summary Report, detailed here at TechNet.

thMy friend was scoring a .6 on his MOS. He wanted it to be at 4 or above prior to go-live.

So at first we suspected QoS tags were being stripped somewhere between his endpoint device and the Lync Mediation VM. Sure enough, Wireshark proved that out; a Distributed vSwitch (or was it a Nexus?) wasn’t respecting the tag, resulting in a sort of half-duplex QoS if you will.

He fixed that, ran the test again, and still: .6. Yikes! Two days to go live. He called again.

That’s when I remembered the last time we tried to tackle this together. You see, the Lync Mediation Server is sort of the real PBX component in Lync Enterprise Voice architecture. It handles signalling to your endpoints, interfaces with the PSTN or a SIP trunk, and is the one server workload that, even in 2014, I’d hesitate making virtual.

My boss had three of them. All VMs on three different VMware hosts across two sites.

I dug up a Microsoft whitepaper on virtualizing Lync, something we didn’t have the last time we tried this. While Redmond says Lync Enterprise Voice on top of VMs can work, it’s damned expensive from a virtualization host perspective. MS advises:

  • You should disable hyperthreading on all hosts.
  • Do not use processor oversubscription; maintain a 1:1 ratio of virtual CPU to physical CPU.
  • Make sure your host servers support nested page tables (NPT) and extended page tables (EPT).
  • Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance.

Talk about Harshing your vBuzz. Essentially, building Lync out virtually with Enterprise Voice forces you to go sparse on your hosts, which is akin to buying physical servers for Lync. If you don’t, into the vCPU blender goes Lync, and out comes poor voice quality, angry users, bitterness, regret and self-punishment.

Anyway, he did as advised, put some additional vCPU & memory reservations in place on his hosts, and yesterday, whilst I was toiling in the Hot Lane, he called me from Lync via his mobile.

He’s a married man just like me, but I must say his voice sounded damn sexy as it was sliced up into packets, sent over the wire, and converted back to analog on my mobile’s speaker. A virtual chest bump over the phone was next, then we said goodbye.

Another Go Live Victory (by proxy). Sweet.

Azure Outage: Yesterday’s bruising hours-long global Azure outage affected Virtual Machines, storage blobs, web services, database services and HD Insight, Microsoft’s service for big data crunching. As it unfolded, I navel-gazed, when I felt like helping. There was literally nothing I could do. Had I some crucial IaaS or PaaS in the Azure stack, I’d be shit out of luck, just like the rest. I felt quite helpless; refreshing Mary Jo’s pageyellow-exclamation-mark-in-triangle-md and the Azure dashboard didn’t help. I wondered what the problem was; it’s been a difficult week for Microsofties whether on-prem or in Azure. Had to be related to the update cycle, I thought.

On the plus side, Azure Active Directory services never went down, nor did several other services. Office 365 stayed up as well, though it is built atop separate-but-related infrastructure in my understanding.

Lastly, I pondered two thoughts: if you’re thinking of reducing your OpEx by replacing your DR strategy with an Azure Site Recovery strategy, does this change your mind? And if you’re building out Azure as your primary IaaS or PaaS, do you just accept such outages or do you plan a failback strategy?

Labworks : Towards a 100% Windows-defined Daisetta Lab: What’s next for the Daisetta Lab? Well, I have me an AMD Duron CPU, a suitable motherboard, a 1U enclosure with PSU, and three Keepin’ it RealTek NICs. Oh, I also have a case of the envies, envies for the VMware crowd and their VXLAN and NSX and of course VMworld next week. So I’m thinking of building a Network Virtualization Gateway appliance. For those keeping score at home, that would mean from Storage to Compute to Network Edge, I’d have a 100% Windows lab environment, infused with NVGRE which has more use cases than just multi-tenancy as I had thought.

Labworks 1:4-7 – The Last Word in ZFS Labworks

Greetings to you Labworks readers, consumers, and conversationalists. Welcome to the last  verse of Labworks Chapter 1, which has been all about building a durable and performance-oriented ZFS storage array for Hyper-V and/or VMware.

Let’s review where we’ve been:

[table]

Labworks Chapter, Verse, Subject, Title & URL

Labworks 1:, 1, Storage, Building a Durable and Performance-Oriented ZFS Box for Hyper-V & VMware

,2-3, StorageI Heart the ARC & Let’s Pull Some Drives!

[/table]

Today we’re going to circle back to the very end of Labworks 1:1, where I assigned myself some homework: find out why my writes suck so bad. We’re going to talk about a man named ZIL and his sidekick the SLOG and then we’re going to check out some Excel charts and finish by considering ZFS’ sync models.

But first, some housekeeping: SAN2, the ZFS box, has undergone minor modification. You can find the current array setup below. Also, I have a new switch in the Daisetta Lab, and as switching is intimately tied to storage networking & performance, it’s important I detail a little bit about it.

Labworks 1:4 – Small Business SG300 vs Catalyst 2960S

Cisco’s SG-300 & SG-500 series switches are getting some pretty good reviews, especially in a home lab context. I’ve got an SG-300 and really like it as it offers a solid spectrum of switching options at Layer 2 as well as a nice Layer 3-lite mode all for a tick under $200. It even has a real web-interface if your CLI-shy, which

Small Business Cisco != Linksys

Small Business Cisco != Linksys

I’m not but some folks are.

Sadly for me & the Daisetta Lab, I need more ports than my little SG-300 has to offer. So I’ve removed it from my rack and swapped it for a 2960S-48TS-L from the office, but not just any 2960S.

No, I have spiritual & emotional ties to this 2960s, this exact one. It’s the same 2960s I used in my January storage bakeoff of a Nimble array, the same 2960s on which I broke my Hyper-V & VMware cherry in those painful early days of virtualization, yes, this five year old switch is now in my lab:

The pride of Cisco's 2009 Desktop Switching series, the 2960s

The pride of Cisco’s 2009 Desktop Switching series, the 2960s

Sure it’s not a storage switch, in fact it’s meant for IDFs and end-users and if the guys on that great storage networking podcast from a few weeks back knew I was using this as a storage switch, I’d be finished in this industry for good.

But I love this switch and I’m glad its at the top of my rack. I saved 1U, the energy costs of this switch vs two smaller ones are probably a wash, and though I lost Layer 3 Lite, I gained so much more: 48 x 1GbE ports and full LAN-licensed Cisco IOS v 15.2, which, agnostic computing goals aside for a moment, just feels so right and so good.

And with the increased amount of full-featured switch ports available to me, I’ve now got LACP teams of three on agnostic_node_1 & 2, jumbo frames from end to end, and the same VLAN layout.

Here’s the updated Labworks schematic and the disk layout for SAN2:

Lab 1-4-5 - Daisetta Labs

[table]

Disk Type, Quantity, Size, Format, Speed, Function

WD Red 2.5″ with NASWARE, 6, 1TB, 4KB AF, SATA 3 5400RPM, Zpool Members

Samsung 840 EVO SSD, 1, 128GB, 512byte, SATA 3, L2ARC Read Cache

Samsung 830 SSD, 1, 128GB, 512byte, SATA 3, L2ARC Read Cache

Seagate 2.5″ Momentus, 1, 500GB, 512byte, 80MB/r/w, Boot/swap/system

[/table]

Labworks 1:5 – A Man named ZIL and his sidekick, the SLOG

Labworks 1:1 was all about building durable & performance-oriented storage for Hyper-V & VMware. And one of the unresolved questions I aimed to solve out of that post was my poor write performance.

Review the hardware table and you’ll feel like I felt. I got me some SSD and some RAM, I provisioned a ZIL so write-cache that inbound IO already ZFS, amiright? Show me the IOPSMoney Jerry!

Well, about that. I mischaracterized the ZIL and I apologize to readers for the error. Let’s just get this out of the way: The ZFS Intent Log (ZIL) is not a write-cache device as I implied in Labworks 1:1.

ZFS storage in excellent Good/Better/Best format

ZFS storage layout in excellent Good/Better/Best format courtesy of Nexenta, which has some outstanding documentation & guides

The ZIL, whether spread out among your rotational disks by ZFS design, or applied to a Separate Log Device (a SLOG), is simply a synchronous writes mechanism, a log designed to ensure data integrity and report (IO ACK) back to the application layer that writes are safe somewhere on your rotational media. The ZIL & SLOG are also a disaster recovery mechanisms/devices ; in the event of power-loss, the ZIL, or the ZIL functioning on a SLOG device, will ensure that the writes it logged prior to the event are written to your spinners when your disks are back online.

Now there seem to be some differences in how the various implementations of ZFS look at the ZIL/SLOG mechanism.

Nexenta Community Edition, based off Illumos which is the open source descendant of Sun’s Solaris, says your SLOG should just be a write-optimized SSD, but even that’s more best practice than hard & fast requirement. Nexenta touts the ZIL/SLOG as a performance multiplier, and their excellent documentation has helpful charts and graphics reinforcing that.

In contrast, the most popular FreeBSD ZFS implementations documentation paints the ZIL as likely more trouble than its worth. FreeNAS actively discourages you from provisioning a SLOG unless it’s enterprise-grade, accurately pointing out that the ZIL & a SLOG device aren’t write-cache and probably won’t make your writes faster anyway, unless you’re NFS-focused (which I’m proudly, defiantly even, not) or operating a large database at scale.

ZIL me

What’s to account for the difference in documentation & best practice guides? I’m not sure; some of it’s probably related to *BSD vs Illumos implementations of ZFS, some of it’s probably related to different audiences & users of the free tier of these storage systems.

The question for us here is this: Will you benefit from provisioning a SLOG device if you build a ZFS box for Hyper-V and VMWare storage for iSCSI?

I hate sounding like a waffling storage VAR here, but I will: it depends. I’ve run both Nexenta and NAS4Free; when I ran Nexenta, I saw my SLOG being used during random & synchronous write operations. In NAS4Free, the SSD I had dedicated as a SLOG never showed any activity in zfs-stats, gstat or any other IO disk tool I could find.

One could spend weeks of valuable lab time verifying under which conditions a dedicated SLOG device adds performance to your storage array, but I decided to cut bait. Check out some of the links at the bottom for more color on this, but in the meantime, let me leave you with this advice: if you have $80 to spend on your FreeBSD-based ZFS storage, buy an extra 8GB of RAM rather than a tiny, used SLC or MLC device to function as your SLOG. You will almost certainly get more performance out of a larger ARC than by dedicating a disk as your SLOG.

Labworks 1:6 – Great…so, again, why do my writes suck? 

Recall this SQLIO test from Labworks 1:1:

sqlio lab 1 short test

As you can see, read or write, I was hitting a wall at around 235-240 megabytes per second during much of “Short Test”, which is pretty close to the theoretical limit of an LACP team with two GigE NICs.

But as I said above, we don’t have that limit anymore. Whereas there were once 2x1GbE Teams, there are now 3x1GbE. Let’s see what the same test on the same 4KB block/4KB NTFS volume yields now.

SQLIO short test, take two, sort by Random vs Sequential writes & reads:

labworks147

By jove, what’s going on here? This graph was built off the same SQLIO recipe, but looks completely different than Labworks 1. For one, the writes look much better, and reads look much worse. Yet step back and the patterns are largely the same.

It’s data like this that makes benchmarking, validating & ultimately purchasing storage so tricky. Some would argue with my reliance on SQLIO and those arguments have merit, but I feel SQLIO, which is easy to script/run and automate, can give you some valuable hints into the characteristics of an array you’re considering.

Let’s look at the writes question specifically.

Am I really writing 350MB/s to SAN2?

storagenetworkingforthewinOn the one hand, everything I’m looking at says YES: I am a Storage God and I have achieved #StorageGlory inside the humble Daisetta Lab HQ on consumer-level hardware:

  • SAN2 is showing about 115MB/s to each Broadcom interface during the 32KB & 64KB samples
  • Agnostic_Node_1 perfmon shows about the same amount of traffic eggressing the three vEthernet adapters
  • The 2960S is reflecting all that traffic; I’m definitely pushing about 350 megabytes per second to SAN2; interface port channel 3 shows TX load at 219 out of 255 and maxing out my LACP team

On the other hand, I am just an IT Mortal and something bothers:

  • CPU is very high on SAN2 during the 32KB & 64KB runs…so busy it seems like the little AMD CPU is responsible for some of the good performance marks
  • While I’m a fan of the itsy-bitsy 2.5″ Western Digitial RED 1TB drives in SAN2, under no theoretical IOPS model is it likely that six of them, in RAIDZ-2 (RAID 6 equivalent) can achieve 5,000-10,000 IOPS under traditional storage principles. Each drive by itself is capable of only 75-90 IOPS
  • If something is too good to be true, it probably is

49286241Sr. Storage Engineer Neo feels really frustrated at this point; he can’t figure out why his writes suck, or even if they suck, and so he wanders up to the Oracle to get her take on the situation and comes across this strange Buddha Storage kid.

Labworks 1:7 – The Essence of ZFS & New Storage model

In effect, what we see here is is just a sample of the technology & techniques that have been disrupting the storage market for several years now: compression & caching multiply performance of storage systems beyond what they should be capable of, in certain scenarios.

As the chart above shows, the test2 volume is compressed by SAN2 using lzjb. On top of that, we’ve got the ZFS ARC, L2ARC, and the ZIL in the mix. And then, to make things even more complicated, we have some sync policies ZFS allows us to toggle. They look like this:

sync policy

The sync toggle documentation is out there and you should understand it it is crucial to understanding ZFS, but I want to demonstrate the choices as well.

I’ve got three choices + the compression options. Which one of these combinations is going to give me the best performance & durability for my Hyper-V VMs?

SQLIO Short Test Runs 3-6, all PivotTabled up for your enjoyment and ease of digestion:

compressionsync

As is usually the case in storage, IT, and hell, life in general, there are no free lunches here people. This graph tells you what you already know in your heart: the safest storage policy in ZFS-land (Always Sync, that is to say, commit writes to the rotationals post haste as if it was the last day on earth) is also the slowest. Nearly 20 seconds of latency as I force ZFS to commit everything I send it immediately (vs flush it later), which it struggles to do at a measly average speed of 4.4 megabytes/second.

Compression-wise, I thought I’d see a big difference between the various compression schemes, but I don’t. Lzgb, lz4, and the ultra-space-saving/high-cpu-cost gzip-9 all turn in about equal results from an IOPS & performance perspective. It’s almost a wash, really, and that’s likely because of the predictable nature of the IO SQLIO is generating.

Labworks 1:Epilogue

Last point: ZFS, as Chris Wahl pointed out, is a sort of virtualization layer atop your storage. Now if you’re a virtualization guy like me or Wahl, that’s easy to grasp; Windows 2012 R2’s Storage Spaces concept is similar in function.

But sometimes in virtualization, it’s good to peel away the abstraction onion and watch what that looks like in practice. ZFS has a number of tools and monitors that look at your Zpool IO, but to really see how ZFS works, I advise you to run gstat. GStat shows what your disks are doing and if you’re carefully setting up your environment, you ought to be able to see the effects of your settings on each individual spindle.

In this Gifcam, watch ada0-6 as they struggle under load with the "Always Sync" option enabled.

In this Gifcam, watch ada0-5 (the western digitals)as they struggle under load with the “Always Sync” option enabled. Notice that the zvol/Alpha-Pool/Test2 volume (The logical volume construct) is at 100% busy and the ops/s are not very stellar.

Now look at this gstat sample. Under SQLIO-load, the zvol is showing 10,000 IOPS, 300+MB/s. But ada0-5, the physical drives, aren't doing squat.

Now look at this gstat sample. Under SQLIO-load, the zvol is showing 10,000 IOPS, 300+MB/s. But ada0-5, the physical drives, aren’t doing squat for several seconds at a time as SAN2 absorbs & processes all the IO coming at it.

That, friends, is the essence of ZFS.

 Links/Knowledge/Required Reading Used in this Post:

[table]
Resource, Author, Summary

Nexenta’s awesome whitepapers and guides, Nexenta, Find ’em and collect ’em good stuff on MPIO config and ZFS performance

Comparing SSD vs NoSSD in Nexenta w/NFS, Larry Smith, A fellow ZFS fan with more focus on NFS & VMware

Get the Most out of ZFS SSD, Sebastian “vBagpipes” Laubscher, Sebastian finds a different way to provision the ZIL/SLOG

Nexenta & Scale, Hans DeLeenHeer, Fellow #TFD delegate looks at ZFS tiers in superhero context

SLOG/ZIL Insight, FreeNAS forum, Great forum-focused post on SLOG/ZIL in BSD ZFS

SLOG Blog, Oracle, 2007 post about the ZIL & SLOG heralding storage di

 Zpool and ZIL management, Magnus Strahlert, Excellent how-to guide for ZIL/L2ARC provisioning

[/table]

 

Fail File : SAN down! SAN down! All Nodes respond

Introducing Fail File #1, where I admit to screwing something up and reflect on what I’ve learned

SAN2.daisettalabs.net, the NAS4Free server I built to simulate some of the functions I perform at work with big boy SANs, crashed last night.

Or, to put it another way, I pushed that little AMD-powered, FreeBSD-running, Broadcom-connected, ZFS-flavored franken-array to the breaking point:

Untitled picture

Love the directness of BSD. The iSCSI Target process was killed in cold blood, resulting in the death of several child partitions. What’s more, in just a few words, I have the suspect (Kernel) the motive (swap space) & the victim (iSCSI). Windows would have said, “The service terminated unexpectedly…error 0x081942ad-SOL”

 

Such are the perils of concentrated block storage, amiright? Instantly my Hyper-V Cluster Shared Volumes + the 8 or 9 VMs inside them dropped:

csvs

So what happened here?

I failed to grok the grub or fsck the fdisk or something and gave BSD an inadequate amount of swap space on the root 10GB partition slice. Then I lobbed some iSCSI packets its way from multiple sources and the kernel, starved for resources (because I’m using about 95% of my RAM for the ARC), decided to kill istgt, the iSCSI target service.

Thinking back to the winter, when I ran Nexenta -derived from Sun’s Solaris, not BSD-based- the failure sequence was different, but I’m not sure it was better.

When I was pounding the Nexenta SAN2 back in the winter, volleying 175,000+ iSCSI packets per second its way onto hardware that was even more ghetto, Nexenta did what any good human engineer does: compensate for the operator’s errors & abuses.

It was kind of neat to see. Whether I was running SQLIO simulations, an iometer run, robocopy or eseutil, or just turning on a bunch of VMs simultaneously, one by one, Nexenta services would start to drop as resources were exhausted.

First the gui (NMV it’s called). Then SSH. And finally, sometimes the console itself would lock up (NMC).

But never iSCSI, the disk subsystem, the ARC or L2ARC…those pieces never dropped.

Now to be fair, the GUI, SSH & console services never really turned back on either….you might end up with a durable storage system you couldn’t interact with at all until hard reset, but at least the LUNs stayed online.

This BSD box, in contrast, kills the most important service I’m running on it, but has the courtesy to admit to it and doesn’t make me get up out of my seat: GUI/SSH all other processes are running fine and I’ve instantly identified the problem and will engineer against it.

One model is resilient, bending but not breaking; the other is durable up to a point, and then it just snaps.

Which model is better for a given application?

Fail File Lesson #1: It’s just as important to understand how things fail as it is to understand why they fail, so that you can properly engineer against it. I never thought inadequate swap space would result in a homicidal kernel gunning for the most important service on the box…now I know.

Labworks 2:5-8 – Get-Me -ConvergedSwitching -For “Hyper-V” | Now-Please

Hello Labworks fans, detractors and partisans alike, hope you had a nice Easter / Resurrection / Agnostic Spring Celebration weekend.

Last time on Labworks 2:1-4, we looked at some of the awesome teaming options Microsoft gave us with Server 2012 via its multiplexor driver. We also made the required configuration adjustments on our switch for jumbo frames & VLAN trunking, then we built ourselves some port channel interfaces flavored with LACP.

I think the multiplexor driver/protocol is one of the great (unsung?) enhancements of Server 2012/R2 because it’s a sort of pre-virtualization abstraction layer (That is to say, your NICs are abstracted & standardized via this driver before we build our important virtual switches) and because it’s a value & performance multiplier you can use on just about any modern NIC, from the humble RealTek to the Mighty Intel Server 10GbE.

But I’m getting too excited here; let’s get back to the curriculum and get started shall we?

Goals

5.  Understand what Microsoft’s multiplexor driver/LBFO has done to our NICs

6. Build our Virtual Machine Switch for maximum flexibility & performance

7. The vEthernets are Coming

8. Next Steps: Jumbo frames from End-to-end and performance tuning

Schematic:

Lab 2 - Daisetta Labs overview

2:5 Understand what Microsoft’s Multiplexor driver/LBFO has done to our NICs

So as I said above, the best way to think about the multiplexor driver & Microsoft’s Load Balancing/Failover tech is by viewing it as a pre-virtualization abstraction layer for your NICs. Let’s take a look.

Our Network Connections screen doesn’t look much different yet, save for one new decked-out icon labeled “Daisetta-Team:”

daisettateam

Meanwhile, this screen is still showing the four NICs we joined into a team in Labworks 2:3, so what gives?

A click on the properties of any of those NICs (save for the RealTek) reveals what’s happened:

Egads! My Intel NIC has been neutered by LBFO

Egads! My Intel NIC has been neutered by LBFO

The LBFO process unbinds many (though not all) settings, configurations, protocols and certain driver elements from your physical NICs, then binds the fabulous Multiplexor driver/protocol to the NIC as you see in the screenshot above.

In the dark days of 2008 R2 & Windows core, when we had to walk up hill to school both ways in the snow I had to download and run a cmd tool called nvspbind to get this kind of information.

Fortunately for us in 2012 & R2, we have some simple cmdlets:

daisettateam3

So notice Microsoft has essentially stripped “Ethernet 4” of all that would have made it special & unique amongst my 4x1GbE NICs; where I might have thought to tag a VLAN onto that Intel GbE, the multiplexor has stripped that option out. If I had statically assigned an IP address to this interface, TCP/IP v4 & v6 are now no longer bound to the NIC itself and thus are incapable of having an IP address.

And the awesome thing is you can do this across NICs, even NICs made by separate vendors. I could, for example, mix the sacred NICs (Intel) with the profane NICs (RealTek)…it don’t matter, all NICs are invited to the LBFO party.

No extra licensing costs here either; if you own a Server 2012 or 2012 R2 license, you get this for free, which is all kinds of kick ass as this bit of tech has allowed me in many situations to delay hardware spend. Why go for 10GbE NICs & Switches when I can combine some old Broadcom NICs, leverage LACP on the switch, and build 6×1 or 8x1GbE Converged LACP teams?

LBFO even adds up all the NICs you’ve given it and teases you with a calculated LinkSpeed figure, which we’re going to hold it to in the next step:

4GbS LACP team sounds great, but is it really 4Gb/s?

4GbS LACP team sounds great, but is it really 4Gb/s?

2:6 Build our Virtual Machine Switch for maximum flexibility & performance

If we just had the multiplexor protocol & LBFO available to us, it’d be great for physical server performance & durability. But if you’re deploying Hyper-V, you get to have your LBFO cake and eat it too, by putting a virtual switch atop the team.

This is all very easy to do in Hyper-V manager. Simply right click your server, select Virtual Switch Manager, make sure the Multiplexor driver is selected as the NIC, and press OK.

Bob’s your Uncle:

daisettaconverged1

But let’s go a bit deeper and do this via powershell, where we get some extra options & control:

PS C:usersjeff.DAISETTALABS> new-vmswitch -NetAdapterInterfaceDescription “Microsoft Network Adapter Multiplexor Driver” -AllowManagementOS 1 -MinimumBandwidthMode Weight -name “Daisetta-Converged”

Let’s go through each of these:

  • New-vmswitch : the cmdlet we’re invoking to build the switch. Run get-help new-vmswitch for a rundown of the cmdlet’s structure & options
  • -NetAdapterInterfaceDescription : here we’re telling Windows which NIC to build the VM Switch on top of. Get the precise name from Get-NetAdapter and enclose it in quotes
  • -Allow ManagementOS 1 : Recall the diagram above. This boolean switch (1 yes, 0 no) tells Windows to create the VM Switch & plug the Host/Management Operating System into said Switch. You may or may not want this; in the lab I say yes; at work I’ve used No.
  • -Minimum Bandwidth Mode Weight: We lay out the rules for how the switch will apportion some of the 4Gb/s bandwidth available to it. By using “Weight,” we’re telling the switch we’ll assign some values later
  • Name: Name your switch

A few seconds later, and congrats Mr. Hyper-V admin, you have built a converged virtual switch!

2:7 The vEthernets are Coming

Now that we’ve built our converged virtual switch, we need to plug some things into it. And that starts on the physical host.

If you’re building a Hyper-V cluster or stand-alone Hyper-V host with VMs on networked storage, you’ll approach vEthernet adpaters differently than if you’re building Hyper-V for VMs on attached/internal storage or on SMB 3.0 share storage. In the former, you’re going to need storage vEthernet adpters; in the latter you won’t need as many vEthernets unless you’re going multi-channel SMB 3.0, which we’ll cover in another labworks session.

I’m going to show you the iSCSI + Failover Clustering model.

In traditional Microsoft Failover Clustering for Virtual Machines, we need a minimum of five discrete networks. Here’s how that shakes out in the Daisetta Lab:

[table]

Network Name, VLAN ID, Purpose, Notes

Management, 1, Host & VM management network, You can separate the two if you like

CSV, 14, Host Cluster & communication and coordination, Important for clustering Hyper-V hosts

LM, 15, Live Migration network, When you must send VMs from broke host to host with the most LM is there for you

iSCSI 1-3, 11-13, Storage, Soemwhat controversial but supported

[/table]

Now you should be connecting that dots: remember in Labworks 2:1, we built a trunked port-channel on our Cisco 2960S for the sole purpose of these vEthernet adapters & our converged switch.

So, we’re going to attach tagged vethernet adapters to our host via powershell. Pay attention here to the “-managementOS” tag; though our Converged switch is for virtual machines, we’re using it for our physical host as well.

You can script his out of course (and VMM does that for you), but if you just want to copy paste, do it in this order:

  • Add the vEthernets
add-vmnetworkadapter -managementos -name CSV -switchname Daisetta-converged
add-vmnetworkadapter -managementos -name iSCSI-1 -switchname Daisetta-converged add-vmnetworkadapter -managementos -name iSCSI-2 -switchname Daisetta-converged
add-vmnetworkadapter -managementos -name iSCSI-3 -switchname Daisetta-converged
add-vmnetworkadapter -managementos -name LM -switchname Daisetta-converged
  • Tag those vEthernets!
Set-VMNetworkAdapterVlan -ManagementOS -Access -VlanId 15 -VMNetworkAdapterName LM
Set-VMNetworkAdapterVlan -ManagementOS -Access -VlanId 14 -VMNetworkAdapterName CSV
Set-VMNetworkAdapterVlan -ManagementOS -Access -VlanId 13 -VMNetworkAdapterName iSCSI-3
Set-VMNetworkAdapterVlan -ManagementOS -Access -VlanId 12 -VMNetworkAdapterName iSCSI-2
Set-VMNetworkAdapterVlan -ManagementOS -Access -VlanId 11 -VMNetworkAdapterName iSCSI-1
  • Now set IPs
New-NetIPAddress -IPAddress 172.16.14.12 -InterfaceAlias "vEthernet (CSV)" -AddressFamily IPv4 -PrefixLength 24
 
New-NetIPAddress -IPAddress 172.16.15.12 -InterfaceAlias “vEthernet (LM)” -AddressFamily IPv4 -PrefixLength 24
New-NetIPAddress -IPAddress 172.16.13.12 -InterfaceAlias "vEthernet (iSCSI-3)" -AddressFamily IPv4 -PrefixLength 24
New-NetIPAddress -IPAddress 172.16.12.12 -InterfaceAlias "vEthernet (iSCSI-2)" -AddressFamily IPv4 -PrefixLength 24
New-NetIPAddress -IPAddress 172.16.11.12 -InterfaceAlias "vEthernet (iSCSI-1)" -AddressFamily IPv4 -PrefixLength 24
 

Notice we didn’t include a Gateway in the New-NetIPAddress cmdlet; that’s because when we built our Virtual Switch with the “-managementOS 1” switch attached, Windows automatically provisioned a vEthernet adapter for us, which either got an IP via DHCP or took an apipa address.

So now we have our vEthernets and their appropriate VLAN tags:

daisettaconverged2

Ignore the DMZ vEthernet for now. Notice Daisetta-Converged, our VM Switch, is seen as a VMNetworkAdapter and is untagged. In my lab, this interface functions as my Host Management interface. In a production scenario, you’ll probably use separate vEthernet adapters for Host Management and not expose the switch itself to the management OS

 

 

 

 

 

 

 

2:8: Next Steps : Jumbo Frames from end-to-end & Performance Tuning

So if you’ve made it this far, congrats. If you do nothing else, you now have a converged Hyper-V virtual switch, tagged vEthernets on your host, and a virtualized infrastructure that’s ready for VMs.

But there’s more you can do; stay tuned for the next labworks post where we’ll get into jumbo frames & performance tuning this baby so she can run with all the bandwidth we’ve given her.

Links/Knowledge/Required Reading Used in this Post:

[table]
Resource, Author, Summary
New-VMSwitch Technet, Microsoft, Always good to have Technet reference
Building a Converged Fabric with Server 2012, Hans “The Hyper-Dutchman” Vredevoort, A 2012 post which helped me when I was struggling through 2008 R2 to 2012 Hyper-V migration

Hyper-V 3.0 Converged Networks with Force 10 and DCB, Dell, Neat Wiki & diagram with iSCSI as separate virtual switch but with DCB

[/table]

 

 

Live Migration Performance in Theory & Practice -or- In Which I take on Aidan Finn

Aidan Finn, upstanding Irishman, apparent bear-cub puncher, hobbyist photog, MVP all-star  and one of my favorite Hyper-V bloggers (seriously, he’s good, and along with DidierV & the Hyper-Dutchman has probably saved my vAss more times than I can vCount) appeared on one of my favorite podcasts last week, RunAs Radio with Canuck Richard Campbell.

Which is all sorts of awesome as these are a few of my favorite things piled on top of each other (Finn on RunAs).

The subject? Hyper-V, scale out file servers (SoFS) in 2012 R2, SMB 3.0 multichannel and Microsoft storage networking, which are just about my favoritest subjects in the whole wide world. I mean what are the odds that one of my favorite Hyper-V bloggers would appear on one of my favorite tech podcasts? Remote. And talk about storage networking tech, Redmond-style, during that podcast?

Where Perfmon is king, you will find Hyper-V bloggers like DidierV, who gets to play with 10G RDMA NICs

Where Perfmon is king, you will find Hyper-V bloggers like DidierV, who gets to play with 10G RDMA NICs

All that and an adorable Irish brogue?

This is Instant nerdgasm territory here people; if you’re into these black arts as I am, it’s a must-listen.

Anyway, Finn reminded me of his famous powershell demos in which he demonstrates all the options we Hyper-V admins have at our disposal now when it comes to Live Migrating VMs from host to host.

And believe me, we have so many now it’s almost embarrassing, especially if you cut your teeth on Hyper-V 2.0 in 2008 R2, where successfully Live Migrating VMs off a host (or draining one during production) involved a few right clicks, chicken sacrifice, Earth-Jupiter-Moon alignment, a reliable Geiger counter by your side and a tolerance for Pucker Factor Values greater than 10* **.

Nowadays, we can:

  • Live Migrate VMs between hosts in a cluster (.vhdx parked in a Cluster Shared Volume, VM config, RAM & CPU on a host….block storage, the Coke Classic option)
  • Live Migrate VMs parked on SMB 3.0 shares, just like you NFS jockeys do
  • Shared-nothing Live Migration, either storage + VM, just storage, or just VM!
    • A for instance:  from my Dell Latitude i7 ultrabook with Windows 8.1 and client hyper-v installed (natch), I can storage Live Migrate a .vhdx off my skinny but fast 256GB SSD to a spacious SMB share at work, then drop it back on my laptop at the end of the day, all via Scheduled Task or powershell with no downtime for the VM
    •  With Server 2012/2012 R2 you get all those options + SMB 3.0 multichannel

Not only that, but we have some cool new toys with which to make the cost of Live Migration a VM to the host with the most a little less painful:

  • Standard TCP/IP : I like this because I’m old school and anything that stresses the network and LACP is fun because it makes the network guy sweat
  • Compression: Borrow spare cycles from the host CPU, compress the VM’s RAM, and Live Migrate your way out of a tight spot
  • SMB via Remote Direct Memory Access : the holy of holies in Live Migration. As Finn points out, this bit of tech can scale beyond the bandwidth capabilities of the PCIe 3 bus. SMB 3.0 + RDMA makes you hate your Northbridge

Finn*** of course provided some Live Migration start:finish times resulting from the various methods above, which I then, of course, interpreted as Finn daring me personally over the radio to try and beat those times in my humble Daisetta Lab.

Now this is just for fun people; not a Labworks-style list of repeatable results, so let’s not nerd-out on how my testing methodology isn’t sound & I’m a stupidhead, ok?

Anyway, Sysinternals has a nice little tool to redline the RAM in your Windows VM. I don’t know how Finn does it, but I don’t have workloads (yet!) in the Lab that would fill 4GB of RAM with non-random data on a VM, so off to the cmd we go:

You type this (haven’t played with all the switches yet) in this navy blue screen:

ramtest

And then this happen and the somewhat pink graph goes full pink:

ramthevm

Then we press this button to test Live Migration w/ compression, as the Daisetta Lab doesn’t have fancy RDMA NICs like certain well-connected Irish Hyper-V bloggers:

Wish I had some RDMA NICs :sadface:

Wish I had some RDMA NICs :sadface:

Which makes this blue celeste denim Azure colored line get all spikey:

Oh...my NUMAs, they're spikey. Second test was more dramatic than first. Why?

Oh! My NUMA! Second spike somewhat higher than first. Why?

all of which results in a wicked-fast Live Migrations & really cool orange-colored charts in my totally non-random, non-scientific but highly enjoyable laboratory experiment

ramthevm2

Still, in the end, I like my TCP/IP uncompressed Live Migrations because 1) sackcloth & ashes, and 2) I didn’t go to the trouble of building a multiplexed LACP team -with a virtual switch on top!- just to let the Cat5es in my attic have an easy day at the office:

livemigration1gbe

But at work: yes. I love this compression stuff and echo Finn’s observations on how Hyper-V doesn’t slam your host CPUs beyond what the host & its VM fleet could bear.

Anyway, did I beat Finn’s Live Migration times in this fun little test? Will the Irish MVP have to admit he’s not so esteemed after all and surrender his Hyper_V_MVP_badge.gif to me?

Of course I did and yes he will.

But not really.

[table caption=”Daisetta Lab LM vs Finn’s Powershell LM Scripts – 4GB VM” width=”500″ colwidth=”20|100|50″ colalign=”left|left|left|left|left”]
Who,TCP/IP LM,Compressed LM,RDMA & SMB 3 LM,Notes
Finn,78 seconds, 15 seconds,6.8 seconds, “Mr. I once moved a VM with 56GB of RAM in 35 seconds probably has a few Xeons”
D-Lab,38 seconds,Like 12 or something,Who’s ass do I need to kiss to get RDMA/iWarp?, But seriously my VM RAM was probably not random
[/table]

Finn notes in his posts that he’s dedicating an entire 1GbE NIC for his Live Migration Demos, wheras I’m embracing the converged switch model and haven’t even played with bandwidth or QOS settings on my Hyper-V switch.

How do my VMware colleagues & friends measure this stuff & think about vMotion performance & reliability? I know NFS can scale & perform, but am ignorant on the nuances of v3 vs v4, how it works on the host and Distributed vSwitch and your “Shared nothing” storage vMotion. And what’s this I hear that vSphere won’t begin a vMotion without knowing it will complete? How’s that determined?

I mean I could spend an hour or two googling it, or you could, I don’t know, post a comment and save me the time and spread some of your knowledge 😀

I’m jazzed about SMB 3.0, but there are only a handful of storage vendors who have support for the new stack, and among them, as Finn points out, Microsoft is #1 storage vendor for SMB 3 fans, with NetApp probably in 2nd place.

 

 

* Just kidding, it wasn’t that bad. Most days. 

** Pucker Factor Value can be measured by querying obscure wmi class win32_pfv

*** Finn is a consultant. So you can hire him. I have no relationship with him other than admiration for his scripting skillz

Labworks 2:1-4 : Converged Hyper-V Switching like a boss

Greetings Labworks fans, today we’re going to learn how to build converged Hyper-V switches, switches so cool they’re nearly identical to the ones available to enterprise users with their fancy System Center licenses.

If you’re coming from a VMware mindset, a Hyper-V converged switch is probably most similar to Distributed vSwitches, though admittedly I’m a total n00b on VMware, so take that statement with a grain of salt. The idea here is to build an advanced switching fabric on your Hyper-V hosts that is fault-tolerant & performance-oriented, and like a Distributed vSwitch, common among your physical hosts and your guests. 

This is one of my favorite topics because I have a serious & problematic love-affair with LACP and a Terrets-like urge to team things up & jumbo, but you don’t need an LACP-capable switch or jumbo frame to enjoy Converged Switching goodness.

Let’s dive in, shall we?

Goals

  1. Prepare the physical switch for Jumbo Frames
  2. Understand LBFO: Microsoft’s Load Balancing/Fail Over teaming technology introduced in Server 2012
  3. Enable LACP on the Switch and on the Server
  4. Build the Switch on the Team & Next Steps

Required Tools ‘n Tech:

  • Server 2012 or 2012 R2…sorry Windows 8.1 Professional/Enterprise fans…LBFO is not available for 8.1. I know, I feel your pain. But the naked Hyper-V 3.0 Hypervisor (Core only) is free, so what are you waiting for?
  • A switch, preferably gigabit. LACP not required but a huge performance multiplier
  • NICs: As in plural. You need at least two. Yes, you can use your Keepin’ it RealTek NICs..Hyper-V doesn’t care that your NICs aren’t server-grade, but I advise against consumer-NICs for production!!

Schematic

State of the Lab as of today. Ag_node_1 is new, with a core i7 Haswell (Yay!), ag_node_2 is the same, still running CSVs off my ZFS box, and check it out, bottom right: a new host, SMB1:

Lab 2 - Daisetta Labs overview

SMB1 Detail:

 

labworks 2

2:1 Prepare the Physical Switch for Jumbo Frames

You can skip this section if all you have at your disposal is a dumb switch.

Commands below are off of a Cisco 2960s. Commands are similar on the new SG300 & 500 series Cisco switches. PowerConnect 5548 switches from Dell aren’t terribly different either, though I seem to recall you have you enable jumbo mtu on each port as well as the switch.

First we’re going to want to turn on Jumbo Frames, system-wide, which usually requires a reload of your switch, so schedule for a maintenance window!

daisettalabs.net(config)#system mtu jumbo 9198

You can run a show system mtu after the reload to be sure the switch is ready for the corpulent frames you will soon send its way:

daisettalabs.net#show system mtu

System MTU size is 1514 bytes
System Jumbo MTU size is 9198 bytes
System Alternate MTU size is 1514 bytes
Routing MTU size is 1514 bytes

2:2 Load Balancing & Failover

Load Balancing & Failover, or LBFO as it’s known, was the #1 feature I was looking forward to in Server 2012.

And boy did Microsoft deliver.

LBFO is a driver/framework that takes whatever NICs you have, “teams” them, applies a mature & resilient multiplexor driver to them, and gives you redundancy & performance in just a few clicks or powershell cmdlets. Let’s do GUI for the team, and later on, we’ll use Powershell to build a switch on that team.

Sidenote: Don’t bother applying IP addresses, VLANs to your LBFO-destined physical NICs at this point. Do bother installing your manufacturer’s latest driver, or hacking one on as I’ve had to do with my new ag_node_1 Intel NIC. (SideSideNote: as this blogger states, Intel can eat a bag of d**** for dropping so many NICs from Server 2012 support. Broadcom, for all the hassles I’ve had with them, still updates drivers on four year old cards!)

On SMB1 from the above schematic, I’ve got five gigabit NICs. One is a RealTek on the motherboard, and the other four are Intel; 1-4 on a PCIe Quad Gigabit network card, i350 x4 I believe.

nics1

The RealTek NIC has a static IP and is my management interface for the purposes of this labworks. We’ll only be teaming the four Intel NICs here. Be sure to leave at least one of your NICs out of the LBFO team unless you are sitting in front of your server console; you can always add it in later.

Launch Server Manager in the GUI and click on “All Servers,” then right click on SMB1 and select Configure NIC Teaming:

nics2

A new window will emerge,titled, NIC Teaming.

In the NIC Teaming window, notice on the right the five GbE adapters you have and their status (Green Arrow). Click on “Tasks” and select “New Team” (Red Arrow):

nics3

The New Team window is where all the magic happens. Let’s pause for a moment and go to our switch.

On my old 2960s, we’re building LACP-flavored port channels by using the “channel group _ mode active” command, which tells the switch to use the genuine-article LACP/802.11ax protocol rather than the older Cisco proprietary Port Aggregation Protocol (PAgp) system, which is activated by running “channel group _ mode auto.”

However, if you have a newer switch, perhaps a nice little SG 300 or something similar, PAgp is dead and not available to you, but the process for LACP is like the old PAgp command: “channel group _ mode auto”  will turn on LACP.

Here’s the 2960s process. Note that my Intel NICs are plugged into Gig 1/0/20-23, with spanning-tree portfast enabled (which we’ll change once our Converged virtual switch is built):

daisettalabs.net#show run int gig 1/0/20
Building configuration...

Current configuration : 63 bytes
!
interface GigabitEthernet1/0/20
spanning-tree portfast

daisettalabs.net#conf t
Enter configuration commands, one per line. End with CNTL/Z.
daisettalabs.net(config)#int range gig 1/0/20-23
daisettalabs.net(config-if-range)#description SMB1 TEAM
daisettalabs.net(config-if-range)#speed 1000
daisettalabs.net(config-if-range)#duplex full
daisettalabs.net(config-if-range)#channel-group 3 mode active
daisettalabs.net(config-if-range)#switchport mode trunk
daisettalabs.net(config-if-range)#
daisettalabs.net(config-if-range)#do wr
Building configuration...
[OK]

Presto! That wasn’t so hard was it?

Note that I’ve trunked all four interfaces; that’s important in Hyper-V Converged switching. We’ll need to trunk po3 as well. 

Let’s take a look at our new port channel:

daisettalabs.net(config-if-range)#do show run int po3
Building configuration…

Current configuration : 54 bytes
!
interface Port-channel3
switchport mode trunk
end

daisettalabs.net(config-if-range)#

Now let’s check the state of the port channel:

daisettalabs.net#show etherchannel summary
Flags: D - down P - bundled in port-channel
I - stand-alone s - suspended
H - Hot-standby (LACP only)
R - Layer3 S - Layer2
U - in use f - failed to allocate aggregator

M - not in use, minimum links not met u - unsuitable for bundling w - waiting to be aggregated d - default port Number of channel-groups in use: 3 Number of aggregators: 3 Group Port-channel Protocol Ports ------+-------------+-----------+----------------------------------------------- 
1 Po1(SU) LACP Gi1/0/1(P) Gi1/0/2(P) Gi1/0/3(P)
2 Po2(SU) LACP Gi1/0/11(D) Gi1/0/13(P) Gi1/0/14(P) Gi1/0/15(P) Gi1/0/16(P) 
3 Po3(SD) LACP Gi1/0/19(s) Gi1/0/20(D) Gi1/0/21(s) Gi1/0/22(s) Gi1/0/23(D)

po3 is in total disarray, but not for long. Back on SMB1, it’s time to team those NICs:

nic5

I’m a fan of naming-conventions even if this screenshot doesn’t show it; All teams on all hosts have the same “Daisetta-Team” name, and I usually rename NICs as well, but honestly, you could go mad trying to understand why Windows names NICs the way it does (Seriously. It’s a Thing). There’s no /dev/eth0 for us in MIcroosft-land, it’s always something obscure and strange and out-of-sequence, which is part of the reason why Converged Switching & LBFO kick ass; who cares what your interfaces are named so long as they are identically configured?

If you don’t have an LACP-capable switch, you’ll select “Switch Independent” here.

As for Load Balancing modes: in server 2012, you get Address Hash (Source/Dest MAC or IP in Layer 3 LACP), or Hyper-V Port, which is sort of a round-robin approach (VM1 goes to one port in the team, VM2 to the other).

I prefer the new (with 2012 R2) Dynamic mode which negotiates with the physical switch. More color on those choices & what they mean for you in the References section at the bottom.

Press ok, sit back, and watch my gifcam shot:

Mmmm, taste the convergence.

2:4 Build a Switch on top of that team & Next Steps

If you’ve ever built a switch for Hyper-V, you’ll find building the converged switch immediately familiar, save for one technicality: you’re going to build a switch on top of that multiplexor driver you just created!

Sounds scary? Perhaps. I’ll go into some of the intricacies and gotchas and show some cool powershell bits ‘n bobs on the next episode of Labworks.

Eventually we’re going to dangle all sorts of things off this virtual switch-atop-a-multiplexor-driver!

nic6

 

Links/Knowledge/Required Reading Used in this Post:

[table]
Resource, Author, Summary
Windows Server 2012 LBFO Whitepaper, Microsoft, Must-have though a bit dated at this point
Etherchannel Considerations, Jeremy Stretch at Packetlife.net, Great overview on Cisco aggregation tech including LACP and PAgp

VLAN Tricks with NICS – Teaming & Hyper-V, Keith Mayer, LBFO + VLANs – Hyper-V = still a win

[/table]

Nimble Storage review : 3 Weeks at Ludicrous Speed

I’ve been going on, insufferably at times, about my new Nimble storage array at work. Back in January, it passed my home-grown bakeoff with flying colors, in February I wrote about how it was inbound to my datacenter, in March I fretted over iSCSI traffic, .vhdx parades, and my 6509-E.

Well it’s been just about a month since it was racked up and jacked into my Hyper-V fabric and I thought maybe the storage nerds among my readers would like an update on how its performing.

The Good

Fast: It’s been strange getting compliments, kudos and thank yous rather than complaints and ALL CAPS emails punctuated by Exclamation Marks. I have a couple of very critical SQL databases, the performance of which can make or break my job, and after some deliberation, we took the risk and moved the biggest of them to the Nimble about three weeks ago.

Here’s a slightly edited email from one power user 72 hours later:

Did I say THANK YOU for the extra zip yet?

STILL LOVING IT!!

I’m taken aback by all the affection coming my way…no longer under user-siege, I feel like maybe I should dress better at work, shave every day, turn on some lights in the office perhaps.  Even the dev team was shocked, with one of them invoking Spaceballs and saying his storage-dependent process was moving at “Ludicrous speed.”

It’s Easy: I can’t underscore this enough. If you’re a mid-sized enterprise with vanilla/commodity workloads and you can tolerate an array that’s just iSCSI (you can still use NFS or SMB 3, just from inside clustered VMs!), Nimble’s a good fit, especially if your staff is more generalist in nature. or you don’t have time to engineer a new SAN from scratch.

This was a Do It Yourself storage project for me; I didn’t have the luxury or time to hire storage engineers or VARs to come in and engineer it for me. Nimble will try to sell you on professional services, but you can decline and hook it up yourself, as I did. There are best practice guides a-plenty, and if you understand your stack & workload, your switching & compute, you’ll do fine.

Buying it was easy: Nimble’s lineup is simple and from a customer standpoint, it was a radically different experience to buy a Nimble than a traditional SAN.

Purchasing a big SAN is like trying to decide what to eat at French restaurant in Chinatown…you recognize the letters and & the pictures on the menu look familiar, but you don’t know what that SKU is exactly or how you’ll feel in the morning after buying & eating it. And while the restaurant has provided a helpful & knowledgeable garçon to explain French cuisine & etiquette to you, you know the garçon & his assistants moonlight at the Italian, German and Sushi place down the road, where they are equally knowledgeable & enthusiastic about those cuisines. But they can’t talk about the Italian place because they have something called agency with the french restaurant; so with you, they are only French cuisine experts, and their professional opinion is that Italian, German and Sushi are horrible food choices. Also, your spend with the restaurant is too small to get the chef’s attention..you have to go through this obnoxious garçon system. 

Buying from Nimble, meanwhile, is like picking a burger at In ‘n Out. You have three options, all of them containing meat, and from left to right, the choices are simply Good, Better, Best. You can stack shelves onto controller-shelves, just like a Double-Double, and you know what you’ll get in the end. Oh sure, there’s probably an Animal Style option somewhere, but you don’t need Animal Style to enjoy In ‘n Out, do you?

Lesson is this: Maybe your organization needs a real full-featured SAN & VAR-expertise. But maybe you just need fast, reliable iSCSI that you can hook up yourself.

It’s nice that we in customer-land have that option now.

MPIO Module.gifASUP & Community: The Autosupport from Nimble has left nothing to be desired, in fact, I think they nag too much. But I’ll take that over a downed array.

I’ve grown to enjoy Connect.Nimble.com, the company’s forum where guys like me can compare notes. Shout out to one awesome Nimble SE named Adam Herbert who built a perfect signed MPIO Powershell script that maps your initiators to your targets in no time at all.

And then you get to sit back and watch as MPIO does its thang across all your iSCSI HBAs, producing symmetrical & balanced utilization charts which, in turn, release pleasing little bursts of storage-dopamine in your brain.

It works fine with Hyper-V, CSVs, and Converged Fabric vEthernets: What a mouthful, but it’s true. Zero issues fitting this array into System Center Virtual Machine Manager storage (though it doesn’t have SMI-S support a “standard” which few seem to have adopted), failing CSVs from one Hyper-V node to another, and resizing CSVs or RDMs live.

Taking a big bite out of Storage Network's forbidden fruit: LACP + MPIO

Taking a big bite out of Storage Network’s forbidden fruit: LACP + MPIO

And for the convergence fans: I pretty much lost my fear of using vEthernet adapters for iSCSI traffic during the bakeoff and in the Daisetta Lab at home, but in case you needed further convincing that Hyper-V’s converged fabric architecture kicks ass, here it is: Each Hyper-V node in my datacenter has 12 gigabit NICs. Eight of them per host are teamed (that is to say they get the Microsoft Multiplexor driver treatment, LACP-flavor) and then a Converged Virtual switch is built atop the multiplexor driver. From that converged v-switch, I’m dangling six virtual Ethernet adapters per host, two of which, are tagged for the Nimble VLAN I built in the 6509.

That’s a really long and complicated way of saying that in a modest-sized production environment, I’m using LACP teaming on the hosts, up to 4x1GbE vNics on the VIP guests, and MPIO to the storage, which conventional storage networking wisdom says is a bit like kissing your sister and bragging about it. Maybe it’s harmless (even enjoyable?) once or twice, but sooner or later, you’ll live to regret it. And hey the Department of Redundancy Department called, they want one of their protocols back.

I’ve read a lot of thoughtful pieces from VMware engineers & colleagues about doing this, but from a Hyper-V perspective, this is supported, and from a Nimble array perspective, I’m sure they’d point the finger at this if something went wrong, but it hasn’t, and from my perspective : one converged virtual switch = easy to deploy/templatize, easy to manage & monitor. Case closed. 

LACP + MPIO in Hyper-V works so well that in three weeks of recording iSCSI stats, I’ve yet to record a single TCP error/re-transmit or anything that would make me think the old model was better. And I haven’t even applied bandwidth policies on the converged switches yet; that tool is still in my box and right now iSCSI is getting the Hyper-V equivalent of best effort. 

It’s getting faster: Caching is legit. All my monitors and measurements prove it out. Implement your Nimble correctly, and you may see it get faster as time goes on.

And by that I mean don’t tick the “caching” box for every volume. Conserve your resources, develop a strategy and watch it bloom and grow as your iSCSI packets find their way home faster and faster.

The DBA is noticing it too in his latency timers & long running query measurements, but this graph suffices to show caching in action over three weeks in a more exciting way than a select * from slow-ass-tables query:

Up is good. We like up because it means fewer trips to the rotational disks.

Up is good. We like up because it means fewer trips to the rotationals*.

Least Frequently Used, Most Recently Used, Most Frequently Used….who frequently/recently cares what caching algorithm the CASL architecture is using? A thousand whiteboard sessions conducted by the world’s greatest SE with the world’s greatest schwag gifts couldn’t sell me on this the way my own charts and my precious perfmons do.

My cached Nimble volumes are getting faster baby.

Compression wise, I’m seeing some things I didn’t expect. Some volumes are compressing up to 40x. Others are barely hitting 1.2x. The performance impact of this is hard to quantify, but from a conservation standpoint, I’m not having to grow volumes very often. It’s a wash with the old dedupe model, save for one thing: I don’t have to schedule compression. That’s the CPUs job, and for all I know, the Nehalems inside my CS260 are, or should be, redlining as they lz4 my furious iSCSI traffic.

The Bad

Busy Box & CLI: The Nimble command line in version 1.4x felt familiar to me the first time I used it. I recognized the command structure, the help files and more, and thought it looked like Busy Box.

Hey wait a minute. Where's my breakfast confection CLI? This looks like....ANDROID

Hey wait a minute. Where’s my breakfast confection CLI? This looks like….ANDROID

What’s Busy Box? How to put this without making enemies of The Guys Who Say Vi…Busy Box is a collection of packages, tools, servers and scripts for the unix world developed about 25 years ago by an amazing Unix engineer. It’s very popular, it’s everywhere, and it’s reliable and I have no complaints about it other than the fact that it’s disconcerting that my Nimble has the same package of tools I once installed on my Android handset.

But that’s just the Windows guy talking, a Windows guy who was really fond of his WAFL and misses it but will adapt and holds out hope that OneGet & PowerShell, one day, will emerge victorious over all.

The Ugly

The SSL cert situation is embarrassing and I’m glad my former boss hasn’t seen it. Namely that situation is this: you can’t replace the stock SSL cert, which, frankly looks like something I would do while tooling around with OpenSSL in the lab.

Who is Jetty Mortbay and why does he want inside my root CA store?

Who is Jetty Mortbay and why does he want inside my root CA store?

I understand this is fixed in the new 2.x OS version but holy shit what a fail. 

Other than that, I’m very pleased -and the organization is very pleased***- with our Nimble array.

It feels like at last,  I’m enjoying the fruits of my labor, I’m riding a high-performance storage array that was cost-effective, easy to install, and is performing at/above expectations. I’m like Major Kong, my array is literally the bomb, man and his machine are in harmony and there’s some joy & euphoria up in the datacenter as my task is complete.

ride

*Remember this lesson #StorageGlory seekers: no one knows your workload like you. The above screenshot of cache hits is of a 400GB SQL transaction log volume of a larger SQL DB that’s in use 24/6. Your mileage may vary. 

*** I do not speak for the organization even though I just did. 

Fresh ZFS on Hyper-V : Nexenta 4.01 CE or I have my evening planned

nexenta4

There’s been some changes in the Daisetta Lab.

Details soon, but sometimes, the fun just can’t wait on my ability to blog it.

Here’s a hint:

  • Nexenta 4.01 Community Edition LIVE
  • 2x Samsung 256GB SSD in RAID 0
  • 3x HGST 2TB in RAID 0
  • Core i5-4670k
  • 16GB RAM on the VM
  • Hyper-V 3.0 & a Legacy virtual NIC because, sadly, Nexenta doesn’t see the Hyper-V Synthetic NIC

The Sammy SSDs put in this outstanding effort last night as I was breaking down into fits of maniacal laughter:

2ssdraid0

1GB/second writes is impressive, but what has ZFS taught us through the course of Labworks 1?

The ARC is still the king.