In his famous essay The Myth of Sisyphus, French existentialist Albert Camus argued that though life is absurd & meaningless, man can achieve relative happiness by acknowledging the true nature of his existence, revolting against it, and enjoying his freedom.
Camus then discussed some examples of men who achieve happiness despite the absurdity, but the greatest Absurd Man of them all, Camus reckons, is Sisyphus, the Greek mythology figure who was doomed to pushing a rock up a mountain every day, only to have to repeat the same task the next day, on and on and on for eternity.
The trick to life, Camus famously said, is to “imagine Sisyphus happy.”
Sorry Camus, but that’s a load of bull. It really sucks being Sisyphus and there’s no way he’s happy pushing that boulder up the mountain day after day.
Especially if Sisyphus is mid-career IT guy on the Infrastructure side of the house. IT guys are supposed to hate repetitive tasks, and if we’re pushing boulders up the mountain again and again, it’s automatic #ITFail as far as I’m concerned. Button pushing monkey work drains the soul & harms the career.
So we automate the boulder push via a script or cron job or scheduled task and then we put some reporting & metrics around boulder performance, and then, just like that, IT Sisyphus can chill out at the bottom of the mountain and feel relatively happy & satisfied.
Yet the risk here for IT Sisyphus is that the care & feeding of the script or cron job becomes the new boulder.
And that’s where I’ve been at these last few difficult weeks at work. The time-saving techniques of yesteryear are the new boulder I’m pushing up the mountain everyday. I see the Absurdity in this, but no one wants to join me in a revolt; the organization is content with the new-boulder-same-as-the-old-boulder strategy.
But I’m not. I’m a Systems Engineer, I’m called to be more than a script-watching Systems Administrator. I’m supposed to hate boulder-pushing, but I aim higher.
I want to defeat gravity.
As long as I’m pushing out five cent IT allegories & metaphors, I might as well mention this one.
Parent Partition has his Sandbox at home:
and soon, hopefully this weekend, the Child Partition will have finally have his Sandbox as well:
I have a whole bunch of good blog posts on the warm burner, but lately my free time has gone to terraforming the side yard and extracting cartloads of dirt to create a large play area & sandbox for Child Partition.
You might think IT guys hate manual labor like this, but to be honest, getting outside and literally toiling in the soil with spades, hoes, rakes and my own sweat & muscle has been regenerative.
Ora et labora; certain Catholic monastic traditions say. Pray and work.
Whatever your muse is, it’s good to occasionally step away from the keyboard and reflect on things while you struggle against the elements, as I have been for the last several weeks on Child Partition’s Sandbox. I’d like to blame the Supervisor Module Spouse and her tendency to move the goalposts, but really, this is my first landscape architecture project and to be frank, I’m not very good at it. I can’t even make the ground level.
But it’s still fun.
Today, Microsoft’s purchase of Nokia closes. Nokia, as we know them, kind of cease to exist. Or do they?
Paul Therott, ace Microsoft blogger/reporter /pundit at WinSuperSite.com, is worried. What makes Nokia special & interesting, he reckoned on Windows Weekly, is the fact that it’s Finnish, it’s old and, as Leo Laporte pointed out, Nokia owns its own supply chain & manufacturing force. From rare-earth mineral extraction that’s safe & socially responsible to device design & construction, Nokia is a classic, vertically-integrated device manufacturer. Everyone else uses Foxconn; Nokia is the exception.
And now they’re owned by Nadella & Microsoft.
As Microsoft ingests Nokia, what’s going to happen to our beloved Finnish phone maker?
I’m a longtime Nokia fan…most guys my age were exposed to smartphones in the late 90s early 2000s era. Some went for Blackberry and its legendary keyboard. Others went Palm & Treo with either PalmOS or Windows Mobile. I was always in the Nokia/Symbian S60 camp until Android arrived on scene.
And I miss it. I miss my old Nokia E51, it’s fast, secure, and unique S60 operating system, and yeah, sometimes I miss the keyboard. And it was made in Finland! By Finns.
So I hope Thurott’s worries are misplaced, but I fear he might be right. Bland Pacific Northwest design sense & standardized Asian-outsourced product management are going to supplant the unique Finnish ethos that made Nokia, Nokia.
Lastly: today marks one week since I ditched Google, the Nexus 5 and went full Redmond with an Office 365 Enterprise E1 subscription + Nokia Lumia Icon running Windows Phone 8.1.
And I’m hooked. The ideal is closer today than it was a year ago, or even a month ago: agnostic computing. I loved Google for so long because I could get what I wanted on whatever device I had on me at the time; today I can do the same with Windows and it’s not so clunky.
Also,the Icon’s camera & optics are incredible. The Nokia camera software & effects produce some really stunning shots.
I finally have a phone with a camera that is superior to my wife’s iPhone 5. It’s great.
More on the transition to O365 next week; have a good weekend!
Introducing Fail File #1, where I admit to screwing something up and reflect on what I’ve learned
SAN2.daisettalabs.net, the NAS4Free server I built to simulate some of the functions I perform at work with big boy SANs, crashed last night.
Or, to put it another way, I pushed that little AMD-powered, FreeBSD-running, Broadcom-connected, ZFS-flavored franken-array to the breaking point:
Such are the perils of concentrated block storage, amiright? Instantly my Hyper-V Cluster Shared Volumes + the 8 or 9 VMs inside them dropped:
So what happened here?
I failed to grok the grub or fsck the fdisk or something and gave BSD an inadequate amount of swap space on the root 10GB partition slice. Then I lobbed some iSCSI packets its way from multiple sources and the kernel, starved for resources (because I’m using about 95% of my RAM for the ARC), decided to kill istgt, the iSCSI target service.
Thinking back to the winter, when I ran Nexenta -derived from Sun’s Solaris, not BSD-based- the failure sequence was different, but I’m not sure it was better.
When I was pounding the Nexenta SAN2 back in the winter, volleying 175,000+ iSCSI packets per second its way onto hardware that was even more ghetto, Nexenta did what any good human engineer does: compensate for the operator’s errors & abuses.
It was kind of neat to see. Whether I was running SQLIO simulations, an iometer run, robocopy or eseutil, or just turning on a bunch of VMs simultaneously, one by one, Nexenta services would start to drop as resources were exhausted.
First the gui (NMV it’s called). Then SSH. And finally, sometimes the console itself would lock up (NMC).
But never iSCSI, the disk subsystem, the ARC or L2ARC…those pieces never dropped.
Now to be fair, the GUI, SSH & console services never really turned back on either….you might end up with a durable storage system you couldn’t interact with at all until hard reset, but at least the LUNs stayed online.
This BSD box, in contrast, kills the most important service I’m running on it, but has the courtesy to admit to it and doesn’t make me get up out of my seat: GUI/SSH all other processes are running fine and I’ve instantly identified the problem and will engineer against it.
One model is resilient, bending but not breaking; the other is durable up to a point, and then it just snaps.
Which model is better for a given application?
Fail File Lesson #1: It’s just as important to understand how things fail as it is to understand why they fail, so that you can properly engineer against it. I never thought inadequate swap space would result in a homicidal kernel gunning for the most important service on the box…now I know.
Last week Amazon announced a new device called a Kindle Fire TV HDX or whatnot.
Basically, it’s a little black box. With some CPU, RAM & GFX processor. And an HDMI output, remote, and a slick GUI. You plug it into your TV and with your Internet connection, you can stream things to it. It’s received mediocre reviews, and by now we can all understand why. It’s just a warmed-over Roku with some forked Android bits on it & some gaming capability. At $99 it isn’t cheap either. It’s essentially a vector for Prime into your living room, as if you couldn’t get Prime on anything else.
So I’m watching a review about this device last weekend on screen 1 of agnostic_node_1. daisettalabs.net and on screen 2, I’m writing up my Labworks post about Hyper-V converged switching and, as you would expect, I experienced acute Cognitive Dissonance Kernel Panic in which my brain simultaneously was thinking of of irrational TV systems, switching inputs from HDMI 1 to HDMI 2 manually whillst writing about Hyper-V’s fabulously converged virtual switches in which all inputs are trunked and everything just works.
And that kernel panic reminded me of an Amazon review I wrote about a remote control, Windows Media Center, and my last attempt to rationalize the TV beast in my living room. The piece is full of #TechnologyDespair and is titled “Thanks Saxony, for Ruining My Life.”
Pertino is essentially an NVGRE, ipv4 & ipv6 overlay technology that works with your existing on-prem equipment and easily extends securely your Layer 2/Layer 3 LAN assets to the places where your users are: the internet.
Well I own my mistakes here on AC so let me correct that characterization.
Pertino, as the firm’s marketing VP Todd Krautkremer pointed out to me, uses:
an NVGRE-like encapsulation method that we combine with network virtualization into a solution we call Overpass. The proprietary extensions allow to support advance features in the future like using different encapsulation methods for differ traffic types (e.g. VoIP, streaming, etc)
The post has been updated. My regret on the error; I see or hear NVGRE and my Microosft whiskers stand on end, and it’s off to the keyboard races I go.
Neat to see that Pertino is thinking about encapsulation of different traffic types in the future too. I’ve reached out for more comment on that.
Also, Pertino still offers a free personal plan for other IT guys to test with:
We still do offer the FREE, 3-device Personal Plan. We believe that many IT pros are experiential users who just want to get their hands on a product in order to understand its capabilities.
So now you really have no excuse not to try it (disclosure: they let me use up to 9 devices for testing, but I have no one to play with. Join me on my Pertino Network, and let’s link our Home Labs up in a sort of grand Home Lab Apollo-Soyuz Moment).
Hello Labworks fans, detractors and partisans alike, hope you had a nice Easter / Resurrection / Agnostic Spring Celebration weekend.
Last time on Labworks 2:1-4, we looked at some of the awesome teaming options Microsoft gave us with Server 2012 via its multiplexor driver. We also made the required configuration adjustments on our switch for jumbo frames & VLAN trunking, then we built ourselves some port channel interfaces flavored with LACP.
I think the multiplexor driver/protocol is one of the great (unsung?) enhancements of Server 2012/R2 because it’s a sort of pre-virtualization abstraction layer (That is to say, your NICs are abstracted & standardized via this driver before we build our important virtual switches) and because it’s a value & performance multiplier you can use on just about any modern NIC, from the humble RealTek to the Mighty Intel Server 10GbE.
But I’m getting too excited here; let’s get back to the curriculum and get started shall we?
5. Understand what Microsoft’s multiplexor driver/LBFO has done to our NICs
6. Build our Virtual Machine Switch for maximum flexibility & performance
7. The vEthernets are Coming
8. Next Steps: Jumbo frames from End-to-end and performance tuning
2:5 Understand what Microsoft’s Multiplexor driver/LBFO has done to our NICs
So as I said above, the best way to think about the multiplexor driver & Microsoft’s Load Balancing/Failover tech is by viewing it as a pre-virtualization abstraction layer for your NICs. Let’s take a look.
Our Network Connections screen doesn’t look much different yet, save for one new decked-out icon labeled “Daisetta-Team:”
Meanwhile, this screen is still showing the four NICs we joined into a team in Labworks 2:3, so what gives?
A click on the properties of any of those NICs (save for the RealTek) reveals what’s happened:
The LBFO process unbinds many (though not all) settings, configurations, protocols and certain driver elements from your physical NICs, then binds the fabulous Multiplexor driver/protocol to the NIC as you see in the screenshot above.
In the dark days of 2008 R2 & Windows core, when we had to walk up hill to school both ways in the snow I had to download and run a cmd tool called nvspbind to get this kind of information.
Fortunately for us in 2012 & R2, we have some simple cmdlets:
So notice Microsoft has essentially stripped “Ethernet 4” of all that would have made it special & unique amongst my 4x1GbE NICs; where I might have thought to tag a VLAN onto that Intel GbE, the multiplexor has stripped that option out. If I had statically assigned an IP address to this interface, TCP/IP v4 & v6 are now no longer bound to the NIC itself and thus are incapable of having an IP address.
And the awesome thing is you can do this across NICs, even NICs made by separate vendors. I could, for example, mix the sacred NICs (Intel) with the profane NICs (RealTek)…it don’t matter, all NICs are invited to the LBFO party.
No extra licensing costs here either; if you own a Server 2012 or 2012 R2 license, you get this for free, which is all kinds of kick ass as this bit of tech has allowed me in many situations to delay hardware spend. Why go for 10GbE NICs & Switches when I can combine some old Broadcom NICs, leverage LACP on the switch, and build 6×1 or 8x1GbE Converged LACP teams?
LBFO even adds up all the NICs you’ve given it and teases you with a calculated LinkSpeed figure, which we’re going to hold it to in the next step:
2:6 Build our Virtual Machine Switch for maximum flexibility & performance
If we just had the multiplexor protocol & LBFO available to us, it’d be great for physical server performance & durability. But if you’re deploying Hyper-V, you get to have your LBFO cake and eat it too, by putting a virtual switch atop the team.
This is all very easy to do in Hyper-V manager. Simply right click your server, select Virtual Switch Manager, make sure the Multiplexor driver is selected as the NIC, and press OK.
Bob’s your Uncle:
But let’s go a bit deeper and do this via powershell, where we get some extra options & control:
New-vmswitch : the cmdlet we’re invoking to build the switch. Run get-help new-vmswitch for a rundown of the cmdlet’s structure & options
-NetAdapterInterfaceDescription : here we’re telling Windows which NIC to build the VM Switch on top of. Get the precise name from Get-NetAdapter and enclose it in quotes
-Allow ManagementOS 1 : Recall the diagram above. This boolean switch (1 yes, 0 no) tells Windows to create the VM Switch & plug the Host/Management Operating System into said Switch. You may or may not want this; in the lab I say yes; at work I’ve used No.
-Minimum Bandwidth Mode Weight: We lay out the rules for how the switch will apportion some of the 4Gb/s bandwidth available to it. By using “Weight,” we’re telling the switch we’ll assign some values later
Name: Name your switch
A few seconds later, and congrats Mr. Hyper-V admin, you have built a converged virtual switch!
2:7 The vEthernets are Coming
Now that we’ve built our converged virtual switch, we need to plug some things into it. And that starts on the physical host.
If you’re building a Hyper-V cluster or stand-alone Hyper-V host with VMs on networked storage, you’ll approach vEthernet adpaters differently than if you’re building Hyper-V for VMs on attached/internal storage or on SMB 3.0 share storage. In the former, you’re going to need storage vEthernet adpters; in the latter you won’t need as many vEthernets unless you’re going multi-channel SMB 3.0, which we’ll cover in another labworks session.
I’m going to show you the iSCSI + Failover Clustering model.
In traditional Microsoft Failover Clustering for Virtual Machines, we need a minimum of five discrete networks. Here’s how that shakes out in the Daisetta Lab:
Network Name, VLAN ID, Purpose, Notes
Management, 1, Host & VM management network, You can separate the two if you like
CSV, 14, Host Cluster & communication and coordination, Important for clustering Hyper-V hosts
LM, 15, Live Migration network, When you must send VMs from broke host to host with the most LM is there for you
iSCSI 1-3, 11-13, Storage, Soemwhat controversial but supported
Now you should be connecting that dots: remember in Labworks 2:1, we built a trunked port-channel on our Cisco 2960S for the sole purpose of these vEthernet adapters & our converged switch.
So, we’re going to attach tagged vethernet adapters to our host via powershell. Pay attention here to the “-managementOS” tag; though our Converged switch is for virtual machines, we’re using it for our physical host as well.
You can script his out of course (and VMM does that for you), but if you just want to copy paste, do it in this order:
Notice we didn’t include a Gateway in the New-NetIPAddress cmdlet; that’s because when we built our Virtual Switch with the “-managementOS 1” switch attached, Windows automatically provisioned a vEthernet adapter for us, which either got an IP via DHCP or took an apipa address.
So now we have our vEthernets and their appropriate VLAN tags:
2:8: Next Steps : Jumbo Frames from end-to-end & Performance Tuning
So if you’ve made it this far, congrats. If you do nothing else, you now have a converged Hyper-V virtual switch, tagged vEthernets on your host, and a virtualized infrastructure that’s ready for VMs.
But there’s more you can do; stay tuned for the next labworks post where we’ll get into jumbo frames & performance tuning this baby so she can run with all the bandwidth we’ve given her.
Links/Knowledge/Required Reading Used in this Post:
Aidan Finn, upstanding Irishman, apparent bear-cub puncher, hobbyist photog, MVP all-star and one of my favorite Hyper-V bloggers (seriously, he’s good, and along with DidierV & the Hyper-Dutchman has probably saved my vAss more times than I can vCount) appeared on one of my favorite podcasts last week, RunAs Radio with Canuck Richard Campbell.
Which is all sorts of awesome as these are a few of my favorite things piled on top of each other (Finn on RunAs).
The subject? Hyper-V, scale out file servers (SoFS) in 2012 R2, SMB 3.0 multichannel and Microsoft storage networking, which are just about my favoritest subjects in the whole wide world. I mean what are the odds that one of my favorite Hyper-V bloggers would appear on one of my favorite tech podcasts? Remote. And talk about storage networking tech, Redmond-style, during that podcast?
All that and an adorable Irish brogue?
This is Instant nerdgasm territory here people; if you’re into these black arts as I am, it’s a must-listen.
Anyway, Finn reminded me of his famous powershell demos in which he demonstrates all the options we Hyper-V admins have at our disposal now when it comes to Live Migrating VMs from host to host.
And believe me, we have so many now it’s almost embarrassing, especially if you cut your teeth on Hyper-V 2.0 in 2008 R2, where successfully Live Migrating VMs off a host (or draining one during production) involved a few right clicks, chicken sacrifice, Earth-Jupiter-Moon alignment, a reliable Geiger counter by your side and a tolerance for Pucker Factor Values greater than 10* **.
Nowadays, we can:
Live Migrate VMs between hosts in a cluster (.vhdx parked in a Cluster Shared Volume, VM config, RAM & CPU on a host….block storage, the Coke Classic option)
Live Migrate VMs parked on SMB 3.0 shares, just like you NFS jockeys do
Shared-nothing Live Migration, either storage + VM, just storage, or just VM!
A for instance: from my Dell Latitude i7 ultrabook with Windows 8.1 and client hyper-v installed (natch), I can storage Live Migrate a .vhdx off my skinny but fast 256GB SSD to a spacious SMB share at work, then drop it back on my laptop at the end of the day, all via Scheduled Task or powershell with no downtime for the VM
With Server 2012/2012 R2 you get all those options + SMB 3.0 multichannel
Not only that, but we have some cool new toys with which to make the cost of Live Migration a VM to the host with the most a little less painful:
Standard TCP/IP : I like this because I’m old school and anything that stresses the network and LACP is fun because it makes the network guy sweat
Compression: Borrow spare cycles from the host CPU, compress the VM’s RAM, and Live Migrate your way out of a tight spot
SMB via Remote Direct Memory Access : the holy of holies in Live Migration. As Finn points out, this bit of tech can scale beyond the bandwidth capabilities of the PCIe 3 bus. SMB 3.0 + RDMA makes you hate your Northbridge
Finn*** of course provided some Live Migration start:finish times resulting from the various methods above, which I then, of course, interpreted as Finn daring me personally over the radio to try and beat those times in my humble Daisetta Lab.
Now this is just for fun people; not a Labworks-style list of repeatable results, so let’s not nerd-out on how my testing methodology isn’t sound & I’m a stupidhead, ok?
Anyway, Sysinternals has a nice little tool to redline the RAM in your Windows VM. I don’t know how Finn does it, but I don’t have workloads (yet!) in the Lab that would fill 4GB of RAM with non-random data on a VM, so off to the cmd we go:
You type this (haven’t played with all the switches yet) in this navy blue screen:
And then this happen and the somewhat pink graph goes full pink:
Then we press this button to test Live Migration w/ compression, as the Daisetta Lab doesn’t have fancy RDMA NICs like certain well-connected Irish Hyper-V bloggers:
Which makes this bluecelestedenim Azure colored line get all spikey:
all of which results in a wicked-fast Live Migrations & really cool orange-colored charts in my totally non-random, non-scientific but highly enjoyable laboratory experiment
Still, in the end, I like my TCP/IP uncompressed Live Migrations because 1) sackcloth & ashes, and 2) I didn’t go to the trouble of building a multiplexed LACP team -with a virtual switch on top!- just to let the Cat5es in my attic have an easy day at the office:
But at work: yes. I love this compression stuff and echo Finn’s observations on how Hyper-V doesn’t slam your host CPUs beyond what the host & its VM fleet could bear.
Anyway, did I beat Finn’s Live Migration times in this fun little test? Will the Irish MVP have to admit he’s not so esteemed after all and surrender his Hyper_V_MVP_badge.gif to me?
Of course I did and yes he will.
But not really.
[table caption=”Daisetta Lab LM vs Finn’s Powershell LM Scripts – 4GB VM” width=”500″ colwidth=”20|100|50″ colalign=”left|left|left|left|left”]
Who,TCP/IP LM,Compressed LM,RDMA & SMB 3 LM,Notes
Finn,78 seconds, 15 seconds,6.8 seconds, “Mr. I once moved a VM with 56GB of RAM in 35 seconds probably has a few Xeons”
D-Lab,38 seconds,Like 12 or something,Who’s ass do I need to kiss to get RDMA/iWarp?, But seriously my VM RAM was probably not random
Finn notes in his posts that he’s dedicating an entire 1GbE NIC for his Live Migration Demos, wheras I’m embracing the converged switch model and haven’t even played with bandwidth or QOS settings on my Hyper-V switch.
How do my VMware colleagues & friends measure this stuff & think about vMotion performance & reliability? I know NFS can scale & perform, but am ignorant on the nuances of v3 vs v4, how it works on the host and Distributed vSwitch and your “Shared nothing” storage vMotion. And what’s this I hear that vSphere won’t begin a vMotion without knowing it will complete? How’s that determined?
I mean I could spend an hour or two googling it, or you could, I don’t know, post a comment and save me the time and spread some of your knowledge 😀
I’m jazzed about SMB 3.0, but there are only a handful of storage vendors who have support for the new stack, and among them, as Finn points out, Microsoft is #1 storage vendor for SMB 3 fans, with NetApp probably in 2nd place.
* Just kidding, it wasn’t that bad. Most days.
** Pucker Factor Value can be measured by querying obscure wmi class win32_pfv
*** Finn is a consultant. So you can hire him. I have no relationship with him other than admiration for his scripting skillz
Greetings Labworks fans, today we’re going to learn how to build converged Hyper-V switches, switches so cool they’re nearly identical to the ones available to enterprise users with their fancy System Center licenses.
If you’re coming from a VMware mindset, a Hyper-V converged switch is probably most similar to Distributed vSwitches, though admittedly I’m a total n00b on VMware, so take that statement with a grain of salt. The idea here is to build an advanced switching fabric on your Hyper-V hosts that is fault-tolerant & performance-oriented, and like a Distributed vSwitch, common among your physical hosts and your guests.
This is one of my favorite topics because I have a serious & problematic love-affair with LACP and a Terrets-like urge to team things up & jumbo, but you don’t need an LACP-capable switch or jumbo frame to enjoy Converged Switching goodness.
Let’s dive in, shall we?
Prepare the physical switch for Jumbo Frames
Understand LBFO: Microsoft’s Load Balancing/Fail Over teaming technology introduced in Server 2012
Enable LACP on the Switch and on the Server
Build the Switch on the Team & Next Steps
Required Tools ‘n Tech:
Server 2012 or 2012 R2…sorry Windows 8.1 Professional/Enterprise fans…LBFO is not available for 8.1. I know, I feel your pain. But the naked Hyper-V 3.0 Hypervisor (Core only) is free, so what are you waiting for?
A switch, preferably gigabit. LACP not required but a huge performance multiplier
NICs: As in plural. You need at least two. Yes, you can use your Keepin’ it RealTek NICs..Hyper-V doesn’t care that your NICs aren’t server-grade, but I advise against consumer-NICs for production!!
State of the Lab as of today. Ag_node_1 is new, with a core i7 Haswell (Yay!), ag_node_2 is the same, still running CSVs off my ZFS box, and check it out, bottom right: a new host, SMB1:
2:1 Prepare the Physical Switch for Jumbo Frames
You can skip this section if all you have at your disposal is a dumb switch.
Commands below are off of a Cisco 2960s. Commands are similar on the new SG300 & 500 series Cisco switches. PowerConnect 5548 switches from Dell aren’t terribly different either, though I seem to recall you have you enable jumbo mtu on each port as well as the switch.
First we’re going to want to turn on Jumbo Frames, system-wide, which usually requires a reload of your switch, so schedule for a maintenance window!
daisettalabs.net(config)#system mtu jumbo 9198
You can run a show system mtu after the reload to be sure the switch is ready for the corpulent frames you will soon send its way:
daisettalabs.net#show system mtu
System MTU size is 1514 bytes
System Jumbo MTU size is 9198 bytes
System Alternate MTU size is 1514 bytes
Routing MTU size is 1514 bytes
2:2 Load Balancing & Failover
Load Balancing & Failover, or LBFO as it’s known, was the #1 feature I was looking forward to in Server 2012.
And boy did Microsoft deliver.
LBFO is a driver/framework that takes whatever NICs you have, “teams” them, applies a mature & resilient multiplexor driver to them, and gives you redundancy & performance in just a few clicks or powershell cmdlets. Let’s do GUI for the team, and later on, we’ll use Powershell to build a switch on that team.
Sidenote: Don’t bother applying IP addresses, VLANs to your LBFO-destined physical NICs at this point. Do bother installing your manufacturer’s latest driver, or hacking one on as I’ve had to do with my new ag_node_1 Intel NIC. (SideSideNote: as this blogger states, Intel can eat a bag of d**** for dropping so many NICs from Server 2012 support. Broadcom, for all the hassles I’ve had with them, still updates drivers on four year old cards!)
On SMB1 from the above schematic, I’ve got five gigabit NICs. One is a RealTek on the motherboard, and the other four are Intel; 1-4 on a PCIe Quad Gigabit network card, i350 x4 I believe.
The RealTek NIC has a static IP and is my management interface for the purposes of this labworks. We’ll only be teaming the four Intel NICs here. Be sure to leave at least one of your NICs out of the LBFO team unless you are sitting in front of your server console; you can always add it in later.
Launch Server Manager in the GUI and click on “All Servers,” then right click on SMB1 and select Configure NIC Teaming:
A new window will emerge,titled, NIC Teaming.
In the NIC Teaming window, notice on the right the five GbE adapters you have and their status (Green Arrow). Click on “Tasks” and select “New Team” (Red Arrow):
The New Team window is where all the magic happens. Let’s pause for a moment and go to our switch.
On my old 2960s, we’re building LACP-flavored port channels by using the “channel group _ mode active” command, which tells the switch to use the genuine-article LACP/802.11ax protocol rather than the older Cisco proprietary Port Aggregation Protocol (PAgp) system, which is activated by running “channel group _ mode auto.”
However, if you have a newer switch, perhaps a nice little SG 300 or something similar, PAgp is dead and not available to you, but the process for LACP is like the old PAgp command: “channel group _ mode auto” will turn on LACP.
Here’s the 2960s process. Note that my Intel NICs are plugged into Gig 1/0/20-23, with spanning-tree portfast enabled (which we’ll change once our Converged virtual switch is built):
daisettalabs.net#show run int gig 1/0/20
Current configuration : 63 bytes
Enter configuration commands, one per line. End with CNTL/Z.
daisettalabs.net(config)#int range gig 1/0/20-23
daisettalabs.net(config-if-range)#description SMB1 TEAM
daisettalabs.net(config-if-range)#channel-group 3 mode active
daisettalabs.net(config-if-range)#switchport mode trunk
Presto! That wasn’t so hard was it?
Note that I’ve trunked all four interfaces; that’s important in Hyper-V Converged switching. We’ll need to trunk po3 as well.
Let’s take a look at our new port channel:
daisettalabs.net(config-if-range)#do show run int po3
Current configuration : 54 bytes
switchport mode trunk
Now let’s check the state of the port channel:
daisettalabs.net#show etherchannel summary
Flags: D - down P - bundled in port-channel
I - stand-alone s - suspended
H - Hot-standby (LACP only)
R - Layer3 S - Layer2
U - in use f - failed to allocate aggregator
M - not in use, minimum links not met u - unsuitable for bundling w - waiting to be aggregated d - default port Number of channel-groups in use: 3 Number of aggregators: 3 Group Port-channel Protocol Ports ------+-------------+-----------+-----------------------------------------------
1 Po1(SU) LACP Gi1/0/1(P) Gi1/0/2(P) Gi1/0/3(P)
2 Po2(SU) LACP Gi1/0/11(D) Gi1/0/13(P) Gi1/0/14(P) Gi1/0/15(P) Gi1/0/16(P)
3 Po3(SD) LACP Gi1/0/19(s) Gi1/0/20(D) Gi1/0/21(s) Gi1/0/22(s) Gi1/0/23(D)
po3 is in total disarray, but not for long. Back on SMB1, it’s time to team those NICs:
I’m a fan of naming-conventions even if this screenshot doesn’t show it; All teams on all hosts have the same “Daisetta-Team” name, and I usually rename NICs as well, but honestly, you could go mad trying to understand why Windows names NICs the way it does (Seriously. It’s a Thing). There’s no /dev/eth0 for us in MIcroosft-land, it’s always something obscure and strange and out-of-sequence, which is part of the reason why Converged Switching & LBFO kick ass; who cares what your interfaces are named so long as they are identically configured?
If you don’t have an LACP-capable switch, you’ll select “Switch Independent” here.
As for Load Balancing modes: in server 2012, you get Address Hash (Source/Dest MAC or IP in Layer 3 LACP), or Hyper-V Port, which is sort of a round-robin approach (VM1 goes to one port in the team, VM2 to the other).
I prefer the new (with 2012 R2) Dynamic mode which negotiates with the physical switch. More color on those choices & what they mean for you in the References section at the bottom.
Press ok, sit back, and watch my gifcam shot:
Mmmm, taste the convergence.
2:4 Build a Switch on top of that team & Next Steps
If you’ve ever built a switch for Hyper-V, you’ll find building the converged switch immediately familiar, save for one technicality: you’re going to build a switch on top of that multiplexor driver you just created!
Sounds scary? Perhaps. I’ll go into some of the intricacies and gotchas and show some cool powershell bits ‘n bobs on the next episode of Labworks.
Eventually we’re going to dangle all sorts of things off this virtual switch-atop-a-multiplexor-driver!
Links/Knowledge/Required Reading Used in this Post:
A few Microsoft bloggers (some prominent, some less so, none that I know of are employed by MS) are doing a bit of crowing today…OpenSSL, VMware, AWS….all #Heartbleed vulnerable while Azure & Windows & Hyper-V are secure! <Nelson>Ha Ha!</Nelson>
I’m new to IT blogging, but one thing I’ve noticed is that it’s dominated by consultants who are selling something other than just software: their skills & knowledge. That goes for Hyper-V bloggers or VMware bloggers, SQL bloggers or Oracle bloggers. And that’s just fine: we all have to find a way to put food on the table, and let’s face facts: blogging IT doesn’t exactly bring in the pageviews, does it? However, making sport out of the other products’ flaws can bring in the hits, and it’s fun.
Me? I’m what you call a “customer” who has always supported Microsoft products, had a love/hate/love relationship with them, a curiosity about the other camps, and a desire to just make it all work together, on time & on budget in service to my employer and my users.
So I blog from that perspective.
And so while it’s tempting to join some of my Win32 colleagues (after all the BSOD & dll.hell jokes are getting old 20 years on) as they take joy in other engineers’ suffering, I say no!
I remind the reader of that great engineer of words, John Donne, who wrote:
No man is an island,
Entire of itself,
Every man is a piece of the continent,
A part of the main.
If a clod be washed away by the sea,
Europe is the less.
As well as if a promontory were.
As well as if a manor of thy friend’s
Or of thine own were:
Any man’s death diminishes me,
Because I am involved in mankind,
And therefore never send to know for whom the bell tolls;
It tolls for thee.
This poem gets me every time; Donne knows his stuff.
No :443 is an island entire of itself, especially in the internet age. And every network is a part of the great /0.
If one datacenter falls, our infrastructure is the less.
Any engineer’s pain diminishes me, because I have been in his shoes*, RDPd or SSHd into the device at 3am, worried about my data and my job, just as he or she is right now.
So to my friends & colleagues in the open source world trying to stem the bloodloss, I ask; do you need a hand?
Working from home today and be happy to help and I know my way around putty.
*Chinese hackers, the NSA, and other malefactors are of course exempted here
I’ve been going on, insufferably at times, about my new Nimble storage array at work. Back in January, it passed my home-grown bakeoff with flying colors, in February I wrote about how it was inbound to my datacenter, in March I fretted over iSCSI traffic, .vhdx parades, and my 6509-E.
Well it’s been just about a month since it was racked up and jacked into my Hyper-V fabric and I thought maybe the storage nerds among my readers would like an update on how its performing.
Fast: It’s been strange getting compliments, kudos and thank yous rather than complaints and ALL CAPS emails punctuated by Exclamation Marks. I have a couple of very critical SQL databases, the performance of which can make or break my job, and after some deliberation, we took the risk and moved the biggest of them to the Nimble about three weeks ago.
Here’s a slightly edited email from one power user 72 hours later:
Did I say THANK YOU for the extra zip yet?
STILL LOVING IT!!
I’m taken aback by all the affection coming my way…no longer under user-siege, I feel like maybe I should dress better at work, shave every day, turn on some lights in the office perhaps. Even the dev team was shocked, with one of them invoking Spaceballs and saying his storage-dependent process was moving at “Ludicrous speed.”
It’s Easy: I can’t underscore this enough. If you’re a mid-sized enterprise with vanilla/commodity workloads and you can tolerate an array that’s just iSCSI (you can still use NFS or SMB 3, just from inside clustered VMs!), Nimble’s a good fit, especially if your staff is more generalist in nature. or you don’t have time to engineer a new SAN from scratch.
This was a Do It Yourself storage project for me; I didn’t have the luxury or time to hire storage engineers or VARs to come in and engineer it for me. Nimble will try to sell you on professional services, but you can decline and hook it up yourself, as I did. There are best practice guides a-plenty, and if you understand your stack & workload, your switching & compute, you’ll do fine.
Buying it was easy: Nimble’s lineup is simple and from a customer standpoint, it was a radically different experience to buy a Nimble than a traditional SAN.
Purchasing a big SAN is like trying to decide what to eat at French restaurant in Chinatown…you recognize the letters and & the pictures on the menu look familiar, but you don’t know what that SKU is exactly or how you’ll feel in the morning after buying & eating it. And while the restaurant has provided a helpful & knowledgeable garçon to explain French cuisine & etiquette to you, you know the garçon & his assistants moonlight at the Italian, German and Sushi place down the road, where they are equally knowledgeable & enthusiastic about those cuisines. But they can’t talk about the Italian place because they have something called agency with the french restaurant; so with you, they are only French cuisine experts, and their professional opinion is that Italian, German and Sushi are horrible food choices. Also, your spend with the restaurant is too small to get the chef’s attention..you have to go through this obnoxious garçon system.
Buying from Nimble, meanwhile, is like picking a burger at In ‘n Out. You have three options, all of them containing meat, and from left to right, the choices are simply Good, Better, Best. You can stack shelves onto controller-shelves, just like a Double-Double, and you know what you’ll get in the end. Oh sure, there’s probably an Animal Style option somewhere, but you don’t need Animal Style to enjoy In ‘n Out, do you?
Lesson is this: Maybe your organization needs a real full-featured SAN & VAR-expertise. But maybe you just need fast, reliable iSCSI that you can hook up yourself.
It’s nice that we in customer-land have that option now.
ASUP & Community: The Autosupport from Nimble has left nothing to be desired, in fact, I think they nag too much. But I’ll take that over a downed array.
I’ve grown to enjoy Connect.Nimble.com, the company’s forum where guys like me can compare notes. Shout out to one awesome Nimble SE named Adam Herbert who built a perfect signed MPIO Powershell script that maps your initiators to your targets in no time at all.
And then you get to sit back and watch as MPIO does its thang across all your iSCSI HBAs, producing symmetrical & balanced utilization charts which, in turn, release pleasing little bursts of storage-dopamine in your brain.
It works fine with Hyper-V, CSVs, and Converged Fabric vEthernets: What a mouthful, but it’s true. Zero issues fitting this array into System Center Virtual Machine Manager storage (though it doesn’t have SMI-S support a “standard” which few seem to have adopted), failing CSVs from one Hyper-V node to another, and resizing CSVs or RDMs live.
And for the convergence fans: I pretty much lost my fear of using vEthernet adapters for iSCSI traffic during the bakeoff and in the Daisetta Lab at home, but in case you needed further convincing that Hyper-V’s converged fabric architecture kicks ass, here it is: Each Hyper-V node in my datacenter has 12 gigabit NICs. Eight of them per host are teamed (that is to say they get the Microsoft Multiplexor driver treatment, LACP-flavor) and then a Converged Virtual switch is built atop the multiplexor driver. From that converged v-switch, I’m dangling six virtual Ethernet adapters per host, two of which, are tagged for the Nimble VLAN I built in the 6509.
That’s a really long and complicated way of saying that in a modest-sized production environment, I’m using LACP teaming on the hosts, up to 4x1GbE vNics on the VIP guests, and MPIO to the storage, which conventional storage networking wisdom says is a bit like kissing your sister and bragging about it. Maybe it’s harmless (even enjoyable?) once or twice, but sooner or later, you’ll live to regret it. And hey the Department of Redundancy Department called, they want one of their protocols back.
I’ve read a lot of thoughtful pieces from VMware engineers & colleagues about doing this, but from a Hyper-V perspective,this is supported, and from a Nimble array perspective, I’m sure they’d point the finger at this if something went wrong, but it hasn’t, and from my perspective : one converged virtual switch = easy to deploy/templatize, easy to manage & monitor. Case closed.
LACP + MPIO in Hyper-V works so well that in three weeks of recording iSCSI stats, I’ve yet to record a single TCP error/re-transmit or anything that would make me think the old model was better. And I haven’t even applied bandwidth policies on the converged switches yet; that tool is still in my box and right now iSCSI is getting the Hyper-V equivalent of best effort.
It’s getting faster: Caching is legit. All my monitors and measurements prove it out. Implement your Nimble correctly, and you may see it get faster as time goes on.
And by that I mean don’t tick the “caching” box for every volume. Conserve your resources, develop a strategy and watch it bloom and grow as your iSCSI packets find their way home faster and faster.
The DBA is noticing it too in his latency timers & long running query measurements, but this graph suffices to show caching in action over three weeks in a more exciting way than a select * from slow-ass-tables query:
Least Frequently Used, Most Recently Used, Most Frequently Used….who frequently/recently cares what caching algorithm the CASL architecture is using? A thousand whiteboard sessions conducted by the world’s greatest SE with the world’s greatest schwag gifts couldn’t sell me on this the way my own charts and my precious perfmons do.
My cached Nimble volumes are getting faster baby.
Compression wise, I’m seeing some things I didn’t expect. Some volumes are compressing up to 40x. Others are barely hitting 1.2x. The performance impact of this is hard to quantify, but from a conservation standpoint, I’m not having to grow volumes very often. It’s a wash with the old dedupe model, save for one thing: I don’t have to schedule compression. That’s the CPUs job, and for all I know, the Nehalems inside my CS260 are, or should be, redlining as they lz4 my furious iSCSI traffic.
Busy Box & CLI: The Nimble command line in version 1.4x felt familiar to me the first time I used it. I recognized the command structure, the help files and more, and thought it looked like Busy Box.
What’s Busy Box? How to put this without making enemies of The Guys Who Say Vi…Busy Box is a collection of packages, tools, servers and scripts for the unix world developed about 25 years ago by an amazing Unix engineer. It’s very popular, it’s everywhere, and it’s reliable and I have no complaints about it other than the fact that it’s disconcerting that my Nimble has the same package of tools I once installed on my Android handset.
But that’s just the Windows guy talking, a Windows guy who was really fond of his WAFL and misses it but will adapt and holds out hope that OneGet & PowerShell, one day, will emerge victorious over all.
The SSL cert situation is embarrassing and I’m glad my former boss hasn’t seen it. Namely that situation is this: you can’t replace the stock SSL cert, which, frankly looks like something I would do while tooling around with OpenSSL in the lab.
I understand this is fixed in the new 2.x OS version but holy shit what a fail.
Other than that, I’m very pleased -and the organization is very pleased***- with our Nimble array.
It feels like at last, I’m enjoying the fruits of my labor, I’m riding a high-performance storage array that was cost-effective, easy to install, and is performing at/above expectations. I’m like Major Kong, my array is literally the bomb, man and his machine are in harmony and there’s some joy & euphoria up in the datacenter as my task is complete.
*Remember this lesson #StorageGlory seekers: no one knows your workload like you. The above screenshot of cache hits is of a 400GB SQL transaction log volume of a larger SQL DB that’s in use 24/6. Your mileage may vary.
*** I do not speak for the organization even though I just did.
Last night, whilst tooling around the Daietta Lab and playing with Nexenta v 4.01, I had occasion to pause for a moment, toss a few choice shots back, and reminisce & reflect on the end of Windows XP support, which is today, April 8, 2014.
Ten to 12 years ago, I was a mid-level/tier 2 helpdesk type at a typical Small to Medium Enterprise in California. The Iraq war had just started, George Bush was in office and Palm, maker of the Treo, was king of the bulky, nerds-only smartphone segment. 802.11g was hot stuff, and your 2.4GhZ spectrum, while still a junk band in the eyes of the FCC, was actually quite usable. HDMI had just been introduced, and plasmas were all the rage, and you didn’t need a bunch of dongle adapters in the boardroom to connect a projector to a laptop. EVDO data was OMG FAST, and Apple was still called Apple Computer. In place of guest Wifi, we had guest Ethernet (“Tell him to use the Red cable. The Red cable!!” I remember shouting to junior guys. The red cable went to the guest internet switch).
At work, life was simpler in IT. I don’t think we even used VLANs at that old job.
Basically, we had physical servers, a DS3 circuit, Windows XP on the clients, and Optiplex GX280s.
Lots of them.
XP, of course, was new then. Only 2-3 years old, it was Microsoft’s most successful operating system in like forever. It united at last the NT kernel & the desktop of Windows 98, but had a nice GUI front-end, soft-touch buttons, a color scheme by Crayola, and a font/typography system that still, to this day, provides fodder for the refined Mac font/typography snobs.
But you could join it to the Domain, use Group Policy against it, and mass-deploy those things like it was going out of style. This was a Big Deal in the enterprise. Ghost & RIS baby!
Hardware-wise, we didn’t worry about iPhone apps, Android vulnerabilities, or the cloud taking our jobs. No, all we had were Optiplexes. Acres of them, seemingly.
Small form factor & desktop-style Optiplex GX280s to be exact. Plastic and ugly, but you could open them up without tools. They were light enough you could carry them around without grabbing a bulky cart, and they offered plenty of surface area for the users to stick pictures of their cat or whatnot on them. Great little machines.
If I’m sounding nostalgic, I am. Getting a bit weepy here.
But then I recall the two straight years of pain. Three years maybe even.
The rise of XP came during the rise of the Internet in the post-dotcom bubble era. Want to get on something called the Internet and do some ecommerce shopping? Have I got an OS & Browser for you: XP + IE 6, or what one might call a Hacker’s Delight.
Oh how many hours were lost learning about, then trying to fix, then throwing up our hands collectively in frustration and saying, “Fuck it. Just RIS/Ghost the damn thing.” in reaction to horror shows like this in the pre-Service Pack 2 days of XP:
And then, coincidentally at the same time, the Optiplex GX280s started failing en masse. Reason? Bad or cheap motherboard capacitors. I shit you not. The capacitors around the old CPU socket just started failing en masse across Dell’s entire GX280 fleet. It was epic: years later, the Times reported some 22% of the 21 million Optiplex machines sold by Dell in 2003-2005 had failed capacitors.
The fix wasn’t difficult; just swap the motherboards. Any help desk monkey could do that. But I remember distinctly how shocked I was that we bought Dell-badged computers but got Packard Bell reliability instead. And I remember boiling resentment and rage against Michael Dell as I walked the halls of that old job, arms stuffed with replacement motherboards.
These were the first episodes of #VendorFail in my IT career. There are many stories like these in IT, but this one is mine. XP Spyware & Optiplex Capacitors were two solid years of my life in IT. I heart Microsoft, but damnnnnn those were some tough days in IT.
Of course, all that being said, today’s desktop is a lot more secure, but our back-end stuff has holes so deep & profound that even experts are shocked. Witness the new Heartbreak OpenSSL vulnerability!