Part 2: Wargaming Mass Storage migration with a 6509e & Hyper-V

As you’ll recall from part 1, much of my time at work lately has been consumed by planning, testing and executing mass Storage Live Migration of 65+ .vhdx files from our old filer (built by a company that rhymes with PetTap) & its end-of-life 7200 RPM shelves to our new hotness, a Nimble CS260.

Essentially I’ve been a sort of IT Moses, planning the Exodus of my .vhdxs out of harsh, intolerably slow conditions, through some hazards & risks (Storage Migration during production), and into the promised land of speed, 1-5ms latency (rather than 20ms+!!), and user happiness.

Now VMware guys have a ton of really awesome tools to vMotion their .vmdks around their vCenters & vSpheres and their ESXis and now their VSANS too. They can tune their NFS for ludicrous speed and their Distributed Switches, now with Extra Power LACP support, can speak CDP & even tell you what physical port the SFP+ 10GbE adapter is plugged into.

Do I sound envious? Green with it even? Well I am. 

In comparison,  I got me some System Center Virtual Machine Manager (2012 SP1), Microsoft Failover Cluster mmc snap-in, a 6509e with two awesome x6748 performance blades (February 2008’s centerpiece switch mod in the popular “Hot Switch Racks” pin-up calendar), Hyper-Vs converged fabric design, 8x1GbE LACP teams, Manage Engine’s NetFlow, boat-loads of enthusiasm and a git ‘r done attitude.

And this is what I’ve got to git done:

untitled
Filer Exodus, Nimble Promised Land

And it has to be done with zero downtime because we have a  24/6 operational tempo, and I like my Saturdays.

One of my main worries as I’ve tried to quarterback this transition has been the switch. Recall from part 1 how I’m oversubscribed to hell & back on my two 6748s:

6748fabrichyper-v

 

I fear the harsh judgment of my networking peers (You’re doing that with that?!?!) so let me just get it out there: yes, I’m essentially using my 6509 & these two blades as a bus for storage IO. In fact, iSCSI traffic accounts for about 90% of all traffic on the switch in any given 24 hour period:

You can choose any storage paradigm you like as long as it's iSCSI
You can choose any storage paradigm you like as long as it’s iSCSI

Perhaps I’m doing things with this switch & with iSCSI that no sane engineer would, but I have to say, this has proven to be pretty durable and adequate as far as performance goes. Would I like some refreshing multi-channel SMB 3 file storage, some relief from the block IO blues & Microsoft clustering headaches? Yes of course, but I’ve got to shepherd the VMs to their new home first.

And to do that, I’ve got to master what this switch is doing on an hour by hour basis as my users log in around the clock.

So I pulled some Netflow data together, cracked my knuckles, and got busy with Excel and Pivot tables.

I’m glad I went through this exercise because it changed the .vhdx parade route & timing. What I thought was the busiest part of my infrastructure’s day was wrong, by a large factor. Here’s 8 days worth of Netflow traffic on the iSCSI & Live Migration VLANs, averaged out. Few live migrations were made during this period:

Image 1
Some of the numbers aren’t scaling properly because it’s a pain in the ass to get Excel to properly display bytes, bits, and millions/billions of packets

What you see here are the three login storms (Times on the graph are MST, they start early down under) as my EU, North America & Australia/New Zealand users login to their session virtualization VMs or hit the production SQL databases or run their reports.

I always thought EU punched my stack the hardest; our offices there have as many employees as North America, but only one or two time zones rather than three in North America.

But EU & North America and Australia combined don’t hit my switch fabric as hard as I do. Yes, the monkey on my back is…me. Well, me & the DBA and his incurable devotion to SQL backups in the evening. My crutch is DPM.

I won’t go into too much detail here but this was pretty surprising. At times over the eight days, Netflow would record more than 1 billion packets traversing the switch in one evening hour; the peak “payload” was north of 1 terabyte of iSCSI bytes/hour on some days!

Now I’m not a networking guy (though I do love Wifi experimenting), but what I saw here concerned me, gave me pause. Between the switch blades, I’ve supposedly got a 40 Gigabit/s backplane, to my Supervisor 720 modules, but is that real 40Gbit/s or marketing 40Gbit/s?

The key question: Am I stressing this 6509e, or does it have more to give?

Show fabric utilization detail said I was consuming only 20% of the switch fabric during my exploratory storage migrations, and that was at peak. 4 gigabit/second per port group.

Is that all you got? the 6509e seemingly taunted me.

But oh my stars, better check the buffers:

buffers

 

ACK! One dropped buffer call or whatever you call it, 7 weeks ago, way before I had the Nimble in place. Still….that’s one drop too many for me.

Stop pushing me so hard, the 6509e pleaded with me.

So I did what any self-respecting & fearful admin would do: call TAC. Show me the way home TAC, get me out of the fix I’m in or at least sooth my worry. Tell me I have nothing to worry about, or tell me I need to buy a Supe 2T to do what I want to do, just give me some certainty!

A few show tech supports, one webex session and one telephonic call with a nice engineer from Costa Rica later, I had some certainty. The config was sound, the single buffer drop was concerning but wasn’t repeating, even when I thought I was stressing the switch.

And I didn’t need to buy a Supe 2T. 

On to the Exodus/.vhdx parade.

In all my fretting about the switch, I was forgetting one thing: the feeble filer is old and slow and can’t push that much IO to the Nimble in the first place.

As best I can figure it, I can do about five storage live migrations simultaneously. Beyond that, I fear that luns will drop on the filer.

To the Nimble, it’s no sweat:

Floor it!!
Floor it!!

Netflow’s view from the same period:

Image 12
Raw naked iSCSI-flavored packet aggression….I like it

Love it when a plan comes together! I should have this complete within a few days, then I can safely upgrade my filer.

Branch office surveillance in a box :Ubiquiti Aircam, Ubunutu Linux & Hyper-V

I pause today from migrating .vhdxs to this:

0217141307

 

and stressing this:

WS-C6509-E

to deploying six of these to a new small branch office:

aircam_3_pack_1

In my industry, we’re constantly standing up new branch offices, tearing down old ones, and so sometimes I have to take off the virtualization/storage guy hat and put on the project management, facilities & security hat, something I admit to enjoying.

And one of my focuses in this area is on rapid deployment of branch offices. I want to be able to deploy a branch office from an IT, security & infrastructure perspective as quickly as overnight and at or below budgeted cost. Tearing down branch offices is more leisurely, but building new ones? I aim for excellence; I want to be the Amazon Prime of branch office rollouts.

Lack of 802.3af PoE standard makes standards guy cry, but for the price, I'll tolerate and use a dongle
Lack of 802.3af PoE standard makes standards guy cry, but for the price, I’ll tolerate and use a dongle

So I’ve tried to templatize the branch office stack as much as possible. Ideally, I’d have a hardened, secure rolling 19″ 12 or 16u rack, complete with a 8 or 16 port switch (SG300 maybe?), patch panel, a Dell R210 II server, 16GB RAM, and 1 terabyte in RAID 1 as a Hyper-V host, a short-depth but sufficient capacity UPS, and a router of some type: it should have 4G LTE & 1Gbase-T as a WAN-connectivity option, VPN ability (to connect to our MPLS) and, ipv6 dreams aside for now, NAT ability, and, of course, the one thing that will never become virtualized or software-defined, a workgroup printer.

Give me that in a rolling rack, and I can drop-ship it anywhere in CONUS overnight. Boom, Instant Branch Office in a Box (structured cabling comes later).

But one of the things that’s gotten in the way of this dream (besides getting the $ spend ahead of time, which is also a big issue) has been provisioning camera security. We need to watch our valuables, but how?

Weather resistant I suppose though I've read the little door that covers this hatch can let moisture in
Weather resistant I suppose though I’ve read the little door that covers this hatch can let moisture in

Usually that means contracting with some slow-moving local security company, going through a lengthy scoping process, choosing between cheap CCTV & DVR vs ip cameras & DVR, then going through a separate structured cabling process, and finally, validating. Major pain, and can get pricey very quickly: the last office I built required six 720p non-IR cameras + IP DVR + Mobile access to camera feeds. Price:$10k, 1.5x the cost of all the equipment I purchased for the 12u rolling rack!!

Meanwhile, you’ve got the business’ stakeholders wondering why it’s all so complicated. At home, they can connect a $100 720p IP camera up to their wifi, and stream video of their son/dog/whatever to their iPhone while they’re out and about, all without hiring anyone. I know it’s not as hardened or reliable as a real security camera system, but in IT, if you’re explaining, you’re losing.

And they do have a point.

This is a space begging for some good old fashioned disruption, especially if you don’t want the montly OpEx for the security company & your goal is only to get adequate surveillance (Two big Ifs, I recognize).

Enter Ubiquti Networks, an unusual but interesting wireless company that targets enterprise, carrier and pro-sumers with some neat solutions (60GhZ point-to-point wifi for the win!). After selling the boss on the vision & showing him the security company quote, I was able to get approval for six Ubiquiti Networks Airvision cameras, a Dome camera all for about $850 off Amazon, via the magical procurement powers of the corporate credit card.

The potential for my pet Branch Office in a Box project is huge and the cost was low. Here’s the vision:

  • Cat5e structured cabling contractor can now hang my cameras and run Cat 5e to them, especially since I’m familiar with aperture & focal length characteristics of the cameras and can estimate location without being on site.
  • DVR unit is an Ubuntu virtual machine in Hyper-V 3, recording to local storage which is backed up off-site via normal processes (it’s just a *.vhdx afterall) . That alone is huge; it’s been very painful to off-site footage from proprietary DVR systems
  • Reserve IPs for cameras prior to deployment via MAC address and normal process
  • Simple affair to secure via HTTPS/ssh the Linux appliance, NAT it out to the internet, then send a URL for the Apple Store & Play Store Ubiquiti camera compatible software, of which there seem to be several

Fantastic. I mean all that’s missing from making BiB into something stupid-proof and ready today is fire & alarm systems (yes, I’ve looked at NEST but regulations made me run for traditional vendors).

Demerits on this package so far:

  • Feels a bit cheap but not complaining too much. However it won't survive an attack
    Feels a bit cheap but not complaining too much. However it won’t survive an attack

    The cameras feel a little cheap. They offer minimal weather-resistance but the plastic casing feels like it was recycled from a 1995 CRT monitor: this thing’s going to turn yellow & brittle

  • No vandal-resistance. Maybe I missed the sku for that add-on. May need to improvise here; these won’t survive a single lucky struck from a hoodlum and his Louisville Slugger
  • Passive POE: So much for standards right? These cameras, sadly, require passive PoE dongle-injectors. And no, 802.3af active PoE, the kind in your switch, won’t work. You need a dongle-injector.

Other than, color me impressed.

Out of the box, the cameras are set for DHCP, but if you reserve the MAC on your DHCP server, you can neatly provision them in your chosen range without going through the usual pain.

Building the Ubuntu virtual machine -our DIY IP cam DVR system- on the Hyper-V host couldn’t be simpler. I followed Willie Howe’s steps here and recorded a few Gifcam shots to show you how easy it was.

As far as the management interface and DVR system: well I’ll say it feels much more integrated, thoughtful and enterprise-level than any of the small IP DVR systems I’ve struggled with at branch offices to date.

airvision

The big question is on performance, reliability, and sensitivity to recording when there’s movement in the zones I need to be monitored. And whether the stakeholder at the remote office will like the app.

But so far, I have to say: I’m impressed. I just did in 90 minutes what would have taken a local security company contractor about 2 weeks to do at a cost about 90% less than they wanted from me.

That’s good value even if these cheap $99 cameras don’t last for more than a year or two.

Airvision https interface allows you to post a floorplan, schedule and manage cameras, and  set recordings settings.
Airvision https interface allows you to post a floorplan, schedule and manage cameras, and set recordings settings.

 

Wargamming a mass Storage Live Migration with a 6509e, part 1

Storage Live Migration is something we Hyper-V guys only got in Server 2012 and it was one of the features I wanted the most after watching jealously as VMware guys storage vMotion .vhdks since George Bush was in office (or was it Clinton?).

I use Live Migration all the time during maintenance cycles and such, but pushing .vhdx hard drives around is more of a rare event for me.

Until now. See, I’ve got a new, moderately-performing array, a Nimble CS260 + an EL-125 add-on SAS shelf. It’s the same Nimble I abused in my popular January bakeoff blog post, and I’m thrilled to finally have some decent hybrid storage up in my datacenter.

WS-C6509-E
Big Iron Switching baby

However before I can push the button or press enter at the end of a cmdlet and begin the .vhdx parade from the land of slow to the promised land of speed, I’ve gotta worry about my switch.

You see, I’ve got another dinosaur in the rack just below the Nimble: a Cisco 6509e with three awful desktop-class blades, two Sup-720 mods in layer 3 and with HSRP, and then, the crown jewels of the fleet: two WS-X6748-GE-TX, where all my Hyper-V hosts & two SAN arrays are plugged in, each of them with two port-groups each with 20Gb/s fabric capacity.

Ahhh, the 6509: love it or hate it, you can’t help but respect it. I love it because it’s everything the fancy-pants switches of today are not: huge, heavy, with shitty cable management, extremely expensive to maintain (TAC…why do you hurt me even as I love you?), and hungry for your datacenter’s amperage.

I mean look at this thing, it gobbles amps like my filer gobbles spindles for RAID-DP:

show power cisco

325 watts per 6748, or, just about 12 watts less than my entire home lab consumes with four PCs, two switches, and a pfSense box. The X6748s are like a big block V8 belching out smoke & dripping oil in an age of Teslas & Priuses…just to get these blades into the chassis forced me to buy a 220v circuit & to achieve PSU redundancy required heavy & loud 3,000 watt supplies.

The efficiency nerd in me despises this switch for its cost & its Rush Limbaugh attitude toward the environment, yet I love it because even though it’s seven or eight years old, it’s only just now (perhaps) hitting the downward slope on my cost/performance bell curve. Even with those spendy power supplies and with increasing TAC costs, it still gives me enough performance thanks to this design & Hyper-V’s converged switching model:

6748fabrichyper-v
Errr sorry for the colors. The Visio looks much better but I had to create a diagram that even networking people could understand

Now MIke Laverick, all-star VMware blogger & employee, has had a great series of posts lately on switching and virtualization. I suggest you download them to your brain stat if you’re a VMware shop; especiallythe ones enabling netflow on your vSwitch & installing a vApp Scrutinizer, the new distributed switch features offered in ESXi 5.5 and migrating from standard to distributed switches. Great stuff there.

But if you’re at all interested in Hyper-V and/or haven’t gone to 10/40Gig yet and want to wring some more out of your old 5e patch cables, Hyper-V’s converged switching model is a damned fine option. Essentially a Hyper-V converged switch is a L2/L3 virtual switch fabricated on top of a Microsoft multiplexor driver that does GigE NIC teaming on the parent partition of your Hyper-V host.

This is something of a cross between a physical and logical diagram and it’s a bit silly and cartoonish, but a fair representation of the setup:

converged fabric
The red highlight is where the magic happens

So this is the setup I’ve adopted in all my Hyper-V instances….it’s the setup that changed the way we do things in Hyper-V 3.0, it’s the setup that allows you to add/subtract physical NIC adapters or shutdown Cisco interfaces on the fly, without any effect on the vNICs on the host or the vNICs on the guest. It is one of the chief drivers for my continuing love affair with LACP, but you don’t need an LACP-capable switch to use this model; that’s what’s great about the multiplexor driver.

It’s awesome and durable and scalable and, oh yeah, you can make it run like a Supercharged V-6. This setup is tunable!

Distributed Switching & Enhanced LACP got nothing on converged Hyper-V switching, and that is all the smack I shall talk.

Now sharp readers will notice two things: #1 I’ve oversubscribed the 6748 blades (the white spaces on the switch diagram are standard virtual switches, iSCSI HBAs for host/guests and these switches function just like the unsexy Standard switch in ESXi) and #2 just because you team it doesn’t mean you can magically make 8 1GbE prots into a single 8 Gb interface.

Correct on both counts, which is why I have to at least give the beastly old 6509 some consideration. It’s only got 20Gb/s of fabric bandwidth per 24 port port-group. Best to test before I move my .vhdxs.

In part 2, I’ll show you in detail some of those tests. In the meantime, here’s some of my Netflows & results from some tests I”m running ahead of moves this weekend.

Hitting nearly 2gb/s on each of the Nimble iSCSI vEthernets
Hitting nearly 2gb/s on each of the Nimble iSCSI vEthernets

 

 

What? These 6748 have been holding out on me and still have 80% left of its fabric to give me. So give it. I will not settle for 20% I want at least 50% utilization to make the moves fast and smooth. How to get there?
What? These 6748 have been holding out on me and still have 80% left of its fabric to give me. So give it. I will not settle for 20% I want at least 50% utilization to make the moves fast and smooth. How to get there?

 

My Netflow's not as sexy as Scrutinizer, but the spikey thing on the right shows one of my move/stress tests went way past the 95th percentile. More fun this weekend!
My Netflow’s not as sexy as Scrutinizer, but the spikey thing on the right shows one of my move/stress tests went way past the 95th percentile. More fun this weekend!