A few brief updates & random thoughts from the last few days on all the stuff I’ve been working on.
Refreshing the Core at work: Summer’s ending, but at work, a new season is advancing, one rack unit at a time. I am gradually racking up & configuring new compute, storage, and network as it arrives; It Is Not About the Hardware™, but since you were wondering: 64 Ivy Bridge cores and about 512GB RAM, 30TB of storage, and Nexus 3k switching.
Ahh, the Nexus line. Never had the privilege to work on such fine switching infrastructure. Long time admirer, first-time NX-OS user. I have a pair of them plus a Layer 3 license so the long-term thinking involves not just connecting my compute to my storage, but connecting this dense stack northbound & out via OSPF or static routes over a fault-tolerant HSRP or VRRP config.
To do that, I need to get familiar with some Nexus-flavored acronyms that aren’t familiar to me: virtual port channels (VPC), Control Plane policy (COPP), VRF, and oh-so-many-more. I’ll also be attempting to answer the question once and for all: what spanning tree mode does one use to connect a Nexus switch to a virtualization host running Hyper-V’s converged switching architecture? I’ve used portfast in the lab on my Catalyst, but the lab switch is five years old, whereas this Nexus is brand new. And portfast never struck me as the right answer, just the easy one.
To answer those questions and more, I have TAC and this excellent tome provided gratis by the awesome VAR who sold us much of the equipment.
Into the vCPU Blender goes Lync: Last Friday, I got a call from my former boss & friend who now heads up a fast-growing IT department on the coast. He’s been busy refreshing & rationalizing much of his infrastructure as well, but as is typical for him, he wants more. He wants total IT transformation, so as he’s built out his infrastructure, he laid the groundwork to go 100% Microsoft Lync 2013 for voice.
Yeah baby. Lync 2013 as your PBX, delivering dial tone to your endpoints, whether they are Bluetooth-connected PC headsets, desk phones, or apps on a mobile.
Forget software-defined networking. This is software-defined voice & video, with no special server hardware, cloud services, or any other the other typical expensive nonsense you’d see in a VoIP implementation.
If Lync 2013 as PBX is not on your IT Bucket List, it should be. It was something my former boss & I never managed to accomplish at our previous employer on Hyper-V.
Now he was doing it alone. On a fast VMware/Nexus/NetApp stack with distributed vSwitches. And he wanted to run something by me.
So you can imagine how pleased I was to have a chat with him about it.
He was facing one problem which threatened his Go Live date: Mean Opinion Score, or MOS, a simple 0-5 score Lync provides to its administrators that summarizes call quality. MOS is a subset of a hugely detailed Media Quality Summary Report, detailed here at TechNet.
My friend was scoring a .6 on his MOS. He wanted it to be at 4 or above prior to go-live.
So at first we suspected QoS tags were being stripped somewhere between his endpoint device and the Lync Mediation VM. Sure enough, Wireshark proved that out; a Distributed vSwitch (or was it a Nexus?) wasn’t respecting the tag, resulting in a sort of half-duplex QoS if you will.
He fixed that, ran the test again, and still: .6. Yikes! Two days to go live. He called again.
That’s when I remembered the last time we tried to tackle this together. You see, the Lync Mediation Server is sort of the real PBX component in Lync Enterprise Voice architecture. It handles signalling to your endpoints, interfaces with the PSTN or a SIP trunk, and is the one server workload that, even in 2014, I’d hesitate making virtual.
My boss had three of them. All VMs on three different VMware hosts across two sites.
I dug up a Microsoft whitepaper on virtualizing Lync, something we didn’t have the last time we tried this. While Redmond says Lync Enterprise Voice on top of VMs can work, it’s damned expensive from a virtualization host perspective. MS advises:
- You should disable hyperthreading on all hosts.
- Do not use processor oversubscription; maintain a 1:1 ratio of virtual CPU to physical CPU.
- Make sure your host servers support nested page tables (NPT) and extended page tables (EPT).
- Disable non-uniform memory access (NUMA) spanning on the hypervisor, as this can reduce guest performance.
Talk about Harshing your vBuzz. Essentially, building Lync out virtually with Enterprise Voice forces you to go sparse on your hosts, which is akin to buying physical servers for Lync. If you don’t, into the vCPU blender goes Lync, and out comes poor voice quality, angry users, bitterness, regret and self-punishment.
Anyway, he did as advised, put some additional vCPU & memory reservations in place on his hosts, and yesterday, whilst I was toiling in the Hot Lane, he called me from Lync via his mobile.
He’s a married man just like me, but I must say his voice sounded damn sexy as it was sliced up into packets, sent over the wire, and converted back to analog on my mobile’s speaker. A virtual chest bump over the phone was next, then we said goodbye.
Another Go Live Victory (by proxy). Sweet.
Azure Outage: Yesterday’s bruising hours-long global Azure outage affected Virtual Machines, storage blobs, web services, database services and HD Insight, Microsoft’s service for big data crunching. As it unfolded, I navel-gazed, when I felt like helping. There was literally nothing I could do. Had I some crucial IaaS or PaaS in the Azure stack, I’d be shit out of luck, just like the rest. I felt quite helpless; refreshing Mary Jo’s page and the Azure dashboard didn’t help. I wondered what the problem was; it’s been a difficult week for Microsofties whether on-prem or in Azure. Had to be related to the update cycle, I thought.
On the plus side, Azure Active Directory services never went down, nor did several other services. Office 365 stayed up as well, though it is built atop separate-but-related infrastructure in my understanding.
Lastly, I pondered two thoughts: if you’re thinking of reducing your OpEx by replacing your DR strategy with an Azure Site Recovery strategy, does this change your mind? And if you’re building out Azure as your primary IaaS or PaaS, do you just accept such outages or do you plan a failback strategy?
Labworks : Towards a 100% Windows-defined Daisetta Lab: What’s next for the Daisetta Lab? Well, I have me an AMD Duron CPU, a suitable motherboard, a 1U enclosure with PSU, and three Keepin’ it RealTek NICs. Oh, I also have a case of the envies, envies for the VMware crowd and their VXLAN and NSX and of course VMworld next week. So I’m thinking of building a Network Virtualization Gateway appliance. For those keeping score at home, that would mean from Storage to Compute to Network Edge, I’d have a 100% Windows lab environment, infused with NVGRE which has more use cases than just multi-tenancy as I had thought.