A few Microsoft bloggers (some prominent, some less so, none that I know of are employed by MS) are doing a bit of crowing today…OpenSSL, VMware, AWS….all #Heartbleed vulnerable while Azure & Windows & Hyper-V are secure! <Nelson>Ha Ha!</Nelson>
I’m new to IT blogging, but one thing I’ve noticed is that it’s dominated by consultants who are selling something other than just software: their skills & knowledge. That goes for Hyper-V bloggers or VMware bloggers, SQL bloggers or Oracle bloggers. And that’s just fine: we all have to find a way to put food on the table, and let’s face facts: blogging IT doesn’t exactly bring in the pageviews, does it? However, making sport out of the other products’ flaws can bring in the hits, and it’s fun.
Me? I’m what you call a “customer” who has always supported Microsoft products, had a love/hate/love relationship with them, a curiosity about the other camps, and a desire to just make it all work together, on time & on budget in service to my employer and my users.
So I blog from that perspective.
And so while it’s tempting to join some of my Win32 colleagues (after all the BSOD & dll.hell jokes are getting old 20 years on) as they take joy in other engineers’ suffering, I say no!
I remind the reader of that great engineer of words, John Donne, who wrote:
No man is an island,
Entire of itself,
Every man is a piece of the continent,
A part of the main.
If a clod be washed away by the sea,
Europe is the less.
As well as if a promontory were.
As well as if a manor of thy friend’s
Or of thine own were:
Any man’s death diminishes me,
Because I am involved in mankind,
And therefore never send to know for whom the bell tolls;
It tolls for thee.
This poem gets me every time; Donne knows his stuff.
No :443 is an island entire of itself, especially in the internet age. And every network is a part of the great /0.
If one datacenter falls, our infrastructure is the less.
Any engineer’s pain diminishes me, because I have been in his shoes*, RDPd or SSHd into the device at 3am, worried about my data and my job, just as he or she is right now.
So to my friends & colleagues in the open source world trying to stem the bloodloss, I ask; do you need a hand?
Working from home today and be happy to help and I know my way around putty.
*Chinese hackers, the NSA, and other malefactors are of course exempted here
I’ve been going on, insufferably at times, about my new Nimble storage array at work. Back in January, it passed my home-grown bakeoff with flying colors, in February I wrote about how it was inbound to my datacenter, in March I fretted over iSCSI traffic, .vhdx parades, and my 6509-E.
Well it’s been just about a month since it was racked up and jacked into my Hyper-V fabric and I thought maybe the storage nerds among my readers would like an update on how its performing.
Fast: It’s been strange getting compliments, kudos and thank yous rather than complaints and ALL CAPS emails punctuated by Exclamation Marks. I have a couple of very critical SQL databases, the performance of which can make or break my job, and after some deliberation, we took the risk and moved the biggest of them to the Nimble about three weeks ago.
Here’s a slightly edited email from one power user 72 hours later:
Did I say THANK YOU for the extra zip yet?
STILL LOVING IT!!
I’m taken aback by all the affection coming my way…no longer under user-siege, I feel like maybe I should dress better at work, shave every day, turn on some lights in the office perhaps. Even the dev team was shocked, with one of them invoking Spaceballs and saying his storage-dependent process was moving at “Ludicrous speed.”
It’s Easy: I can’t underscore this enough. If you’re a mid-sized enterprise with vanilla/commodity workloads and you can tolerate an array that’s just iSCSI (you can still use NFS or SMB 3, just from inside clustered VMs!), Nimble’s a good fit, especially if your staff is more generalist in nature. or you don’t have time to engineer a new SAN from scratch.
This was a Do It Yourself storage project for me; I didn’t have the luxury or time to hire storage engineers or VARs to come in and engineer it for me. Nimble will try to sell you on professional services, but you can decline and hook it up yourself, as I did. There are best practice guides a-plenty, and if you understand your stack & workload, your switching & compute, you’ll do fine.
Buying it was easy: Nimble’s lineup is simple and from a customer standpoint, it was a radically different experience to buy a Nimble than a traditional SAN.
Purchasing a big SAN is like trying to decide what to eat at French restaurant in Chinatown…you recognize the letters and & the pictures on the menu look familiar, but you don’t know what that SKU is exactly or how you’ll feel in the morning after buying & eating it. And while the restaurant has provided a helpful & knowledgeable garçon to explain French cuisine & etiquette to you, you know the garçon & his assistants moonlight at the Italian, German and Sushi place down the road, where they are equally knowledgeable & enthusiastic about those cuisines. But they can’t talk about the Italian place because they have something called agency with the french restaurant; so with you, they are only French cuisine experts, and their professional opinion is that Italian, German and Sushi are horrible food choices. Also, your spend with the restaurant is too small to get the chef’s attention..you have to go through this obnoxious garçon system.
Buying from Nimble, meanwhile, is like picking a burger at In ‘n Out. You have three options, all of them containing meat, and from left to right, the choices are simply Good, Better, Best. You can stack shelves onto controller-shelves, just like a Double-Double, and you know what you’ll get in the end. Oh sure, there’s probably an Animal Style option somewhere, but you don’t need Animal Style to enjoy In ‘n Out, do you?
Lesson is this: Maybe your organization needs a real full-featured SAN & VAR-expertise. But maybe you just need fast, reliable iSCSI that you can hook up yourself.
It’s nice that we in customer-land have that option now.
ASUP & Community: The Autosupport from Nimble has left nothing to be desired, in fact, I think they nag too much. But I’ll take that over a downed array.
I’ve grown to enjoy Connect.Nimble.com, the company’s forum where guys like me can compare notes. Shout out to one awesome Nimble SE named Adam Herbert who built a perfect signed MPIO Powershell script that maps your initiators to your targets in no time at all.
And then you get to sit back and watch as MPIO does its thang across all your iSCSI HBAs, producing symmetrical & balanced utilization charts which, in turn, release pleasing little bursts of storage-dopamine in your brain.
It works fine with Hyper-V, CSVs, and Converged Fabric vEthernets: What a mouthful, but it’s true. Zero issues fitting this array into System Center Virtual Machine Manager storage (though it doesn’t have SMI-S support a “standard” which few seem to have adopted), failing CSVs from one Hyper-V node to another, and resizing CSVs or RDMs live.
And for the convergence fans: I pretty much lost my fear of using vEthernet adapters for iSCSI traffic during the bakeoff and in the Daisetta Lab at home, but in case you needed further convincing that Hyper-V’s converged fabric architecture kicks ass, here it is: Each Hyper-V node in my datacenter has 12 gigabit NICs. Eight of them per host are teamed (that is to say they get the Microsoft Multiplexor driver treatment, LACP-flavor) and then a Converged Virtual switch is built atop the multiplexor driver. From that converged v-switch, I’m dangling six virtual Ethernet adapters per host, two of which, are tagged for the Nimble VLAN I built in the 6509.
That’s a really long and complicated way of saying that in a modest-sized production environment, I’m using LACP teaming on the hosts, up to 4x1GbE vNics on the VIP guests, and MPIO to the storage, which conventional storage networking wisdom says is a bit like kissing your sister and bragging about it. Maybe it’s harmless (even enjoyable?) once or twice, but sooner or later, you’ll live to regret it. And hey the Department of Redundancy Department called, they want one of their protocols back.
I’ve read a lot of thoughtful pieces from VMware engineers & colleagues about doing this, but from a Hyper-V perspective,this is supported, and from a Nimble array perspective, I’m sure they’d point the finger at this if something went wrong, but it hasn’t, and from my perspective : one converged virtual switch = easy to deploy/templatize, easy to manage & monitor. Case closed.
LACP + MPIO in Hyper-V works so well that in three weeks of recording iSCSI stats, I’ve yet to record a single TCP error/re-transmit or anything that would make me think the old model was better. And I haven’t even applied bandwidth policies on the converged switches yet; that tool is still in my box and right now iSCSI is getting the Hyper-V equivalent of best effort.
It’s getting faster: Caching is legit. All my monitors and measurements prove it out. Implement your Nimble correctly, and you may see it get faster as time goes on.
And by that I mean don’t tick the “caching” box for every volume. Conserve your resources, develop a strategy and watch it bloom and grow as your iSCSI packets find their way home faster and faster.
The DBA is noticing it too in his latency timers & long running query measurements, but this graph suffices to show caching in action over three weeks in a more exciting way than a select * from slow-ass-tables query:
Least Frequently Used, Most Recently Used, Most Frequently Used….who frequently/recently cares what caching algorithm the CASL architecture is using? A thousand whiteboard sessions conducted by the world’s greatest SE with the world’s greatest schwag gifts couldn’t sell me on this the way my own charts and my precious perfmons do.
My cached Nimble volumes are getting faster baby.
Compression wise, I’m seeing some things I didn’t expect. Some volumes are compressing up to 40x. Others are barely hitting 1.2x. The performance impact of this is hard to quantify, but from a conservation standpoint, I’m not having to grow volumes very often. It’s a wash with the old dedupe model, save for one thing: I don’t have to schedule compression. That’s the CPUs job, and for all I know, the Nehalems inside my CS260 are, or should be, redlining as they lz4 my furious iSCSI traffic.
Busy Box & CLI: The Nimble command line in version 1.4x felt familiar to me the first time I used it. I recognized the command structure, the help files and more, and thought it looked like Busy Box.
What’s Busy Box? How to put this without making enemies of The Guys Who Say Vi…Busy Box is a collection of packages, tools, servers and scripts for the unix world developed about 25 years ago by an amazing Unix engineer. It’s very popular, it’s everywhere, and it’s reliable and I have no complaints about it other than the fact that it’s disconcerting that my Nimble has the same package of tools I once installed on my Android handset.
But that’s just the Windows guy talking, a Windows guy who was really fond of his WAFL and misses it but will adapt and holds out hope that OneGet & PowerShell, one day, will emerge victorious over all.
The SSL cert situation is embarrassing and I’m glad my former boss hasn’t seen it. Namely that situation is this: you can’t replace the stock SSL cert, which, frankly looks like something I would do while tooling around with OpenSSL in the lab.
I understand this is fixed in the new 2.x OS version but holy shit what a fail.
Other than that, I’m very pleased -and the organization is very pleased***- with our Nimble array.
It feels like at last, I’m enjoying the fruits of my labor, I’m riding a high-performance storage array that was cost-effective, easy to install, and is performing at/above expectations. I’m like Major Kong, my array is literally the bomb, man and his machine are in harmony and there’s some joy & euphoria up in the datacenter as my task is complete.
*Remember this lesson #StorageGlory seekers: no one knows your workload like you. The above screenshot of cache hits is of a 400GB SQL transaction log volume of a larger SQL DB that’s in use 24/6. Your mileage may vary.
*** I do not speak for the organization even though I just did.
Last night, whilst tooling around the Daietta Lab and playing with Nexenta v 4.01, I had occasion to pause for a moment, toss a few choice shots back, and reminisce & reflect on the end of Windows XP support, which is today, April 8, 2014.
Ten to 12 years ago, I was a mid-level/tier 2 helpdesk type at a typical Small to Medium Enterprise in California. The Iraq war had just started, George Bush was in office and Palm, maker of the Treo, was king of the bulky, nerds-only smartphone segment. 802.11g was hot stuff, and your 2.4GhZ spectrum, while still a junk band in the eyes of the FCC, was actually quite usable. HDMI had just been introduced, and plasmas were all the rage, and you didn’t need a bunch of dongle adapters in the boardroom to connect a projector to a laptop. EVDO data was OMG FAST, and Apple was still called Apple Computer. In place of guest Wifi, we had guest Ethernet (“Tell him to use the Red cable. The Red cable!!” I remember shouting to junior guys. The red cable went to the guest internet switch).
At work, life was simpler in IT. I don’t think we even used VLANs at that old job.
Basically, we had physical servers, a DS3 circuit, Windows XP on the clients, and Optiplex GX280s.
Lots of them.
XP, of course, was new then. Only 2-3 years old, it was Microsoft’s most successful operating system in like forever. It united at last the NT kernel & the desktop of Windows 98, but had a nice GUI front-end, soft-touch buttons, a color scheme by Crayola, and a font/typography system that still, to this day, provides fodder for the refined Mac font/typography snobs.
But you could join it to the Domain, use Group Policy against it, and mass-deploy those things like it was going out of style. This was a Big Deal in the enterprise. Ghost & RIS baby!
Hardware-wise, we didn’t worry about iPhone apps, Android vulnerabilities, or the cloud taking our jobs. No, all we had were Optiplexes. Acres of them, seemingly.
Small form factor & desktop-style Optiplex GX280s to be exact. Plastic and ugly, but you could open them up without tools. They were light enough you could carry them around without grabbing a bulky cart, and they offered plenty of surface area for the users to stick pictures of their cat or whatnot on them. Great little machines.
If I’m sounding nostalgic, I am. Getting a bit weepy here.
But then I recall the two straight years of pain. Three years maybe even.
The rise of XP came during the rise of the Internet in the post-dotcom bubble era. Want to get on something called the Internet and do some ecommerce shopping? Have I got an OS & Browser for you: XP + IE 6, or what one might call a Hacker’s Delight.
Oh how many hours were lost learning about, then trying to fix, then throwing up our hands collectively in frustration and saying, “Fuck it. Just RIS/Ghost the damn thing.” in reaction to horror shows like this in the pre-Service Pack 2 days of XP:
And then, coincidentally at the same time, the Optiplex GX280s started failing en masse. Reason? Bad or cheap motherboard capacitors. I shit you not. The capacitors around the old CPU socket just started failing en masse across Dell’s entire GX280 fleet. It was epic: years later, the Times reported some 22% of the 21 million Optiplex machines sold by Dell in 2003-2005 had failed capacitors.
The fix wasn’t difficult; just swap the motherboards. Any help desk monkey could do that. But I remember distinctly how shocked I was that we bought Dell-badged computers but got Packard Bell reliability instead. And I remember boiling resentment and rage against Michael Dell as I walked the halls of that old job, arms stuffed with replacement motherboards.
These were the first episodes of #VendorFail in my IT career. There are many stories like these in IT, but this one is mine. XP Spyware & Optiplex Capacitors were two solid years of my life in IT. I heart Microsoft, but damnnnnn those were some tough days in IT.
Of course, all that being said, today’s desktop is a lot more secure, but our back-end stuff has holes so deep & profound that even experts are shocked. Witness the new Heartbreak OpenSSL vulnerability!
I don’t know if Software Defined Networking is a legitimate thing I should pursue, or just another mine in the IT requisition battlefield I need to be aware of. If it’s something I should pursue, what is the scope, budget, risks, and payoff? And what do I need to buy exactly? With x86 virtualization, it was clear (powerful pizza boxes! Lots of RAM!), with network virtualization…not so much.
I do know this much: the traditional, monolithic & inflexible business WAN causes me pain and suffering.
You know the type, or maybe in your career, you’ve moved on from such environments. But many of us haven’t. Remember this thing? The Business WAN:
Yeah baby. It’s still kicking after all these years…you get yourself some T1s for the branches, 10MegE/100MegE for the tentpole sites, some Cisco routers, OSPF & maybe MPLS to tie it all together with a safe, predictable RFC-1918 ipv4 address scheme and NAT on the ASA edge device. Active Directory is usually built on top with a replication topology matching the sites’ /24s.
And on the seventh day, the young engineer stood back and beheld what he had built, and said, Go forth IT department, and let thy services & value multiply: Exchange, Sharepoint, SMB shares, SANs, QOS policies, print servers, a Squid caching server here, a Peplink there, oh my!
This model is straight out of the 1990s playbook, but it’s still in wide-use. In the meantime, a crazy thing happened, the internet came along and for some inscrutable reason, it’s really popular, accessible and useful, and people like it. Your thought your advanced Business WAN was hot stuff, but your users feel it’s positively archaic because they have 20 megabits of bandwidth on their tiny 4G LTE phone, limitless storage & bandwidth via the Dropboxes & Googles of the world, and an internet that’s never under maintenance and never tells them no.
This then, is the problem I want SDN to solve: take the stuff my users need that’s on the business WAN and put it where my users are: on the internet. 443 doesn’t work for everything and while cloud is the ultimate home, I’m looking for baby-steps to the cloud, things I can do today with SDN that are low-risk and potentially high-reward.
What do you do hotshot?
Once upon a time in the Microsoft world, there was a thing called Direct Access. This was a software-defined solution that eased access to corporate resources for users on the internet. Without initiating a VPN connection, your C-level could access that stubborn decade-old UNC path from his laptop anywhere on the internet. IPV6 to the rescue!
But it was somewhat painful to install, especially in multi-domain scenarios, and sadly, only worked on Windows, which was great 10 years ago, but we’re not in a world where the PC is increasing in relevance; we’re in a world where the PC is less relevant by the day.
Enter Pertino, which to the cynical is yet another SDN startup from the Valley, but to me, is the among the first vendors wearing the badge SDN that actually knows my WAN pain and is building something SDN-related that is imminently practical and immediately applicable.
Pertino bills itself as a Cloud VPN provider, which, I think, doesn’t do it justice. VPN calls to mind dial-up…remote users connecting to your LAN. Pertino is sort of the opposite: this bit of tech allows you to extend your WAN/LAN into the cloud effortlessly.
I’m pretty jazzed on it because I think Pertino, or something like it, could be the next evolution in business WAN networking, especially if your organization is cloud-cautious.
So What is it?
Pertino is essentially an ipv4 & ipv6 overlay technology employing NVGRE-like encapsulation dubbed “Overpass” that works with your existing on-prem equipment and extends securely your Layer 2/Layer 3 LAN assets to the places where your users are: the internet.
It’s so simple too. All you need is a modest 16 megabyte application that, to your users, will remain largely invisible. Once installed, Pertino sits quietly in the Windows system tray or in the background on Android and just generally stays out of the way, which makes it about 10x better than dial-up style VPNs of yesteryear.
While that application is low drama, what’s happening behind the scenes is some serious high stakes vKung-Fu, involving on-demand VMs, virtual switches, control & data planes and encapsulation.
On the Windows side, Pertino creates a simple virtual network interface, hooks onto your existing internet connection and begins a session with a Pertino virtual machine in a datacenter somewhere, in theory, close to your device.
All traffic on that vif is encapsulated via the NVGRE-like Overpass and your Windows client or Android handset is assigned both an ipv4 & ipv6 address. And just like that, you have what is in effect a fully switched LAN on the internet, to the point where an arp lookup sees other MAC addresses of other Pertino-enabled devices wherever they are.
Just think about that for a second. In years past, you’d have to call up your provider and order an exotic Virtual Private Wire Service to extend Layer 2 across a Layer 3 link if you wanted to expose your MAC addresses to the remote end.
Now I’m doing in effect the same thing with a simple Windows application. And I didn’t have to hire a consultant or mess around with NAT or the ASA, which is both comforting in that I like my security blanket, yet terrifying at the same time because it’s ipv6ing its way over my firewall. Pertino is essentially giving me a layer 2 switch….in the cloud.
In the Layer 3 space, your ipv4 address isn’t RFC-1918, but it’s not routable either. You can have any ipv4 address you like as long as it’s 50.203.x.x. Pertino is using Overpass encapsulation here to isolate tenants & customers from each other, reminding me of the way Microsoft uses NVGRE in Hyper-V & Azure.
After installing Pertino, you’re going to look at the output of ipconfig or ifconfig and say, “Wait. They gave me an entire /24?” Indeed, it seems you have a /24 but it’s not routable or unique. Another Pertino customer is probably getting the same /24. That’s what’s cool about encapsulation.
On your ipv6 Pertino network, things are a little more hazy. I think ipv6 is also NVGRE-encapsulated and perhaps all customers share the same /64, but while I’m a willing conscript in Tom Hollingsworth’s army of ipv6 boosters, I’m not smart enough to speak to how it works with Pertino. I know this: I can’t ping that ipv6 address from a non-Pertino client, yet the address differs substantially from the one my Windows 8.1 & 2012 R2 clients assign my NIC, which I can’t ping either.
So whatever man. It’s ipv6 and it’s mysterious.
What can you do with this?
Last fall I was hot on Pertino because I envisioned using it at scale in a modern business WAN: imagine being able to kill your expensive, continent-hopping MPLS network without having to revamp your entire infrastructure.
I’m not sure Pertino could do that, but still: the mind races.
As much as I hate to see it because I think it encourages bad behavior (printing), you can do print servers over this
I’ve been using Pertino in the Daisetta Lab and quietly at work for about five months. With it, I’ve done this:
Built an Domain Controller in AWS somewhere in Virginia, joined & promoted it as a DC with the DC in my Daisetta Lab
SMB (CIFS to the non Microsoft crowd) shares via common UNC paths
Remote desktop, ssh
Mounted LUN in remote datacenter via iSCSI and MS iSCSI Initiator and ran IOMETER over Pertino
So you’ve got your fancy Pertino adapters deployed to laptops, mobile phones, iPads, and certain strategic servers, you’re living the SDN dream with only a modest OpEx spend and no rip & replace and your users can finally access CIFS, Sharepoint, and other internal resources from whatever internet connection they have.
How’s this baby perform?
Couple of measurements:
Test Type, Details, Time, Subjective Feeling
SMB File Copy, Copied 104MB file from remote site, 10 minutes 3 seconds, Felt slow but was evening hours at home
SMB File copy, Copied 95MB of random files from remote site, 3 minutes 46 seconds, Felt much faster speed varied 400k to 1mb/s
Latency tests, Simple pings to remote pertino clients, 90ms minimum/300ms max, Was variable but mostly similar to standard VPN perofrmance
RDP, 2560×1440 remote desktop session, Session connected in 10s or so, Better that expected artifacting and compression minor
There’s room for improvement here, but I’m on the free tier of Pertino service. The company offers some enhancements for paying customers, but Pertino’s not something you deploy as Tier 1 infrastructure; this is better used to fill gaps between your infrastructure and your users, and as such, I think the performance is acceptable.
It’s at least as fast as what my users call the “stupid VPN,” so what’s not to like?
I’ve been using Pertino now for almost five months. I’d give them an A- for reliability.
I’ve been trying to push this review out for months, but it’s so easy to forget Pertino’s there. 99.9% of hte time, it’s invisible and just running in the background, connecting me seamlessly to whatever remote device I need to access.
There have been only two times the network failed me. Once, briefly in January I couldn’t RDP into the home network, and then, last week, there was an hours-long outage affecting many Pertino customers.
Credit to Pertino here: the same day they blogged about the outage, its cause and promised to make it better. Essentially a Pertino datacenter went offline, which they can recover from, but the resulting failover process snowballed and a widespread outage resulted:
On the afternoon of April 1st, there was a network outage between a cluster of data plane v-switches and the control plane, which was located in a different cloud datacenter. The disruption was brief, but lasted long enough for the control plane to consider those v-switches at risk. So, the control plane began migrating customers to other v-switches.
However, due to a new bug in the data plane, too many messages were sent from the data plane v-switches to the control plane, increasingly loading it with requests.
It’s been so reliable that after carefully considering the options, I had no problem recommending Pertino to my own parent partition, dad, a radio engineer by trade & reluctant IT consultant, as he often needs to connect small, distant radio stations to each other over IP. Usually he purchases a Zywall appliance, connects the sites together via VPN and with LogMeIn, he can remotely support these small operations.
Pertino is an obvious fit for scenarios like that as well, and it’s probably cheaper.
Back when I first was testing Pertino, they allowed you to install the package on up to three devices for free. It looks like that plan is gone now,
Pertino still offers a free account for IT pros to sink their teeth into: with it, you can add Peritno on up to three devices to see how this works.
After that, the base level pricing for Pertino is $29/month for up to 10 devices. It scales from there, but only modestly: enterprise packaging starts at 40+ devices and you have to contact them to get pricing.
One gets the feeling that perhaps this is aimed at really small SMB markets, but I’m not so sure. If you have 1500+ objects in Active Directory, you surely don’t need Pertino on all of them. Just certain strategic, edge-focused & secured ones: a Domain Controller & SMB server here, a few executive or important mobile user laptops there. You get the idea.
Up to 40 devices can be connected through Pertino for about $90 per month.
All in all, I’ve been pretty impressed with this kit. It’s at once a practical way to get your on-prem services out to your users, dip your toes into some ipv6 waters even if you’re not engineering it yourself, and leverage some real software defined networking (insert definition of that here) in a safe, low-risk way.
In fact, I think you should help me test it a bit more. If you’re a fellow tech blogger with a lab at home and you’re interested in this or suffer from WAN pains too, let’s link up: The Supervisor Module spouse isn’t interested in becoming a test user on my Pertino network, but if you are, shoot me an email . I can add you as a user on my Pertino network, you can join a VM to my Pertino switch, and we can have a sort of home lab Apollo-Soyuz moment and learn something together.
Crazy timing, but within minutes of me posting my review of Pertino’s CloudVPN tech yesterday, Scott Lowe, a well-known rockstar virtualization blogger weighed in with his views of Pertino on Twitter:
The concept behind @PertinoNetworks is cool, but I’m not terribly impressed w/ the implementation thus far. A bit too simplistic, I think.
I hope I don’t appear to be a Pertino shill and no disrespect to Scott intended, but I don’t think there’s anything “simplistic” about what is in effect a layer 2 switch in the cloud.
Okay, let’s suppose it’s a dumb switch.
Still. A dumb switch…in the cloud is something much more than a dumb switch in your rack.
Maybe I’m just easily impressed and Scott should show me how simplistic it is by joining my Pertino network. 😀
When I first signed up for the 3 device free Pertino service in the fall, a customer agent reached out to me to see if I had good experience. I relayed my experiences, linked to this blog and Pertino executives took notice. They offered me the use of up to 9 devices fro free on my Pertino network if I would post an unedited & unreviewed blog about my experiences with the product. No other compensation was given and no Pertino employees viewd the content of this post prior to its publication.
Do you remember this guy from Maxxel tape commercials?
Throw out the old cathode ray tube and insert a few 24″ LCD panels, and that’s been me for the last 48 hours as I’ve absorbed the news from #BUILD2014, Microsoft’s big developer conference and the first one under Captain Satya “Fearless” Nadella.
All the consumer stuff is great and exciting and on a personal level, I think Windows Phone 8.1 finally has reached feature parity with iOS & Android and is thus a potential handset for me some three years after I gave the laughable Windows Phone 7 a chance.
But the biggest news is *nix & Windows, friends at last. Maybe.
Where’s Microsoft headed? Agnostic Computing land. Just look at these crazy developments:
Jeffrey Snover & a declarative Windows/Azure in WMF 5: Snover, who I’ve written about before, is the father of PowerShell & Desired State Configuration, Microsoft’s document-based attempt to simplify deployment. Basically it’s a Redmond-flavored Puppet, but, on a deeper level, and as Snover pointed out in an interview last year, it’s a declarative framework for Windows, which marks a subtle change in focus in our old API-focused OS. Windows is/will soon be document-based like Linux, which means something substantive to programmer types and something practical to guys like me who are tired of SCCM crashing
Xamarin: Microsoft is really embracing Xamarin, Miguel de Icaza’s firm that produces open source tools for .net. If anyone deserves a bit of fame & fortune, it’s probably de Icaza as he’s been a .net/C# supporter for a long time in a space and among a crowd that hung pictures of Bill Gates, Locutus of Borg. I mean, imagine trying to scratch out a living wedged between giant ecosystems and associated history baggage & dogma. That’s where de Icaza has been and I’m eager to see the fruits of the new relationship.
Microsoft open sources stuff: Suddenly Microsoft is interesting again, says WaPo, of the blizzard of open source announcements at #BUILD. .Net Compilers, Win Java Script, hell, they’ve even open sourced Word 1.0. “Microsoft is trying to be your friend, and it may actually win you over,” WIRED swoons.
To me, the significant news out of BUILD is further proof that Nadella’s got his priorities straight, that Microsoft’s no longer worrying about shedding some of the legacy stuff/philosophies that have held it back in mobile, cloud and elsewhere.
The line between the open source world & Windows used to be really sharp, fine, and narrow, with combatants clearly staked out on each side.
After BUILD 2014, it seems a whole lot more fuzzy, and I think that’s a great thing in IT and in the consumer space. You can almost have best of breed & one throat to choke simultaneously!
One last cool news bit from BUILD: Microsoft’s network virtualization solution to date has amounted to NVGRE, a packet encapsulation solution that few engineers outside of Azure seem to care about, use, or make products for. It’s only available if you’re a System Center customer and frankly, seems more trouble than its worth. I don’t have multiple tenants in my data-center, I have one: my employer.
Meanwhile, OpenDaylight and all the VMware SDN products & frameworks are gaining momentum. And you can experiment with that stuff for free. Cumulus Networks sells a Linux-powered switch, some companies are pushing API-based traffic management and SDN feels like it’s real & tangible.
As you’ll recall from part 1, much of my time at work lately has been consumed by planning, testing and executing mass Storage Live Migration of 65+ .vhdx files from our old filer (built by a company that rhymes with PetTap) & its end-of-life 7200 RPM shelves to our new hotness, a Nimble CS260.
Essentially I’ve been a sort of IT Moses, planning the Exodus of my .vhdxs out of harsh, intolerably slow conditions, through some hazards & risks (Storage Migration during production), and into the promised land of speed, 1-5ms latency (rather than 20ms+!!), and user happiness.
Now VMware guys have a ton of really awesome tools to vMotion their .vmdks around their vCenters & vSpheres and their ESXis and now their VSANS too. They can tune their NFS for ludicrous speed and their Distributed Switches, now with Extra Power LACP support, can speak CDP & even tell you what physical port the SFP+ 10GbE adapter is plugged into.
Do I sound envious? Green with it even? Well I am.
In comparison, I got me some System Center Virtual Machine Manager (2012 SP1), Microsoft Failover Cluster mmc snap-in, a 6509e with two awesome x6748 performance blades (February 2008’s centerpiece switch mod in the popular “Hot Switch Racks” pin-up calendar), Hyper-Vs converged fabric design, 8x1GbE LACP teams, Manage Engine’s NetFlow, boat-loads of enthusiasm and a git ‘r done attitude.
And this is what I’ve got to git done:
And it has to be done with zero downtime because we have a 24/6 operational tempo, and I like my Saturdays.
One of my main worries as I’ve tried to quarterback this transition has been the switch. Recall from part 1 how I’m oversubscribed to hell & back on my two 6748s:
I fear the harsh judgment of my networking peers (You’re doing that with that?!?!) so let me just get it out there: yes, I’m essentially using my 6509 & these two blades as a bus for storage IO. In fact, iSCSI traffic accounts for about 90% of all traffic on the switch in any given 24 hour period:
Perhaps I’m doing things with this switch & with iSCSI that no sane engineer would, but I have to say, this has proven to be pretty durable and adequate as far as performance goes. Would I like some refreshing multi-channel SMB 3 file storage, some relief from the block IO blues & Microsoft clustering headaches? Yes of course, but I’ve got to shepherd the VMs to their new home first.
And to do that, I’ve got to master what this switch is doing on an hour by hour basis as my users log in around the clock.
So I pulled some Netflow data together, cracked my knuckles, and got busy with Excel and Pivot tables.
I’m glad I went through this exercise because it changed the .vhdx parade route & timing. What I thought was the busiest part of my infrastructure’s day was wrong, by a large factor. Here’s 8 days worth of Netflow traffic on the iSCSI & Live Migration VLANs, averaged out. Few live migrations were made during this period:
What you see here are the three login storms (Times on the graph are MST, they start early down under) as my EU, North America & Australia/New Zealand users login to their session virtualization VMs or hit the production SQL databases or run their reports.
I always thought EU punched my stack the hardest; our offices there have as many employees as North America, but only one or two time zones rather than three in North America.
But EU & North America and Australia combined don’t hit my switch fabric as hard as I do. Yes, the monkey on my back is…me. Well, me & the DBA and his incurable devotion to SQL backups in the evening. My crutch is DPM.
I won’t go into too much detail here but this was pretty surprising. At times over the eight days, Netflow would record more than 1 billion packets traversing the switch in one evening hour; the peak “payload” was north of 1 terabyte of iSCSI bytes/hour on some days!
Now I’m not a networking guy (though I do love Wifi experimenting), but what I saw here concerned me, gave me pause. Between the switch blades, I’ve supposedly got a 40 Gigabit/s backplane, to my Supervisor 720 modules, but is that real 40Gbit/s or marketing 40Gbit/s?
The key question: Am I stressing this 6509e, or does it have more to give?
Show fabric utilization detail said I was consuming only 20% of the switch fabric during my exploratory storage migrations, and that was at peak. 4 gigabit/second per port group.
Is that all you got? the 6509e seemingly taunted me.
But oh my stars, better check the buffers:
ACK! One dropped buffer call or whatever you call it, 7 weeks ago, way before I had the Nimble in place. Still….that’s one drop too many for me.
Stop pushing me so hard, the 6509e pleaded with me.
So I did what any self-respecting & fearful admin would do: call TAC. Show me the way home TAC, get me out of the fix I’m in or at least sooth my worry. Tell me I have nothing to worry about, or tell me I need to buy a Supe 2T to do what I want to do, just give me some certainty!
A few show tech supports, one webex session and one telephonic call with a nice engineer from Costa Rica later, I had some certainty. The config was sound, the single buffer drop was concerning but wasn’t repeating, even when I thought I was stressing the switch.
And I didn’t need to buy a Supe 2T.
On to the Exodus/.vhdx parade.
In all my fretting about the switch, I was forgetting one thing: the feeble filer is old and slow and can’t push that much IO to the Nimble in the first place.
As best I can figure it, I can do about five storage live migrations simultaneously. Beyond that, I fear that luns will drop on the filer.
To the Nimble, it’s no sweat:
Netflow’s view from the same period:
Love it when a plan comes together! I should have this complete within a few days, then I can safely upgrade my filer.