Defending IT amidst the novel WannaCry worm

It’s been a hell of a few days here in the trenches of Information Technology in 2017. Where to begin?

Between explaining how this all works to concerned friends & family, answering my employer’s questions about our patching posture & status, and reading the news & analysis, I think it’s safe to say that WCry has been in my thoughts for every one of the last 72 hours, including the 24 hours of Mother’s Day and all the hours I spent in restless slumber.

Yes, that’s right. WCry was on my mind even as I celebrated Mother’s day for the three women I’m close to in my life who are mothers. Wow. Just wow.

Having had the chance to catch my breath, I’ve got some informed observations about this global incident from my perspective as an IT Pro. Why is WCry as interesting & novel as it is potent and effective in 2017? And is there any defense of an IT team one might make if their organization got pwned by WCry?

I contemplate both questions below.

WCry successfully chains a social engineering attack with a technical exploit resulting in automated organization pwnage
WCry begins as a social engineering/phishing attack on users in the place they love and hate by equal measure: their Inbox. Using Subject lines that draw the eye, the messages include malicious attachments. This facet of WCry is not new of course…..it’s routine and has been in IT for at least two decades.

How WannaCry works

Once the attachment is clicked, WCry pivots, unleashing an NSA-built cyberweapon upon the enterprise by scanning port 445 across the local /24, cycling through cached RDP accounts and calling special attention to SQL & Exchange services, presumably to price the ransom accordingly.

Then it encrypts. Nearly everything.

All of this from a single email opened by a gullible user.

This behavior -socially engineered attack on human meatbag + scan + pivot to the rest of the network- is also not novel, new or remarkable.  In fact, security Pros call this behavior “moving laterally” through an enterprise and they usually talk about it being done from “jump box” or “beach head” that’s been compromised via social engineering. Typically, security pros will reserve those terms to describe the behavior of a skilled & hostile hacker meatbag intent on pwning a targeted organization.

Where WCry is novel is that it in effect automates the hacker out of the picture, making the whole org pwnage process way more efficient. This is Organization-crippling, self-replicating malware at scale. Think Sony Pictures 2014, applied everywhere automatically minus the North Korean hacker units at the keyboard.

 

The red Wcry “Ooops” message is both informative and visually impressive, which multiplies its influence beyond its victims
As these things go, I couldn’t help but be impressed with Wcry’s incredibly detailed and anxiety-inducing UI announcing a host’s Wcry infection:

This image, or some variant thereof, has appeared on everything from train station arrival/departure boards to manufacturing floor PCs to hospital MRIs to good old-fashioned desktop PCs in Russia’s Interior Ministry. The psychological effects of seeing this image on infected hardware, then seeing it again on popular social media sites, the evening news, and newspapers around the world over the last few days are hard to determine, but I know this: this had an effect on normal consumers and users of technology across the globe. Sitting on my lap Saturday, my four year old saw the image in my personal OneNote pastebin and asked me, “Daddy, is that an alarm? Why does it show a lock? Do you have key?”

What’s interesting is that while computer users saw this or a screensaver version of this image, in reality you could click past it or minimize it in some way. Yet images of this application have proliferated on Twitter, FaceTube and elsewhere. Ransomware used to just announce itself in the root of your file share or your c:\user\username\documents folder: now it poses for screen caps and cell phone pics which multiplies its effectiveness as a PsyOps weapon. By Saturday I was reading multiple articles in my iPad’s Apple News about how regular people could protect themselves from the ‘global cyberattack.’

Its function is not just about encrypting file shares like earlier ransomware campaigns, but about owning Enterprises
If my organization or any organization I was advising got hit by WCry, my gut feeling is that I wouldn’t feel secure about my Forest/Domain integrity until I burned it down and started over. Why? Well, big IT security organizations like Verizon’s Enterprise Security group typically don’t classify ransomware as a ‘data breach’ event. Yet, as we know, Wcry installs a Pulsar backdoor that enables persistent access in the future. This feels like a very effective escalation of what it means to be ransomed in modern IT organizations, so yeah, I wouldn’t feel secure until our forest/domain was burned to the ground.

It is the manifestation of a Snoverism : Today’s nation-state cyberweapon is tomorrow’s script-kiddie attack
I was listening to the father of Powershell, Jeff Snover once and he implanted yet another Snoverism in my brain.  He said, paraphrasing here, that Today’s nation-state attack is tomorrow’s script-kiddie attack. What the what?

Jeff Snover, speaker of wisdom

Let’s unpack: the democratization of technology, the shift to agile, DevOps, and other development disciplines along with infrastructure automation has lead to a lot of great things being developed, released and consumed by users very quickly. In the consumer world this has been great -Alexa is always improving with new skills…Apple can release security patches rapidly, and FaceTube can instantly perform A/B testing on billions of people simultaneously. But not well understood by many is the fact that Enterprises and even individuals can harness these tools and techniques to instantly build and operate data systems globally, to get their product, whatever it may be, to market faster. The classic example of this is Shadow IT, wherein someone in your finance team purchases a few seats on Salesforce to get around the slow & plodding IT team.

I think Snover was observing that bad guys get the same benefits from modern technology techniques & the cloud as consumers and business users do.

And as I write this on Monday, what are we seeing? WCry is posted on GitHub and new variants are being created without the kill-switch/sandbox detection domain. Eternal Blue, the component of Wcry that exploits SMB1, was literally just a few months ago a specialized tool in the NSA’s cyber weapons arsenal. By tomorrow it will be available to any kid who wants it, or, even worse, as a push-button turn-key service anybody can employ against anybody else.

The democratization of technology means that no elite or special knowledge, techniques or tools are required to harness technology to some end. All you need is motive and motivation to do things at scale. This week, we learned that the democratization of technology is a huge double-edged sword.

It was blunted by a clever researcher for about $11
Again on the democratization of technology front, I find it fascinating that MalwareTech was able to blunt this attack by spending $11 of his own money to purchase the domain he found encoded in the output of his decompile. He’s the best example of what a can-do technologist can do, given the right amount of tools and freedom to pursue his craft.

It has laid bare the heavy costs of technical debt for which there is no obvious solution
Technical debt is a term used in software engineering circles and computer science curricula, but I also think it can and should apply to infrastructure thinking. What’s technical debt? Take it away Wikipedia:

Technical Debt is a metaphor referring to the eventual consequences of poor system design, software architecture, or software development within a codebase. The debt can be thought of as work that needs to be done before a particular job can be considered proper or complete. If the debt is not repaid, then it will keep on accumulating interest, making it hard to implement changes later on.

I can’t tell you how many times and at how many organizations I’ve seen this play out. Technical Debt, from an IT Pro’s perspective, can be the refusal to correct a misconfiguration of an important device upon which many services are dependent, or it can be a poorly-designed security regime that takes bad practice and cements it into formal process & habit, or it can be a refusal to give IT the necessary political cover & power to change bad practices or bad design into something durable and agile, or it can be refusing to patch your systems out of fear or a desire to kick the can down the road a bit.
Over time, efforts will be made to pay that technical debt down, but unless a conscious effort is made consistently to keep it low, technical debt eventually -inevitably- becomes just as crippling to an organization as credit card debt becomes to a consumer. Changes to IT systems that in other organizations are routine & easy become hard and difficult; and hard changes in other companies are close to impossible in yours.

This is a really bad place to be for an IT Pro, and now WCry made it even worse by exploiting organizations that have high technical debt, particularly as it relates to patching. Indeed, it’s almost as if the author of this malware understood at a basic fundamental level how much technical debt organizations in the real world carry.

There is no obvious solution to this. We can’t force people to use technology a certain way, or even to think of technology in a certain way. The point of going into business is to make money, not to build durable & secure and flexible technology systems, unless that is your business. Cloud services are the obvious answer, but they can’t do things like run MRI machines or interface with robots on the Nissan assembly line. At least not yet. And nobody wants regulation, but that’s a topic for another post.

It has shown how hard it is to maintain & patch systems that are in-use for more than a typical workday
If we ignore the way WCry rampaged through Russia, China and other places where properly licensing your software is considered optional, something else interesting emerges: the organizations that were hardest hit by Wcry were ones in which technology is likely in use beyond the standard 8 hour workday, which likely makes patching those technology systems all the more difficult.

While reporting on the NHS fiasco has zoomed in on the fact that the UK’s healthcare system had Windows XP widely deployed, I don’t think that tells the whole story, even if it’s true that 100% of NHS systems ran XP, it still doesn’t tell the whole story.  I can easily see how patching in such environments could be difficult based on how much those systems are used.  Hospitals and even out-patient facilities typically operate more than 8 hours a day; finding a slot of time in a given 24 hour period in which you can with the consent of the hospital, offline healthcare devices like MRI machines to update & reboot them is probably more difficult than it is in a company where systems are only required to be up between 7am and 6pm, for instance.

On and on down the list of Wcrypt’s corporate vicitms this pattern continues:

  • Nissan: factory controlled machines were infected with WCry. How easy is it to patch these systems amid what is surely a fast-paced, multi-shift, high-volume operating tempo?
  • German Train system: Literally computers that make the trains run on time have been hit by WCry. Trains and planes operate more than 8 hours a day, making them difficult to patch
  • Telefonica & Portugal Telecom: another infrastructure company that operates beyond a standard 8 hour day that got hit by WCry

I know banks & universities were hit as well, but they’re the exception that points at the rule emerging: Security is hard enough in an 8 hour a day organization. But it’s extra, extra hard when half of a 24 hour day, or even 2/3rds of a 24 hour day is off-limits for patching. Without well-understood processes, buy-in and support from management, discipline and focus on the part of a talented IT team,  such high tempo operating environments will inevitably fall behind the security curve and be preyed upon by WCry and its successors.

It has demonstrated dramatically the perpetual tension between uptime, security and the incentives thereof for IT
This is similar to the patching-is-hard-in-high-tempo organizations claim, but focuses on IT incentives. For the first 2o or 30 years of Information Technology, our collective goal and mission in life was to create, build and maintain business systems that have as much uptime as possible. We call this ‘9s’ as in, “how many ya got?!?”, and it’s about the only useful objective measure by which management continues to sign our check.

Here, I’ll show you how it works:

IT Pro # 1: I got five 9s of uptime this month, that’s less than 26 seconds of unplanned downtime!

IT Pro #2: Still doesn’t touch my record in March of 2015, where I had six 9s (2.59 seconds of downtime) for this service!

Uptime is our raison d’etre, the thing we get paid to deliver the most. We do not get paid, in general, to practice our craft the right way, or the best practice way, per se. We certainly do not get paid to guard against science-fiction tales of security threats involving cyber-weapon worms that encrypt all our data.

We are paid to keep things up and running because, at the end of the day, we’re a cost center in the business. It takes a rare and unique and charismatic manager with support from the business to change that mindset, to get an organization beyond a place where it merely views IT as a cost-center and a place to call when things that are supposed to be up are down.

And that’s part of the reason why Wcry was so effective around the globe.

It has spawned a bunch of ignorant commentary from non-technical people who are outraged at Microsoft

Zeynep Tufecki, an outstanding scholar of good reputation studying the impact of technology on society wrote a piece in the NYT this weekend that had my blood boiling. Effectively, she blames Microsoft and incompetent IT teams for this mess:

First, companies like Microsoft should discard the idea that they can abandon people using older software. The money they made from these customers hasn’t expired; neither has their responsibility to fix defects. Besides, Microsoft is sitting on a cash hoard estimated at more than $100 billion (the result of how little tax modern corporations pay and how profitable it is to sell a dominant operating system under monopolistic dynamics with no liability for defects).

This is absurd on its face. She’s essentially arguing that software manufacturers extend warranties on software forever. She continues:

For example, Chromebooks and Apple’s iOS are structurally much more secure because they were designed from the ground up with security in mind, unlike Microsoft’s operating systems.

Tufecki, whom I really like and enjoy reading, is trolling us. 93% of Google’s handsets don’t run the latest Google OS, which means many people -close to a billion by my count- are, through now fault of their own, carrying around devices that aren’t up to date. Should they be supported forever too? And Apple’s iPhone, as much as I love it, can’t run an Assembly line that manufacturers cars nevermind coordinate an MRI machine.

Rubbish. Disappointed she wrote this.

For all the reasons above, Wcry is not the fault of Microsoft any more than it’s the fault of the element Copper. If anything, the fault for this lies in the way we think about and use technology as businesses and as individuals. Certainly, IT shares some of the blame in these organizations, but there are mitigating factors as I spoke about above.

Mostly, I lay the blame at the NSA for losing these damned things in the first place. If they can’t keep things secure, what hope do most IT shops have?

It has inspired at least one headline writer to say your data is safer with FaceTube than with your hospital
Again, more rubbish and uninformed nonsense from the normals. Sure, my data might be safer from third party hackers if I were to house it inside FaceTube, but then again, adtech companies might just buy that same dataset, anonymized, connect dots from that set to my online behavior dataset, and figure out who I really am. That’s FaceTube’s business, after all!

Find Office problems before they find you with Telemetry server

I’ve not always had a bromance with Microsoft’s Office suite. I cut my word processing teeth on WordPerfect 5.1, did most of my undergrad papers in BeOS’ one productivity suite ((GoBe Productive, still the best Office suite name)) , and touch-typed my way to graduating cum laude in grad school with countless Turabian-style Google Docs papers.

Office?

That was for corporate suits, man. Rich corporate suits.

But all that’s ancient history. Or maybe I’ve become a suit. Either way, I’m loving Office today.

In 2015, Office has transformed into the ultimate agnostic git ‘r done productivity package. It’s free to use in many cases, but if you want to ‘own’ it, you can subscribe to it, just like HBO ((For the IT Pro, this is a huge advantage, as a cheap E-class sub gives you access to your own Exchange instance, your own Sharepoint server, and your own Office tenant. It’s awesome!)) . It’s also available on just about any device or computing system you can think of, works just as well inside a browser as Google Docs does, and has an enormous install base.

telemetry
From the Office Telemetry PDF guide, linked below

Office has become so impressive and so ubiquitous that it’s truly a platform unto itself, consumed a la carte or as part of a well-balanced Microsoft meal. I’m bullish on Windows but if Office’s former partner ever sunsets, I’m convinced my kid and his kid will still grow up in an Office world.

All of that makes Office really important for IT, so important that you as an IT Guy should consider standing-up some easy instrumentation around it.

Enter Office Telemetry, a super-simple package that flows your Office data to a SQL collector, mashes it up, and gives you important insight into how your users are using Office. It also surfaces the problems in Office -or Office documents- before your users do, and it’s free.

Oh, did I mention it’s called Office Telemetry? This thing makes you feel like an astronaut when you’re using it!

Here’s how you deploy it. Total time: about an hour.

  1. Download the Office 2013 ADMX/ADML files for Group Policy and deploy them to your Domain Controllers.
  2. Spin-up a 2008 R2 or 2012 VM, or find a modestly-equipped physical box that at least has Windows Management Framework 3.0/Powershell 3.0 on it. If it has a SQL 2012 instance on it that you can use, even better. If not, don’t stress and proceed to the next step.
  3. Set-aside a folder on a separate volume (ideally) for the telemetry data. If you’re going to flow data from hundreds of Office users, plan for a minimum of 5-25 megabytes per user, at a minimum.
    • If your users are on the WAN, plan accordingly. Telemetry data is pretty lightweight (50k chunks for older Office clients, 64k chunks for Office 2013)
  4. gptelemetryInstall Office ProPlus 2013 or 365 on the VM. You do not need to use an Office 365 license for it to run.
  5. Download the Deploy Office Telemetry powershell script package from TechNet or via Script Browser in Powershell ISE.
  6. Because it’s a script, you’ll need to temporarily change your server’s execution policy, self-sign it, or configure Group Policy as appropriate to run it. TechNet has instructions.
  7. Run the script; it will download SQL 2012 express and install it for you if you don’t have SQL. It will also set proper SMB read/modify permissions on that folder you set up earlier.
  8. As if that wasn’t enough, the script will give you a single registry keyfile you can use to deploy to your user’s machines.
  9. But I prefer the Group Policy/SCCM route. Remember the ADMX files you deployed? Flip the switches as appropriate under User Configuration>Administrative Templates>Microsoft Office 2013> Telemetry Dashboard.
  10. Sit back, and watch the data flow in, and pat yourself on the back because you’re being a proactive IT Pro!

As I’ve deployed this solution, I’ve found broken documents, expensive add-ons that delay Office, and multiple other issues that were easy to resolve but difficult to surface. It’s totally worth your time to install it.

Office Telemetry PDF

Sign of the Times or just the best PKI book ever?

Like a lot of IT Pros, I’ve been studying up on security topics lately, both as a reaction to the increasing amount of breach news (Who got breached this week, Alex?) and because I felt weak in this area.

So, I went shopping for some books. My goals were simply to get a baseline understanding of crypto systems and best-practice guidance on setting up Microsoft Public Key Infrastructures, which I’ve done in the past but without much confidence in the end result.

Well, it turns out there’s not a whole lot of literature on Microsoft PKI systems. It seems the best of the genre is Windows Server 2008 PKI & Certificate Security, a Microsoft Press book published in 2008 and authored by Brian Komar:

pkiwin

This 3.2lb, 800 page book has a 4.9 out of 5 star rating on Amazon, with reviewers calling it the best Microsoft PKI guide out there.

Great! I thought, as I prepared to shell out about $80 and One Click my way to PKI knowledge.

That’s when I noticed that the book is out of print. There are digital versions available from O’Reilly, but it appears most don’t know that.

For the physical book itself, the least expensive used one on Amazon is $749.99. You read that right. $750!

If you want a new copy, there’s one available on Amazon, and it’s $1000.

I immediately jumped over to Camelcamelcamel.com to check the history of this book, thinking there must have been a run on Mr. Komar’s tome as Target, Home Depot, JP Morgan, and Sony Pictures fell.

Result:

pkiprice

 

The price of this book has spiked recently, but Peak PKI was a full three years ago.

I looked up security breaches/events of early 2012. Now correlation != causation, but it’s interesting nonetheless. Hopefully this means there’s a lot of solid Microsoft PKI systems being built out there!

Rather than shell out $750 for the physical book, I decided to get Ivan Ristic’s fantastic Bulletproof SSL/TLS, which I highly recommend. It’s got a chapter on securing Windows infrastructure, but is mostly focused on crypto theory & practical OpenSSL. I’ll buy Komar’s as a digital version next or wait for his forthcoming 2012 R2 revision.

Microsoft’s commitment to open initiatives & the riddle of whitebox networking

On Tuesday Microsoft surprised me by announcing an open switching/networking plan in partnership with Mellanox and as part of the Open Compute initiative.

Wait, what?

Microsoft’s building a switch?

Not quite, but before we get into that, some background on Microsoft’s participation in what I call OpenMania: the cloud & enterprise technology vendor tendency to prefix any standards-ish cooperative work effort with the word Open.

Microsoft’s participating in several OpenMania efforts, but I only really care about these two because they highlight something neat about Microsoft and apply or will soon apply to me, the Converged IT Guy.

Open Compute, or OCP, is the Facebook-led initiative to build agnostic hardware platforms on x86 for the datacenter. I like to think of OCP as a ground-up re-imagining of hardware systems by guys who do software systems.

As part of their participation in OCP, Microsoft is devoting engineering resources and talent into building out specifications, blueprints and full hardware designs for things like this, a 12U converged chassis comprised of storage and compute resources.

ocs
Are those brown Zunes in the blades?

 

Then there’s Open Management Infrastructure (OMI), an initiative of the The Open Group (TOG). Microsoft joined OMI almost three years ago to align & position Windows to share common management frameworks across disparate hardware & software systems.

That’s a lot of words with little meaning, so let me break it down for the Windows guys and gals reading this. The promise of Microsoft’s OMI participation is this: you can configure other people’s hardware and software via the same frameworks your Windows Server runs on (CIM, the next-gen WMI) using the same techniques and tooling you manage other things with: Powershell.

All your management constructs are belong to CIM
All your management constructs are belong to CIM

I’ve been keenly interested in Microsoft & their OMI push because it’s an awesome vision, and it’s real, or real-close at any rate: SMI-S, for instance, is gaining traction as a management play on other people’s hardware/software storage systems ((cf NIMBLE STORAGE NOW INTEGRATES WITH SCVMM)) , and is already baked-into Windows server as a feature you can install and use to manage Windows Storage Spaces, which itself is a first-class citizen of CIMville.

All your CIM classes -running as part of Windows or not- manipulated & managed via Powershell, the same ISE you and I use to deploy Hyper-V hosts, spin-up VMs, manage our tenants in Office 365, fiddle around in Azure, and make each day at work a little better and a little more automated than the last.

That’s the promised land right there, ladies and gentlemen.

Except for networking, the last stubborn holdout in my fevered powershell dream.

Jeff Snover, the architect of the vision, teases me with Powershell Leaf Spine Tweets like this:

//platform.twitter.com/widgets.js

but  I have yet to replace Putty with Powershell, I still have to do show int status rather than show-interface -status “connected” on my switch because I don’t have an Arista or N7K, and few other switches vendors seem to be getting the OMI religion.

All of which makes Microsoft’s Tuesday announcement that it is extending its commitment to OCP’s whitebox switching development really odd yet worthy of more consideration:

The Switch Abstraction Interface (SAI) team at Microsoft is excited to announce that we will showcase our first implementations of the specification at the Open Compute Project Summit, which kicks off today at the San Jose Convention Center. SAI is a specification by the OCP that provides a consistent programming interface for common networking functions implemented by network switch ASIC’s. In addition, SAI allows network switch vendors to continue to build innovative features through extensions.

The SAI v0.92 introduces numerous proposals including:

Access Control Lists (ACL)
Equal Cost Multi Path (ECMP)
Forwarding Data Base (FDB, MAC address table)
Host Interface
Neighbor database, Next hop and next hop groups
Port management
Quality of Service (QoS)
Route, router, and router interfaces

At first glance, I wouldn’t blame you if you thought that this thing, this SAI, means OMI is dead in networking, that managing route/switch via Powershell is gone.

But looking deeper, this development speaks to Microsoft’s unique position in the market (all markets, really!)

  1. SAI is probably more about low-level interaction with Broadcom’s Trident II ((At least that’s my read on the Github repo material)) and Microsoft’s participation in this is more about Azure and less about managing networking stuff w/Powershell
  2. But this is also perhaps Microsoft acknowledging that Linux-powered whitebox switching is really enjoying some momentum, and Microsoft needs to have something in this space

So, let’s review: Microsoft has embraced Open Compute & Open Management. It breaks down like this:

  • Microsoft + OCP =  Contributions of hardware blueprints but also low-level software code for things like ASIC interaction
  • Microsoft + OMI = A long-term strategic push to manage x86 hardware & software systems that may run Windows, but likely run something Linuxy yet

In a perfect world, OCP and OMI would just join forces and be followed by all the web-scale players, the enterprise technology vendors, the storage guys & packet pushers. All would gather together under a banner singing kumbaya and praising agnostic open hardware managed via a common, well-defined framework named CIM that you can plug into any front-end GUI or CLI construct you like.

Alas, it’s not a perfect world and OCP & OMI are different things. In the real world, you still need a proprietary API to manage a storage system, or a costly license to utilize another switchport. And worst of all, in this world, Powershell is not my interface to everything, it is not yet the answer to all IT questions.

Yet Microsoft, by virtue of its position in so many different markets, is very close now to creating its own perfect world. If they find some traction with SAI, I’m certain it won’t be long before you can manage an open Microsoft-designed switch that’s a first-class OMI citizen and gets along famously with Powershell! ((Or buy one, as you can buy the Azure-in-a-box which is simply the OCP blueprint via Dell/Microsoft Cloud Platform System program))

The Value of Community Editions

I was excited to hear on the In Tech We Trust podcast this week that the godfather of all the hyperconverged things -Nutanix- may release a community edition of their infrastructure software this year.

That. Would. Be. Amazing.

I’ve crossed paths with Nutanix a few times in my career, but they’ve always remained just a bit out of reach in my various infrastructure projects. Getting some hands-on experience with the Google-inspired infrastructure system in my lab at home would be most excellent, not just for me, but for them, as I like to recommend product stacks I’ve touched above ones I haven’t.

Take Nexenta as an example. As Hans D. pointed out on the show, aside from downloading & running Oracle Solaris 12, Nexenta’s just about the only way one can experience a mature & enterprise-focused implementation of ZFS. I had a blast testing Nexenta out in my lab in 2014 and though I can’t say my posts on ZFS helped them move copies of NexentaStore, it surely didn’t hurt in my view.

VEEAM is also big in the community space, and though I’ve not tested their various products, I have used their awesome stencil collection.

Lest you think storage & hyperconvergence vendors are the only ones thinking ‘community, today my favorite yellow load balancer Kemp announced in effect a community edition of their L4/L7 Loadmaster vAppliance. Kemp holds a special place in the hearts of Hyper-V guys; as long as I can remember, yes even back in the dark days of 2008 R2, they’ve always released a Loadmaster that’s just about on-par with what they offer to VMware shops. In 2015 that support is paying off I think; Kemp’s best-in-class for Microsoft shops running Hyper-V or building out Azure, and with the announcement you can now stress a Kemp at home in your lab or in Azure with your MSDN sub. Excellent.

Speaking of Microsoft, I’d be remiss if I didn’t mention Visual Studio 2013, which got a community edition last fall.

I’d love to see more community editions, namely:

  • Nimble Storage: I’ve had a lot of success in the last 18 months racking/stacking Nimble arrays in environments with older, riskier storage. I must not be the only one;  the company recently celebrated its 5,000th customer. Yet, Nimble’s rapid evolution from storage startup with potential to serious storage player is somewhat bittersweet for me as I no longer work at the places I’ve installed Nimble arrays and can’t tinker with their rapidly-evolving features & support. Come on guys, just give me the CASL caching system in download form and let me evaluate your Fiber Channel support and test out your support for System Center
  • NetApp: A community release of Clustered Data OnTAP 8.2x would accomplish something few NetApp products have accomplished in the last few years: create some genuine excitement about the big blocky blue N. I’m certain they’ve got a software-only release in-house as they’ve already got an appliance for vSphere and I heard rumors about this from channel sources for years. So what are you waiting for NetApp? Let us build-out, support, and get excited about cDOT community-style since it’s been too hard to see past the 7-mode–>clustered mode transition pain in production.

On his Graybeards on Storage podcast, Howard Marks once reminisced about his time testing real enterprise technology products in a magazine’s tech lab. His observations became a column, printed on paper in an old-school pulp magazine which was shipped to readers. This was beneficial relationship for all.

Those days may be gone but thanks to scalable software infrastructure systems, the agnostic properties of x86, bloggers & community edition software, perhaps they’re back!

Hunting Lettered Drives in a Microsoft Enterprise

Of all the lazy, out-dated constructs still hanging around in computing,SMB shares mapped as drive letters to client PCs has to be the worst.

Microsoft Windows is the only operating system that still employs these stubborn, vestigal organs of 1980s computing. Why?

Search me. Backwards compatibility perhaps, but  really? It’s not like you can install programs to shares mapped as drive letters, block-storage style.

If you work in Microsoft-powered shops like me, then you’re all too familiar with lettered drive pains. Let’s review:

  1. Lettered drives are paradigms from another era: Back in the dial-up and 300 baud modem days you got in your car and drove to Babbages to purchase a big box on a shelf. The box contained floppy diskettes, which contained the program you wanted to use. You put the floppy in your computer and you knew instinctively to type a: on your PC. Several hours later after installing the full program to your C: drive, you took the floppy out of its drive and A: ceased to exist. If this sounds archaic to you (it is), then welcome to IT’s version of Back to the Future, wherein we deploy, manage and try to secure systems tied to this model
  2. Lettered drives are dangerous:  The Crytpo* malware viruses of the last two years have proven that lettered drives = file server attack vector. I have friends dealing with Gen 3 of this problem today; a drive map from one server to all client PCs must be a Russian crypto-criminal’s dream come true.
  3. Your Users Don’t Understand Absolute/Relative paths:  When users want to share a cat video from the internet, they copy + paste the URL into an email, press send, and joyous hilarity ensues. But anger, confusion, despair & Help Desk tickets result when those same users paste a relative path of G:FridayFunDebsFunnyCatVids into an email and press send. Guess what Deb? Not everyone in the world has a G: drive. This is frustrating for IT, and Deb doesn’t understand why they’re so mad when she opens a ticket.
  4. Lettered drives spawn bad practice offspring: Many IT guys believe that lettered drives suck, but they end up making more of them out of laziness, fear or uncertainty. For instance: say the P:HR_Benefits folder is mapped to every PC via Group Policy, and everyone is happy. Then one day someone in HR decides to put something on the P: drive that users in a certain department shouldn’t see. IT hears about this and figures, “Well! Isn’t this a pickle. I think, good sir, that the only way out of this storm of bad design is to go through it!” and either stands-up a new share on a new letter (\fsSecretHRStuff maps to Q:) or puts an NTFS Deny ACL on the sub-folder rather than disabling inheritance. More Help Desk tickets result, twice as many if the drive mapping spans AD Sites and is dependent on Group Policy.
  5. Lettered drives don’t scale: Good on your company for surviving and thriving throughout the 90s, 2000s, and into the roaring teens, but it’s time for a heart-to-heart. That M:Deals thing you stood-up in 1997 isn’t the best way to share documents and information in 2015 when the company you helped scale from one small site to a global enterprise needs access to its files 24/7 from the nearest egress point.

I wish Microsoft would just tear the band-aid off and prevent disk mapping of SMB shares altogether. Barring that, they should kill it by subterfuge & pain ((Make it painful, like disabling signed drivers or something))

But at the end of the day, we the consumers of the Microsoft stack bear responsibility for how we use it. And unfortunately, there is no easy way to kill the lettered drive, but I’ll give you some alternatives. It’s up to you to sell them in your organization:

  1. OneDrive for Business: Good on Microsoft for putting advanced and updated OneDrive clients everywhere. This is about as close to a panacea as we get in IT. OneDrive should be your goal for files and your project plan should go a little something like this: 1) Classify your on-prem file shares, 2) upload those files & classification metadata to OneDrive for Business, and 3) install OneDrive for Business on every PC, device, and mobile phone in your enterprise, 4) unceremoniously kill your lettered drive shares
  2. What’s wrong with wack-wack? Barring OneDrive, it’s trivial to map a \sharefolder to a user’s Library so that it appears in Window Explorer in a univeral fashion just like a mapped drive would
  3. DFS: DFS is getting old, but it’s still really useful tech, and it’s on by default in an AD Domain. Don’t believe me? Type \yourdomain and see DFS in action via your NETLOGON & SYSVOL shares. You can build out a file server infrastructure -for free- using Distributed File Sharing tech, the same kit Microsoft uses for Active Directory. Say goodbye to to mapping \sharesharename to Site1 via Group Policy, say hello to automatic putting bits of data close to the user viaGroup Policy.
  4. Alternatives: If killing off the F: drive is too much of an ask for your organization, consider locking them down top prirority with tools like SMB signing, access-based enumeration and other security bits available in Server 2012 and 2012 R2.

Microsoft is the original & ultimate hyperconverged play

The In Tech We Trust Podcast has quickly became my favorite enterprise technology podcast since it debuted late last year. If you haven’t tuned into it yet, I advise you to get the RSS feed on your favored podcast player of choice ASAP.

The five gents ((Nigel Poulton, Linux trainer at Pluralsight, Hans De Leenheer,datacenter/storage and one of my secret crushes, Gabe Chapman, Marc Farley and Rick Vanover)) putting on the podcast are among the sharpest guys in infrastructure technology, have great on-air chemistry with each other, and consistently deliver an organized & smart format that hits my player on-time as expected every week. Oh, and they’ve equalized the Skype audio feeds too!

And yet….I can’t let the analysis in the two most recent shows slip by without comment. Indeed, it’s time for some tough love for my favorite podcast.

Guys you totally missed the mark discussing hyperconvergence & Microsoft over the last two shows!

For my readers who haven’t listened, here’s the compressed & deduped rundown of 50+ minutes of good stimulating conversation on hyperconvergence:

  • There’s little doubt in 2015 that hyperconverged infrastructure (HCI) is a durable & real thing in enterprise technology, and that that thing is changing the industry. HCI is real and not a fad, and it’s being adopted by customers.
  • But if HCI is a real, it’s also different things to different people; for Hans, it’s about scale-out node-based architecture, for others on the show, it’s more or less the industry definition: unified compute & storage with automation & management APIs and a GUI framework over the top.
  • But that loose definition is also evolving, as Rick Vanover sharply pointed out that EMC’s new offering, vSpex Blue, offers something more than what we’d traditionally (like two weeks ago) think of as hyperconvergence

Good stuff and good discussion.

And then the conversation turned to Microsoft. And it all went downhill. A summary of the guys’ views:

  • Microsoft doesn’t have a hyperconverged pony in the race, except perhaps Storage Spaces, which few like/adopt/bet on/understand
  • MS has ceded this battlefield to VMware
  • None of the cool & popular hyperconverged kids, save for Nutanix and Gridstore, want to play with Microsoft
  • Microsoft has totally blown this opportunity to remain relevant and Hyper-V is too hard. Marc Farley in particularly emphasized how badly Microsoft has blown hyperconvergence

I was, you might say, frustrated as I listened to this sentiment on the drive into my office today. My two cents below:

The appeal of Hyperconvergence is a two-sided coin. On the one side are all the familiar technical & operational benefits that are making it a successful and interesting part of the market.

  • It’s an appliance: Technical complexity and (hopefully) dysfunction are ironed out by the vendor so that storage/compute/network just work
  • It’s Easy: Simple to deploy, maintain, manage
  • It’s software-based and it’s evolving to offer more: As the guys on the show noted, newer HCI systems are offering more than ones released 6 months or a year ago.

The other side of that coin is less talked about, but no less powerful. HCI systems are rational cost centers, and the success of HCI marks a subtle but important shift in IT & in the market.

    • It’s a predictable check cut to fewer vendors: Hyperconvergence is also about vendor consolidation in IT shops that are under pressure to make costs predictable and smoother (not just lower).
    • It’s something other than best-of-breed: The success of HCI systems also suggests that IT shops may be shying away from best-of-breed purchasing habits and warming up to a more strategic one-throat-to-choke approach ((EMC & VMware, for instance, are titans in the industry, with best-in-class products in storage & virtualization, yet I can’t help but feel there’s more going on than the chattering classes realize. Step back and think of all the new stuff in vSphere 6, and couple it with all the old stuff that’s been rebranded as new in the last year or so by VMware. Of all that ‘stuff’, how much is best of breed, and how much of it is decent enough that a VMware customer can plausibly buy it and offset spend elsewhere?))
    • It’s some hybrid of all of the above: HCI in this scenario allows IT to have its cake and eat it too, maybe through vendor consolidation, or cost-offsets. Hard to gauge but the effect is real I think.

((As Vanover noted, EMC’s value-adds on the vSpex Blue architecture are potentially huge: if you buy vSpex Blue architecture, you get backup & replication, which means you don’t have to talk to or cut yearly checks to Commvault, Symantec or Veeam. I’ve scored touchdowns using that exact same play, embracing less-than-best Microsoft products that do the same thing as best-in-class SAN licenses))

And that’s where Microsoft enters the picture as the original -and ultimate- Hyperconverged play.

Like any solid HCI offering, Microsoft makes your hardware less important by abstracting it, but where Microsoft is different is that they scope supported solutions to x86. VMware, in contrast only hands out EVO:RAIL stickers to hardware vendors who dress x86 up and call it an appliance, which is more or less the Barracuda Networks model. ((I’m sorry. I know that was a a cheapshot,  but I couldn’t resist))

With your vanilla, Plain Jane whitebox x86 hardware, you can then use Microsoft’s Hyperconverged software system (or what I think of as Windows Server) to virtualize & abstract all the things from network (solid NFV & evolving overlay/SDN controller) to compute to storage, which features tiering, fault-tolerance, scale-out and other features usually found in traditional SAN systems.

But it doesn’t stop there. That same software powers services in an enormous IaaS/PaaS cloud, which works hand-in-hand with a federated productivity cloud that handles identity, messaging, data-mining, mail and more. The IaaS cloud, by the way, offers DR capabilities today, and you can connect to it via routing & ipsec, or you can extend your datacenter’s layer 2 broadcast domain to it if you like.

On the management/automation side, I understand/sympathize with ignorance of non-‘softies. Microsoft fans enthuse  about Powershell so much because it is -today-  a unified management system across a big chunk of the MS stack, either masked by GUI systems like System Center & Azure Pack or exposed as naked cmdlets. Powershell alone isn’t cool though, but Powershell & Windows Server aligned with truly open management frameworks like CIM, SMI-S and WBEM is very cool, especially in contrast to feature-packed but closed APIs.

On the cost side,there’s even more to the MS hyperconverged story:  Customers can buy what is in effect a single SKU (the Enterprise Agreement) and get access to most if not all of the MS stack.

Usually,organizations pay for the EA in small, easier-to-digest bites over a three year span, which the CFO likes because it’s predictable & smooth. (( Now, of course, I’m drastically simplifying Microsoft’s licensing regime and the process of buying an EA as you can’t add an EA to your cart & checkout, it’s a friggin negotiation. And yes I know everyone hates the true-up. And I grant that an EA just answers the software piece; organizations will still need the hardware, but I’d argue that de-coupling software from hardware makes purchasing the latter much, much easier, and how much hardware do you really need if you have Azure IaaS to fill in the gaps?))

Are all these Microsoft things you’ve bought best of breed? No, of course not. But you knew that ahead of time, if you did you homework.

Are they good enough in a lot of circumstances?

I’ll let you judge that for yourself, but, speaking from experience here, IT shops that go down the MS/EA route strategically do end up in the same magical, end-of-the-rainbow fairy-tale place that buyers of HCI systems are seeking.

That place is pretty great, let me tell you. It’s a place where the spend & costs are more predictable and bigger checks are cut to fewer vendors.  It’s a place where there are fewer debutante hardware systems fighting each other and demanding special attention & annual maintenance/support renewals in the datacenter. It’s also a place where you can manage things by learning verb-noun pairs in Powershell.

If that’s not the ultimate form of hyperconvergence, what is?

Snover re-factoring Windows Server & System Center

My last two posts on Microsoft were filled with angst and despair at Microsoft’s announcement that the next gen versions of Server & System Center would be delayed until sometime in 2016. Why, I cried out, why the delay on Server, and what’s to become of my System Center, I wondered?

I went a bit off-the-rails, imagining that Satya Nadella had shaken things up for the System Center team. Then I wrote a letter to him asking him what was up.

Snover & Microsoft love Linux
Snover & Microsoft love Linux

Well, I was wrong on all that, or perhaps I was only a little bit right.

There was a shakeup, but it wasn’t Nadella who had angrily overturned a gigantic redwood table at System Center HQ, spilling Visio shapes & System Center management packs as he did so, rather it was Mr Windows himself, the Most Distinguished of Distinguished Technical Fellows, Dr. Jeffrey Snover who had shaken things up.

Yes. The Padre of Powershell himself filled in the gaps for me on why System Center & Windows Server were delayed during a TechDays online one day after my last post.

During that  talk, he announced that the Windows Server Team has been meshed with the System Center Team and, even better, the Azure team. Hot dog.

Redmond mag:

[Snover] explained that the System Center team and the Windows Server team are now “a single organization,” with common planning and scheduling. He said that the integration of the two formerly separate organizations isn’t 100 percent, but it’s better than it’s been in the past. The team also takes advantage of joint development efforts with the Microsoft Azure team, he added.

That’s outstanding news in my view.

Microsoft’s private|hybrid|public cloud story is second to none as far as I’m concerned. No one else offers deep integration between cutting edge public cloud systems (Azure) with your on-prem legacy infrastructure stack.

Yet that deep integration (not speaking of AAD Sync & ADFS 3 here) was becoming confused and muddled with overlap between the older tools (System Center) and the newer tools like Desired State Configuration, mixed in with AzurePack, an on-prem/cloud management engine.

It sounds to me like Snover’s going to put together a coherent strategy using all the tools, and I can’t think of a better guy to do the job.

But what of Windows server?

It’s getting Snovered too, but in a way that’s not as clear to me. Again, Redmond mag:

The next Windows Server product will be deeply refactored for cloud scenarios. It will have just the components for that and nothing else, Snover explained. Next, on top of that, Microsoft plans to build a server that will be the same as the Windows Servers that organizations currently use. This server it will have two application profiles. One of the application profiles will target the existing APIs for Windows Server, while the other will target the subsets of the APIs that are cloud optimized, Snover explained. On top of the server, it will be possible to install a client, he added. This redesign is happening to better support automation, he explained.

I watched most of Snover’s talk, took a few days to think about it, and still have no idea what to make of the high-level architecture slide below that flashed on screen briefly:

vnext

Some thoughts that ran through my head: is the cloud-optimized server akin to CoreOS, with active/passive boot partitions, something that will finally make Patch Tuesday obsolete? One could hope that with further abstraction, we’ll get something like that in Windows Server vNext.

In some sense, we already have parts of this: if you enable the Hyper-V feature on a bare-metal computer, you emerge, after a few reboots, running a Windows virtual machine atop a Type-1 Hypervisor.

Big deal right? Well, Snover’s slide seems to indicate this will be the default state for the next generation of Windows server, but more than that, it seems to indicate that what we think of as the Type-1 Hyperivisor is getting a bunch of new features, like container support.

We knew Docker support was coming, but at this level, and almost indistinguishable from the hypervisor itself?

That’s potentially all kinds of awesome.

Interestingly, Server Roles & Features look like they’re being recast into a “Client” level that operates above a Windows Server.

Which, if we continue down the rabbit hole, means we have to ask the question: If my AD Domain Controller  or my RemoteApp session host farm servers are now clients, what are they running on? It certainly doesn’t seem to be a Windows server anymore, but rather a kind agnostic compute fabric, made up of virtual “Servers” and/or “Containers” operating atop a cloud-optimized server running on bare-metal…an agnostic computing ((Damn straight, had to work that in there)) fabric that stretches across my old on-prem Dells all the way up to the Azure cloud…right?!?

I’m like four levels deep into Jeffrey Snover’s subconscious so I’ll stop, but suffice it to say, the delay of Windows Server & System Center appears to be justified and I can’t wait to start testing it in 2016.

Hyper-V 29% of Hypervisors shipped and Second Place Never Felt so Good

Click!!
Click!!

I couldn’t help but cheer and raise a few virtual fist bumps to the Microsoft Server 2012 and 2012 R2 team as I read the latest report out of some industry group or other. Hyper-V 3.0, you see, is cracking along with just a tick under 1/3rd of the hypervisor market.

Meanwhile, VMware -founder of the genre, much respect for the Pater v-Familias- is running about 2/3rds of virtualized datacenters.

And that’s just fine with me. 

Hyper-V is still in a distant second place. But second place never felt so good as it does right now. And we got some vMomemntum on our side, even if we don’t have feature parity, as I’ve acknowledged before. 

Hyper-V is up in your datacenter and it deserves some V.R.E.S.P.E.C.T.

Testify IDC, testify:

A growing number of shops like UMC Health System are moving more business-critical workloads to Hyper-V. In 2013, VMware accounted for 53 percent of hypervisors deployed last year, according to data released in April by IT market researcher IDC. While VMware still shipped a majority, Hyper-V accounted for 29 percent of hypervisors shipped.

The Redmond Magazine report doesn’t get into it beyond some lame analyst comments, but let me break it down for you from a practitioner point of view.

Why is Hyper-V growing in marketshare, stealing some of the vMomentum from the sharp guys at VMware?

Four reasons from a guy who’s worked it:

  • The Networking Stack: It’s not that Windows Server 2012 & 2012 R2 and, as a result, Hyper-V 3.0, have a better network stack than VMware does. It’s that the Windows server team rebuilt the entire stack between 2008 R2 & Server 2012. And it’s OMG SO MUCH BETTER that the last version. Native support for Teaming. Extensible VM switching. Superb layer 3 and layer 2 cmdlets. You can even do BGP routing with it. It’s built to work, with minimal hassle, and it’s solid on a large amount of NICs. I say that as someone who ran 2008 R2 Hyper-V clusters then upgraded the cluster to 2012 in the space of about two weekends. Trust me, if you played around with Windows Server 2008 R2 and Hyper-V and broke down in hysterics, it’s time for another look.
  • SMB 3.0 & Storage Spaces/SOFS…don’t call it CIFS and also, it’s our NFS: There’s a reason beyond the obvious why guys like Aidan Finn, the Hyper-Dutchman and DidierV are constantly praising Server Message Block Three dot Zero. It kicks ass. Out of the box, multi-channel is enabled on SMB 3.0, meaning that anytime you create a Hyper-V-Kicks-Ass file share on a server with at least two distinct IP addresses, you’re going to get two distinct channels to your share. And that scales. On Storage Spaces and its HA (and fault tolerant?) big brother Scaled out File Server: what Microsoft gave us was a method by which we could abstract our rotational & SSD disks and tier them. It’s a storage virtualization system that’s quite nifty. It’s not quite VSAN except that both Storage Spaces/SOFS & VSAN seem to share common cause: killing your SAN.

    "Turn me on!" Hyper-V says to the curious
    “Turn me on!” Hyper-V says to the curious
  • Only half the Licensing headaches of VMware: I Do Not Sign the Checks, but there’s something to be said for the fact that the features I mention above are not SKUs. They are part & parcel of Server 2012 R2 Standard. You can implement them without paying more, without getting sign-off from Accounts payable or going back to the well for more spend.Hyper-V just asks that you spend some time on Technet but doesn’t ask for more $$$ as you build a converged virtual switch.
  • It’s approachable: This has always been one of Microsoft’s strengths and now, with Hyper-V 3.0, it’s really true. My own dad -radio engineer, computer hobbyist, the original TRS-80 fan- is testing versions of radio control system software within a Windows 7 32 bit & 64 bit VM right from his Windows 8.1 Professional desktop. On the IT side: if you’re a generalist with a Windows server background, some desire to learn & challenge yourself, and, most importantly, you want to Win #InfrastructureGlory, Hyper-V is tier one hypervisor that’s approachable & forgiving if you’re just starting out in IT.

It’s also pretty damn agnostic. You can now run *BSD on it, several flavors of linux and more. And we know it scales: Hyper-V, or some variant of it, powers the X-Box One (A Hypervisor in Every Living Room achieved), it can power your datacenter, and it’s what’s in Azure.

vSympathy under vDuress

An engineer in a VMware shop that’s using VMware’s new VSAN converged storage/compute tech had a near 12 hour outage this week. He reports in vivid detail at Reddit, making me feel like I’m right there with him:

At 10:30am, all hell broke loose. I received almost 1000 alert emails in just a couple minutes, as every one of the 77 VM’s in the cluster began to die – high CPU, unresponsive, applications or websites not working. All of the ESXi hosts started emitting a myriad of warnings, mostly for high CPU. DRS attempted to start migrating VM’s but all of the tasks sat “In progress”. After a few minutes, two of the ESXi hosts became “disconnected” from vCenter, but the machines were still running.

Everything appeared to be dead or dying – the VM’s that didn’t immediately stop pinging or otherwise crash had huge loads as their IO requests sat and spun. Trying to perform any action on any of the hosts or VM’s was totally unresponsive and vCenter quickly filled up with “In progress” tasks, including my request to turn off DRS in an attempt to stop it from making things worse.

I’m a Hyper-V guy and (admittedly) barely comprehend what DRS is but wow. I’ve got 77 VMs in my 6 node cluster too. And I’ve been in that same position, when something unexpected…rare…almost impossible to wargame…happens and the whole cluster falls apart. For me it was an ARP storm in the physical switch thanks in part to an immature understanding 2008 R2’s virtual switching.

I’m not ashamed to say that in such situations intuition plays a part. Logs are an incomprehensible firehose and not useful and may even distract you from the real problem. Your ops manager VM, if stored within the cluster (cf observer effect) is useless, and so, what do you have?

You have what lots of us have, no matter the platform. A support contract. You spend valuable minutes explaining your situation to a guy on the phone who handles many such calls per day. Minutes, then a half hour, then a full hour tick by. The business is getting restless & voices are being raised. If your IT group has an SLA, you’re now violating it. Your pulse is rising, you’re sweating now.

So you escalate.  Engage the sales team who sold you the product..you’re desperate. This guy got a vExpert on the phone. At times, I’ve had MVPs helping me. Yet with some problems, there are no obvious answers, even for the diligent & extraordinary.

But if you’re good, you’ve a keen sense of what you know versus what you don’t know (cf Donald Rumsfeld for the win), and you know when to abandon one path in favor of another. This engineer knew exactly the timing of his outage…what he did, when he finished the  work he did, and when the outage started. Maybe he didn’t have it down in a spread and proving it empirically in court would never work, but he knew: he was thinking about what he knew during his outage, and he was putting all his knowns and unknowns together and building a model of the outage in his head.

I feel simpatico with this guy…and I’m not too proud to say that sometimes, when nothing’s left, you’ve got to run to the server room (if it’s near, which it’s not in my case or in this engineer’s case I think) and check the blinky lights on the hard drives on each of your virtualization nodes. Are they going nuts? Does it look odd? The CPUs are redlined and the putty session on the switch is slow…why’s that? ‘

Is this signal, or is this noise?

Observe the data, no matter how you come by it. Humans are good at pattern recognition. Observe all you can, and then deduce.

Bravo to this chap for doing just that and feeling -yes feeling at times- his way through the outage, even if he couldn’t solve it.

High five from a Hyper-V guy.