The Home IT Lab is just about finished and boy does it feel good after almost three months of scavenging for parts, buying certifiable junk off eBay, crawling through a dusty attic to run cable, and constructing something robust & redundant(ish) enough that I could use it as an experimental tech lab while providing four, maybe 5, 9s of reliability for my most important users, the fam.
At times it felt like one step forward and two steps back, but this is the final configuration. Yes, I’m done with the bare metal provisioning:
SG300 at top, dumb switch in the middle, 1U box is intended for vMWare, second 1u box is a Hyper-V host, and 2U server is running NexentaSTOR
Not bad eh? Sure the cabling could be dressed up a bit, but overall, I’m pretty happy with how it came together.
So, here’s the setup (So far):
Already I’ve learned so much that’s applicable at work and in the lab.
The first thing I’ve learned: ZFS is awesome, and the open source guys and companies who labor over these amazing storage network operating systems are heros in my book.
Child Partition testing fit ‘n finish of my cabling
Over the last 30 days, I’ve tested FreeNAS, OpenFiler, NAS4Free, and NexentaStor. Initially I didn’t want to sacrifice an entire physical host for simple storage duties, but building a Scaled Out File Server running Windows Server (ie the Windows SAN idea that’s legitimately interesting yet completely frightening at he same time) wasn’t a possibility here, so I had to sacrifice.
Now let’s be honest: what kind of performance would you expect out of a bunch of 2009-era AMD Semprons, some REALTEK Gbit adapters, consumer grade SSD & HDD, and Cat5e cabling pulled through this attic over the course of a few weekends?
Well, I think I’m hitting this stack about as hard as it can be hit without enabling jumbo frames or going RAID 0. FreeNAS and the others, while feature rich and sporting great cacti reporting, fell over at times during some really harsh and cruel SQLIO benchmark runs.
But NexentaSTOR didn’t. Here’s what the Sempron + 4xHDD, 1 SSD cache, 1xSSD logs and 16GB RAM turned in last night:
Yeah, I know, kind of unbelievable. Like, so unbelievable I had to run it a few times to make sure, and even now, I’m worried something isn’t quite right with the stats I’ve gathered. I mean I don’t know squat about L2ARC caching (except that it costs a lot of RAM) and logs, but I do know quite a bit about optimizing switch, LACP, and Hyper-V.
This is what it looked like from the Nexenta dashboard and one of my vEthernet NICs. Notice the read rate on Drive E/Disk 13:
OMG the poor Sempron.
Perfmon recorded this on Node1, comprised of a 3x1GbE team, with two Intel server NICs and a RealTek:
170,000+ packets per second. Keepin’ it RealTek
Fun times indeed.
One of my favorite virtualization bloggers, Everything Should Be Virtual.com, also found Nexenta to be a great fit for his home lab.
So the lab is built now. And it’s solid and robust enough for my needs. I have Hyper-V virtual machine failover between Node1 (office) and Node2 (Garage), I have passed cluster validation, thrown a ton of packets at the NexentaStor, and nothing caught fire, with the Nexenta giving me some CPU warnings but still able to write & read more data than a 1GbE NIC would have allowed otherwise.
I still have to get the vSphere host up. 5.5 doesn’t like RealTek nics, and I’m cheap, but I’ll figure something out. And if I don’t, Xen or KVM will go on that box.
Shitbox SAN under construction
Next up: I have about 80 days left on these Windows trials. I’m going to take full advantage and spin up a System Center instance. Then I’m going to play with Splunk and start again on building an OwnCloud VM to capture, index and app-ify all my personal family photos, music, videos, etc.
Also thinking of making the SG-300 my default gateway, or getting around to finally using a Vyatta VM Router or pfSense VM as my gateway. Following that: ipv6 experiments with Pertino, Hurricane Electric and more.
It’s been a bit quiet here on the AC blog because I’m neck deep in thinking about storage at work. Aside from the Child Partition at home (now 14 months and beginning to speak and make his will known, love the little guy), all my bandwidth over the last three weeks has been set to Priority DSCP values and directed at reading, testing, thinking, and worrying about a storage refresh at work.
You could say I’m throwing a one-man Storage Field Day, every day, and every minute, for the last several weeks.
And finally, this week: satisfaction. Testing. Validation. Where the marketing bullshit hits the fan and splashes back onto me, or magically arrays itself into a beautiful Van Gogh on the server room wall.
Yes. I have some arrays in my shop. And some servers. And the pride of Cisco’s 2009 mid-level desktop switching line connecting them all.
Join me as I play.
My employer is a modest-sized company with a hard-on for value, so while we’ve tossed several hundred thousand dollars at incumbent in the last four years (only to be left with a terrifying upgrade to clustered incumbent OS that I’ll have to fit into an 18 hour window), I’m being given a budget of well-equipped mid-level Mercedes sedan to offset some of the risk our stank-ass old DS14MK2 shelves represent.
We’re not replacing our incumbent, we’re simply augmenting it. But with what?
After many months, there are now only two contenders left. And I racked/stacked/cabled them up last week in preparation for a grand bakeoff, a battle royale between Nimble & incumbent.
Meet the Nimble Storage CS260 array. Sixteen total drives, comprised of 12x3TB 7.2K spinners + 4x300GB SSDs, making for around 33TB raw, and depending on your compression rates, 25-50TB usable (crazy I know).
Nimble appeals to me for so many reasons: it’s a relatively simple, compact and extremely fast iSCSI target, just the kind of thing I crave as a virtualization engineer. The 260 sports dual controllers with 6x1GbE interfaces on each, has a simple upgrade path to 10GbE if I ever get that, new controllers and more. On the downside the controllers are Active/Passive, there’s no native support for M/CS (but plenty for MPIO) and well, it doesn’t have a big blocky N logo on the front, which is a barrier for entry because who ever got fired for buying the blue N?
On top of the Nimble is a relatively odd array: a incumbent (incumbent now) incumbent array with 2x800GB SanDisk Enterprise SSDs & 10x1TB 7200 RPM spinning disks. This guy sports dual controllers as well, 2x1GbE iSCSI interfaces & 2x1GbE MGMT interfaces per controller and something like 9TB usable, but each volume you create can have its own RAID policy. Oh, did I mention it’s got an operating system very different from good old Data OnTAP?
So that’s what I got. When you’re a SME with a very limited budget but a very tight and Microsoft-oriented stack, your options are limited.
Anyway onto the fun and glory at 30,000 IOPS
Here’s the bakeoff setup, meant to duplicate as closely as possible my production stack. Yes it’s a pretty pathetic setup, but I’m doing what I can here with what I got:
Nimble v. incumbent Bakeoff
1x Cisco Catalyst 2960s with IOS
1x 2011 Mac Pro tower with 2x Xeon 5740, 16GB RAM, and 2xGbE
1x Dell PowerEdge 1950 with old-ass Xeon (x1), 16GB RAM, and 2xGbE
NICs: I adopted the Converged Fabric architecture that’s worked out really well for us in our datacenter, only instead of clicking through and building out vSwitches in System Center VMM, I did it in Powershell without System Center. So I essentially have this:
pServer 1 &2: 1 LACP team (2x1GbE) with converged virtual switch and five virtual NICs (each tagged for appropriate VLANs) on the management OS
pServer3: 1 LACP team (4x1GbE) on this R900 box, which is actually a production Hyper-V server at our HQ. So pServer2 is not a member of my Hyper-V Cluster, but just a simple host with a 4gb teamed interface and a f(#$*(@ iSCSI vswitch on top (yes, yes, I know, don’t team iSCSI they say, but haven’t you ever wanted to?)
All the virtual switch performance customization you can shake a stick at. Seriously, I need to push some packets. And I want angry packets, jacked up on PCP, ready to fight the cops. I want to break that switch, make smoke come out of it even. The Nimble & incumbent sport better CPUs than any of these physical servers so I looked for every optimization on virtual & physical switches
Cisco Switch: Left half is Nimble, right half is incumbent, host teams are divided between the two sides with -hopefully- equal amounts of buffer memory, ASIC processing power etc. All ports trunked save for the iSCSI ports. incumbent is on VLAN 662, Nimble is on VLAN 661. One uplink to my MDF switches.
VM Fleet: Seven total (so far) with between 2GB and 12GB RAM, 2-16vCPU and several teams of teams. Most virtual machines have virtual nics attached to both incumbent & Nimble VLANs
Volumes: 10 on each array. 2x CSV, 4xSQL, and 4xRDM (raw disk maps, general purpose iSCSI drives intended for virtual machines). All volumes equal in size. The incumbent, as I’m learning, requires a bit more forethought into setting it up, so I’ve dedicated the 2x800GB SSDs as SSD cache across a disk pool, which encompasses every spinner in the array
The tests:
I wish I could post a grand sweeping and well considered benchmark routine ala AnandTech, but to be honest, this is my first real storage bakeoff in years, and I’m still working on the nice and neat Excel file with results. I can do a follow-up later but so far, here’s the tools I’m using and the concepts I’m trying to test/proof:
SQLIO: Intended to mimic as closely as possible our production SQL servers, workloads & volumes
IOMETER: same as above plus intended to mimic terminal services login storms
Robocopy: Intended to break my switch
Several other things I suffer now in my production stack
Letting the DBA have his way with a VM and several volumes as well
All these are being performed simultaneously. So one physical host will be robocopying 2 terabytes of ISO files to a virtual machine which is parked inside a CSV in the NImble in the same CSV as another VM which is running a mad SQLIO test on a Nimble RDM. You get the idea. Basically everything you always wanted to but couldn’t do on your production SAN.
So far, from the Nimble, I’m routinely sustaining 20,000 IOPs with four or five tests going on simultaneously (occasionally I toss in an ATTO 2GB random throughput test just for shits, grins, and drama) and sometimes peaking at 30,000 IOPS.
The Nimble isn’t missing a beat:
What else can we throw at this thing? ATTO, the totally non-predictive, non-enterprise storage benchmarking application! I ran this Saturday night in the midst of 2 SQLIO runs, one IOMETER SQL-oriented run, and two robocopies.
So yeah. The Nimble is taking all that my misfit army of Mac hardware, a PowerEdge 1950 that practically begs us to send it to a landfill in rural China every time I power it on, and a heavyweight R900 whose glory days were last decade, and its laughing at me.
Choke on this I say.
Please sir may I have another? the Nimble responds.
So what did we do? What any mid-career, sufficiently caustic and overly-cynical IT Pro would do in this situation: yank drives. Under load. 2xSSD and 1xHDD to be specific.
And then pull the patch cables out of the active controller.
Take that Nimble! How you like me now and so forth.
Results:
And lo, what does the Nimble do?
Behold the 3U wonder box that you can setup in an afternoon, sustains 25-30,000 IOPs, draws about 5.2 amps, and yet doesn’t lose a single one of my VMs after my boss violently and hysterically starts pulling shit out of its handsome, SuperMicro-built enclosure.
Sure some of the SQLIO results paused for about 35-40 seconds. And I still prefer M/CS over MPIO. But I can’t argue with the results. I didn’t lose a VM in a Nimble CSV. I dropped only one or two pings during the handover, and IO resumed post-gleeful drive pulls.
Storage Glory.
I mean this is crazy right? There’s only 16 drives in there. 12 of which spin. I can feel the skepticism in you right now….there’s no replacement for displacement right? Give me spindle count or give me death. My RAID DP costs me a ton of spindles, but that’s the way God intended it, you’re thinking.
So in the end (incumbent tests forthcoming), what I/we really have to choose is whether to believe the Nimble magic.
I’m sold on it and want that array in my datacenter post-haste. Sure, it’s not a Filer. I’ll never host a native SMB 3.0 share on it. I’ll miss my breakfast confection command line (Nimble CLI feels like Busy Box by the way, but can’t confirm), but I’ll have CASL to play with. I can even divvy out some “aggressive” cache policies to my favorite developer guys and/or my most painful, highest-cost, user workloads.
As far as the business goes? From my seat, it’s the smart bet. The Nimble performs extremely well for our workloads and is cost effective.
For a year now I’ve been reading Nimble praise in my Enterprise IT feed. Neat to see that, for once, reality measured up to the hype.
More on my bakeoff at work and storage evolution at home later this week. Cheers.
Editor note: This post has been edited and certain information removed since its original posting on Jan 21.
My jaw dropped. Belkin nailed it. Feast your eyes on this:
As Ars Technica notes, the colors, shape and strangeness are immediately recognizable, even to people like my mom who still, to this day, looks for the “blue and black router box” when internet problems occur at her house.
Here’s the original in all its early 2000s, plastic-fantastic glory:
It’s rare for a tech company outside of One Infinite Loop to build iconic technology hardware. Even rarer for said device to be networking hardware.
And yet, the old WRT54G (still have one) has reached icon stature, something Belkin was smart enough to recognize (and why they feel they can justify a $300 price tag on it!)
The look of the new one makes me want to buy it and mount it in a place that’s visible, just as a conversation piece. Crazy.
The only consumer networking gear to get a starring role in Southpark
Going to be hard to turn down this one when it’s released soon. It’s coming with OpenWRT!
It’s been awhile since I blogged about my home IT lab, the purpose of which is to 1) ensure near HA-levels of service for my most important and critical users -the wife and fam- and 2) build something with which I can approximate and simulate conditions at work while hopefully learning a thing or two.
In November I blogged:
Sucks to admit it, but I think I’ve got to spend. But what? I want a small footprint but capable PC running at least a Core i3 or i5 and that can support up to 32GB of RAM to make sure I can continue to use it in a few years (Lenovo tops out at 16GB in my current box).
I’m thinking Mac Mini (an appropos choice for the Agnostic Computing lab), a Gigabyte BRIX, or a custom PC inside a shuttle case (offers 2GigE built in) and have a total budget of about $700.
Boy how a couple of months, a birthday & holiday season changed that picture. I went from thinking I’d build a humble little lab -mostly virtual- to building this:
I’ve tagged each element of the stack to ease comprehension and foster the reader’s amusement.
TrendNet 16 port Cat 5e patch panel: $20, Fry’s
Cisco SG-200 no PoE: A gift from a vendor. Yes, I’m not above that kind of thing. 10 GbE ports, love this switch
1U Cable Management: $17 from a local business IT systems retailer. Great for hiding the shame
24 Port TP-Link GbE switch, unmanaged: Where I plug the the stuff that shan’t be messed with. It’s a stupid switch but it’s rack-mountable and if something broke while I was away, I could, in the worst case scenario, have my wife plug in the blue “internet” cable into the TP-Link and all would be right again Borrowing
Frankencuda: Behold the depths I’ll go to. I’ve re-purposed and re-built a dead Barracuda Load Balancer 340. Not only that, but I bolted 3.5″ HDD trays & 2TB drives onto the top of the ‘Cuda’s modified 1U SuperMicro case. Frankencuda parts: Motherboard $50, 8GB RAM, $69, 2x128GB SanDisk SSD ($180, Amazon), re-used/borrowed all other parts including the dapper little AMD Sempron which can be unlocked into an Athlon II dual core
TV Convergence, almost: With the over-weight ‘cuda threatening to collapse on it, this stack represents my home internet connection (Surfboard docsis 3.0 modem on right) and telvision (Time Warner, via HD HomeRun Prime and shitty TWC tuning adapter). Cable Modem: $110 in 2012, HDHomeRun Prime: $99 on Woot.com11
Lenovo PC: My old standby, a 2011 M91p with a core i-7 2600, 16GB RAM, and a half-height 4x1GbE Broadcome NIC I’m borrowing from work. 2TB drive inside. $950 in 2011
NetGear ReadyNAS 102 w/ Buffalo 3TB “Caching” External USB 3.0: I got the ReadyNAS in October when I was convinced I could do this cheaply and with a simple iSCSI box and adequate LUN management. Alas, I quickly overwhelmed the ReadyNAS; the poor thing falls over just booting three VMs simultaneously, but it’s freaking amazing as a DLNA media server and a general purpose storage device. The Buffalo is on-loan from work; decent performer, good for backups. $250
StarTech USA two post 12U rack: Normally $60, I got mine used on eBay for $25. Great little piece of kit. It’s bolted down to my wooden workbench.
Latest Fisher Price cable tester: It makes a smiley face and plays a happy sound when the four pairs are aligned. $10
Not pictured is my new desktop PC at home, a Core i-5 4670K, Asus Z87 Premiere or Dope or Awesome line motherboard, 32GB RAM, 1x256GB Samsung EVO SSD and some cheap $50 mini-tower case.
So yeah, I blew past he $700 limit, but only if you consider purchases made in 2012 and earlier, which really shouldn’t be counted. And much of this was funded via the generosity of friends and family vis a vis Christmas and birthday gift certificates.
Thank you everyone. You’ve only made it worse.
What have I learned from this experience? Building a home IT lab is not like the procurement processes you’re used to at pretty much any organized job you’ve ever been employed at. It basically involves you pestering vendors (or sucking up to them), nagging others for old parts, debasing yourself by dumpster diving for old, inferior gear, and generally just doing unsavory things.
But it’s all in pursuit of IT Excellence so it is justified.
So what have I got with this crazy stack? Well material is only one piece; sweat equity costs are very high as well. I’ve run about 17 Cat 5e cables of varying lengths through an attic that hasn’t seen this much human attention since the Nixon administration:
The mess on the left isn’t mine….entirely. I only claim the ones on the right
I spent three solid Saturdays (in between other chores) navigating this awful, dusty attic and its counterpart in the garage above my server stack, all in pursuit of this:
And though the cable management out of frame is obscene and not suitable for a family-friendly blog like AC, I will say I’ve accomplished something important here.
Who else can say they’ve unified TV & Compute resources in such a singular stack in their home? All those goddamned ugly black power bricks are located in one corner of the garage, the only area suitable for such things. The only non-endpoint device in the living quarters of the house is the Netgear Nighthawk AC wifi router (DD-WRT, currently my gateway).
Everything else in the living quarters -save for my computer which now has a nice 3x1GbE drop in the wall- is a simple endpoint device. Ethernet is my medium: data & Television are the payload, all from this one spot. Yes, even my wife can appreciate that.
And from an IT Lab perspective, I’ve got this:
Three compute nodes with a total of 10 cores
50GB DDR3 minimum 1333mhz RAM
2TB in the NetGear which runs some light iSCSI LUNs
6TB in RAID 0 on the Frankencuda with 256GB SSD
3Gb/s fabric-oriented networking to each node, LACP on the Cisco switch
So now the fun begins. Benchmarks are underway, followed by real workload simulation. I’ll update you diligently as I try to break what I just built.