Thursday, May 7, 2015

I got Nutanix on a sunny day

I got Nutanix on a shiny day, when it's cold outside I got another NetApp bay.

The question I get most often when I talk about Nutanix is why I bought them. Technically, I didn't buy them. My peer and close friend did. Then, he found another job and quit.  So I own it now. I don't know how he found them but he and I talked, more dreamed about, technology letting our imaginations ring a bit wild.  I think the Double IPA tastings didn't hurt.  I was heavily into bending vCenter to my will.  VMware tries hard to deliver almost the feature you want. Almost.

So, while on a bender, I  lamented (beer induced revisionist history) how we have this crappy hodgepodge of mashed up compute and storage hardware.  Wouldn't it be great to be able to cluster servers that have 16 hard disks as one, reliable, striped storage array shared as one NFS mount to all the VMware servers?   We would have all out current compute plus disk distributed across multiple physical hosts.  We could optimize disk usage and survive a whole host failure.  if only...

Some months later he suggested Nutanix.  I looked over the platform info and considered our use case and nodded in approval as one does to add dramatic affect to his approval.  Not that I approved.  I was skeptical. But, in case the skies over Nutanix clouded up, I had a extra NetApp bay.

I had spend a few weeks helping the Engineering Operations folks with virtualizing our builds servers.  having recently emerged from the primordial ooze, virtualization was witchcraft and I  repeatedly fended of burning torches aimed at the kindling I stood upon.  Eventually though, they came around and much of it was virtualized.  Poorly. Using local storage and suffering for it.   It really wan't all their fault except for not taking my help.  They were way out of their element,.  I mean who runs a VM on an ESXi instance in Xenserver and says virtualization doesn't work.

Our design goal was to decrease the build time.  Whatever it was, decrease it.   After quite a bit of instrumenting, I already knew the way they implemented their systems and network was a mess and a big part of the problem.  Two other very significant problems were the server OS and build scripts.  Those would have to be fixed later.  Turns out much later.

So, we bought Nutanix because to eliminate disk IO latency on concurrent builds, localize VM files to the host running the VM and to upgrade and isolated the network to 10gb.  The hardware was about the same as the Dell blades and R720 servers we already had.  We gained a lot of storage for our bloated VMs.

My short term goal was for our team to take over the server and network architecture and free the engops folks to do what they are best at. EngOpsy stuff.  Before we could finish evaluating the builds time, we were asked to deploy builds servers on the platform .  I guess they were so delighted with the Nutanix performance, the heck with finishing experimentation.  

Did we solve all the problems?  No.  With some help, they finally got on a current OS release.  There was a lot of work done to the scripts. I recently optimized their copy process to move builds to a central location.

Was Nutanix the right choice.  Well, it was the one we made  It kicks ass.  Has great performance metrics, it stupidly simple to upgrade and their support is stellar.  I have rolled out several new boxes for DevTest, so, yeah, it was worth it.

I also get asked about VDI.  I used to say we don't do DVI but after thinking about it, most the DevTest VMs are Windows running our own apps.  We use RDP.  We also have a lot of linux which we use SSH.

Are there any problems with Nutanuix? I get that a lot too.  My answer is that they are software. All software has issues.  There are also interface and operations elements I curse and I would change (they are very welcoming of my criticism).  I don't like my Deduplication ratio so I use inline compression which I like but doesn't help as much when you are using [protection domains remote sites.  I didn't plan for the CVM needs so I over estimated the VMs I can run on a single node.








No comments: