Tuesday, October 18, 2016

Free VMware ESXi Active Directory Problems

I have several free license VMware ESXi servers where I use Active Directory (AD) authentication to log in via the vCenter client.  I frequently find AD credential don't work (usually invalid password or authentication failed).

When I look at the Host Configuration / Authentication Services Setting tab, I see:
Directory Service Type                    Active Directory
Domain                                            Mydomain.Com
Trusted Domain Controllers                        --

The "--"  for domain controllers means it isn't talking to my DC.  I've spend hours digging though DC and ESXi logs trying to figure out why with no clear reason found.  We have had to take drastic actions such as leaving the domain which causes all the defined permissions to be lost and have to be recreated.   Rebooting usually works but that is very disruptive. 

My star helper finally found a way to reconnect to AD that is fast, easy and non-disruptive.  Basically he kept searching through all the files on ESXi looking for anything to do with AD, domain and a dozen other keywords.  He found this gem!


This is the script that joins the domain or rejoins when disconnected for some reason.

The usage is: 

/usr/lib/vmware/likewise/bin/domainjoin-cli join

/usr/lib/vmware/likewise/bin/domainjoin-cli join mydomain.com jomebrew password

I have automated this using pling and a simple perl based web page where we can enter a number if IP addresses and iterate through the list issuing the command via plink.  It usually works but when it doesn't, we ssh to the host and issue the command manually.

Note:  We have experienced AD issues on Vmware 5.5 and 6.0.  These are not issues with vCenter or licenses hosts.  I imagine stand alone licensed hosts would experience the same issue.

Sunday, October 16, 2016

Installing RaspberryPints on Raspian Jesse

I'v setup a couple Raspberry Pints (RP) on new Raspberry Pi 3 Model B.  While most of the configuration at http://raspberrypints.com/build-flow-meters-2/ is still relevant, there are a couple differences.

I have images below to show the code lines in files that are edited.  This blog tool does not allow me ti easily list code snippets which would make it a lot easier to copy and paste.

Step 5: Package Configuration Wo/Flow Meters

The path to autostart has changed.  From the pi home directory, edit the following file


Use the following entry to load Chromium in kiosk mode and instruct Chromium not to display the unsafe shutdown message on restart.

@chromium-browser --incognito --kiosk localhost

Apache2 Default Document Root Directory

Apache2 default document root is /var/www/html.  you can choose to install Raspberry Pints in that directory or change Apache2 configuration.  I chose to change Apache 2 configuration.

Edit /etc/apache2/sites-enabled/000-default.conf and change 

DocumentRoot /var/www/html to DocumentRoot /var/www

My Changes to Raspberry Pints

I make a few changes to Raspberry Pints to suit me needs.  

Add Automatic Refresh 

RP does not refresh the browser on my systems.  Once taps are updated the page needs a manual refresh.  I add a refresh every 60 seconds to automatically pickup tap changes.  There is a refresh.sh tool that I expect should refresh but doesn't work on my system.  Maybe because xdotool isn't installed.

Edit /var/www/index.php

Add the meta refresh tag as show in the as shown below

Removing CR and LF from beer info.  If you use CR and LF, it breaks the program when you tap a new keg and probably elsewhere.  This change strips CR and LF on all database updates.

Edit /var/www/admin/includes/functions.php and add the following to the bottom.

Automatically Mark Kegs as Clean When A Keg is Kicked

I don't manage kegs and don't care about the keg feature.  I just want to show my taps.  The process to kick a keg and tap a new beer is a bit tedious especially at a festival.  So, I automatically mark a keg a clean when I kick it.

Edit /var/www/admin/includes/managers/tap_manager.ph

I prefer to backup the original instruction.  Find and add a # to comment out the following line and add the line afterwards.  It is easiest to copy and paste the first line then add the #.  Change NEEDS_CLEANING to CLEAN

#$sql="UPDATE kegs k, taps t SET k.kegStatusCode = 'NEEDS_CLEANING' WHERE t.kegId = k.id AND t.Id = $id";
$sql="UPDATE kegs k, taps t SET k.kegStatusCode = 'CLEAN' WHERE t.kegId = k.id AND t.Id = $id";

Remove Column Headers

Most people already know what the columns mean.  I can get some more screen real estate by removing the column headers.  the following simply adds comment tags to not display the header.

Edit /var/www/index.php 
Around line 111, locate the thead tag and add the !-- as show below.  Then locate the closing /thead tag and add the --> as show below.

Rearrange Columns to List Beer Name after the Tap Number

This is a bit more complex.  I recommend backing up index.php before making changes.  

cd /var/www
cp index.php index.php.orig

Locate the following text for the Name Column and copy to the clipboard **.  It is around line 174.

Now scroll up and locate. It is around line 160.

Paste the Name column data, from copied earlier, above the ConfigName line.  Save. Reload the taps page and verify the changes are ok.

** There are various ways to cut and paste or copy and delete depending in the editor you use.  

If you screw up the tables, just copy index.php.orig to index.php.

I also make changes to style.css and update fonts, sizes, colors, table width (so name column is wider).


Sunday, May 15, 2016

Double IPA and a Barleywine in a Single Brew

I am almost out of beer. I know, right?  Down to the Pineapple IPA hopped with Citra and Mosaic. The club needs a barleywine for NCHF, the Northern California Homebrew Festival.  The missus DIPA. Can I make a barleywine and a Double IPA in a single batch?  Technically, a Barleywine and is a Double/Tripel IPA and a game of semantics.

So I modified my Phoebe Pry IPA recipe and a club Barleywine recipe to do a no sparge Barleywine and mash on the same grain bed with the IPA recipe.  I have done this before with an Imperial Stout and a normal Stout.  They came out great.   I wasn't sure if dark beer fare better with this method or not.

Recipe and Brew Day Notes
Recipes and Brew Day Notes
I separated the recipe ingredients in two separate buckets.  Milled separately and added a note to each to ensure I was using he right ingredients at the right time.  I started with the no sparge Barleywine.  using Beersmith software, I chose a Brew In a Bag equipment profile which is an easy way to calculate a no sparge batch strike water volume. I definitely nailed the strike temp at 154F adding 7.2 gallons of 170F water to the 12 pounds of grain for the 3 gallon batch.

Now I had an hour for the mash before I would vorlauf 2 pints at a time, 10 times until the wort would flow clear.   So, it was time for some brew day breakfast; Triple decker fried ham and egg sandwich on a plain bagel cut into three layers.  Some beer mustard and spicy pepper jack cheese went well with my Chocolate Coconut Porter.

I transfered around 5 gallons of 13.2 BRIX wort which was 1.054 Specific Gravity.  A long way from my target 1.101 SG.  After 2 hours of boiling and several hop additions (Magnum, Cascade, Columbus) the boil was done an the gravity was 24 BRIX or 1.102 SG.  I transferred about 2.5 gallons into the 3 gallons fermentor at 68f.  I pitched a packet of Safale 04 dry yeast.

After transferring the Barleywine to the boil kettle, I added the additional 13lbs of grain to the grain bed and added 4 gallons of strike water at 165F hitting my smash temp of 146F.  A little over an hour later I sparged with another 2.5 gallons of 168F water and ended up with 5 gallons of clear wort in the kettle.  I was a little worried with the 14 BRIX pre boil gravity (1.058 SG) but after an hour oro so of boiling and adding magnum, Cascade and Simcoe hops I ended up with 3 gallons of 1.083 SG wort in the fermenter.

I pitched 2 vials of White Labs 001 yeast and rigged a blowoff for the high krausen yeast to drop into the Barleywine. The implementation was poor and was almost a disaster but worked well enough to get the happy yeast from the IPA into the Barleywine though there was a lot of leakage and cleaning still to do.

With a lot to clean and gnarly back pain, I took a break and had an intermission beer.

The beers stayed in the fermenter for two weeks.  I took both out and cleaned the mess in the fermenter from the blowoff experiment. It took gravity sample and dry hopped both.

The barleywine finished at 1.021 SG for a final ABV of 10.8.  I dry hopped with an ounce of Cascade, an ounce of Amarillo and two ounces of Liberty hops.

The IPA finished at 1.011 SG for a final ABV of 9.5%.  I dry hopped with an once of Simcoe, two ounces of Cascade and an ounce of Amarillo.

I will let both set another 5 days before kegging.  If I can score some spirals, I will oak age the barley wine another week or two before kegging.

Next up I plan to split a 6 gallon batch of Imperial Porter and finish with 3 gallons of Bourbon Vanilla and 3 gallons of Chocolate and something else.

Tuesday, May 10, 2016

Taxonomy Of A VMWare Outage After A Power Failure

A few VMs hosted on the two blade servers indicated they were inaccessible. Some were simply orphaned which is sometimes a result of a power loss event.  Others listed the GUID of the path (rather than the label for the storage).  Orphaned VMs must be removed from inventory and then added to inventory from the datastore browser.  VMs that show the GUID path can often be recovered simply by refreshing the storage after all services are restored.  Neither of these were successful.

The hosts have been configured with local storage and NetApp storage for some time.  VMs have been running from both datastore sources without issue.

After the power failure, we experienced problems with VMs on NetApp storage.

-          VMs running from local storage experienced no issues and could be started normally.
-          VMs running on the NetApp could not be started.  Orphaned VMs could not be added to inventory.  We were not able to modify or copy the files on the NetApp.   We could not unmount the NetApp.  Neither SSH or Desktop clients actions were successful.  Later in the morning, we identified another blade experiencing the same condition.  This indicated it was not isolated to just these two blades.

We verified the NetApp was in a normal state by accessing the files  as a CIFS share. 

We identified changes since the last power failure (a week ago).  1. We added a vSwitch and network interface to the Storage network.  2. We attached a Nimble volume to the host.  We decided to back out these two changes.

We set the Nimble Volume to Offline in Nimble.  Surprisingly, we were then able to unmount the NetApp.  This indicated a strange link to the datastores. VMWare indicated the datastore was in use. Several iterations of removing and adding did not solve the VM issues on NetApp.

We unmounted NetApp then removed the new Storage vSwitch followed by mounting the NetApp again.  VM operation on the NetApp were normal again.

We unmounted NetApp, added the Storage vSwitch then mounted NetApp again.  VM operations failed on the NetApp once again.

A short time later we realized the NetApp export access permissions permitted the (public) IP address of the original vSwitch of the blades.  Adding the vSwitch on the Storage network established a connection from the new IP on the local storage subnet.

The solution was to change the NetApp export permissions to allow root access from the new Storage vSwitch IP address.  Resolution time was 7 hours.

Last week we added another blade to the system.  In the course of adding the additional server, we added a Nimble storage volumes to the existing two servers.  We add a new Storage vSwitch connected to the IO Aggregator which has a 10Gb connection directly attached to the storage network.  After the power failure, the NetApp was remounted over the new Storage vSwitch as the destination was on the locally attached network.  This put the datastore in a read only state.  Technically, the NFS export is mounted as non-root which does not have write privileges.  Additional blades have this same configuration and experienced the same issue though the users hadn’t realized it yet.

Sunday, January 31, 2016

High CPU in Windows VM

Summary:  When VMware Resource Pool constrains CPU resources on a VM within the pool, the guest OS can indicate high CPU usage when VMware showed it as not particularly busy.  CPU Ready % from esxtop shows the VM has a high %.  Resolution is to increase the CPU limit in the resource pool settings, power off some VMs or check for process in the VMs using excessive CPU.

I had been struggling with  a Windows 2012 R2 VM running on VMware.  I have a few hundred of these but this one happened to be a domain controller and was painful to remote desktop to. Further, it is running on hyper-converged Nutanix host in India. I know the host and storage are fine.  I even had the fine folks at Nutanix help look at the issue just in case there was some odd issue with storage for this one VM.

To be honest, I have a couple more VMs experiencing issues.  One of the issues we did find was CPU was often in a high Ready percent.  One of traps we fall get caught in is there is ample CPU % free on the host.  Around 40% free. But that does not mean VMs are not waiting for CPU.  Clearly mine were with 10 or 15% Ready.

I manually control memory and CPU for each of my resource pool.  basically, each resource pool get a percentage of usable CPU and memory.  I deduct 20% of each CPU and Memory from the total of all hosts in the cluster then calculate what that means in KB and Mhz and allocate these per resource pool.  This way, I can ensure the machine never exceeds 80% resource utilization.  I had to create a spreadsheet to calculate these values for me.  It is tedious and when I add a host to the cluster or worse, a new resource pool, there is a lot of updating to the spreadsheet then updating resource pool settings.

So, I checked my resource pool usage for this problem VM and while it is close to the max, it is around 5% free CPU resources.  So, I have this VM that is basically idle but using 90% CPU in the OS.  VMware performance stats show it rather idle.  Remote Desktop control is painfully slow just to open control panel.  It is acting poorly as a DC.  Plenty of free resources on the host.

I see the VM is an exceedingly and pretty constant high CPU Ready %.  What could cause this I thought.  Then, I realize the only thing that would create this condition is if the resource pool limits was constraining the VMs in the pool despite looking OK.

So, I add 20,000 MHz to the resource pool and the CPU usage for the pool in the summary display makes this weird jump and I can see now that the resource pool was actually under duress and the VMs were being heavily constrained.  Within a few minutes the VM returns to normal.  The CPU % in Windows drops to a normal few percent and RDP is working normally and the CPU Ready % for the VM is under 1% again.

Monday, June 15, 2015

NetApp OnCommand System Manager would not launch

It was really pissing me off.  All the sudden, NetApp OnCommand System Manager would not launch.  I tried the desktop icon, the .exe, the start menu but nothing happened.   I clicked away and nothing happened.

I opened a command line and launched java manually and it worked and was 1.8 dot something which is should be OK.

I recently upgraded OnCommand to 3.1.2 RC1 which fixed my last set of connection (500 errors) issues.  Of course I had to make some command line changes to the NetApp box which is a pain in the ass since I never remember the console password.  However, it was working OK.

Maybe Java upgraded, I can't remember but OnCommand stopped working.  So, I installed 3.1.2 RC2 which the links in this forum post said I needed to do http://community.netapp.com/... But, it still would not launch.

So, I poked around on my system look for rogue Java executables. Sure enough I found some in "c:\windows\system32".  Since this is in the path earlier that the real java path, every time I launch System Manager, this java was invoked.  System Manager could be better written to show an error but it isn't, so we just see nothing.

I deleted the java files in the c:\windows\system32 folder and System Manager launched perfectly.

Error: Registry key 'Software\JavaSoft\Java Runtime Environment'\CurrentVersion'
has value '1.8', but '1.7' is required.
Error: could not find java.dll

Error: Could not find Java SE Runtime Environment.

C:\Windows\System32>dir java*.*
 Volume in drive C is OS

 Directory of C:\Windows\System32

04/02/2015  09:41 AM           189,352 java.exe
05/22/2015  11:29 AM            77,824 JavaScriptCollectionAgent.dll
04/02/2015  09:41 AM           189,352 javaw.exe
04/02/2015  09:41 AM           320,424 javaws.exe
               4 File(s)        776,952 bytes
               0 Dir(s)  123,620,012,032 bytes free

C:\Windows\System32>del java*.*
Access is denied.


Thursday, May 7, 2015

I got Nutanix on a sunny day

I got Nutanix on a shiny day, when it's cold outside I got another NetApp bay.

The question I get most often when I talk about Nutanix is why I bought them. Technically, I didn't buy them. My peer and close friend did. Then, he found another job and quit.  So I own it now. I don't know how he found them but he and I talked, more dreamed about, technology letting our imaginations ring a bit wild.  I think the Double IPA tastings didn't hurt.  I was heavily into bending vCenter to my will.  VMware tries hard to deliver almost the feature you want. Almost.

So, while on a bender, I  lamented (beer induced revisionist history) how we have this crappy hodgepodge of mashed up compute and storage hardware.  Wouldn't it be great to be able to cluster servers that have 16 hard disks as one, reliable, striped storage array shared as one NFS mount to all the VMware servers?   We would have all out current compute plus disk distributed across multiple physical hosts.  We could optimize disk usage and survive a whole host failure.  if only...

Some months later he suggested Nutanix.  I looked over the platform info and considered our use case and nodded in approval as one does to add dramatic affect to his approval.  Not that I approved.  I was skeptical. But, in case the skies over Nutanix clouded up, I had a extra NetApp bay.

I had spend a few weeks helping the Engineering Operations folks with virtualizing our builds servers.  having recently emerged from the primordial ooze, virtualization was witchcraft and I  repeatedly fended of burning torches aimed at the kindling I stood upon.  Eventually though, they came around and much of it was virtualized.  Poorly. Using local storage and suffering for it.   It really wan't all their fault except for not taking my help.  They were way out of their element,.  I mean who runs a VM on an ESXi instance in Xenserver and says virtualization doesn't work.

Our design goal was to decrease the build time.  Whatever it was, decrease it.   After quite a bit of instrumenting, I already knew the way they implemented their systems and network was a mess and a big part of the problem.  Two other very significant problems were the server OS and build scripts.  Those would have to be fixed later.  Turns out much later.

So, we bought Nutanix because to eliminate disk IO latency on concurrent builds, localize VM files to the host running the VM and to upgrade and isolated the network to 10gb.  The hardware was about the same as the Dell blades and R720 servers we already had.  We gained a lot of storage for our bloated VMs.

My short term goal was for our team to take over the server and network architecture and free the engops folks to do what they are best at. EngOpsy stuff.  Before we could finish evaluating the builds time, we were asked to deploy builds servers on the platform .  I guess they were so delighted with the Nutanix performance, the heck with finishing experimentation.  

Did we solve all the problems?  No.  With some help, they finally got on a current OS release.  There was a lot of work done to the scripts. I recently optimized their copy process to move builds to a central location.

Was Nutanix the right choice.  Well, it was the one we made  It kicks ass.  Has great performance metrics, it stupidly simple to upgrade and their support is stellar.  I have rolled out several new boxes for DevTest, so, yeah, it was worth it.

I also get asked about VDI.  I used to say we don't do DVI but after thinking about it, most the DevTest VMs are Windows running our own apps.  We use RDP.  We also have a lot of linux which we use SSH.

Are there any problems with Nutanuix? I get that a lot too.  My answer is that they are software. All software has issues.  There are also interface and operations elements I curse and I would change (they are very welcoming of my criticism).  I don't like my Deduplication ratio so I use inline compression which I like but doesn't help as much when you are using [protection domains remote sites.  I didn't plan for the CVM needs so I over estimated the VMs I can run on a single node.

Wednesday, May 6, 2015

Two VMware Clusters, One Nutanix Cup

OK, I played with the title for dramatic effect... I have two VMware Clusters sitting on one Nutanix Cluster.

My SE, I'll call him Steve. Totally not his real name (yes it is), was aghast!  Hey!  If you don't say I can't do it, then I can do it. I do a lot of things that freak out my my SE. 

The idea for this configuration was that I can logically split my VMs and isolate a cluster of hosts for a specific team.  I know you can do a lot with DRS groups but that is hard to see in the Hosts and Clusters view which I live in.

My cluster of 8 Nutanix hosts share one storage pool but I created a separate container for each VMware cluster.  The containers are only mounted on the appropriate hosts in the separate VMware clusters. I get the performance of 48 disks and I have simplified management of both storage and VMs.

So, Steve freaked out a bit when I he saw this.  Mainly because he feared we would reboot 2 VMware hosts at the same time which would cause a problem with data redundancy. I have the same problem with one cluster too but we are less likely to when we see they are in the same VM cluster.

For this exercise, I have three Nutanix  hosts, with just under 300 VMs, running VMware 5.5 Enterprise.  My hosts run at around 80% CPU and memory.  The resource pools are configured to limit the CPU and Memory (why oh why can I limit storage!  Why!!) so my hosts can barely have HA (not really but I tell management that to make them feel good. HA costs money folks.).

To move the hosts to the target VMware cluster, in the save vCenter (can't say about separate vCenters), and ensure the VMs end up in the correct Resource Pool, I followed the steps below.  each host took about 5 minutes to complete.

Step 1;  Create new resource pools in on the target VMware cluster

I use vCenter in a mostly self service model.  VMware doesn't make this easy but we have a working model.   
- I created duplicate resource pools in the target VM cluster.
- I added the AD users that can access the RP and set their roles.  
- I configured the RP limits.  
- I use a small reservation and have a complex Excel doc to determine the limits for memory and CPU for each resource pool (each pools gets a specific percentage of a total of 80% of the total CPU/Memory calculated in MHz and KBytes).  

Step 2: Add a note to existing VMs

On the existing resource pools (I'll call them Auto and D_Auto) edit the notes to add a unique name to identify the RP the VM resides in. 
- Click on the RP, then the VM tab
- Then selected all the VMs
- Right Click
- Edit Notes and entered X_Auto for "Auto" and D_Auto for "D_Auto".  This allows me to quickly find all the VMs and move them to the right target later.  Running VMs are easily identifiable later but powered off VMs are not.

Step 3: Disconnect a host from the source VM Cluster

Right Click on the host and Disconnect.  Wait for the host to disconnect.

Step 4: Remove the host from the VM Cluster

Right Click the host again and select Remove.  Wait for the host to be removed.

Step 5: Add host to the target VM cluster

Right click on the target VM cluster and Add Host.  You will authenticate, pick a license, etc.  When you get to the Resource Pool section, select the option to create new pool.  The default name is Grafted From [IPADDRESS].  Just leave it default and finish up.

Step 6: Move VMs to the correct Resource Pool

- Navigate to the new Grafted From resource Pool.  
- Drill down a level.  
- Click a resource pool, 
- Click the VMs tab
- Select all the VMs and drag / drop then to the target.  
- Repeat for all resource pools.  

Filter the remaining VMs by the name you entered into the notes to uniquely identify the target resource pool.  
- Make sure the Notes column is displayed in the VM list panel (right click and select the column from the list).
- Click the little down arrow next to the search/filter box and select the Notes field.
- Enter the name of the first resource pool you want to filter for as entered in Step 2
- Verify the list then drag and drop to the target RP. 
- Repeat until done.

Saturday, January 17, 2015

Embracing Nature

It might seem like I act too kid like.  I do immature things.  I take stupid chances.  It is true.  I am all that though I can't be "too kid like" enough.  Too many adults have forgotten how to embrace the joys and wonders of childhood.  They think of all the reasons not to explore nature.  Whether that is sliding around polished wood floors in socks or wandering through the woods searching for and exploring history or like me, I see something and wonder what it is and find a way to see it.

Many people travel the world and explore what man has built.  I would love to do that more too.  But I love to explore what nature built.  I love to see the awe and grandeur of nature.  When we explore places man made, we are usually wondering how they did it.  I am so intrigued by the engineering and construction.  I am awed by their accomplishments considering the tools at their disposal.  I think of the term "nature finds a way" and find meaning in how man can make wondrous buildings or pyramids or churches or murals on ceilings.  

Nature finds a way for the wondrous creations to be freed from the imagination.  Nature finds a way for that song or that story to be freed from your imagination.  It takes patience for nature to help us find a way but it finds a way.  At some point, we stop exploring like a child.  We stop wondering for ourselves and start accepting what other tell us.  We often say we need to think outside the box but we really need to be thinking outside.

I can't see myself ever changing.  The older I get the more I want to be out of the box.  The more I want to be out.  I need to see the danger for myself or make a little danger.  Exploring new and old trails.  What was left behind in those hills?  Every rock wall and every brick foundation is interesting to me.  So what if I can't get there on a designated trail?  I'll get scratches, cuts, occasional ticks, lots of bumps a few bruises.  that is OK.

Nature finds a way for all life to create and grow.  Nature also cleans up after us and though it is slow, covers our blight and our treasures.  For me, it is not that my destination ended up being nothing, blight or a treasure.  For me, it is about the journey and child like wonder to explore and not caring about the destination.  Life is about the journey.  That is my meaning of life. What did I learn today?  What did I do that I have been conditioned not to? What did I teach?  I  think it is about what meaning did I give to life !