Friday, May 19, 2017

Automated Applying of Microsoft WannaCry Security Patch With PSEXEC

I needed to ensure we had this patch on hundreds of servers that are pretty much unmanaged.  I've used various tools in the past but had to figure out how to do it to my windows 8.1 and 2012R2 hosts. This applies to any manual patch really.

First, I used psexec from the sysinternal tools at Microsoft

I needed a way to get the update onto the machines without needing credentials to access a network share.  I figured out how to use powershell to copy the file from a web server I've already setup.

I copied the update msu file to the webserver with a simpler name and renamed it as a .zip file to eliminate issues with file transfer through the web server.  IIS will block the file unless you have a content type configured for the extension msu.

I created a simple text file with the IP addresses of all the hosts I wanted to patch.  one per line.  e.g.

Using the powershell command Invoke-WebRequest is like using wget.  Just define the output file and specify the web url of the file. 

e.g. powershell Invoke-WebRequest -OutFile c:\temp\update.msu http://mywebserver/

With psexec, you should specify the full path to the file.  Finding powershell path is simple.  Just type where powershell from a command line.

The psexec command example for a list of hosts
psexec @buildshosts.txt -s -u username -p password C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe Invoke-WebRequest -OutFile c:\temp\update.msu http://mywebserver/

Alternatively you can just issue it directly to a host
psexec \\ -s -u username -p password C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe Invoke-WebRequest -OutFile c:\temp\update.msu http://mywebserver/

Now that the patch was on the host, I used psexec to apply it using the standalone wusa stand alone tool. Since I know it is in the default path c:\windows\system32, I didn't bother to specify the path.

I included the /quiet and /forcerestart options to silently install and then reboot.
psexec @hosts.txt -s -u username -p password wusa c:\temp\update.msu /quiet /forcerestart

The patch update tool exits with code 1641 if the application and reboot was successful.

The process is done serially so, it takes a while to iterate through a large number of hosts.

Using powershell to do a web download is really slow, so it takes several minutes to download the 200K rollup.

There is a chance powershell is old and doesn't support the  Invoke-WebRequest option.

Saturday, November 19, 2016

Freaking Out About To Run Out Of Disk Space On My Nimble AFA

So I am moving workloads over to a Nimble All Flash Array and I notice I am out of free space.  Now I start freaking out afraid my critical VMs are about to start crashing.  I checked the Nimble GUI and I am only using 20% of disk space after compression and deduplication.   I know I am not running out but VMware doesn't.  There is the Nimble free space vs VMware free space.

As I start move workloads off Nimble, I already know the problem, just now what to do about it yet.  When the Nimble volume was created, we chose to allocate the entire capacity to a single volume that will be mounted on a cluster of VMWare hosts.  The total free space is 7TB.

As I run through the issue and ping my account SE and a friend who knows this stuff better than I, I consider how Nimble is supposed to represent actual free space.  Or better yet, how is it going to dynamically show the volume size?  Starting out, there is no compression and deduplication savings so, the volume size is the max size of free space.  While I want it to change the volume size dynamically based on actual dedupe and compression ratios, Nimble doesn't.

I proceed to the Nimble GUI and navigate to my single large volume.  As I iterate through volume configuration, I decide to change the volume size or at least see if I can. Right there above the volume size is a blurb of text  advising you can create a volume size greater the the free space because of deduplication and compression.  It would be nice it the GUI gave me some guidance on how big based on the current ratios, but it doesn't.  With nearly 5X space reclamation, I could probably choose 35TB.  So I choose 15TB for now.

Back in VMware vCenter, I rescan the volume on each host and try to resize the volume on each mounted host to no avail.  I make an educated guess and connect directly to a host.  I picked the first host I mounted and formatted the volume and attempt a resize from there.  Sure enough, it allows me to resize.  Back in vCenter, I go to each host again and rescan which now shows the 15TB of total space.  I cancel my storage vMotions that were abandoning the storage and go back to moving the final set of workloads back onto the Nimble.

Crisis averted and I never needed help for the SE or my expert friend.

The Nimble AFA has been performing incredibly well with sub millisecond latency.  My jobs are performing quite predictably which is critical for the workload.  Further, Nimble AFA is saving me around 30-45 minutes over the fastest time from my next gen hybrid array.  The Nimble AFA is a better match to the workload than the next gen hybrid array which experiences unpredictable latency causing my jobs to vary between zero and 6 hours or additional time.  Of course time will tell but so far, Nimble is stellar.

Tuesday, October 18, 2016

Free VMware ESXi Active Directory Problems

I have several free license VMware ESXi servers where I use Active Directory (AD) authentication to log in via the vCenter client.  I frequently find AD credential don't work (usually invalid password or authentication failed).

When I look at the Host Configuration / Authentication Services Setting tab, I see:
Directory Service Type                    Active Directory
Domain                                            Mydomain.Com
Trusted Domain Controllers                        --

The "--"  for domain controllers means it isn't talking to my DC.  I've spend hours digging though DC and ESXi logs trying to figure out why with no clear reason found.  We have had to take drastic actions such as leaving the domain which causes all the defined permissions to be lost and have to be recreated.   Rebooting usually works but that is very disruptive. 

My star helper finally found a way to reconnect to AD that is fast, easy and non-disruptive.  Basically he kept searching through all the files on ESXi looking for anything to do with AD, domain and a dozen other keywords.  He found this gem!


This is the script that joins the domain or rejoins when disconnected for some reason.

The usage is: 

/usr/lib/vmware/likewise/bin/domainjoin-cli join

/usr/lib/vmware/likewise/bin/domainjoin-cli join jomebrew password

I have automated this using plink and a simple perl based web page where we can enter a number of IP addresses and iterate through the list issuing the command via plink.  It usually works but when it doesn't, we ssh to the host and issue the command manually.

Note:  We have experienced AD issues on Vmware 5.5 and 6.0.  These are not issues with vCenter or licenses hosts.  I imagine stand alone licensed hosts would experience the same issue.

Sunday, October 16, 2016

Installing RaspberryPints on Raspian Jesse

I'v setup a couple Raspberry Pints (RP) on new Raspberry Pi 3 Model B.  While most of the configuration at is still relevant, there are a couple differences.

I have images below to show the code lines in files that are edited.  This blog tool does not allow me ti easily list code snippets which would make it a lot easier to copy and paste.

Step 5: Package Configuration Wo/Flow Meters

The path to autostart has changed.  From the pi home directory, edit the following file


Use the following entry to load Chromium in kiosk mode and instruct Chromium not to display the unsafe shutdown message on restart.

@chromium-browser --incognito --kiosk localhost

Apache2 Default Document Root Directory

Apache2 default document root is /var/www/html.  you can choose to install Raspberry Pints in that directory or change Apache2 configuration.  I chose to change Apache 2 configuration.

Edit /etc/apache2/sites-enabled/000-default.conf and change 

DocumentRoot /var/www/html to DocumentRoot /var/www

My Changes to Raspberry Pints

I make a few changes to Raspberry Pints to suit me needs.  

Add Automatic Refresh 

RP does not refresh the browser on my systems.  Once taps are updated the page needs a manual refresh.  I add a refresh every 60 seconds to automatically pickup tap changes.  There is a tool that I expect should refresh but doesn't work on my system.  Maybe because xdotool isn't installed.

Edit /var/www/index.php

Add the meta refresh tag as show in the as shown below

Removing CR and LF from beer info.  If you use CR and LF, it breaks the program when you tap a new keg and probably elsewhere.  This change strips CR and LF on all database updates.

Edit /var/www/admin/includes/functions.php and add the following to the bottom.

Automatically Mark Kegs as Clean When A Keg is Kicked

I don't manage kegs and don't care about the keg feature.  I just want to show my taps.  The process to kick a keg and tap a new beer is a bit tedious especially at a festival.  So, I automatically mark a keg a clean when I kick it.

Edit /var/www/admin/includes/managers/

I prefer to backup the original instruction.  Find and add a # to comment out the following line and add the line afterwards.  It is easiest to copy and paste the first line then add the #.  Change NEEDS_CLEANING to CLEAN

#$sql="UPDATE kegs k, taps t SET k.kegStatusCode = 'NEEDS_CLEANING' WHERE t.kegId = AND t.Id = $id";
$sql="UPDATE kegs k, taps t SET k.kegStatusCode = 'CLEAN' WHERE t.kegId = AND t.Id = $id";

Remove Column Headers

Most people already know what the columns mean.  I can get some more screen real estate by removing the column headers.  the following simply adds comment tags to not display the header.

Edit /var/www/index.php 
Around line 111, locate the thead tag and add the !-- as show below.  Then locate the closing /thead tag and add the --> as show below.

Rearrange Columns to List Beer Name after the Tap Number

This is a bit more complex.  I recommend backing up index.php before making changes.  

cd /var/www
cp index.php index.php.orig

Locate the following text for the Name Column and copy to the clipboard **.  It is around line 174.

Now scroll up and locate. It is around line 160.

Paste the Name column data, from copied earlier, above the ConfigName line.  Save. Reload the taps page and verify the changes are ok.

** There are various ways to cut and paste or copy and delete depending in the editor you use.  

If you screw up the tables, just copy index.php.orig to index.php.

I also make changes to style.css and update fonts, sizes, colors, table width (so name column is wider).


Sunday, May 15, 2016

Double IPA and a Barleywine in a Single Brew

I am almost out of beer. I know, right?  Down to the Pineapple IPA hopped with Citra and Mosaic. The club needs a barleywine for NCHF, the Northern California Homebrew Festival.  The missus DIPA. Can I make a barleywine and a Double IPA in a single batch?  Technically, a Barleywine and is a Double/Tripel IPA and a game of semantics.

So I modified my Phoebe Pry IPA recipe and a club Barleywine recipe to do a no sparge Barleywine and mash on the same grain bed with the IPA recipe.  I have done this before with an Imperial Stout and a normal Stout.  They came out great.   I wasn't sure if dark beer fare better with this method or not.

Recipe and Brew Day Notes
Recipes and Brew Day Notes
I separated the recipe ingredients in two separate buckets.  Milled separately and added a note to each to ensure I was using he right ingredients at the right time.  I started with the no sparge Barleywine.  using Beersmith software, I chose a Brew In a Bag equipment profile which is an easy way to calculate a no sparge batch strike water volume. I definitely nailed the strike temp at 154F adding 7.2 gallons of 170F water to the 12 pounds of grain for the 3 gallon batch.

Now I had an hour for the mash before I would vorlauf 2 pints at a time, 10 times until the wort would flow clear.   So, it was time for some brew day breakfast; Triple decker fried ham and egg sandwich on a plain bagel cut into three layers.  Some beer mustard and spicy pepper jack cheese went well with my Chocolate Coconut Porter.

I transfered around 5 gallons of 13.2 BRIX wort which was 1.054 Specific Gravity.  A long way from my target 1.101 SG.  After 2 hours of boiling and several hop additions (Magnum, Cascade, Columbus) the boil was done an the gravity was 24 BRIX or 1.102 SG.  I transferred about 2.5 gallons into the 3 gallons fermentor at 68f.  I pitched a packet of Safale 04 dry yeast.

After transferring the Barleywine to the boil kettle, I added the additional 13lbs of grain to the grain bed and added 4 gallons of strike water at 165F hitting my smash temp of 146F.  A little over an hour later I sparged with another 2.5 gallons of 168F water and ended up with 5 gallons of clear wort in the kettle.  I was a little worried with the 14 BRIX pre boil gravity (1.058 SG) but after an hour oro so of boiling and adding magnum, Cascade and Simcoe hops I ended up with 3 gallons of 1.083 SG wort in the fermenter.

I pitched 2 vials of White Labs 001 yeast and rigged a blowoff for the high krausen yeast to drop into the Barleywine. The implementation was poor and was almost a disaster but worked well enough to get the happy yeast from the IPA into the Barleywine though there was a lot of leakage and cleaning still to do.

With a lot to clean and gnarly back pain, I took a break and had an intermission beer.

The beers stayed in the fermenter for two weeks.  I took both out and cleaned the mess in the fermenter from the blowoff experiment. It took gravity sample and dry hopped both.

The barleywine finished at 1.021 SG for a final ABV of 10.8.  I dry hopped with an ounce of Cascade, an ounce of Amarillo and two ounces of Liberty hops.

The IPA finished at 1.011 SG for a final ABV of 9.5%.  I dry hopped with an once of Simcoe, two ounces of Cascade and an ounce of Amarillo.

I will let both set another 5 days before kegging.  If I can score some spirals, I will oak age the barley wine another week or two before kegging.

Next up I plan to split a 6 gallon batch of Imperial Porter and finish with 3 gallons of Bourbon Vanilla and 3 gallons of Chocolate and something else.

Tuesday, May 10, 2016

Taxonomy Of A VMWare Outage After A Power Failure

A few VMs hosted on the two blade servers indicated they were inaccessible. Some were simply orphaned which is sometimes a result of a power loss event.  Others listed the GUID of the path (rather than the label for the storage).  Orphaned VMs must be removed from inventory and then added to inventory from the datastore browser.  VMs that show the GUID path can often be recovered simply by refreshing the storage after all services are restored.  Neither of these were successful.

The hosts have been configured with local storage and NetApp storage for some time.  VMs have been running from both datastore sources without issue.

After the power failure, we experienced problems with VMs on NetApp storage.

-          VMs running from local storage experienced no issues and could be started normally.
-          VMs running on the NetApp could not be started.  Orphaned VMs could not be added to inventory.  We were not able to modify or copy the files on the NetApp.   We could not unmount the NetApp.  Neither SSH or Desktop clients actions were successful.  Later in the morning, we identified another blade experiencing the same condition.  This indicated it was not isolated to just these two blades.

We verified the NetApp was in a normal state by accessing the files  as a CIFS share. 

We identified changes since the last power failure (a week ago).  1. We added a vSwitch and network interface to the Storage network.  2. We attached a Nimble volume to the host.  We decided to back out these two changes.

We set the Nimble Volume to Offline in Nimble.  Surprisingly, we were then able to unmount the NetApp.  This indicated a strange link to the datastores. VMWare indicated the datastore was in use. Several iterations of removing and adding did not solve the VM issues on NetApp.

We unmounted NetApp then removed the new Storage vSwitch followed by mounting the NetApp again.  VM operation on the NetApp were normal again.

We unmounted NetApp, added the Storage vSwitch then mounted NetApp again.  VM operations failed on the NetApp once again.

A short time later we realized the NetApp export access permissions permitted the (public) IP address of the original vSwitch of the blades.  Adding the vSwitch on the Storage network established a connection from the new IP on the local storage subnet.

The solution was to change the NetApp export permissions to allow root access from the new Storage vSwitch IP address.  Resolution time was 7 hours.

Last week we added another blade to the system.  In the course of adding the additional server, we added a Nimble storage volumes to the existing two servers.  We add a new Storage vSwitch connected to the IO Aggregator which has a 10Gb connection directly attached to the storage network.  After the power failure, the NetApp was remounted over the new Storage vSwitch as the destination was on the locally attached network.  This put the datastore in a read only state.  Technically, the NFS export is mounted as non-root which does not have write privileges.  Additional blades have this same configuration and experienced the same issue though the users hadn’t realized it yet.

Sunday, January 31, 2016

High CPU in Windows VM

Summary:  When VMware Resource Pool constrains CPU resources on a VM within the pool, the guest OS can indicate high CPU usage when VMware showed it as not particularly busy.  CPU Ready % from esxtop shows the VM has a high %.  Resolution is to increase the CPU limit in the resource pool settings, power off some VMs or check for process in the VMs using excessive CPU.

I had been struggling with  a Windows 2012 R2 VM running on VMware.  I have a few hundred of these but this one happened to be a domain controller and was painful to remote desktop to. Further, it is running on hyper-converged Nutanix host in India. I know the host and storage are fine.  I even had the fine folks at Nutanix help look at the issue just in case there was some odd issue with storage for this one VM.

To be honest, I have a couple more VMs experiencing issues.  One of the issues we did find was CPU was often in a high Ready percent.  One of traps we fall get caught in is there is ample CPU % free on the host.  Around 40% free. But that does not mean VMs are not waiting for CPU.  Clearly mine were with 10 or 15% Ready.

I manually control memory and CPU for each of my resource pool.  basically, each resource pool get a percentage of usable CPU and memory.  I deduct 20% of each CPU and Memory from the total of all hosts in the cluster then calculate what that means in KB and Mhz and allocate these per resource pool.  This way, I can ensure the machine never exceeds 80% resource utilization.  I had to create a spreadsheet to calculate these values for me.  It is tedious and when I add a host to the cluster or worse, a new resource pool, there is a lot of updating to the spreadsheet then updating resource pool settings.

So, I checked my resource pool usage for this problem VM and while it is close to the max, it is around 5% free CPU resources.  So, I have this VM that is basically idle but using 90% CPU in the OS.  VMware performance stats show it rather idle.  Remote Desktop control is painfully slow just to open control panel.  It is acting poorly as a DC.  Plenty of free resources on the host.

I see the VM is an exceedingly and pretty constant high CPU Ready %.  What could cause this I thought.  Then, I realize the only thing that would create this condition is if the resource pool limits was constraining the VMs in the pool despite looking OK.

So, I add 20,000 MHz to the resource pool and the CPU usage for the pool in the summary display makes this weird jump and I can see now that the resource pool was actually under duress and the VMs were being heavily constrained.  Within a few minutes the VM returns to normal.  The CPU % in Windows drops to a normal few percent and RDP is working normally and the CPU Ready % for the VM is under 1% again.

Monday, June 15, 2015

NetApp OnCommand System Manager would not launch

It was really pissing me off.  All the sudden, NetApp OnCommand System Manager would not launch.  I tried the desktop icon, the .exe, the start menu but nothing happened.   I clicked away and nothing happened.

I opened a command line and launched java manually and it worked and was 1.8 dot something which is should be OK.

I recently upgraded OnCommand to 3.1.2 RC1 which fixed my last set of connection (500 errors) issues.  Of course I had to make some command line changes to the NetApp box which is a pain in the ass since I never remember the console password.  However, it was working OK.

Maybe Java upgraded, I can't remember but OnCommand stopped working.  So, I installed 3.1.2 RC2 which the links in this forum post said I needed to do But, it still would not launch.

So, I poked around on my system look for rogue Java executables. Sure enough I found some in "c:\windows\system32".  Since this is in the path earlier that the real java path, every time I launch System Manager, this java was invoked.  System Manager could be better written to show an error but it isn't, so we just see nothing.

I deleted the java files in the c:\windows\system32 folder and System Manager launched perfectly.

Error: Registry key 'Software\JavaSoft\Java Runtime Environment'\CurrentVersion'
has value '1.8', but '1.7' is required.
Error: could not find java.dll

Error: Could not find Java SE Runtime Environment.

C:\Windows\System32>dir java*.*
 Volume in drive C is OS

 Directory of C:\Windows\System32

04/02/2015  09:41 AM           189,352 java.exe
05/22/2015  11:29 AM            77,824 JavaScriptCollectionAgent.dll
04/02/2015  09:41 AM           189,352 javaw.exe
04/02/2015  09:41 AM           320,424 javaws.exe
               4 File(s)        776,952 bytes
               0 Dir(s)  123,620,012,032 bytes free

C:\Windows\System32>del java*.*
Access is denied.


Thursday, May 7, 2015

I got Nutanix on a sunny day

I got Nutanix on a shiny day, when it's cold outside I got another NetApp bay.

The question I get most often when I talk about Nutanix is why I bought them. Technically, I didn't buy them. My peer and close friend did. Then, he found another job and quit.  So I own it now. I don't know how he found them but he and I talked, more dreamed about, technology letting our imaginations ring a bit wild.  I think the Double IPA tastings didn't hurt.  I was heavily into bending vCenter to my will.  VMware tries hard to deliver almost the feature you want. Almost.

So, while on a bender, I  lamented (beer induced revisionist history) how we have this crappy hodgepodge of mashed up compute and storage hardware.  Wouldn't it be great to be able to cluster servers that have 16 hard disks as one, reliable, striped storage array shared as one NFS mount to all the VMware servers?   We would have all out current compute plus disk distributed across multiple physical hosts.  We could optimize disk usage and survive a whole host failure.  if only...

Some months later he suggested Nutanix.  I looked over the platform info and considered our use case and nodded in approval as one does to add dramatic affect to his approval.  Not that I approved.  I was skeptical. But, in case the skies over Nutanix clouded up, I had a extra NetApp bay.

I had spend a few weeks helping the Engineering Operations folks with virtualizing our builds servers.  having recently emerged from the primordial ooze, virtualization was witchcraft and I  repeatedly fended of burning torches aimed at the kindling I stood upon.  Eventually though, they came around and much of it was virtualized.  Poorly. Using local storage and suffering for it.   It really wan't all their fault except for not taking my help.  They were way out of their element,.  I mean who runs a VM on an ESXi instance in Xenserver and says virtualization doesn't work.

Our design goal was to decrease the build time.  Whatever it was, decrease it.   After quite a bit of instrumenting, I already knew the way they implemented their systems and network was a mess and a big part of the problem.  Two other very significant problems were the server OS and build scripts.  Those would have to be fixed later.  Turns out much later.

So, we bought Nutanix because to eliminate disk IO latency on concurrent builds, localize VM files to the host running the VM and to upgrade and isolated the network to 10gb.  The hardware was about the same as the Dell blades and R720 servers we already had.  We gained a lot of storage for our bloated VMs.

My short term goal was for our team to take over the server and network architecture and free the engops folks to do what they are best at. EngOpsy stuff.  Before we could finish evaluating the builds time, we were asked to deploy builds servers on the platform .  I guess they were so delighted with the Nutanix performance, the heck with finishing experimentation.  

Did we solve all the problems?  No.  With some help, they finally got on a current OS release.  There was a lot of work done to the scripts. I recently optimized their copy process to move builds to a central location.

Was Nutanix the right choice.  Well, it was the one we made  It kicks ass.  Has great performance metrics, it stupidly simple to upgrade and their support is stellar.  I have rolled out several new boxes for DevTest, so, yeah, it was worth it.

I also get asked about VDI.  I used to say we don't do DVI but after thinking about it, most the DevTest VMs are Windows running our own apps.  We use RDP.  We also have a lot of linux which we use SSH.

Are there any problems with Nutanuix? I get that a lot too.  My answer is that they are software. All software has issues.  There are also interface and operations elements I curse and I would change (they are very welcoming of my criticism).  I don't like my Deduplication ratio so I use inline compression which I like but doesn't help as much when you are using [protection domains remote sites.  I didn't plan for the CVM needs so I over estimated the VMs I can run on a single node.