A few VMs hosted on the two blade servers indicated they
were inaccessible. Some were simply orphaned which is sometimes a result of a
power loss event. Others listed the GUID
of the path (rather than the label for the storage). Orphaned VMs must be removed from inventory
and then added to inventory from the datastore browser. VMs that show the GUID path can often be recovered
simply by refreshing the storage after all services are restored. Neither of these were successful.
The hosts have been configured with local storage and NetApp
storage for some time. VMs have been
running from both datastore sources without issue.
After the power failure, we experienced problems with VMs on
NetApp storage.
-
VMs running from local storage experienced no
issues and could be started normally.
-
VMs running on the NetApp could not be
started. Orphaned VMs could not be added
to inventory. We were not able to modify
or copy the files on the NetApp. We
could not unmount the NetApp. Neither
SSH or Desktop clients actions were successful.
Later in the morning, we identified another blade experiencing the same
condition. This indicated it was not
isolated to just these two blades.
We verified the NetApp was in a normal state by accessing
the files as a CIFS share.
We identified changes since the last power failure (a week ago). 1. We added a vSwitch and network interface
to the Storage network. 2. We attached a
Nimble volume to the host. We decided to
back out these two changes.
We set the Nimble Volume to Offline in Nimble. Surprisingly, we were then able to unmount
the NetApp. This indicated a strange
link to the datastores. VMWare indicated the datastore was in use. Several iterations of removing and adding did not solve
the VM issues on NetApp.
We unmounted NetApp then removed the new Storage vSwitch followed
by mounting the NetApp again. VM
operation on the NetApp were normal again.
We unmounted NetApp, added the Storage vSwitch then mounted
NetApp again. VM operations failed on
the NetApp once again.
A short time later we realized the NetApp export access
permissions permitted the (public) IP address of the original vSwitch of the
blades. Adding the vSwitch on the
Storage network established a connection from the new IP on the local storage
subnet.
The solution was to change the NetApp export permissions to
allow root access from the new Storage vSwitch IP address. Resolution time was 7 hours.
Last week we added another blade to the system. In the course of adding the additional
server, we added a Nimble storage volumes to the existing two servers. We add a new Storage vSwitch connected to the
IO Aggregator which has a 10Gb connection directly attached to the storage
network. After the power failure, the
NetApp was remounted over the new Storage vSwitch as the destination was on the
locally attached network. This put the
datastore in a read only state. Technically,
the NFS export is mounted as non-root which does not have write privileges. Additional blades have this same configuration
and experienced the same issue though the users hadn’t realized it yet.
No comments:
Post a Comment