Bits and pieces ive come across using VMWare ESX 3, 3.5i and 4i along with vCentre and vSphere etc..
Posted by admin on August 18th, 2010 |
0 comments
We recently encountered a problem which nearly ground our VMWare ESX farm to a halt.
The cause of the problem was a iscsi lock caused by 2 hosts trying to write to the same store at the same time.
This was evident from messages on the ESX hosts from the error message: vprob.vmfs.heartbeat.timedout and referencing one of the volumes on our ISCSI storage.
This was causing the entire ESX host to have connectivity problems as well as affecting the guests that resided on that volume.
Because the host and guests were not accesible through vSphere, we were unable to remove the volume or power cycle the guests.
After much digging around and with the help of VMWare support we understood that the cause of the problem was a lock on that filesystem, and to fix the problem we ran vmkfstools -L lunreset /vmfs/devices/disks/volumename…
This removed the lock which was caused by 2 hosts trying to write to the same volume at the same time and causing a iscsi lock.
Very painful, but happy to have the cluster back up and running.
Posted by admin on February 17th, 2010 |
2 comments
Came across a very odd issue lately where guests on one of our ESX4 hosts were periodically loosing network connectivity very briefly – maybe 10 ICMP packets every half hour or hour.
After much debugging on the network side, thinking that perhaps there was a misconfigured NIC with the wrong VLAN config, the problem was still happening.
So ssh’ing onto the host, I started to trawl through the log files, and came across the below in the /var/log/vmkwarning file:
Feb 17 13:44:19 vminfraboxvmkernel: 18:00:00:11.865 cpu4:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device “naa.6090a028004f243d08ab44c26687e3dd” – issuing command 0×410002074040
Feb 17 13:44:19 vminfrabox vmkernel: 18:00:00:11.865 cpu4:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device “naa.6090a028004f243d08ab44c26687e3dd” – failed to issue command due to Not found (APD), try again…
Feb 17 13:44:19 vminfrabox vmkernel: 18:00:00:11.865 cpu4:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Logical device “naa.6090a028004f243d08ab44c26687e3dd”: awaiting fast path state update…
This was occuring repeatedly every half hour and the entries above filled the logs solidly for about 2 minutes continuously every half an hour.
After doing some digging on the google, I found out that ESX4 has a bug whereby if you have a duff or old connection to an iSCSI LUN – perhaps one that no longer exists – but you never rescanned to remove it – when the host tries to check the paths every 30 minutes, it finds this duff connection and goes through the motions of trying to find failover paths. The bug is that this causes very brief network loss to your guests.
The fix for me was to simply re-scan my adapter, which removed the old mapping to one of our removed LUNS’s and the problem went away.
Posted by admin on October 16th, 2009 |
0 comments
From ESX3.5i to vSphere VMWare has changed slightly the location of where to configure your host so that the guests restart if you happen to have either power failures to your ESX host or if you just shut it down etc..
So follow the below steps to ensure your guests power on automatically on host startup.
Open vSphere Client
In the left hand pane, highlight your choosen host, then click the configuration tab in the main pane.
Click Virtual Machine Startup/Shutdown under the software Section.
Click Properties in the top right hand corner of the main pane.
Then check Allow virtual machines to start and stop automatically with the system, then change the startup order of your guests in the box below as appropriate.
If you want all to startup with the host, then move them all up to automatic startup.
Now when your host restarts, your guests will come up automatically as well.