Bits and pieces ive come across using VMWare ESX 3, 3.5i and 4i along with vCentre and vSphere etc..
Posted by admin on June 7th, 2011 |
0 comments
In the past on vmware you could use the tool vmware-cmd-l to help you find the mac addresses of guests registered to vmware hosts.
You might need to do this if you have an ip address conflict or for some other reason where going through all the guests would be a time consuming and laborious process.
With ESX and ESXi version 4 onwards that command was removed, so the below ash script will allow you to either list the VM’s on a host, or if you use it with an argument it will search for a mac address or part of a mac address.
This script will allow you to search and find a mac address and tie that to a guest name. The script needs to be run on the ESX or ESXi host/s
run it from the command line like below
#./macfinder.sh < no arguments to list all registered VM’s
or
#./macfinder.sh 00:50:56:00:00:00 < with argument to find specific or part of mac address
___________________________ shell script below —————————
#!/bin/ash
# searches current registered vm’s then uses that list to find macAddresses of those
# then greps output for desired mac or part of mac address
vim-cmd vmsvc/getallvms | awk ‘{print $1″:” $2}’ | grep -v Vmid > /tmp/allvms
for lines in `cat /tmp/allvms`do
id=$(echo $lines | awk -F: ‘{print $1}’)
name=$(echo $lines | awk -F: ‘{print $2}’)
mac=$(vim-cmd vmsvc/device.getdevices $id | grep macAddress)
if test $1then
echo $name $mac | grep $1
else echo $name $mac
fi
done
rm -rf /tmp/allvms
Posted by admin on August 18th, 2010 |
0 comments
We recently encountered a problem which nearly ground our VMWare ESX farm to a halt.
The cause of the problem was a iscsi lock caused by 2 hosts trying to write to the same store at the same time.
This was evident from messages on the ESX hosts from the error message: vprob.vmfs.heartbeat.timedout and referencing one of the volumes on our ISCSI storage.
This was causing the entire ESX host to have connectivity problems as well as affecting the guests that resided on that volume.
Because the host and guests were not accesible through vSphere, we were unable to remove the volume or power cycle the guests.
After much digging around and with the help of VMWare support we understood that the cause of the problem was a lock on that filesystem, and to fix the problem we ran vmkfstools -L lunreset /vmfs/devices/disks/volumename…
This removed the lock which was caused by 2 hosts trying to write to the same volume at the same time and causing a iscsi lock.
Very painful, but happy to have the cluster back up and running.
Posted by admin on February 17th, 2010 |
2 comments
Came across a very odd issue lately where guests on one of our ESX4 hosts were periodically loosing network connectivity very briefly – maybe 10 ICMP packets every half hour or hour.
After much debugging on the network side, thinking that perhaps there was a misconfigured NIC with the wrong VLAN config, the problem was still happening.
So ssh’ing onto the host, I started to trawl through the log files, and came across the below in the /var/log/vmkwarning file:
Feb 17 13:44:19 vminfraboxvmkernel: 18:00:00:11.865 cpu4:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device “naa.6090a028004f243d08ab44c26687e3dd” – issuing command 0×410002074040
Feb 17 13:44:19 vminfrabox vmkernel: 18:00:00:11.865 cpu4:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device “naa.6090a028004f243d08ab44c26687e3dd” – failed to issue command due to Not found (APD), try again…
Feb 17 13:44:19 vminfrabox vmkernel: 18:00:00:11.865 cpu4:4222)WARNING: NMP: nmp_DeviceAttemptFailover: Logical device “naa.6090a028004f243d08ab44c26687e3dd”: awaiting fast path state update…
This was occuring repeatedly every half hour and the entries above filled the logs solidly for about 2 minutes continuously every half an hour.
After doing some digging on the google, I found out that ESX4 has a bug whereby if you have a duff or old connection to an iSCSI LUN – perhaps one that no longer exists – but you never rescanned to remove it – when the host tries to check the paths every 30 minutes, it finds this duff connection and goes through the motions of trying to find failover paths. The bug is that this causes very brief network loss to your guests.
The fix for me was to simply re-scan my adapter, which removed the old mapping to one of our removed LUNS’s and the problem went away.