VMWare ESX 4.0 iscsi volume problems.

We recently encountered a problem which nearly ground  our VMWare ESX farm to a halt.

The cause of the problem was a iscsi lock caused by 2 hosts trying to write to the same store at the same time.

This was evident from messages on the ESX hosts from the error message: vprob.vmfs.heartbeat.timedout and referencing one of the volumes on our ISCSI storage.

This was causing the entire ESX host to have connectivity problems as well as affecting the guests that resided on that volume.

Because the host and guests were not accesible through vSphere, we were unable to remove the volume or power cycle the guests.

After much digging around and with the help of VMWare support we understood that the cause of the problem was a lock on that filesystem, and to fix the problem we ran vmkfstools -L lunreset /vmfs/devices/disks/volumename…

This removed the lock which was caused by 2 hosts trying to write to the same volume at the same time and causing a iscsi lock.

Very painful, but happy to have the cluster back up and running.

You can follow any responses to this entry through the RSS 2.0 feed.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This blog is kept spam free by WP-SpamFree.