we had an issue , the host were un-responsive and the VM that resides on the host were unable to power off , or power on,nor do any activity like migrate to another host,
we used to get request timed out , operation is already in use errors ,
We tried to kill the process for the specific VM using
ps -ef | grep *VM* ,
we killed the Virtual machines PPID , but still the VM was hung and the reboot task was stuck at 95 %
When we checked the vmkernel.log , cd /var/log/vmkernel.log and found out plenty of LUN reservation conflict events, We verified the KAVG, GAVG, and DAVG Read rate , Write rate metrics and found out the latency on the data store was very high
We identified the LUN id on which the VM resides , and noted down the volume path
esxcfg-scsidevs -c : this will list the LUN that are detected by Host during the boot time
and from /var/log/vmkernel.log we identified the LUN id and the path which had reservation conflict and later we did a LUN reset using the below command
vmkfstools --lock lunreset /vmfs/devices/disks/<Volume path >
this made the lun to reset and the Vm that was residing on the server was freed up, after this we were able to Power off the server and make the VM up
Thanks and Regards,
Nithyanathan R