Issue Description:Un-presentation of an NFS datastore without first un-mounting from the hosts results in a network traffic storm in the infrastructure.
Symptoms:
- Hosts disconnected from Virtual Center server
- Physical network switches utilized near capacity
- Packet capture reveals multitude of GETATTR and ARP packets
- Systems connected to the same network infrastructure impacted
Tip#1 : Run packet capture on NFS array/filer
Tip#2 : Use Wireshark to review packet capture
Packet Capture - Failure pattern:
===
-->SYN---
<--SYN-ACK<--
-->ACK---
-->GETATTR---
<--FIN-ACK<--- (Close connection from Server)
-->ARP REQ---
<--ARP RES<---
-->ACK--- (for the FIN sent by Server)
-->RST----
===
Root Cause :
NFS filer/Arrays return FIN-ACK -typically to close connections to any NFS client(ESX host or any server accessing the NFS filer) that attempts to access a Lun that has been deleted or removed.
This can be deemed as a security measure to quell requests to gain access to non-existent devices and the NFS server is not obligated to service such requests.
Another significant reason why this should be done by the array is that one can build a server in the environment that can maliciously cause Denial of service(DDoS) type attacks on NFS array.
Resolution:
If hosts are still accessible, unmount the datastores
Else power down/reboot the hosts causing the network storm immediately.
The best practice for datastore removal is documented below,
http://kb.vmware.com/kb/2004605 - Un-mounting or detaching a datastore / storage device from multiple ESXi 5.x hosts.
In conclusion it is neither a fault of the ESX server or the array to behave in this fashion, both are reacting to abrupt device removal which is against standard best practices, although both server and client can be designed to behave more gracefully.
-Cedric