- We notice the ‘rport’ messages are only observed when an array controller port times out, not when it is visible again. The message of “blocked FC remote port time out” literally translates to the array controller port connection timed out, so we mark the paths as dead but save the “binding” information in case the device ever comes back on the fabric, in which case the connection is reestablished and the paths change state from “dead” to “on”.
- But it was followed by a rport recovery event received by the Emulex controller further.
var/log# grep rport messages | less
Jun 4 09:24:52 vmkernel: 0:19:33:08.038 cpu1:4450)<3> rport-1:0-2: blocked FC remote port time out: saving binding
Jun 4 09:25:39 vmkernel: 0:19:33:54.710 cpu5:4750)<3>lpfc820 0000:06:00.0: 0:3094 Start rport recovery on shost id 0x0 fc_id 0x653200 vpi 0x0 rpi 0x4 state 0x6 flags 0x80000000
Jun 4 09:25:49 vmkernel: 0:19:34:04.711 cpu9:4442)<3> rport-0:0-2: blocked FC remote port time out: saving binding
Jun 4 09:26:41 vmkernel: 0:19:34:56.885 cpu8:4454)<3> rport-0:0-2: blocked FC remote port time out: saving binding
Jun 4 09:27:26 vmkernel: 0:19:35:41.978 cpu1:4448)<3> rport-0:0-2: blocked FC remote port time out: saving binding
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu6:4454)<3> rport-1:0-2: blocked FC remote port time out: saving binding
Jun 4 09:30:30 vmkernel: 0:19:38:29.749 cpu5:4751)<3>lpfc820 0000:06:00.1: 1:3094 Start rport recovery on shost id 0x1 fc_id 0x663200 vpi 0x0 rpi 0x4 state 0x6 flags 0x80000000
Jun 4 09:30:30 vmkernel: 0:19:38:39.750 cpu8:4462)<3> rport-1:0-2: blocked FC remote port time out: saving binding
Jun 5 05:03:00 vmkernel: 1:15:11:15.569 cpu9:4751)<3>lpfc820 0000:06:00.1: 1:3094 Start rport recovery on shost id 0x1 fc_id 0x663200 vpi 0x0 rpi 0x4 state 0x6 flags 0x80000000
Jun 5 05:05:13 vmkernel: 1:15:13:28.450 cpu11:4446)<3> rport-1:0-2: blocked FC remote port time out: saving binding
var/log# grep Devloss messages | less
Jun 4 09:24:52 vmkernel: 0:19:33:08.038 cpu6:4751)<3>lpfc820 0000:06:00.1: 1:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:50 NPort x663200 Data: x40000 x1 x0
Jun 4 09:25:49 vmkernel: 0:19:34:04.711 cpu2:4750)<3>lpfc820 0000:06:00.0: 0:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:40 NPort x653200 Data: x40000 x1 x0
Jun 4 09:26:41 vmkernel: 0:19:34:56.885 cpu4:4750)<3>lpfc820 0000:06:00.0: 0:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:40 NPort x653200 Data: x0 x1 x0
Jun 4 09:27:26 vmkernel: 0:19:35:41.978 cpu5:4750)<3>lpfc820 0000:06:00.0: 0:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:40 NPort x653200 Data: x40000 x1 x0
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu4:4751)<3>lpfc820 0000:06:00.1: 1:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:50 NPort x663200 Data: x80000040 x4 x4
Jun 4 09:30:30 vmkernel: 0:19:38:39.750 cpu2:4751)<3>lpfc820 0000:06:00.1: 1:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:50 NPort x663200 Data: x40000 x1 x0
Jun 5 05:05:13 vmkernel: 1:15:13:28.451 cpu2:4751)<3>lpfc820 0000:06:00.1: 1:(0):0203 Devloss timeout on WWPN 50:06:0e:80:06:fe:7b:50 NPort x663200 Data: x0 x1 x0
- - To add to that we noticed PLOGI failures being reported which could indicate there is an issue with Emulex driver or firmware being used.
var/log# grep PLOGI messages | less
Jun 4 09:25:42 vmkernel: 0:19:33:57.757 cpu4:4751)<3>lpfc820 0000:06:00.1: 1:(0):2753 PLOGI failure DID:663200 Status:x3/x2
Jun 4 09:25:58 vmkernel: 0:19:34:13.824 cpu1:4750)<3>lpfc820 0000:06:00.0: 0:(0):2753 PLOGI failure DID:653200 Status:x3/x103
Jun 4 09:26:52 vmkernel: 0:19:35:07.708 cpu2:4750)<3>lpfc820 0000:06:00.0: 0:(0):2753 PLOGI failure DID:653200 Status:x3/x103
Jun 4 09:27:58 vmkernel: 0:19:36:14.623 cpu5:4750)<3>lpfc820 0000:06:00.0: 0:(0):2753 PLOGI failure DID:653200 Status:x3/x103
Jun 4 09:30:30 vmkernel: 0:19:38:29.805 cpu5:4751)<3>lpfc820 0000:06:00.1: 1:(0):2753 PLOGI failure DID:663200 Status:x3/x103
Jun 5 05:03:20 vmkernel: 1:15:11:35.568 cpu1:4751)<3>lpfc820 0000:06:00.1: 1:(0):2753 PLOGI failure DID:663200 Status:x3/x103
Jun 5 05:05:10 vmkernel: 1:15:13:25.959 cpu0:4751)<3>lpfc820 0000:06:00.1: 1:(0):2753 PLOGI failure DID:663200 Status:x3/x103
Jun 5 05:05:26 vmkernel: 1:15:13:41.564 cpu8:4751)<3>lpfc820 0000:06:00.1: 1:(0):2753 PLOGI failure DID:663200 Status:x3/x103
- - During the rport timeout message, APD was reported by storage devices which is as expected.
var/log# grep APD messages | less
Jun 4 09:25:49 vmkernel: 0:19:34:04.711 cpu6:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000aef" - failed to issue command due to Not found (APD), try again...
Jun 4 09:25:49 vmkernel: 0:19:34:04.711 cpu6:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b000005ac" - failed to issue command due to Not found (APD), try again...
Jun 4 09:25:49 vmkernel: 0:19:34:04.711 cpu6:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000451" - failed to issue command due to Not found (APD), try again...
Jun 4 09:25:49 vmkernel: 0:19:34:04.711 cpu6:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b0000013e" - failed to issue command due to Not found (APD), try again...
Jun 4 09:25:50 vmkernel: 0:19:34:05.711 cpu9:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000447" - failed to issue command due to Not found (APD), try again...
Jun 4 09:25:50 vmkernel: 0:19:34:05.711 cpu9:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000326" - failed to issue command due to Not found (APD), try again...
Jun 4 09:26:41 vmkernel: 0:19:34:56.955 cpu9:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000aef" - failed to issue command due to Not found (APD), try again...
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu10:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000451" - failed to issue command due to Not found (APD), try again...
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu10:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000447" - failed to issue command due to Not found (APD), try again...
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu10:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b0000013e" - failed to issue command due to Not found (APD), try again...
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu10:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000aef" - failed to issue command due to Not found (APD), try again...
Jun 4 09:28:12 vmkernel: 0:19:36:28.607 cpu10:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b000005ac" - failed to issue command due to Not found (APD), try again...
Jun 4 09:28:13 vmkernel: 0:19:36:29.050 cpu9:4511)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e8006fe7b000000fe7b00000326" - failed to issue command due to Not found (APD), try again...
- Once the rport recovery event was received, the storage devices failed over and connected was restored.
var/log# grep restored messages | less
Jun 4 09:26:04 vobd: Jun 04 09:26:04.069: 70460276757us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b00000aef (Datastores: "Tier-310_P6500_07_T3") restored. Path vmhba0:C0:T0:L7 is active again..
Jun 4 09:26:05 vobd: Jun 04 09:26:05.673: 70461880289us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b000005ac (Datastores: "Tier-319_P6500_00_T3") restored. Path vmhba0:C0:T0:L0 is active again..
Jun 4 09:26:57 vobd: Jun 04 09:26:57.384: 70513591727us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.60060e8006fe7b000000fe7b00000485 (Datastores: "Tier-312_P6500_09_T3") restored. Path vmhba0:C0:T0:L9 is active again..
Jun 4 09:27:13 vobd: Jun 04 09:27:13.577: 70529784677us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.60060e8006fe7b000000fe7b00000c39 (Datastores: "Tier-320_P6500_01_T3") restored. Path vmhba0:C0:T0:L1 is active again..
Jun 4 09:27:13 vobd: Jun 04 09:27:13.606: 70529814188us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.60060e8006fe7b000000fe7b00000c3a (Datastores: "Tier-321_P6500_02_T3") restored. Path vmhba0:C0:T0:L2 is active again..
Jun 4 09:27:13 vobd: Jun 04 09:27:13.892: 70530099508us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.60060e8006fe7b000000fe7b00000e5c (Datastores: "Tier-307_P6500_04_T3 - Dont Use") restored. Path vmhba0:C0:T0:L4 is active again..
Jun 4 09:27:13 vobd: Jun 04 09:27:13.926: 70530133583us: [esx.clear.storage.redundancy.restored] Path redundancy to storage device naa.60060e8006fe7b000000fe7b0000020d (Datastores: "Tier-323_P6500_05_T3") restored. Path vmhba0:C0:T0:L5 is active again..
Jun 4 09:28:18 vobd: Jun 04 09:28:18.391: 70594598826us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b00000aef (Datastores: "Tier-
Jun 4 09:28:27 vobd: Jun 04 09:28:27.680: 70603888112us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b00000c3a (Datastores: "Tier-321_P6500_02_T3") restored. Path vmhba1:C0:T0:L2 is active again..
Jun 4 09:28:27 vobd: Jun 04 09:28:27.773: 70603980389us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b00000e61 (Datastores: "Tier-322_P6500_03_T3") restored. Path vmhba1:C0:T0:L3 is active again..
Jun 4 09:28:27 vobd: Jun 04 09:28:27.789: 70603996973us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b00000e5c (Datastores: "Tier-307_P6500_04_T3 - Dont Use") restored. Path vmhba1:C0:T0:L4 is active again..
Jun 4 09:28:27 vobd: Jun 04 09:28:27.896: 70604103974us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b00000445 (Datastores: "Tier-309_P6500_06_T3 - Dont Use") restored. Path vmhba1:C0:T0:L6 is active again..
Jun 4 09:28:27 vobd: Jun 04 09:28:27.906: 70604113875us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b0000020d (Datastores: "Tier-323_P6500_05_T3") restored. Path vmhba1:C0:T0:L5 is active again..
Jun 4 09:28:31 vobd: Jun 04 09:28:31.789: 70607996505us: [esx.clear.storage.connectivity.restored] Connectivity to storage device naa.60060e8006fe7b000000fe7b000011d9 (Datastores: "Tier-328_P6500_20_T3") restored. Path vmhba0:C0:T0:L20 is active again..
Recommendations:
- Engage Emulex vendor for the PLOGI failures..
- Engage storage hardware vendor to check the array for port flapping.