I would like to share my recent experience when troubleshooting a HA issue.Be default when trying adding a new host to the existing cluster or when you reconfigure a HA on one of the existing host in the cluster HA opearing will timeout.To fix the issue eithwe we have to disable and enable the Ha on the cluster level or reconfigure HA on the master node
vpxd logs
vpxd-19.log:2017-09-26T01:17:53.673Z info vpxd[7FA6DC311700] [Originator@6876 sub=vpxLro opID=lro-53-79f246b8-02] [VpxLRO] -- BEGIN task-147951 -- svrau-esx03.jbhi-fi.local -- DasConfig.ConfigureHost --
vpxd-19.log:2017-09-26T01:17:53.673Z info vpxd[7FA6DC311700] [Originator@6876 sub=MoHost opID=lro-53-79f246b8-02] [HostMo::UpdateDasState] VC state for host host-220 (initialized -> uninitialized), FDM state (Live -> Live), src of state (null -> null)
vpxd-19.log:2017-09-26T01:17:53.965Z info vpxd[7FA6DC311700] [Originator@6876 sub=DAS opID=lro-53-79f246b8-02] [VpxdDasConfigLRO::ConfigureResources] Skipping aam RP config for ESX 6+ host
vpxd-19.log:2017-09-26T01:17:54.389Z info vpxd[7FA6DC311700] [Originator@6876 sub=HostUpgrader opID=lro-53-79f246b8-02] [VpxdHostUpgrader] Fdm on host-220 has build 5973321. Expected build is 6671409 - will upgrade
vpxd-19.log:2017-09-26T01:17:54.578Z info vpxd[7FA6DC311700] [Originator@6876 sub=HostAccess opID=lro-53-79f246b8-02] Using vpxapi.version.version11 to communicate with vpxa at host svrau-esx03.jbhi-fi.local
vpxd-19.log:2017-09-26T01:21:41.970Z warning vpxd[7FA6DC698700] [Originator@6876 sub=VpxProfiler opID=lro-53-79f246b8-02-TaskLoop-dc7c952] TaskLoop [TotalTime] took 222225 ms
vpxd-19.log:2017-09-26T01:21:42.104Z info vpxd[7FA6DC311700] [Originator@6876 sub=DAS opID=lro-53-79f246b8-02] [VpxdDasConfig::PushConfigToFDM] pushed config version 127 to host [vim.HostSystem:host-220,svrau-esx03.jbhi-fi.local] (cluster [vim.ClusterComputeResource:domain-c31,AU001_CLUSTER_GENERAL])
vpxd-19.log:2017-09-26T01:23:42.120Z error vpxd[7FA6DC311700] [Originator@6876 sub=DAS opID=lro-53-79f246b8-02] Timed out waiting for election to complete or for host to join existing master
vpxd-19.log:2017-09-26T01:23:42.120Z error vpxd[7FA6DC311700] [Originator@6876 sub=DAS opID=lro-53-79f246b8-02] EnableDAS failed on host [vim.HostSystem:host-220,svrau-esx03.jbhi-fi.local]: N3Vim5Fault8Timedout9ExceptionE(vim.fault.Timedout)
vpxd-19.log:2017-09-26T01:23:42.121Z info vpxd[7FA6DC311700] [Originator@6876 sub=MoHost opID=lro-53-79f246b8-02] [HostMo::UpdateDasState] VC state for host host-220 (initialized -> init error), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
vpxd-19.log:2017-09-26T01:23:42.184Z info vpxd[7FA6DC311700] [Originator@6876 sub=DAS opID=lro-53-79f246b8-02] [VpxdDasConfigLRO::Cleanup] Number of unprotected vms: 24
vpxd-19.log:2017-09-26T01:23:42.184Z warning vpxd[7FA6DC311700] [Originator@6876 sub=VpxProfiler opID=lro-53-79f246b8-02] VpxLro::LroMain [TotalTime] took 348510 ms
vpxd-19.log:2017-09-26T01:23:42.184Z info vpxd[7FA6DC311700] [Originator@6876 sub=vpxLro opID=lro-53-79f246b8-02] [VpxLRO] -- FINISH task-147951
vpxd-19.log:2017-09-26T01:23:42.184Z info vpxd[7FA6DC311700] [Originator@6876 sub=Default opID=lro-53-79f246b8-02] [VpxLRO] -- ERROR task-147951 -- svrau-esx03.jbhi-fi.local -- DasConfig.ConfigureHost: vim.fault.Timedout:
vpxd-19.log:2017-09-26T01:24:15.373Z warning vpxd[7FA6DE17B700] [Originator@6876 sub=VpxProfiler opID=lro-53-79f246b8-02-EventManagerProcessJobs-4606457a] EventManagerProcessJobs [TotalTime] took 33248 ms
From the above snippets it is evident that HA configuration is timing because of under lying network latency between the esxi hosts.This issue can be fixed by increasing the default timed out value for FDM in the Vcenter server advanced settings
Applying a VMware HA customization
Using the vSphere Web Client
Log in to VMware vSphere Web Client.
Go to Home > vCenter > Clusters.
Under Object, click on the cluster you want to modify.
Click Manage.
Click vSphere HA.
Click Edit.
Click Advanced Options.
Click Add and enter in Option add config.vpxd.das.fdmWaitForUpdatesTimeoutSec and Value field set it t0 60
Deselect Turn ON vSphere HA.
Click OK.
Wait for HA to unconfigure, click Edit and check Turn ON vSphere HA.
Click OK and wait for the cluster to reconfigure.