In this blog post i will describe what is going on in the background when ‘./sos –stretch-vsan’ command is initiated.
It’s not a rocket science because all these steps are done when you stretch cluster manually but i thought it can be helpful for someone who never did that. Some tasks are described as ‘all clear’. I assumed they don’t require a comment because their names are quite clear.
All prerequisites and procedure how to stretch a cluster across two availability zones are described in VMware documentation.
Ok, so when you ssh to SDDC Manager and you enter the command:
” ./sos –stretch-vsan –sc-domain <DOMAIN NAME> –sc-cluster <CLUSTER NAME> –sc-hosts <HOSTFQDN,HOSTDQND2,…> –witness-host-fqdn <WITNESS HOST FQDN> –witness-vsan-ip <WITNESS VSAN IP> –witness-vsan-cidr <WITNESS VSAN CIDR> –esxi-license-key <LICENSE KEY>”
just start monitor the task in SDDC Manager UI. You will see long list of tasks that are executed in order as in table below to stretch the cluster:
TASK NAME | DETAILS | |
1 | Validate Stretch Spec | – validating hosts for duplicate entries; – getting ESXi info for hosts; – checking if hosts exist in inventory; – checkin Un-assigned esxi id’s; – checkin Esxi compliant versions; – checking if hosts are not Dirty, if partitions are erased; – checking if hosts have ACTIVE status; – checking if all hosts are from same Network Pool |
Validate Host and vSAN Licenses | – all clear | |
2 | Acquire Lock for ESXi Host Addition | – all clear |
3 | Validate Storage Compatibility for Hosts | – checking if the hosts are having homogeneous storage types |
4 | Generate Internal Model for ESXi Host Addition | – checking VSAN network (MTU 9000), – checking VMOTION network (MTU 9000); – saving vmknics configuration |
5 | Assemble Fault Domain Spec | – checkin if Esxi hosts in both fault domain are visible in inventory |
6 | Validate Internal Model for ESXi Host Addition | – validating management VC SSO credentials; – validating AZ2 hosts’s management IPs, credentials, Networking State and VMs; – validate if sufficient number of vmnics are available in hosts; – listing and checking all vmnics; – validating if there are any VMs in hosts; |
7 | Allocate ESXi Host IP Addresses | – building Esxi networkPool and network types: VMOTION and VSAN; – fetching the networks associated with the network pool; – fetching the IPs from the network pool for the network type: VMOTION and VSAN for all the Esxi hosts; – allocating IPs for all hosts |
8 | Validate Vmotion network connectivity | – setting VDS with PortGroup.TransportType VMOTION; – creating vmknic on vSwitch vSwitch0 and vmotion portgroup with IP address |
9 | Validate vSAN network connectivity | – creating vmknic on vSwitch vSwitch0 and VSAN portgroup with IP address |
10 | Stretch Cluster HA Validation | – looking for gateways for all hosts (AZ1 and AZ2); – updated isolationAddress0 in advanced HA settings; – checking if Primary zone esxists |
11 | Validate VRLI entity in inventory | – validating if vRLI exists in inventory; – validating API credentials for vRLI |
12 | Validate primary zone host connectivity | – validating if the primary zone hosts of existing cluster are reachable from SDDC Manager or not |
13 | Validate secondary zone host connectivity | – validating if the secondary zone hosts of existing cluster are reachable from SDDC Manager or not |
14 | Validate interzone host connectivity | – checking if all Esxi host (AZ1 and AZ2) can ping eachother |
15 | Validate primary and secondary zone host conflicts | – validating if secondary zone hosts are conflicting with primary zone hosts (primary and secondary zone hosts must be unique) |
16 | Check if vSAN Cluster is Network Partitioned | – checking cluster- how many Esxi hosts are already added to cluster; getting the vSAN system details for the hosts |
17 | Check if secondary zone has minimum required hosts | – if primary zone has 4 Esxi hosts, secondary zone has to have same number of hosts |
18 | Validate witness registration with vCenter | – validating if witness host is registered in VCenter |
19 | Validate Witness Host not in Cluster to be Stretched | – validating if witness host is already present in cluster – validating if witness host is not part of cluster |
20 | Validate witness host storage options | – validating if Witness host has required storage options (if ssd and non-ssd disks exists); – claiming vSAN storage SSD disk Local VMware Disk (mpx.vmhba0:C0:T2:L0) and NonSSD disk Local VMware Disk (mpx.vmhba0:C0:T1:L0) for witness |
21 | Update ESXi Host Data in Inventory | – updating Esxi hosts details: domainId, clusterId, vcenterId, networkPoolId, privateIpAddress , vsanIpAddress, vmotionIpAddress, status (“ACTIVATING”), hostAttributes (i.e: {“vendor”:”Dell Inc.”,”model”:”VxRail E460F })….. |
22 | Register Current Task | – registering taskId |
23 | Prepare ESXi Host | – establishing SSH session to hosts: deleting VSAN UUID marker file to host, preparing hosts as a non-primary hosts with vsanUuid, starting to execute command [ cd /tmp/; python /tmp/removevd.py 2>&1, Starting to execute command [ cd /tmp/; python /tmp/createvd.py 2>&1 (output: vmware-esx-storcli VIB not installed), starting to execute command [ cd /tmp/; python /tmp/capacityflash.py 2>&1 (output: All flash disks:), Starting to execute command [ esxcfg-advcfg -s 100000 /LSOM/diskIoTimeout && esxcfg-advcfg -s 4 /LSOM/diskIoRetryFactor ] (output: Value of diskIoTimeout is 100000) |
24 | Get vSphere Cluster MOID | – retrieve and update the vSphere Cluster MOID for Add ESXi Host (to find Managed Object ID for cluster, hosts, vms etc. go to https://VCSAFQDN/mob) |
25 | Create vCenter and Platform Services Controller Host Group | – creatin default host group ( if you want to create it manually please follow VMware doc ) |
26 | Create Primary Availability Zone Host Group | – creating host group ‘clustername_primary-az-hostgroup’ with cluster in vCenter |
27 | Create Primary Availability Zone VM Group | – creating VM group ‘clustername-primary-az-vmgroup’ with vm list with cluster in vCenter |
28 | Create Primary Availability Zone VM Host Rule | – creating VM/Host rule ‘clustername_VMs’ with vms that should be on Primary Site |
29 | Add ESXi Hosts to Data Center | – retrieving ESXi license; – adding host to DC |
30 | Apply License(s) to ESXi Host in vCenter Server – licenseKey for entityId host-84 (name: mec10mgt003.nx5dpc.next) is found, currently Applied license does not match the required license, Applying new license to hosts | – checking if currently applied license match the required license. If not new license is applied to Esxi hosts. |
31 | Enter Maintenance Mode on ESXi Hosts | – all clear |
32 | Add ESXi Hosts to vSphere Distributed Switch | — adding host with vmnic vmnic1 to DV switch, connect vmnic1 to UplinkPorgroup |
33 | Create vMotion vmknic(s) on ESXi Host | – creating vmknic VMOTION, connecting to DVPortgroupkey on DVS; – assigning netstack instance key as vmotion |
34 | Create vSAN vmknic(s) on ESXi Host | – creating vmknic VSAN; – assigning default gateway for vmknic VSAN; – creating vsan vmknic on host and attached to DvSwitch |
35 | Migrate ESXi Host Management vmknic(s) to vSphere Distributed Switch | – migrating vmknic vmk0 to DvSwitch |
36 | Detach vmknic(s) from vSphere Standard Switch | – detaching vmnics [vmnic0] from vSwitch0 |
37 | Attach vmknic(s) to vSphere Distributed Switch | – listing all vmnics; – checking physical nic vmnic0 (if is available- connected and not in-use), speed: 10000MB); – checking rest of physical nic(vmnics); – attaching vmnic1 to DvSwitch; – attaching vmnic vmnic0 to DVS |
38 | Remove vSphere Standard Switches from ESXi Hosts | – removing standard switch vSwitch0 on hosts |
39 | Add ESXi Hosts to vSphere Cluster | – getting thumbprint for Esxi hosts; – adding hosts to cluster |
40 | Configure Static Routes on ESXi Hosts | – all clear |
41 | Configure Power Management Policy on ESXi Host | – checking current power management policy on host: if DYNAMIC – modifying to static |
42 | Create vSAN Disk Groups | – all clear |
43 | Clear Alarms on ESXi Hosts | – clearing alarms on Esxi hosts from gray/red to green |
44 | Enable Log Collection for vSphere | – configuring vCenter hosts to LogInsight |
45 | Resolve ESXi Host Preparation Issues for NSX | – validating NSX manager is up and running |
46 | Enable Reconfigure VM task for NSX Controllers | – exceuting Enable ReconfigureVM Method for NSX Controllers. This step has to be done before task ‘Update and Re-apply vSAN Storage Policy’. Few years ago William Lam wrote post on how to use the vCenter MOB to enable vim.VirtualMachine.reconfigure |
47 | Enable Reconfigure VM task for NSX Edges | – as above: executing Enable ReconfigureVM Method for NSX Edges This step has to be done before task ‘Update and Re-apply vSAN Storage Policy’. |
48 | Exit Maintenance Mode on ESXi Hosts | – all clear |
49 | Configure HA Admission Control | – configuring: CPU 50% and Memory 50% |
50 | Configure HA Isolation Address | – checking and updating isolationAddress0 |
51 | Configure HA Isolation Response | – initianing of ConfigureHaIsolationResponseAction; – configuring of ConfigspecEx {“dasConfig”:{“defaultVmSettings”:{“isolationResponse”:”powerOff”} |
52 | Re-Acquire the Lock and Prepare the ESXi Host(s) for stretch in Inventory | – updating ESXI hosts status to ACTIVATING in Logical Inventory |
53 | Configure Witness Static Routes on ESXi Hosts | – all clear |
54 | Configure vSAN Fault Domains and Stretch Cluster | – configuring fault domains in cluster; – claiming vSAN storage SSD disk Local VMware Disk (mpx.vmhba0:C0:T2:L0) and NonSSD disk Local VMware Disk (mpx.vmhba0:C0:T1:L0) for witness |
55 | Update and Re-apply vSAN Storage Policy | – checking if Policy with name vSAN Default Storage Policy exists; – adding property replicaPreference with value RAID-1 (Mirroring) – Performance; – adding new property ‘checksumDisabled’ with value ‘false’ in storage policy vSAN Default Storage Policy; – updating property hostFailuresToTolerate with value 1; – adding property subFailuresToTolerate with value 1; – adding property locality with value None; – adding property iopsLimit with value 0; – reapplying the storage policy vSAN Default Storage Policy |
56 | Configure and Enable vSAN Health and Performance Service | – enabling vSAN performance service for the cluster |
57 | Disable Reconfigure VM task for NSX Controllers | – execute Disable ReconfigureVM Method for NSX Controllers |
58 | Disable Reconfigure VM task for NSX Edges | – Execute Disable ReconfigureVM Method for NSX Edges (to do that manually just go to https://VCSAFQDN/api/4.0/edges |
59 | Update ESXi & Cluster status in Inventory | – updating Cluster status and Updating status in logical inventory for Esxi hosts |
60 | Release Lock for ESXi Host Addition | – releasing deployment lock |