Snapshots can harm vSAN witness appliance

I came across a problem where vSAN reported problems with one of the virtual disks (vmdk) connected to vSAN witness appliance (cache disk).

The vSAN health reported “Operational Health Alarm” and all of the VMs objects showed “Reduced availability with no rebuild”.

Strangely enough the underlaying storage was not reporting any errors or problems and other VMs on the same datastore were all ok.

As there was no time to do proper investigation, the decision was made to restart the witness appliance VM.

When the witness came back online everything went back to normal and the errors were gone.

One thing that I noticed was that by some accident the witness appliance got included into snapshot-based (quiesced) backup policy, and a few hours before the accident the backup job got started. It crossed my mind that this problem may have something to do with quiesced snapshots.

I tried to reproduce this problem in my lab and I manage to create the same issue.

I generated some IO in my stretched vSAN cluster plus I executed some re-sync operations by changing storage policy settings.

At the same time I started to create quiesced snapshots on the witness appliance VM. After a while I noticed the same error in my lab.

The following alarm appeared on the witness nested ESXi:

The vSAN health reported “Operational health error” – permanent disk failure:

And “vSAN object health” showed “Reduced availability with no rebuild”:

Witness was inoperable at this stage and as it is a vital component of stretched vSAN cluster the whole environment got affected. Of course, existing VMs kept running as vSAN is a robust solution and can handle such situations (still had quorum). However without "Force Provisioning" in the storage policy no new object (VMs, snapshots, etc.) could be created.

Further investigation of the logs (vmkernel.log and vmkwarning.log) on the witness appliance revealed problems with access to the affected disk (vmhba1:C0:T0:L0)

That proved the problem was indeed virtual disk related and caused by the snapshot.

I tried to fix it by rescanning the storage adapter but to no avail, so I decided to reboot the appliance.

Once the appliance was on-line again the “Operational health error” disappeared.

However, there was still 7 objects with “Reduced availability with no rebuild”

After examining these objects, it turned out that the witness component was missing. Fortunately, it was quite easy to fix by using the “Repair Object Immediately” option in vSAN Health.

It looks like taking snapshots on the vSAN witness appliance not only does not make any sense (can’t think of any) but can also cause problems in the environment.

There is a configuration parameter that could prevent such accidents form happening - “snapshot.maxSnapshots”.

If it is set to “-0” on the VM level it will effectively disable snapshots for that VM, therefore I would strongly advise to set it for the vSAN witness appliance.

Snapshots can harm vSAN witness appliance

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112