vSAN VM Write Latency Bug

Having recently had the opportunity to do a bit of performance analysis of a hybrid vSAN cluster, I recently found a bug, causing high VM write latency and reduced read cache hit rate.

The vSAN environment analysed is a 6 host cluster running ESXi 6.0U2 (vSAN 6.2) on HP Proliant DL380, each host has 2 disk groups, and each disk group 1 SSDs for cache and 3 10000rmp SAS disk for capacity. The storage controller is an HP P840. Needless to say, all drivers/firmware was on VMware's HCL, and the vSAN health check and proactive checks was showing up with all pass.

I noticed very high VM write latency (often 50ms or higher), especially on VMs with high IOPS. This was odd as the latency on the SSDs themselves was very low (generally less than 1ms), I had a lot of write buffer space available on all disk groups (90% plus), and very infrequent cache de-staging. I also noticed lower than expected read cache hit rate.

As vSAN does all writes to the SSD write buffer something seemed to be going on between IOPS leaving the VM but before arriving to the SSD. One of the main culprits would have been the network. The Virtual SAN Diagnostics & Troubleshooting Reference Manual (p. 122) for instance indicates that this could be the result of flow control and excessive pause frames. But troubleshooting this with the network team showed that there were no problems with the network. Network drivers on all ESXi hosts were also correct.

I was troubleshooting this problem with VMware GSS and it turned out to be a bug. The dedupe scanner is running although dedupe and compression is turned off (and not supported on hybrid vSAN). This should according to GSS be fixed in esxi 6.0 u3, or can be disabled manually by running the following command when the ESXi host is in maintenance mode:

esxcfg-advcfg -s 0 /LSOM/lsomComponentDedupScanType

Note that you have to reboot the server for this change to take effect!

This is now recorded in VMware KB 2146267. I saw a significant drop in VM write latency and increased rate of read cache hits after implementing this.

Show Comments