SLOW BOOTING ESXi WHICH HAS RDM MSCS

vQuicky 

>Hosts having a MSCS reboot takes a long time

>MSCS clusters share a RDM lun in physical compatibility mode

> The active vm of the cluster will have SCSI reservation on the lun causing the hypervisor to interrogate the lun by doing a shared discovery.

>ESXi/ESX 4.0, we recommended changing the advanced option Scsi.UWConflictRetries to 80.

> ESXi 5.0 & 5.1, a new flag was introduced to allow an administrator to mark a LUN/RDM as ‘perennially reserved’

inDepth

It is not too rare to see Microsoft clustered virtual machines deployed on a vmware infrastructure. VMs spread across cabs or even in the same can but on different hosts. This allows you to run microsoft clusters which can offer higher failure tolerance.

However an interesting blog was throw out there addressing the much known slow boot process in the host that has been presented with a RDM which is being used by Microsoft clustering vms.

We all know that a microsoft cluster is presented with the RDM and has access directly to the raw lun when the RDM is presented. Now in a scenario where you have to reboot a host that is participating in a Microsoft cluster, you will notice slow boot up times for the host.

This is because the active node will still have scsi reservations on the rdm lun which will inadvertently cause the hypervisor to slow down during boot as it tries to interrogate each of these disks during storage discovery. The article explains on how to resolve this because a down hypervisor in a Microsoft Cluster is not really desirable.

The solution is to set some advanced parameters that will basically skip over a disk if they find a SCSI reservation.

On ESXi/ESX 4.0, we recommended changing the advanced option Scsi.UWConflictRetries to 80. In ESX/ESXi 4.1, a new advanced option called Scsi.CRTimeoutDuringBoot was introduced (CR is short for Conflict Retries), and the recommendation was to set this value to 1 to speed up the boot process. What these settings did was effectively get the discovery process to move on as quickly as possible once a SCSI reservation was detected.

In ESXi 5.0 & 5.1, a new flag was introduced to allow an administrator to mark a LUN/RDM as ‘perennially reserved’. This is an indication to the SCSI mid-layer of the VMkernel to not to try to query this device during a ‘discovery’ process.

This allows a host with a MCSC cluster node to boot up faster

The blog is here and the KB Article is here.

Leave a Reply

Your email address will not be published.

Post Navigation