Author Archives: Ranjit Singh Thakurratan

SLOW BOOTING ESXi WHICH HAS RDM MSCS

vQuicky 

>Hosts having a MSCS reboot takes a long time

>MSCS clusters share a RDM lun in physical compatibility mode

> The active vm of the cluster will have SCSI reservation on the lun causing the hypervisor to interrogate the lun by doing a shared discovery.

>ESXi/ESX 4.0, we recommended changing the advanced option Scsi.UWConflictRetries to 80.

> ESXi 5.0 & 5.1, a new flag was introduced to allow an administrator to mark a LUN/RDM as ‘perennially reserved’

inDepth

It is not too rare to see Microsoft clustered virtual machines deployed on a vmware infrastructure. VMs spread across cabs or even in the same can but on different hosts. This allows you to run microsoft clusters which can offer higher failure tolerance.

However an interesting blog was throw out there addressing the much known slow boot process in the host that has been presented with a RDM which is being used by Microsoft clustering vms.

We all know that a microsoft cluster is presented with the RDM and has access directly to the raw lun when the RDM is presented. Now in a scenario where you have to reboot a host that is participating in a Microsoft cluster, you will notice slow boot up times for the host.

This is because the active node will still have scsi reservations on the rdm lun which will inadvertently cause the hypervisor to slow down during boot as it tries to interrogate each of these disks during storage discovery. The article explains on how to resolve this because a down hypervisor in a Microsoft Cluster is not really desirable.

The solution is to set some advanced parameters that will basically skip over a disk if they find a SCSI reservation.

On ESXi/ESX 4.0, we recommended changing the advanced option Scsi.UWConflictRetries to 80. In ESX/ESXi 4.1, a new advanced option called Scsi.CRTimeoutDuringBoot was introduced (CR is short for Conflict Retries), and the recommendation was to set this value to 1 to speed up the boot process. What these settings did was effectively get the discovery process to move on as quickly as possible once a SCSI reservation was detected.

In ESXi 5.0 & 5.1, a new flag was introduced to allow an administrator to mark a LUN/RDM as ‘perennially reserved’. This is an indication to the SCSI mid-layer of the VMkernel to not to try to query this device during a ‘discovery’ process.

This allows a host with a MCSC cluster node to boot up faster

The blog is here and the KB Article is here.

SRM 5.1 SELF PACED TRAINING FOR FREE!

No vQuicky on this one.. Thought I will throw this post out real quick. If you have a mylearn account with vmware you can access a free 3 hour course on SRM. You can easily register for a mylearn account as well.

The course is really great and covers pretty much everything technical and also gives you a great introduction for SRM. If you have a decent or good understanding of technology and by that I mean hands on, then this course will suffice! No using Facebook, twitter or using a smart phone does’t count as hands on 🙂

Here is the link – SRM Self Paced Training.

Do comment if you like!

 

 

NETAPP RELEASES VSC 4.2 BETA

Netapp yesterday released its Virtual Storage Console (renamed SMVI plugin) 4.2 beta. There are about 150 bug fixes in it and also include roles as one of its newer features.

There is also support for VDDK 5.1 to support Windows 2012 servers as well. There is some consolidation done in the Backup and recovery sections as well.

An excerpt below –

VSC Privileges

The first thing you should go look at is the native Roles pane in your vSphere client after installing VSC 4.2.  You may have seen the few we’ve had in there before, but it has now been massively expanded to include an individual privilege for just about every granular task you could perform within the VSC.

VSC Canned Roles

We’ve also taken the liberty of creating some pre-canned roles for you to use as sample templates.  I’d like to recommend that you only use these as SAMPLE TEMPLATES to clone and customize as you see fit.  If we need to modify these in a future release, we don’t want to screw you up by doing so.  We’re gonna warn you about it, we’re gonna recommend against doing it, but fully expect some people will modify them and run into trouble when upgrading in the future.  We’re hoping to avoid this as much as possible, so please please please do not modify the canned roles themselves, but Clone the roles and then modify those.

As part of the Canned Roles, one of the things we insisted on doing was identifying which native vSphere privileges were required to perform VSC-specific actions.   This is one of those awesome stories where we collaborated heavily with devs from VMware to really nail this down for YOU.  We’ve taken into account as many scenarios as possible and here is what we’ve delivered…

  • VSC Administrator
  • VSC Read-Only
  • VSC Provision
  • VSC Clone
  • VSC Scan/Migrate
  • VSC Backup
  • VSC Restore

The coolest one of these, to me, is the Read-Only.  If you look at the list of privileges available to any role, you will see a nested privilege called “View.”   This solves several asks, two of which I want to highlight here…

  1. One of the asks we’ve had for some time is the ability to hide the VSC for certain underprivileged Jr. admins, where the seniors did not want certain vSphere users/admins to even see the VSC.  We’ve added this privilege specifically for this task.
  2. Secondly, admins have always wanted a “read-only” view.  Sometimes, the IT Director wants to come in and see what things look like.  Sometimes, you want to be able to have certain screens up on a NOC panel.  Whatever the use case, if you select none of the other VSC privileges, and select ‘View’ by itself, this will enable Read-Only and not allow a user to interact with the product, or execute any of the tasks/workflows within.

VSC Shared Credentials


When VSC 1.0 first came out, Rapid Cloning Utility (RCU), and SMVI were separate products. When they were all baked together around VSC 2.0 timeframe, they still managed their own controllers independently.   One of the core charters of development for the 4.x cycle, along with clustered Data ONTAP equivalency, was to consolidate all of this into one centrally managed list within Monitoring & Host Configuration.

I’m happy to report that as of 4.2, we have completed this story by consolidating management of controllers used in Backup & Recovery as well.

We have modified the installer to put in a notification screen to note this, and we are adding some notes into the Install & Admin Guide, as well as the Release Notes to let you know what steps you need to take in order for your jobs to continue to run successfully.

The long and the short of it is:  In past releases, you had to Add… storage controllers and credentials in the B&R section if you wanted to enable them for snapshot backup operations. When you go into B&R in 4.2, you will notice that list is now gone, and that the credentials have been consolidated into Monitoring & Host Config.

Upgrades from 4.x to 4.2 will leave your backup jobs intact, but they could potentially fail if they use a special user account to run.  For example, let’s say you were using the account “backupadmin” in Backup & Recovery previously, but the controller(s) were listed in Monitoring & Host Config with “root.”  Whatever account is configured in M&HC will override what is configured in B&R during the upgrade process.  Now that these have been consolidated, your jobs will use “root” to run, as that is what is configured in Monitoring & Host Configuration.  In this particular scenario, the jobs will continue to run, simply because root is root.  But in the case where you might have “DiscoveryOnly” as an account for listing controllers in M&HC, and the account you used previously to run backup jobs would have higher permissions, the jobs would then fail because the “DiscoveryOnly” account does not have enough permissions to execute snapshots on the controllers.

Note: It is always ill-advised to use root, and both NetApp and myself always recommend against it.  If you’re a generalist managing both, it might not be a big deal, but I would at least create a named user with the same “root” privileges so that you could at least audit what was doing what to what as time goes on.

All storage controllers, clusters, and user/pass credentials for all functional areas of VSC are now solely managed within Monitoring & Host Configuration.

 
VSC Bug Fixes
VSC 4.2 addresses and fixes more than 150 bugs, issues, and vulnerabilities.  It’s hard to call out any one in particular because they all carry tremendous weight as a full payload, but I wanted to highlight one of the big ticket items we fixed in this release, in an effort to encourage you to upgrade your production environments as soon as possible when the final release of VSC 4.2 is available.

Upgraded to VDDK 5.1 to support W2K12

VDDK 5.1 has support for newer versions of Windows operation systems including Windows Server 2008 R2 SP1 and Windows Server 2012. P&C and O&M both have a dependency on the VDDK, so they have both been upgraded to the newest version in order for VSC to support the latest Microsoft OS’s.

You can download your version here.

Comment if you like! Comment if you have tried it

References –

Source 1

AUTO DEPLOY GUI IS OUT THANKS TO VMWARE LABS!

The little birdy told me that auto deploy gui for ESXi 5.1 is out on VMware labs! I did download the plugin on my desktop but will rush home to try it on my lab. Now may be its time for me to get a remote connectivity to my home lab setup.

The auto deploy gui is a plugin obviously for the vcenter. Some of the features as listed are the ability to add/remove Depots, list/create/modify Image Profiles, list VIB details, create/modify rules to map hosts to Image Profiles, check compliance of hosts against these rules and re-mediate hosts.

You can download your copy here.

Enjoy!

CREATING A WINDOWS 2008 SNAPSHOT FAILS – SNAPSHOT 0 FAILED: FAILED TO QUIESCE THE VIRTUAL MACHINE. (40)

vQuicky – 

> Creating a windows 2008 snapshot fails if more than 7 disks are attached to the same SCSI controller

> Occurs when the snapshot is created without snapping the virtual machine’ memory and choosing to quiescing the guest vm.

> Snapshot fails with the error – Snapshot 0 failed: Failed to quiesce the virtual machine. (40)

> Issue occurs because a quiesce snapshot requires one available scsi slot per disk.

> Fix is to either power down the vm and spread the disks across other scsi controllers OR create another thin disk on SCSI1:0 and then create the snapshot.

inDepth – 

I ran into this the other day and tried it on my home lab as well. Turns out windows 2008 vm fails when you choose to quiesce the snapshot but uncheck the snapshot virtual machine’s memory.

The error was listed as

Snapshot fails with the error – Snapshot 0 failed: Failed to quiesce the virtual machine. (40)

The kb article showed what the issue was. The issue occurs when the scsi controller presented to the windows vm has more than 7 disks i.e – when a quiesced snapshot is taken – it requires one free scsi slot per disk. Obviously having more than 7 disks will leave less empty slots and hence it fails.

The kb article listed two work arounds – either power off the vm and move the disks over to another scsi controller OR create another disk and attach it to a secondary scsi controller.

You can change the scsi controller as shown in the below pic.

scsi-controller-change

 

Link to the KB article is here.

COLLECTING VMKERNEL DUMP FILES IN ESXi 5.X

I just thought I will have a quick write up on collecting vmkernel dump files in ESXi 5.X. Basics are as always running vm-support to grab the dump file!

I will copy the notes from the KB article which is self explanatory actually.

During startup of an ESXi 5.x host, the startup script /usr/lib/vmware/vmksummary/log-bootstop.sh checks the defined Dump Partition for new contents. If new content is found, an entry is written to the /var/log/vmksummary.log file citing “bootstop: Core dump found“.

You can collect logs from an ESXi host either by running vm-support at the command line or by using ExportDiagnostic Data from the vSphere Client. Both methods invoke the vm-support script, which checks the defined Dump Partition for new contents. If new content is found, it is temporarily placed in a vmkernel-zdump file in /var/core/ before being compressed in the vm-support output.

Since the vmkernel-zdump-* coredump file is copied from the Dump Partition while running vm-support, it is not necessary to run vm-support a second time to collect the logs. If vm-support is run multiple times, only the first attempt includes a vmkernel-zdump file.

Note: The directory /var/core/ is often located on a ramdisk, so the vmkernel-zdump files placed within may not persist across a reboot.

 

Hope this helps!

BACK TO BASICS – CAN’T VMOTION WITH A INTRANET SWITCH :)

vQuicky – 

> Tried to vmotion a vm to another hyp that had a nic connected to a disconnected vswitch.

> Had the same port group on the other hyp as well but the vmotion failed.

> “Unable to migrate from <source server> to <destination server>: Currently connected network interface ‘<device>’ uses network ‘<network>’, which is a ‘virtual intranet'” was the error

> KB article says to over ride that we need to change the following setting –

navigate to Administration > vCenter Server Settings> Advanced Settings and add config.migrate.test.CompatibleNetworks.VMOnVirtualIntranet with a value of “false”.

inDepth – 

So turns out I forgot some basics about vmotioning vms from one hyp to another. The kb article put the sanity back in me. I was trying to vmotion a vm from one hyp to another. Not a problem right? But the vm was connected to a nic which was connected to a disconnected switch.

VMware was throwing an error about Intranet! Turns out that the virtual machine is connected to a network that is internal only, which means that the virtual switch has no outbound adapters attached.

The fix is – Either to attach an adapter or power down the vm and then move it. Or you can set the advanced setting to false.

The setting can be seen by navigating to Administration > vCenter Server Settings> Advanced Settings and add config.migrate.test.CompatibleNetworks.VMOnVirtualIntranet with a value of “false”.

Here is the kb link.

VSPHERE WEB CLIENT TAKES A MOMENT TO COME ONLINE!

Wanted to throw in a quick note about the vsphere web c lient.

In my home lab I deployed the vcenter appliance. When trying to access the web client – got an error 404 not found.

The service was down – so started the service. Tried accessing after the service was up but still did not see any response.

Turns out, it take a couple of minutes before the web client comes online. No real reason according to the kb article. This also only occurs on the vcenter appliance.

You can find the kb article here.

FALSE ALARM – 2TB THICK DATASTORE SHOWS UP AS 0.00 BYTES!

vQuicky

> VMware recently released a kb article about a bug that shows a 2TB thick datastore as 0 bytes in the datastore browser. The disk shows as 0.00 bytes even on ssh and on console.

> The 2TB thick disk provisioned to the virtual machine shows as 0.00 bytes.

> All files are accessible and the disk is accessible as well. All features are supported such as storage vmotion.

> The issue is caused due to a 32 bit addressing limitation of the DU command and the datastore browser – and a VMFS 5 disk of 2TB ends up being 2,181,972,430,848 bytes.

> Work around is to either convert the disk to thin by storage vmotion OR have the datastore to 1.95 or 1.99 TB.

inDepth

Turns out VMware has some bugs that it needs to take care of. The most recent that is reported is the disk space reporting on a 2TB thick provisioned datastore presented to a virtual machine. Now turns out that the du command on the shell and the datastore browser cannot read any bytes beyond the 32 bit addressing limitation. Now a VMFS5 thick disk 2TB datastore ends up being 2,181,972,430,848 bytes.

This errors out and datastore browser and du command on ssh reports it as 0.00 bytes throwing YOU into panic mode.

Apparently this does not affect any data on the disk nor will this prevent us from performing any operations on the disk such as storage vmotion. This ends up being more of a reporting issue and could throw some of the end users who have read-only access to their vms over the vsphere web-client into panic mode.

Obviously, a quick fix would be to NOT have a disk presented as 2TB and instead having the disk presented as 1.97TB may be. Another alternative would be to pick the provisioning as thin format – but that can add management risks and overhead. My suggestion would be to have a thick disk of 1.95 TB.

Also remember, althought the datastore can be as large as 64TB, the disk presented to the vm can only be 2TB at maximum. You cannot have one single 64TB disk presented to the VM.

Here is the KB article in case you want to read.

Do comment 🙂

DEPLOYING OVF/OVA USING VSPHERE WEB-CLIENT FAILS

vcenter web client ova/ovf deployment error

vcenter web client ova/ovf deployment error

vQuicky

> Found out that deploying and ovf/ova file using vsphere web client fails

> Deployment errors out while selecting storage – “No Datastores Found on Target” and/or “cURL error: Couldn’t connect to server”

> File can be deployed using vsphere windows client.

> VMware aware of the issue and the only work around is to use the windows client.

inDepth

When deploying an ovf/ova file using the vsphere web client – the process fails while selecting the datastore. You see a “no datastores found on target” error. This is a bug so don’t panic 🙂

The work around is to use the vsphere standalone client to upload the ova/ovf files.

If you want to check out the kb article that talks about the datastore error but not about the cURL error – click here.