Category Archives: Virtualization

Everything about Virtualization


Ran into this KB article that speaks of the catalog status in vcloud to be shown as “Unknown”.

In vCloud Director 5.5.0, the status of a catalog is based on a task stored in the database.
This issue occurs when the catalog has existed for longer than the Activity Log of vCloud Director, the task may have been deleted from the database resulting in a catalog status of UNKNOWN.
I haven’t run into this issue personally but you may have and VMware article states that this does not affect the catalog itself.
Apparently there is no fix for this and the only resolution is to extend the activity log to 365 days. Don’t forget the space that is needed when you increase the number of days to keep for the log.
1. Log in as a System Administrator.
2. Go to the Administration section.
3. Navigate to General > Activity Log.
4. Update the number of days to keep to 365.
Here is the KB article.


VMware released the OpenSSL vulnerability express patch for its Site Recovery Manager. This fixes the openssl vulnerability so yes this is a critical patch!

The KB for this is here.


The push with cloud and any application today is to scale out. Scaling out allows for high loads to be distributed with multiple nodes fulfilling the request and also make high availability possible. Much intelligence is built into the application for scale out architectures because it is the application that should be able to handle a request across multiple nodes.

However I read up on how stackoverflow is able to serve 560 million page views over just 25 servers with their scale up approach rather than scale out approach and the stats are very impressive!

Here is an excerpt of the stats.


  • StackExchange network has 110 sites growing at a rate of 3 or 4 a month.

  • 4 million users

  • 8 million questions

  • 40 million answers

  • As a network #54 site for traffic in the world

  • 100% year over year growth

  • 560 million pageviews a month

  • Peak is more like 2600-3000 requests/sec on most weekdays. Programming, being a profession, means weekdays are significantly busier than weekends.

  • 25 servers

  • 2 TB of SQL data all stored on SSDs

  • Each web server has 2x 320GB SSDs in a RAID 1.

  • Each ElasticSearch box has 300 GB also using SSDs.

  • Stack Overflow has a 40:60 read-write ratio.

  • DB servers average 10% CPU utilization

  • 11 web servers, using IIS

  • 2 load balancers, 1 active, using HAProxy

  • 4 active database nodes, using MS SQL

  • 3 application servers implementing the tag engine, anything searching by tag hits

  • 3 machines doing search with ElasticSearch

  • 2 machines for distributed cache and messaging using Redis

  • 2 Networks (each a Nexus 5596 + Fabric Extenders)

  • 2 Cisco 5525-X ASAs (think Firewall)

  • 2 Cisco 3945 Routers

  • 2 read-only SQL Servers for used mainly for the Stack Exchange API

  • VMs also perform functions like deployments, domain controllers, monitoring, ops database for sysadmin goodies, etc.

You can read more here.

Here’s a interesting video of their architecture by their software developer Marco Cecconi. What I did learn is that we should scale up before scaling out and ofcourse this all depends on the workloads and use cases we are dealing with.


I know am late to the party but Yes there are many tools that allow you to check out what your I/O performance on the hypervisor looks like but I particularly like this fling. Recently I ran into a ton of storage issues while using vcloud director to spin up >10 virtual machines. Now random I/O isn’t great especially when you are using a vmax as a back end storage – what makes it worse that it was a shared array. Although separating them on to their own disk spindles might have helped but really without VASA enabled we weren’t getting anywhere.

This fling deploys as a virtual appliance and has a interface that automates and reports back storage performance with “graphical results”!

Here is an excerpt –

I/O Analyzer can use Iometer to generate synthetic I/O loads or a trace replay tool to deploy real application workloads. It uses the VMware VI SDK to remotely collect storage performance statistics from VMware ESX/ESXi hosts. Standardizing load generation and statistics collection allows users and VMware engineers to have a high level of confidence in the data collected.

It also has a full blown installation guide which makes this one awesome fling.

Check out this cool video – the video is for version 1.5 while version 1.6 is out already.

Check out the fling here.


vQuicky – 

> Large packet loss occurring in guest OS due to traffic bursts.

> Fix is to increase the receive and transmit buffer space within the guest operating system itself.

> Issue has been seen in ESXi 4.x and ESXi 5.x

inDepth – 

I haven’t experienced any but VMware KB came out with an article where while using ESXi 4.x and 5.x versions you see significant packet loss during burst traffic to your vms when they are configured with VMXNET3 vnics. The issue occurs during high traffic bursts only.

For example you have a 8-5 server or a file dump server which experiences large bursts of traffic at some set time. Users all logging in at 8 AM will cause high traffic burst possibly. VMware has confirmed that there is heavy packet loss due to lack of receive and transmit buffer space.

The fix of the issue is to increase the number of buffers in the guest operating system itself. VMware’s KB article lists the below as the steps to do so.

To reduce burst traffic drops in Windows 2008 R2 Buffer Settings:
  1. Click Start > Control Panel > Device Manager.
  2. Right-click vmxnet3 and click Properties.
  3. Click the Advanced tab.
  4. Click Small Rx Buffers and increase the value. The default value is 512  and the maximum is 8192.
  5. Click Rx Ring #1 Size and increase the value. The default value is 1024 and the maximum is 4096.

You have more info at the KB article here.

Hope this helps if you had customer’s calling in complaining of low performance 🙂



vQuicky –

> Browse button fails when trying to upload media to vcd.

> Issue seems to be affecting vcloud director 5.5 version.

> Fix is to lower security settings for java on your browser.

inDepth – 

I have been using vcloud director but seldom upload media to my organization catalog. However if you were doing so and saw the browse button fail then don’t be alarmed as this is a known issue with the Java version you are running.

The fix is simple –

  1. Click Start  > All Programs > Java > Configure Java to open the Java Control Panel.
  2. Click the Security tab.
  3. Change the Security Level option to medium.
  4. Click the Edit Site List button and add the URL address of the vCloud Director web interface.

More on the kb article here.


vQuicky –

>Recoverypoint SRA installation fails with error “SRA Command “discoveryArrays” failed. Failed opening session for user to site mgmt IP.”

> To fix the issue, use https instead of http by running the following commands on the SRM server where SRA’s are installed.

cd c:\Program Files\VMware\VMware vCenter Site Recovery

c:\Program Files\VMware\VMware vCenter Site Recovery Manager\external\perl-5.14.2\bin\perl.exe” –useHttps true

inDepth – 

I ran into a pretty good article by Duco where he mentioned an issue with recovery point 4.1 and the SRA 4.2. Seems like the SRA install fails with the discoveryarrays issue. There seems to be no kb articles addressing this however the fix is simple, need to use https to connect to the recoverypoint cluster rather than Http.

Just run the commands listed above in your SRM server where the SRA is installed and you should be good to go.

Here is the original link.


With the release of vSphere 5.5, a slew of releases took place and one of them is VMware’s Site Recovery Manager 5.5. Below is a list of “what’s new” in SRM 5.5

1. Storage DRS and Storage vMotion is supported by SRM on protected virtual machines. Now with Array Based Replication SRM can support storage migration of virtual machines only if the datastores are all part of the same consistency group on the array. There is no checks made my SRM to check if disks are part of the same consistency group yet so you are still on the hook to make sure that those disks are part of the same consistency group on the array. A good way of doing this is to create a pool of disks that belong to the same consistency group and have your virtual machines protected to this pool.

2.. After a failover has occured you can now choose a point in time to revert to i.e. multiple point in time failover is now available. This helps greatly for instance if you had to failover and end up with the virtual machine not working on the DR site for some reason, all you have to then do is browse through the snapshot manager of a VM and pick another point in time to restore to. You can retain upto 24 historical points in time.

3. Few know that vSphere Replication is very much part of SRM and with 5.5 you can now deploy multiple vsphere replication appliances per vCenter server. This is an upgrade from what we had in 5.1 i.e. one appliance per vCenter server.

Read More …


I am rarely using tweetdeck but today I did and I am glad I did. I saw William Lam’s tweet about how one of the engineers showed him how to snapshot a state of the physical ESX(i) host so any changes can be restored.

Now if it was a nested ESX(i) host (a vm as a esxi host) you could simply snapshot it. But this is how you do it for a physical ESX(i) host – possibly in your lab.

“It turns out you could “snapshot” a physical or even virtual ESXi host by just backing up the state.tgz file and then restoring it. As the name suggest, the state.tgz file contains all the configurations of your ESXi host. The process is pretty straight forward:

  1. SCP /bootbank/state.tgz and back that up to your local system or shared storage
  2. Perform your tests or make changes to the system
  3. When you are ready to restore, copy the state.tgz back into /bootbank folder
  4. Login to ESXi Shell and run reboot -f which will ensure no changes are saved to our state.tgz

Once the ESXi host reboots, it will use the restored state.tgz file and your system will be back at its original state. This process is actually not new, ESXi already provides a way to backup/restore “

Here is the original post.



No vQuicky note on this one – just fun to read so read on 🙂

I was working on my lab for a new deep dive blog post and guess what I run into and was actually glad/surprised to notice – VUM is now in vsphere webclient 5.5!

I recently upgraded my home lab with vsphere and vcenter 5.5 and also vmware’s update manager as well. It has been running great ever since and my discovery brought a smile to my face. Not that I keep my hosts up to date all the time but it was more about the convenience of not having to switch back and forth between the Windows client and the web client – now that I am getting used to using it.


Above u see the vmware vsphere update manager plugin installed and enabled. You could disable it which is the only action allowed as expected.

And on the Summary page for the Host – you have a update manager status showing up.

Read More …