Okay have you noticed or is it just me who finds too many known and unresolved issues with SRM 5.1? Now granted that SRM is out that solves a few of them – these still remain.

Hopefully VMware is taking note and getting these taken care of!

The following known issues have been discovered through rigorous testing and will help you understand some behavior you might encounter in this release.

    • SRM Might Encounter Errors Mounting Datastores During Recoveries 

During a test recovery or actual failover, SRM waits for recovered datastores to become available. After datastores become available, SRM attempts to mount any datastores that are not mounted. In rare instances, these datastores are automatically mounted before SRM can mount them. If this occurs during a test failover, the failover does not complete. If this occurs during an actual recovery, the recovery completes with an error. To resolve this issue, retry the recovery.

    • Temporary Loss of vCenter Server Connections Might Create Recovery Problems for Virtual Machines with Raw Disk Mappings

If the connection to the vCenter Server is lost during a recovery, one of the following might occur:

      • The vCenter Server remains unavailable, the recovery fails. To resolve this issue re-establish the connection with the vCenter Server and re-run the recovery.
      • In rare cases, the vCenter Server becomes available again and the virtual machine is recovered. In such a case, if the virtual machine has raw disk mappings (RDMs), the RDMs might not be mapped properly. As a result of the failure to properly map RDMs, it might not be possible to power on the virtual machine or errors related to the guest operating system or applications running on the guest operating system might occur.
        • If this is a test recovery, complete a cleanup operation and run the test again.
        • If this is an actual recovery, you must manually attach the correct RDM to the recovered virtual machine.

      Refer to the vSphere documentation about editing virtual machine settings for more information on adding raw disk mappings.

    • Cancellation of Recovery Plan Not Completed

When a recovery plan is run, an attempt is made to synchronize virtual machines. It is possible to cancel the recovery plan, but attempts to cancel the recovery plan run do not complete until the synchronization either completes or expires. The default expiration is 60 minutes. The following options can be used to complete cancellation of the recovery plan:

  • Pause vSphere Replication, causing synchronization to fail. After recovery enters an error state, use the vSphere Client to restart vSphere Replication in the vSphere Replication tab. After replication is restarted, the recovery plan can be run again, if desired.
  • Wait for synchronization to complete or time out. This might take considerable time, but does eventually finish. After synchronization finishes or expires, cancellation of the recovery plan continues.


    • Valid Certificates Produce Warnings

When uploading and installing certificates to the vSphere Replication appliance, the following error occurs:The certificate installed with warnings. Remote VRM systems with the 'Accept only SSL certificate signed by a trusted CA' option enabled might be unable to connect to this site for the following reason: The certificate was not issued for use with the given hostname:VRM hostname

This error can be ignored, or you can avoid this error by using a supported browser other than Internet Explorer.

    • Non-ASCII Passwords Not Accepted For Log In To Virtual Appliance Management Infrastructure (VAMI)

Users can manage the vSphere Replication appliance using VAMI. Attempts to log on to VAMI with an account with a password that uses non-ASCII character fails. This occurs even when correct authentication information is provided. This issue occurs in all cases where non-ASCII passwords are used with VAMI. To avoid this issue, use ASCII passwords or connect using SSH.

    • Outdated Replication Status Displayed if Datastore Becomes Unavailable

It is possible that after virtual machine synchronization begins, the target datastore becomes unavailable. In such a case, the group status should display information about this failure, but the status remains unchanged. To identify issues related to datastore unavailability, use the events generated by the target datastore. The following events are generated in such a case:

  • Datastore is not accessible for VR Server... Generated immediately after datastore becomes inaccessible
  • Virtual machine vSphere Replication RPO is violated... Replica can not be generated within the specified RPO


    • Stopping Datastore Replication for Protected Virtual Machines Produces Incorrect Error Messages

It is possible to protect a virtual machine that has disks on multiple datastores and then subsequently disable replication for one of the datastores. In such a case, the virtual machine’s status in the protection group changes toInvalid: Virtual machine 'VM' is no longer protected. Internal error: Cannot create locator for disk'2001'... This information is incorrect. The status should change to Datastore '[datastore name]' is no longer replicated.

    • Virtual Machine Recovery Fails Due to Disk Configuration Error

It is possible to place different disks and configuration files for a single protected virtual machine on multiple datastores. During recovery, SRM must have access to raw disk mapping and parent disk files. Without this access, SRM cannot determine disk types during recovery. In such a case, SRM might assume that a Raw Disk Mapping (RDM) disk is a non-RDM disk, resulting in a failed reconfiguration. To avoid this issue, ensure all hosts that can access recovered virtual machine configuration files can also access RDM mapping files and any parent disks, if such disks exist.

    • Pairing Sites Fails Due to Different Certificate Trust Methods

When pairing SRM sites, the error Local and Remote servers are using different certificate trust methods appears. This occurs when the root certificate for the Certificate Authority (CA) signing the certificate is missing on SRM Server. To resolve this issue, install the root certificate for the SRM certificate’s signing Certificate Authority using Microsoft Management Console. After installing the certificate, perform an SRM installation Modify operation to provide the user-generated certificate again.

    • Recovery Fails to Progress After Connection to Protected Site Fails

If the protection site becomes unreachable during a deactivate operation or during RemoteOnlineSync or RemotePostReprotectCleanup, both of which occur during reprotect, then the recovery plan might fail to progress. In such a case, the system waits for the virtual machines or groups that were part of the protection site to complete those interrupted tasks. If this issue occurs during a reprotect operation, you must reconnect the original protection site and then cancel and restart the recovery plan. If this issue occurs during a recovery, it is sufficient to cancel and restart the recovery plan.

    • vSphere Replication Appliance Fails to Support Valid ESX Hosts

During vSphere Replication configuration, when a datastore is being selected on a supported version of ESX, the message VR server Server Name has no hosts through which to access destination datastore ... appears. This occurs when adding a new host to vCenter Server or during registration of vSphere Replication server, if there is a temporary interruption of communication between the vSphere Replication appliance and the vSphere Replication server. Communication problems typically arise due to temporary loss of connectivity or to the server services being stopped.To resolve this issue, restart the vSphere Replication management server service.

  1. Log into the virtual appliance management interface (VAMI) of the vSphere Replication appliance at https://vr_applliance_address:5480.
  2. Click Configuration > Restart under Service Status.


    • Datastores Fail to Unmount When on Distributed Power Management (DPM) Enabled Clusters

Planned migrations and disaster recoveries fail to unmount datastores from hosts that are attached to a DPM cluster if the host enters standby mode. The error Error: Cannot unmount datastore datastorename from host hostname. Unable to communicate with the remote host, since it is disconnected might appear. To resolve this issue, turn off DPM at the protected site before completing planned migrations or disaster recoveries. You can choose to turn DPM back on after completing recovery tasks.

    • vSphere Replication Servers Deployed With an Unspecified Network Configuration Malfunction

vSphere Replication servers are deployed from an OVF file using the OVF deployment wizard. The deployment wizard includes a page for specifying the vSphere Replication server’s network configuration. If no network settings are specified for the network configuration, DHCP addressing is used, but vSphere Replication servers do not support DHCP addressing. To avoid this issue, specify valid network settings for the vSphere Replication server during deployment.

    • Generic Error Message Is Displayed When Server Pairing Fails Due to Certificate Policy Strictness

Attempts to pair servers between sites might fail, displaying the following error message: Site pairing or break operation failed. Details: VRM Server generic error. This error might occur when one site being configured to use a strict certificate policy and the other site being configured to use a lenient certificate policy. In such a case, the pairing should fail, as it does. After such a failure, modify the lenient certificate policy to use strict certificate policy and provide a valid certificate.

    • Including a percent (%) symbol in a folder name on the recovery site creates a new folder during replication.

If you include a percent (%) symbol in the folder name on the recovery site and try to configure replication to that folder, the replication might be created in an incorrect folder with additional encoding. For example, if you create the folder %3dTest, vSphere Replication creates a new folder %253dTest and places the replication in this folder.

    • Context-sensitive help is not accessible in Internet Explorer 7
    • SRM fails to recover virtual machines after RDM failures.

Raw Disk Mapping (RDM) LUNs might fail while LUNs that back datastores are unaffected. In such a case, SRM cannot recover virtual machines with RDMs.Workaround: Recover affected virtual machines manually. Failover the RDM LUN and reattach it as an RDM disk on the recovered virtual machine.

    • vSphere Replication appliance status is Disconnected when running the SRM client plug-in on Windows XP or Windows 2003.

The status of the vSphere Replication appliance shows as Disconnected in the Summary tab for a vSphere Replication site. Attempting to reconfigure the connection results in the error Lost connection to local VRMS server at server_address:8043. (The client could not send a complete request to the server 'server_address'. (The underlying connection was closed: An unexpected error occurred on a send.)). This problem occurs because the SRM client plug-in and vSphere Client cannot negotiate cryptography when the SRM client plug-in runs on older versions of Windows. If you run the desktop version of vSphere Client and SRM client plug-in on Windows XP 64-bit or Windows Server 2003 SP2, you might encounter incompatibilities between server and client cryptography support.Workaround: Download and install the Microsoft Hotfix from Microsoft KB 948963. This hotfix is not applied in any regular Windows updates so you must manually download and apply the fix.

    • Recovery takes a long time to finish and reprotect fails with error 

Cannot check login credentials. Authentication service infrastructure failed.This error occurs due to the exhaustion of ephemeral ports in vCenter Server running on Windows 2003 server. The SRM Server cannot communicate with vCenter Server.Workaround:

  1. Install the Microsoft hotfix from KB 979230 to fix a problem in the tcpip.sys driver.
  2. Set the following regedit values, either by making the changes manually or by importing the following .reg file:Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "MaxUserPort"=dword:00002710 "TcpTimedWaitDelay"=dword:0000001E
  3. If the registry values do not exist, create them.
  4. Restart the Windows 2003 Server machine after making the changes.


    • Error in recovery plan when shutting down protected virtual machines: 

Error - Operation timed out: 900 seconds during Shutdown VMs at Protected Site step.If you use SRM to protect datastores on arrays that support dynamic swap, for example Clariion, running a disaster recovery when the protected site is partially down or running a force recovery can lead to errors when re-running the recovery plan to complete protected site operations. One such error occurs when the protected site comes back online, but SRM is unable to shut down the protected virtual machines. This error usually occurs when certain arrays make the protected LUNs read-only, making ESXi unable to complete I/O for powered on protected virtual machines.Workaround: Reboot ESXi hosts on the protected site that are affected by read-only LUNs.

    • Generating support bundles on a heavily loaded environment might disrupt ongoing vSphere Replication operations.

Generating support bundles in heavily loaded environments can cause vSphere Replication connection problems during recovery operations. This specifically occurs if the storage for the vSphere Replication virtual machine is overloaded.Workaround: If an operation fails to start when the vSphere Replication server is blocked by generation of the support bundle, attempt to rerun the operation. Re-evaluate the expected storage bandwidth requirements of the cluster, as well as the network bandwidth if the storage is NAS.

    • Rerunning reprotect fails with error: 

Protection Group '{protectionGroupName}' has protected VMs with placeholders which need to be repaired.If a ReloadFromPath operation does not succeed during the first reprotect, the corresponding protected virtual machines enter a repairNeeded state. When SRM runs a reprotect on the protection group, SRM cannot repair the protected virtual machines nor restore the placeholder virtual machines. The error occurs when the first reprotect operation fails for a virtual machine because the corresponding ReloadFromPath operation failed.Workaround: Rerun reprotect with the force cleanup option enabled. This option completes the reprotect operation and enables the Recreate placeholder option. Click Recreate placeholder to repair the protected virtual machines and to restore the placeholder virtual machines.

    • Protect virtual machine task appears to remain at 100%.

The VI Client Recent Tasks pane shows a virtual machine stuck at 100% during the Protect VM task. SRM marks the virtual machine as Configured, indicating that it was protected. You do not need to take action as SRM successfully protected the virtual machine.

    • Cleanup fails if attempted within 10 minutes after restarting recovery site ESXi hosts from maintenance mode.

The cleanup operation attempts to swap placeholders and relies on the host resilience cache which has a 10 minute refresh period. If you attempt a swap operation on ESXi hosts that have been restarted within the 10 minute window, SRM does not update the information in the SRM host resiliency cache, and the swap operation fails. The cleanup operation also fails.Workaround: Wait for 10 minutes and attempt cleanup again.

    • SRM stops during an attempt to protect an already reprotected array-based virtual machine using vSphere Replication.

If you run a recovery, then try to use vSphere Replication to protect a virtual machine already protected by an array-based protection group, SRM Server asserts.Workaround: Restart SRM Server and unprotect the array-based protected virtual machine first before protecting with vSphere Replication. Alternatively, continue with array-based protection and do not not protect with vSphere Replication. SRM does not support protecting with both providers.

    • Reprotect fails with error: 

Operation timed out: 3600 seconds VR synchronization failed for VRM group <Unavailable>. Operation timed out: 3600 seconds.When you run reprotect, SRM performs an online sync for the replication group which might time out the operation. The default timeout value is 2 hours.Workaround: Increase the timeout value in Advanced Settings in SRM.

    • Cannot configure a virtual machine with physical mode RDM disk even if the disk is excluded from replication.

If you configure a replication for a virtual machine with physical mode, you might see the following error:VRM Server generic error. Check the documentation for any troubleshooting information. The detailed exception is: HMS can not set disk UUID for disks of VM : MoRef: type = VirtualMachine, value = , serverGuid = null'.Workaround: None.

    • Planned migration fails with Error: Unable to copy the configuration file...

If there are two ESXi hosts in a cluster and one host loses connectivity to the storage, the other host can usually recover replicated virtual machines. In some cases the other host might not recover the virtual machines and recovery fails with the following error: Error: Unable to copy the configuration file...Workaround: Rerun recovery.

    • While reprotecting a virtual machine, the following error might occur during the “Configure protection to reverse direction” step: Error - The operation was only partially completed for the protection group 'pg_name' since a protected VM belonging to it was not successful in completing the operation. VM 'vm_name' is not replicated by VR.

This error occurs during the second reprotect run if the first run failed with Operation Timed out error during “Configure storage to reverse direction” step.Workaround: Manually configure reverse replication for the affected virtual machines and rerun reprotect. For information on reverse replication, see vSphere Replication Administration: Failback of Virtual Machines in vSphere Replication.

    • vSphere Replication cannot access datastores through hosts with multiple management virtual NICs and postsDatastoreInaccessibleEvent in vCenter Server: vSphere Replication cannot access datastore.

If a host is configured with multiple virtual NICs and you select more than one NIC for management traffic, vSphere Replication registers only the first NIC and uses it to access target datastores. If the vSphere Replication server address is not on the first management network of the host, vSphere Replication does not communicate with the host.Workaround: Use a host with a single virtual NIC selected for management traffic for datastores at the secondary site. You can also reconfigure the host networking so that the address of the first management virtual NIC is from a network that vSphere Replication can access.

    • A virtual machine cannot power off due to a pending question error.

If you create a permanent device loss (PDL) situation, accidentally or deliberately, by dropping an initiator from the SAN to the host where the virtual machine is registered, you might see the following error:Error: The operation cannot be allowed at the current time because the VM has a question pending...

This error occurs if hardware fails on the recovery site during PDL while running a clean up after you ran a recovery plan in test recovery mode.

Workaround: Answer the question in the virtual machine Summary tab. Then rerun clean up in force clean up mode. After the clean up operation completes, the virtual machine might still exist on the recovery site, in which case, remove it manually.

    • SRM version 5.0 can communicate with upgraded SRM Server version 5.1 while running recovery.

If you upgrade the recovery site from version 5.0 to version 5.1 and attempt a disaster recovery on the upgraded site, SRM Servers version 5.0 on the protected site and SRM Server version 5.1 on the recovery site can communicate with each other and can perform operations on the protected site. If you run a reprotect operation before you upgrade the protected site, the operation runs for a very long time without any progress.Before running a recovery on an upgraded site, stop all SRM 5.0 services that are still running on the remote site. Otherwise, SRM Servers with incompatible versions can still communicate with each other.

    • Internal error occurs during recovery.

SRM retrieves various information from vCenter during the recovery process. If it does not receive critical information required to proceed, an internal error CannotFetchVcObjectProperty can occur. This error might occur when vCenter is under heavy stress or an ESXi host becomes unavailable due to heavy stress. This error might also occur when SRM tries to look up information of an ESXi host that is in a disconnected state or has been removed from vCenter inventory.Workaround: Rerun the recovery plan.

    • Virtual machine VNIC’s MAC address is usually preserved during recovery.

Under very rare circumstances, test or recovery might fail to recover a specific virtual machine because vCenter unexpectedly assigns a new MAC address to the virtual machine’s VNIC on the recovery site. The error message in the result column in the recovery steps is the following: Error - Cannot complete customization, possibly due to a scripting runtime error or invalid script parameters (Error code: 255). IP settings might have been partially applied. The SRM logs contain a message: Error finding the specified NIC for MAC address = xx::xx:xx:xx:xx where xx::xx:xx:xx:xx is the expected MAC address.Workaround: Modify the affected virtual machine’s MAC address manually in the vSphere Client virtual machine Properties to “xx::xx:xx:xx:xx” and restart the recovery plan.

    • vSphere Replication reports “Datastore is not accessible” for datastores at a host added to vCenter Server inventory while registering vSphere Replication server.

vSphere Replication selects all supported hosts from vCenter inventory and enables them as part of vSphere Replication registration. If you add a host to vCenter while vSphere Replication is still being registered, vSphere Replication does not select this host and it cannot access datastores on the recovery site.Workaround: Disconnect and reconnect the host in the vCenter inventory for vSphere Replication to enable it.

    • Synchronize virtual machine, recovery, or reprotect operations fail with vSphere Replication generic error: The requested instance with Id=<...> was not found on the remote site.

Although the operation reports failure, vSphere Replication successfully synchronizes the virtual machine state to the remote site. This error can occur when you request a synchronize operation or when you run main operations such as recovery or reprotect which use this operation.Workaround: Rerun the failed operation.

    • Recovered VMFS volume fails to mount with error: Failed to recover datastore.

This error might occur due to a latency between vCenter, ESXi and SRM Server.Workaround: Rerun the recovery plan.

    • vSphere Replication server registration might take a long time depending on the number of hosts in the vCenter Server inventory.

If the vCenter Server inventory contains a few hundred or more hosts, the Register VR server task takes an hour or more to complete, as vSphere Replication updates each host’s SSL thumbprint registry. The vCenter Server Eventspane displays Host is configured for vSphere Replication for each host as the vSphere Replication server registration task progresses.Workaround: Wait for the registration task to complete. After it finishes, you can use vSphere Replication for incoming replication traffic.

    • vSphere Replication registration might fail with error: VRM server generic error ... Row was updated or deleted by another transaction ... HostEntity #<host-managed-object-id>.

The Register VR server operation might fail with this error if vCenter Server has a large number of hosts in its inventory and you perform the following actions while registration is in progress:

  • Remove a host from the vCenter Server inventory.
  • Remove and reconnect a host from the inventory.
  • Change the host’s SSL thumbprint.

Workaround: Retry the Register VR server operation.

    • Test recovery, planned migration, or re-protect workflow operations might fail with error:

Operation timed out.This error can occur when running multiple operations with multiple primary sites.Workaround: Re-run the failed operation.

    • A recovery or test workflow fails for a virtual machine with the following message: Error - Unexpected error '3008' when communicating with ESX or guest VM: Cannot connect to the virtual machine.

Under rare circumstances this error might occur when you configure IP customization or an in-guest callout for the virtual machine and the recovery site cluster is in fully-automated DRS mode. An unexpected vMotion might cause a temporary communication failure with the virtual machine, resulting in the customization script error.Workaround: Rerun the recovery plan. If the error persists, configure the recovery site cluster DRS to manual mode and rerun the recovery plan.

    • Some SRM initiated tasks that fail with a NoPermission error and displays Internal Error: vim.fault.NoPermission instead of Permission to perform this operation was denied.

The vSphere Client asserts if a mirrored task contains a MoRef to an object that is not a vCenter Server or SRM object.Workaround: If the failed SRM task is a recovery task, consult the recovery task pane for a more specific error. For a vCenter Server task failure, see the subtasks which contain more information.

    • Reprotect operation for multiple virtual machines targeting multiple remote sites fails with Unable to reverse replication for the virtual machine vm_name. Operation timed out.

vSphere Replication stops responding to SRM requests when reprotecting multiple virtual machines to multiple remote sites.Workaround: Change several vSphere Replication parameters:

      1. Stop the vSphere Replication management server: /etc/init.d/hms stop
      2. Edit /opt/vmware/hms/conf/hms-configuration.xml and change hms-db-max-connections from 99 to 500.
      3. Edit /var/lib/vrmsdb/postgresql.conf and change max_connections from 100 to 501.
      4. Restart the embedded vPostgres database: /etc/init.d/hms-vpostgres stop /etc/init.d/hms-vpostgres start
      5. Change hms-vlsi-server thread pool size: /opt/vmware/vpostgres/1.0/bin/psql -U vrmsdb vrmsdb update ConfigEntryEntity set configValue='250' where configKey = 'hms-vlsi-server-threadpool-size'
      6. Increase heap for vSphere Replication management server process: edit /etc/init.d/hms and add -Xmx1536M in JAVA_TOOL_OPTIONS.
      7. Start vSphere Replication management server: /etc/init.d/hms start
      8. Rerun the failed operation.


    • Last Sync Size value for a virtual machine protected by vSphere Replication is the amount of data that has changed since the last synchronization.

Even if you perform a full synchronization on a virtual machine that vSphere Replication protects, the Last Sync Size value shows the amount of data that has changed since the last synchronization, and not the size of the full virtual machine. This can be misinterpreted as meaning that the synchronization was not complete. After the initial synchronization, during a full synchronization of a virtual machine, vSphere Replication compares entire disks, but only transfers data that has changed, not the entire disk.To see the size and duration of the initial synchronization, you can check the Events that vSphere Replication posts to vCenter Server. This issue only occurs on ESXi 5.0.x hosts. This behavior has been clarified on ESXi 5.1 hosts.

    • Recovery or test recovery might fail with the error "No host with hardware version '7' and datastore 'ds_id' which are powered on and not in maintenance mode are available..." 

in cases in which very recent changes occur in the host inventory.SRM Server keeps a cache of the host inventory state. Sometimes when there are recent changes to the inventory, for example if a host becomes inaccessible, is disconnected, or loses its connection to some of the datastores, SRM Server can require up to 15 minutes to update its cache. If SRM Server has the incorrect host inventory state in its cache, a recovery or test recovery might fail.Workaround: Wait for 15 minutes before running a recovery if you have made changes to the host inventory. If you observe the error above, wait for 15 minutes then re-run the recovery.

    • Reprotect fails with an error message that contains Unable to communicate with the remote host, since it is disconnected.

This error might be due to the fact that the protected side cluster has been configured to use Distributed Power Management (DPM), and one of the ESX hosts required for the operation was put into standby mode. This could happen if DPM detected that the host had been idle, and put it in the standby mode. SRM had to communicate to the host in order to access the replicated datastore managed by this host. SRM does not manage the DPM state on the protected site but does, however, manage the DPM state during recovery, test, and cleanup on the recovery site.Workaround: If the error persists, temporarily turn off DPM and ensure the ESX hosts managing the replicated datastores on the protected side are turned on before attempting to run reprotect.

    • Test recovery cleanup might fail if one of the hosts loses connection to a placeholder datastore.

If you ran a test recovery on a cluster with two hosts on a recovery site and one of the hosts in the cluster loses connection to a placeholder datastore, cleanup of the test recovery might fail.Workaround: Run cleanup in force mode. On the recovery site, manually remove placeholder virtual machines created on the host that lost connection to the placeholder datastore. Remove the virtual machine replication configuration and reconfigure the replication. Reconfigure virtual machine protection from protection group properties.

    • Reprotect fails with an error when running multiple recovery plans concurrently.When running multiple recovery plans conconcurrently, reprotect can fail with the error Error - The operation was only partially completed for the protection group 'protection_group' since a protected VM belonging to it was not successful in completing the operation.Workaround: Run the reprotect operation again.
    • After restarting vCenter Server, when using vSphere Replication, reprotect operations fail with Error - Unable to reverse replication for the virtual machine 'virtual_machine'. The session is not authenticated.After vCenter Server restarts, it fails to refresh some sessions that SRM uses to communicate with vSphere Replication and causes reprotect to fail.Workaround: Restart the SRM services on both the sites.
    • When protection site LUNs encounter All Paths Down (APD) or Permanent Device Loss (PDL), SRM might not recover raw disk mapping (RDM) LUNs in certain cases.During the first attempt at planned migration you might see the following error message when SRM attempts to shut down the protected virtual machine:Error - The operation cannot be allowed at the current time because the virtual machine has a question pending: 'msg.hbacommon.askonpermanentdeviceloss:The storage backing virtual disk VM1-1.vmdk has permanent device loss. You might be able to hot remove this virtual device from the virtual machine and continue after clicking Retry. Click Cancel to terminate this session.If the protected virtual machines have RDM devices, in some cases SRM does not recover the RDM LUN.Workaround:
      1. When LUNs enter APD/PDL, ESXi Server marks all corresponding virtual machines with a question that blocks virtual machine operations.
        1. In the case of PDL, click Cancel to power off the virtual machine.
        2. In the case of APD, click Retry.

        If you run planned migration, SRM fails to power off production virtual machines.

      2. If the virtual machines have RDM devices, SRM might lose track of the RDM device and not recover it. Rescan all HBAs and make sure that the status for all of the affected LUNs has returned from the APD/PDL state.
      3. Check the vCenter Server inventory and answer the PDL question that is blocking the virtual machine.
      4. If you answer the PDL question before the LUNs come back online, SRM Server on the protected site incorrectly detects that the RDM device is no longer attached to this virtual machine and removes the RDM device. The next time you run a recovery, SRM does not recover this LUN.
      5. Rescan all HBAs to make sure that all LUNs are online in vCenter Server inventory and power on all affected virtual machines. vCenter Server associates the lost RDMs with protected virtual machines.
      6. Check the Array Managers tab in the SRM interface. If all the protected datastores and RDM devices do not display, click Refresh to discover the devices and recompute the datastore groups.
      7. Make sure that Edit Group Settings shows all of the protected datastores and RDM devices and that the virtual machine protection status does not show any errors.
      8. Start a planned migration to recover all protected LUNs, including the RDM devices.


  • Recovery fails with Error creating test bubble image for group ... 

The detailed exception is Error while getting host mounts for datastore:managed-object-id... or The object has already been deleted or has not been completely created.If you run a test recovery or a planned recovery and the recovery plan fails with the specific exception, the LUN used for storing replication data has been temporarily disconnected from ESXi. When reconnected, replication continues as normal and no replication data is lost. The exception occurs during these scenarios:

  • vSphere Replication cannot locate the LUN as the LUN has changed its internal ID.
  • The target datastore internal ID changes when the host containing the target datastore is removed from vCenter inventory and later added.

You must manually reconfigure the replication to refresh the new ID.

Workaround: If the primary site is no longer available, contact VMware Support for instructions about adding a special configuration entry in the vSphere Replication appliance database that triggers an automatic fix of the changed internal datastore ID to allow recovery. If the primary site is still available:

  1. Run a cleanup operation on the recovery plan that failed.
  2. In the Virtual Machines tab of the vSphere Replication view, right-click a virtual machine and select Configure Replication.
  3. Click Next, and click Browse to change the location of the files on the datastore that has been disconnected and then reconnected, and select the same datastore and folder locations as before.
  4. Reuse the existing disks and reconfigure the replication of the virtual machine. The vSphere Replication management server picks up the changed datastore identity (managed object ID) in vCenter Server.
  5. Wait for the initial sync to finish. This sync uses existing disks and checks for data consistency.


  • Running the SRM installer in Modify mode from the command line with the CUSTOM_SETUP option results in an error.

If you installed SRM by using the CUSTOM_SETUP option, for example to create a shared recovery site setup, attempting to run the SRM installer in Modify mode from the command line with the CUSTOM_SETUP option results in the error CUSTOM_SETUP command line not supported when standard installation already exists.Workaround: Use Windows control panel to start the SRM installer in Modify mode.

  • SRM stops unexpectedly during planned migration if ESXi Server is disconnected from vCenter Server on the protected site.

If the ESXi Server on the protected site is disconnected from vCenter Server or if it loses its connection to vCenter Server due to a problem, SRM stops unexpectedly if you attempt to perform a planned migration. The planned migration fails with an error.Workaround: Reconnect the ESX Server.

Original Link here.

Leave a Reply

Your email address will not be published.

Post Navigation