What to check after running the system recovery

Several tasks must be completed before you use the system.

The recovery procedure recreates the old system from the quorum data. However, some things cannot be restored, such as cached data or system data managing in-flight I/O. This latter loss of state affects RAID arrays managing internal storage. The detailed map about where data is out of synchronization has been lost, meaning that all parity information must be restored, and mirrored pairs must be brought back into synchronization. Normally this results in either old or stale data being used, so only writes in flight are affected. However, if the array had lost redundancy (such as syncing, or degraded or critical RAID status) prior to the error requiring system recovery, then the situation is more severe. Under this situation you need to check the internal storage:
  • Parity arrays will likely be syncing to restore parity; they do not have redundancy when this operation proceeds.
  • Because there is no redundancy in this process, bad blocks might have been created where data is not accessible.
  • Parity arrays could be marked as corrupt. This indicates that the extent of lost data is wider than in-flight I/O, and in order to bring the array online, the data loss must be acknowledged.
  • RAID-6 arrays that were actually degraded prior the system recovery might require a full restore from backup. For this reason, it is important to have at least a capacity match spare available.
Be aware of these differences regarding the recovered configuration:
  • FlashCopy mappings are restored as "idle_or_copied" with 0% progress. Both volumes must have been restored to their original I/O groups.
  • The management ID is different. Any scripts or associated programs that refer to the system-management ID of the clustered system (system) must be changed.
  • Any FlashCopy mappings that were not in the "idle_or_copied" state with 100% progress at the point of disaster have inconsistent data on their target disks. These mappings must be restarted.
  • Intersystem remote copy partnerships and relationships are not restored and must be re-created manually.
  • Consistency groups are not restored and must be re-created manually.
  • Intrasystem remote copy relationships are restored if all dependencies were successfully restored to their original I/O groups.
  • If hardware was replaced before the recovery, the SSL certificate might not be restored. If it is not restored, then a new self-signed certificate is generated with a validity of 30 days. Follow the associated Directed Maintenance Procedures (DMP) for a permanent resolution.
  • The system time zone might not have been restored.
  • Any Global Mirror secondary volumes on the recovered system might have inconsistent data if there was replication I/O from the primary volume cached on the secondary system at the point of the disaster. A full synchronization is required when recreating and restarting these remote copy relationships.
  • Immediately after the T3 recovery process runs, compressed disks do not know the correct value of their used capacity. The disks initially set the capacity as the entire real capacity. When I/O resumes, the capacity is shrunk down to the correct value.

    Similar behavior occurs when you use the -autoexpand option on vdisks. The real capacity of a disk might increase slightly, caused by the same kind of behavior that affects compressed vdisks. Again, the capacity shrinks down as I/O to the disk is resumed.

Before using the volumes, complete the following tasks:
  • Start the host systems.
  • Manual actions might be necessary on the hosts to trigger them to rescan for devices. You can complete this task by disconnecting and reconnecting the Fibre Channel cables to each host bus adapter (HBA) port.
  • Verify that all mapped volumes can be accessed by the hosts.
  • Run file system consistency checks.
For Virtual Volumes (VVols), complete the following tasks.

For Virtual Volumes (VVols), also be aware of the following information.

FlashCopy mappings are not restored for VVols. The implications are as follows.
  • The mappings that describe the VM's snapshot relationships are lost. However, the Virtual Volumes that are associated with these snapshots still exist, and the snapshots might still appear on the vSphere Web Client. This outcome might have implications on your VMware back-up solution.
    • Do not attempt to revert to snapshots.
    • Use the vSphere Web Client to delete any snapshots for VMs on a VVol data store to free up disk space that is being used unnecessarily.
  • The targets of any outstanding 'clone' FlashCopy relationships might not function as expected (even if the vSphere Web Client recently reported clone operations as complete). For any VMs, which are targets of recent clone operations, complete the following tasks.
    • Perform data integrity checks as is recommended for conventional volumes.
    • If clones do not function as expected or show signs of corrupted data, take a fresh clone of the source VM to ensure that data integrity is maintained.