Several tasks must be completed before
you use the system.
The recovery procedure recreates the
old system from the quorum data. However, some things cannot
be restored, such as cached data or system data managing in-flight
I/O. This latter loss of state affects RAID arrays managing internal
storage. The detailed map about where data is out of synchronization
has been lost, meaning that all parity information must be restored,
and mirrored pairs must be brought back into synchronization. Normally
this results in either old or stale data being used, so only writes
in flight are affected. However, if the array had lost redundancy
(such as syncing, or degraded or critical RAID status) prior to the
error requiring system recovery, then the situation is more severe.
Under this situation you need to check the internal storage:
- Parity arrays will likely be syncing to restore parity; they do
not have redundancy when this operation proceeds.
- Because there is no redundancy in this process, bad blocks might
have been created where data is not accessible.
- Parity arrays could be marked as corrupt. This indicates that
the extent of lost data is wider than in-flight I/O, and in order
to bring the array online, the data loss must be acknowledged.
- RAID-6 arrays that were actually degraded
prior the system recovery might require a full restore from backup.
For this reason, it is important to have at least a capacity match
spare available.
Be aware of these differences regarding the recovered
configuration:
- FlashCopy mappings
are restored as "idle_or_copied" with 0% progress. Both volumes
must have been restored to their original I/O groups.
- The management ID is different. Any scripts or associated programs
that refer to the system-management ID of the clustered system (system)
must be changed.
- Any FlashCopy mappings
that were not in the "idle_or_copied" state with 100% progress
at the point of disaster have inconsistent data on their target disks.
These mappings must be restarted.
- Intersystem remote copy partnerships and relationships are not
restored and must be re-created manually.
- Consistency groups are not restored and must be re-created manually.
- Intrasystem remote copy relationships are restored if all dependencies
were successfully restored to their original I/O groups.
- If hardware was replaced before
the recovery, the SSL certificate might not be restored. If it is
not restored, then a new self-signed certificate is generated with
a validity of 30 days. Follow the associated Directed Maintenance
Procedures (DMP) for a permanent resolution.
- The system time zone might not have been restored.
- Any Global Mirror secondary volumes
on the recovered system might have inconsistent data if there was
replication I/O from the primary volume cached on the secondary system
at the point of the disaster. A full synchronization is required when
recreating and restarting these remote copy relationships.
- Immediately after the T3 recovery process runs, compressed disks
do not know the correct value of their used capacity. The disks initially
set the capacity as the entire real capacity. When I/O resumes, the
capacity is shrunk down to the correct value.
Similar behavior occurs
when you use the -autoexpand option on vdisks. The
real capacity of a disk might increase slightly, caused by the same
kind of behavior that affects compressed vdisks. Again, the capacity
shrinks down as I/O to the disk is resumed.
- Manual actions might be necessary on the hosts to trigger them
to rescan for devices. You can complete this task
by disconnecting and reconnecting the Fibre Channel cables to each
host bus adapter (HBA) port.
- Verify that all mapped volumes can be accessed by the hosts.
- Run the application consistency checks.
For Virtual Volumes (VVols), complete the following
tasks.
- After you confirm that the T3 completed successfully, restart
Spectrum Control Base (SCB) services. Use the Spectrum Control Base
command service ibm_spectrum_control start.
- Refresh the storage system information on the SCB GUI to ensure
that the systems are in sync after the recovery.
- To complete this task, login to the SCB GUI.
- Hover over the affected storage system, select the menu launcher,
and then select Refresh. This step repopulates
the system.
- Repeat this step for all Spectrum Control Base instances.
- Rescan the storage providers from within the vSphere Web Client.
For Virtual Volumes (VVols), also be aware of the following
information.
FlashCopy mappings are not restored for VVols.
The implications are as follows.
- The mappings that describe the VM's snapshot relationships are
lost. However, the Virtual Volumes that are associated with these
snapshots still exist, and the snapshots might still appear on the
vSphere Web Client. This outcome might have implications on your VMware
back-up solution.
- Do not attempt to revert to snapshots.
- Use the vSphere Web Client to delete any snapshots for VMs on
a VVol data store to free up disk space that is being used unnecessarily.
- The targets of any outstanding 'clone' FlashCopy relationships
might not function as expected (even if the vSphere Web Client recently
reported clone operations as complete). For any VMs, which are targets
of recent clone operations, complete the following tasks.
- Perform data integrity checks as is recommended for conventional
volumes.
- If clones do not function as expected or show signs of corrupted
data, take a fresh clone of the source VM to ensure that data integrity
is maintained.