The recover system
procedure recovers the entire storage system if the system state is
lost from all control enclosure node canisters.The procedure re-creates the storage system by
using saved configuration data. The saved configuration
data is in the active quorum disk and the latest XML configuration
backup file. The recovery might not be able to restore all volume
data. This procedure is also known as Tier 3 (T3) recovery.
CAUTION:
If the system encounters a state where:
- No nodes are active, and
- One or more nodes have node errors that require
a node rescue, node canister replacement, or node firmware reinstallation
Attention:
- Run service actions only when directed by the fix procedures.
If used inappropriately, service actions can cause loss of access
to data or even data loss. Before you attempt
to recover a storage system, investigate the cause of the failure
and attempt to resolve those issues by using other fix procedures. Read and understand all of the instructions before you complete
any action.
- The recovery procedure can take several hours if
the system uses large-capacity devices as quorum devices.
Do not attempt the recover
system procedure unless the following conditions are met:
- All of the conditions have been met in When to run the recover system procedure.
- All hardware errors are fixed. See Fix hardware errors
- All node canisters have candidate
status. Otherwise, see step 1.
- All node canisters must be at
the same level of code that the storage system had before the system
failure. If any node canisters
were modified or replaced, use the service assistant to verify the
levels of code, and where necessary, to reinstall the level of code
so that it matches the level that is running on the other node canisters
in the system.
The system recovery procedure is one of several tasks
that must be completed. The following list is an overview of the tasks
and the order in which they must be completed:
- Preparing for system recovery
- Review the information regarding when to run the recover system
procedure.
- Fix your hardware errors and make sure that all nodes
in the system are shown in service assistant or in the output from sainfo lsservicenodes.
- Remove the system information
for node canisters with error code 550 or error code 578 by using
the service assistant, but only if the recommended user response for
these node errors has already been followed. See Removing system information for node canisters with error code 550 or error code 578 using the service assistant.
- For Virtual Volumes (VVols), shut down the services
for any instances of Spectrum Control Base that are connecting to
the system. Use the Spectrum Control Base command service
ibm_spectrum_control stop.
- Running the system recovery. After you prepared the system for
recovery and met all the pre-conditions, run the system recovery.
Note: Run the procedure on one system in a fabric at
a time. Do not run the procedure on different node canisters in the
same system. This restriction also applies to remote systems.
- Completing actions to get your environment operational.
- Recovering from offline volumes by using the CLI.
- Checking your system, for example, to ensure that all mapped volumes
can access the host.