Running system recovery by using the service assistant

You can use the service assistant to start recovery when all node canisters that were members of the system are online and have candidate status. For any nodes that display error code 550 or 578, ensure that all nodes in the system are visible and all the recommended actions are completed before you place them into candidate status. To place a node into candidate status, remove system information for that node canister. Do not run the recovery procedure on different node canisters in the same system.

Note: Ensure that the web browser is not blocking pop-up windows. If it does, progress windows cannot open.

Before you begin this procedure, read the recover system procedure introductory information; see Recover system procedure.

The service assistant can also be accessed by using the technician port. See Procedure: Accessing the service assistant from the technician port.

Attention: This service action has serious implications if not completed properly. If at any time an error is encountered not covered by this procedure, stop and call the support center.

Run the recovery from any node canisters in the system; the node canisters must not participate in any other system.

If the system has USB encryption, run the recovery from any node canister in the system that has a USB flash drive that is inserted which contains the encryption key.

If the system has USB encryption, run the recovery from any node in the system that has a USB flash drive that is inserted which contains the encryption key.

If the system contains an encrypted cloud account that uses USB encryption, a USB flash drive with the system master key must be present in the configuration node before the cloud account can move to the online state. This requirement is necessary when the system is powered down, and then restarted.

If the system has key server encryption, note the following items before you proceed with the T3 recovery.
  • Run the recovery on a node that is attached to the key server. The keys are fetched remotely from the key server.
  • Run the recovery procedure on a node that is not hardware that is replaced or node that is rescued. All of the information that is required for a node to successfully fetch the key from the key server resides on the node's file system. If the contents of the node's original file system are damaged or no longer exist (rescue node, hardware replacement, file system that is corrupted, and so on), then the recovery fails from this node.

If the system uses both USB and key server encryption, providing either a USB flash drive or a connection to the key server (only one is needed, but both will work also) will unlock the system.

If you use USB flash drives to manage encryption keys, the T3 recovery causes the connection to a cloud service provider to go offline if the USB flash drive is not inserted into the system. To fix this issue, insert the USB flash drive with the current keys into the system.

If you use key servers to manage encryption keys, the T3 recovery causes the connection to a cloud service provider to go offline if the key server is offline. To fix this issue, ensure that the key server is online and available during T3 recovery.

If you use both key servers and USB flash drives to manage encryption keys, the T3 recovery causes the connection to a cloud service provider to go offline if none of the key providers are available. To fix this issue, ensure that either the key server is online or a USB flash drive is inserted into the system (only one is needed, but both will work also) during T3 recovery.

Note: Each individual stage of the recovery procedure can take significant time to complete, depending on the specific configuration.
  1. Point your browser to the service IP address of one of the node canisters.
  2. Log on to the service assistant.
  3. Check that all node canisters that were members of the system are online and have candidate status.

    If any nodes display error code 550 or 578, remove their system data to place them into candidate status; see Procedure: Removing system data from a node canister.

  4. Select Recover System from the navigation.
  5. Follow the online instructions to complete the recovery procedure.
    1. Verify the date and time of the last quorum time. The time stamp must be less than 30 minutes before the failure. The time stamp format is YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the hour, and mm is the minute.
      Attention: If the time stamp is not less than 30 minutes before the failure, call the support center.
    2. Verify the date and time of the last backup date. The time stamp must be less than 24 hours before the failure. The time stamp format is YYYYMMDD hh:mm, where YYYY is the year, MM is the month, DD is the day, hh is the hour, and mm is the minute.
      Attention: If the time stamp is not less than 24 hours before the failure, call the support center.

      Changes that are made after the time of this backup date might not be restored.

Any one of the following categories of messages might be displayed:
  • T3 successful
    The volumes are back online. Use the final checks to get your environment operational again.
  • T3 recovery completed with errors
    T3 recovery that is completed with errors: One or more of the volumes are offline because fast write data was in the cache. To bring the volumes online, see Recovering from offline volumes using the CLI for details.
  • T3 failed
    Call the support center. Do not attempt any further action.
Verify that the environment is operational by completing the checks that are provided in What to check after running the system recovery.

If any errors are logged in the error log after the system recovery procedure completes, use the fix procedures to resolve these errors, especially the errors that are related to offline arrays.