1627: The cluster has insufficient redundancy in its controller connectivity.

Explanation

The cluster has detected that it does not have sufficient redundancy in its connections to the disk controllers. This means that another failure in the SAN could result in loss of access to the application data. The cluster SAN environment should have redundant connections to every disk controller. This redundancy allows for continued operation when there is a failure in one of the SAN components.

To provide recommended redundancy, a cluster should be configured so that:

If there are no higher-priority errors being reported, this error usually indicates a problem with the SAN design, a problem with the SAN zoning or a problem with the disk controller.

If there are unfixed higher-priority errors that relate to the SAN or to disk controllers, those errors should be fixed before resolving this error because they might indicate the reason for the lack of redundancy. Error codes that must be fixed first are:

Note: This error can be reported if the required action, to rescan the Fibre Channel network for new MDisks, has not been performed after a deliberate reconfiguration of a disk controller or after SAN rezoning.

The 1627 error code is reported for a number of different error IDs. The error ID indicates the area where there is a lack of redundancy. The data reported in an event log entry indicates where the condition was found.

The meaning of the error IDs is shown below. For each error ID the most likely reason for the condition is given. If the problem is not found in the suggested areas, check the configuration and state of all of the SAN components (switches, controllers, disks, cables and cluster) to determine where there is a single point of failure.

010040 A disk controller is only accessible from a single node port.

010041 A disk controller is only accessible from a single port on the controller.

010042 Only a single port on a disk controller is accessible from every node in the cluster.

010043 A disk controller is accessible through only half, or less, of the previously configured controller ports.

010044 A disk controller is not accessible from a node.

010117 A disk controller is not accessible from a node allowed to access the device by site policy

User Response

  1. Check the error ID and data for a more detailed description of the error.
  2. Determine if there has been an intentional change to the SAN zoning or to a disk controller configuration that reduces the cluster's access to the indicated disk controller. If either action has occurred, continue with step 8.
  3. Use the GUI or the CLI command lsfabric to ensure that all disk controller WWPNs are reported as expected.
  4. Ensure that all disk controller WWPNs are zoned appropriately for use by the cluster.
  5. Check for any unfixed errors on the disk controllers.
  6. Ensure that all of the Fibre Channel cables are connected to the correct ports at each end.
  7. Check for failures in the Fibre Channel cables and connectors.
  8. When you have resolved the issues, use the GUI or the CLI command detectmdisk to rescan the Fibre Channel network for changes to the MDisks. Note: Do not attempt to detect MDisks unless you are sure that all problems have been fixed. Detecting MDisks prematurely might mask an issue.
  9. Mark the error that you have just repaired as fixed. The cluster will revalidate the redundancy and will report another error if there is still not sufficient redundancy.
  10. Go to MAP 5700: Repair verification.

Possible Cause-FRUs or other: