Quorum disk configuration

A clustered system automatically assigns quorum disk candidates. When you add new storage to a system or remove existing storage, however, it is a good practice to review the quorum disk assignments.

It is possible for a system to split into two groups where each group contains half the original number of nodes in the system. A quorum disk determines which group of nodes stops operating and processing I/O requests. In this tie-break situation, the first group of nodes that accesses the quorum disk is marked as the owner of the quorum disk and as a result continues to operate as the system, handling all I/O requests. If the other group of nodes cannot access the quorum disk or finds the quorum disk is owned by another group of nodes , it stops operating as the system and does not handle I/O requests.

A system can have only one active quorum disk that is used for a tie-break situation. However, the system uses three quorum disks to record a backup of system configuration data to be used in the event of a disaster. The system automatically selects one active quorum disk from these three disks. The active quorum disk can be specified by using the chquorum command-line interface (CLI) command with the active parameter. To view the current quorum disk status, use the lsquorum command.

The other quorum disk candidates provide redundancy if the active quorum disk fails before a system is partitioned. To avoid the possibility of losing all the quorum disk candidates with a single failure, assign quorum disk candidates on multiple storage systems.
Note: Mirrored volumes can be taken offline if no quorum disk is available. The synchronization status for mirrored volumes is recorded on the quorum disk.

In a system with a single control enclosure or without any external managed disks, quorum is automatically assigned to drives. In this scenario, manual configuration of the quorum disks is not required.

In a system with two or more I/O groups, the drives are physically connected to only some of the node canisters. In such a configuration, drives cannot act as tie-break quorum disks; however, they can still be used to back up metadata.

If suitable external MDisks are available, these MDisks are automatically used as quorum disks that do support tie-break situations.

If no suitable external MDisks exist, the entire system might become unavailable if exactly half the node canisters in the system become inaccessible (such as due to hardware failure or becoming disconnected from the fabric).

In systems with exactly two control enclosures, an uncontrolled shutdown of a control enclosure might lead to the entire system becoming unavailable because two node canisters become inaccessible simultaneously. It is therefore vital that node canisters are shut down in a controlled way when maintenance is required.

Quorum configuration is chosen by the following criteria:

These criteria are not requirements. On configurations where meeting all the criteria is not possible, quorum still is automatically configured.

It is possible to assign quorum disks to alternative drives by using the chquorum command. However, you cannot move quorum to a drive that creates a less optimum configuration. You can override the dynamic quorum selection by using the override yes option of the chquorum command. This option is not advised, however, unless you are working with your support center.

When you change the managed disks that are assigned as quorum candidate disks, follow these general guidelines:

Quorum MDisks or drives in HyperSwap system configurations

To provide protection against failures that affect an entire location (for example, a power failure), you can use active-active relationships with a configuration that splits a single clustered system between two physical locations. For more information, see HyperSwap configuration details. For detailed guidance about HyperSwap system configuration for high-availability purposes, contact your IBM regional advanced technical specialist.

If you configure a HyperSwap system, the system automatically selects quorum disks that are placed in each of the three sites.

The following scenarios describe examples that result in changes to the active quorum disk:
  • Scenario 1:
    1. Site 3 is either powered off or connectivity to the site is broken.
    2. If topology is standard, the system selects a quorum disk candidate at site 2 to become the active quorum disk. If topology is HyperSwap, the system operates without any active quorum disk.
    3. Site 3 is either powered on or connectivity to the site is restored.
    4. Assuming that the system was correctly configured initially, the system automatically recovers the configuration when the power is restored.
  • Scenario 2:
    1. The storage system that is hosting the preferred quorum disk at site 3 is removed from the configuration.
    2. If possible, the system automatically configures a new quorum disk candidate.
    3. In HyperSwap topology, the system selects only a new quorum disk that is in site 3. In a standard topology, the system selects a quorum disk candidate at site 1 or 2 to become the active quorum disk.
    4. A new storage system is added to site 3.
    5. In a standard topology, the administrator must reassign all three quorum disks to ensure that the active quorum disk is now at site 3 again. In HyperSwap topology, the system automatically assigns the new active quorum disk when the storage system is installed and the site setting is configured.