You can create an HyperSwap® topology system configuration where each control enclosure
that is used to access a HyperSwap® volume is physically on a different site. HyperSwap® topology is supported on Lenovo Storage V5030 systems.
In a HyperSwap® configuration, each site is defined as an independent failure
domain. If one site experiences a failure, then the other site can
continue to operate without disruption. You must also configure
a third site to host a quorum device or IP quorum application that
provides an automatic tie-break in case of a link failure between
the two main sites. The main site can be in the same room or
across rooms in the data center, buildings on the same campus, or
buildings in different cities. Different kinds of sites protect against
different types of failures.
- Sites are within a single location
- If each
site is a different power phase within a single location or data center,
the system can survive the failure of any single power domain. For example, one node can be placed in one rack installation and
the other node can be in another rack. Each rack is considered a separate
site with its own power phase. In this case, if power was lost to
one of the racks, the partner node in the other rack might be configured
to process requests. The partner node can effectively provide availability
to data even when the other node is offline due to a power disruption.
- Each site is at separate locations
- If each site
is a different physical location, the system can survive the failure
of any single location. These sites can span shorter distances, for
example two sites in the same city, or they can be spread farther
geographically, such as two sites in separate cities. If one site
experiences a site-wide disaster, the remaining site can remain available
to process requests.
If configured properly, the system continues to
operate after the loss of one site. In the management GUI, the Modify System Topology wizard simplifies setting up HyperSwap® system topology. After you configure HyperSwap® topology, you can use the Create Volumes wizard to create HyperSwap® volumes and copies for each site. In addition, the HyperSwap® volume wizard automatically creates active-active relationships
and change volumes to manage replication between sites. If you are configuring HyperSwap® by using the command-line interface, you must also configure
the system topology, volumes, and active-active relationships separately.
You must configure a
HyperSwap® system to meet the following requirements:
- Directly connect each node to two or more SAN fabrics at the primary
and secondary sites (2 - 8 fabrics are supported). Sites are defined
as independent failure domains. A failure domain is a part of the
system within a boundary. Any failure within that boundary (such as
a power failure, fire, or flood) is contained within the boundary.
The failure affects any part that is outside of that boundary. Failure
domains can be in the same room or across rooms in the data center,
buildings on the same campus, or buildings in different towns. Different
kinds of failure domains protect against different types of faults.
- Use a third site to house a quorum disk
on an external storage system or an IP quorum application on a server.
- If a storage system is used at the third site, it must support extended
quorum disks. More information is available in the interoperability
matrixes that are available at the following websites:
- Place independent storage systems at the primary and secondary
sites, and use active-active relationships to mirror the host data
between the two sites.
- Connections can vary based on fibre type and small form-factor
pluggable (SFP) transceiver (longwave and shortwave).
- Nodes that have connections to switches that are longer than 100
meters (109 yards) must use longwave Fibre Channel connections. A
longwave small form-factor
pluggable (SFP) transceiver can be purchased as an optional component, and must be one of the
longwave SFP transceivers that are listed at the following websites:
- Avoid using inter-switch links (ISLs) in paths between nodes and
external storage systems. If this configuration is unavoidable, do
not oversubscribe the ISLs because of substantial Fibre Channel traffic
across the ISLs. For most configurations, trunking is required. Because
ISL problems are difficult to diagnose, switch-port error statistics
must be collected and regularly monitored to detect failures.
- Using a single switch at the third site can lead to the creation
of a single fabric rather than two independent and redundant fabrics.
A single fabric is an unsupported configuration.
- Ethernet port 1 on every node must be connected to the same subnet
or subnets. Ethernet port 2 (if used) of every node must be connected
to the same subnet (this might be a different subnet from port 1).
The same principle applies to other Ethernet ports.
- Some service actions require physical access to all nodes in a
system. If nodes in a HyperSwap® system are separated by more than 100 meters, service actions
might require multiple service personnel. Contact your service
representative to inquire about multiple site support.
- Use consistency groups to manage the volumes that belong to an
application. This structure ensures that when a rolling disaster occurs,
the out-of-date image is consistent and therefore usable for that
application.
- Use consistency groups to maintain data that is usable for disaster
recovery for each application. Add relationships for each volume for
an application to an appropriate consistency group.
- You can add relationships to a consistency group only in certain
states, including both sites accessible.
- If you need to add a volume to an application to provide it with
more capacity at a time when only one site is accessible, take careful
note as you cannot create and add the HyperSwap® relationship. Be sure to create the relationship and add it
to the group as soon as possible after the failed site is recovered.
A HyperSwap® system locates the active quorum disk at a third site. If communication
is lost between the primary and secondary sites, the site with access
to the active quorum disk continues to process transactions. If communication
is lost to the active quorum disk, an alternative quorum disk at another
site can become the active quorum disk.
A system of nodes can
be configured to use up to three quorum disks. However, only one
quorum disk can be elected to resolve a situation where the system
is partitioned into two sets of nodes of equal size. The purpose of
the other quorum disks is to provide redundancy if a quorum disk fails
before the system is partitioned.
Restriction: Do not
connect an external storage system in one site directly to a switch
fabric in the other site.
An alternative configuration can
use an extra Fibre Channel switch at the third site with connections
from that switch to the primary site and to the secondary site.
A HyperSwap® system configuration is supported only when the storage system
that hosts the quorum disks supports extended quorum. Although
the system can use other types of storage systems for providing quorum
disks, access to these quorum disks is always through a single path.
For quorum disk configuration requirements, see the technote Guidance for Identifying and Changing Managed Disks Assigned as Quorum
Disk Candidates.