Distributed array configurations create large-scale internal MDisks.These arrays, which can contain 4 - 128 drives, also contain rebuild areas that are used to maintain redundancy after a drive fails. As a result, the distributed configuration dramatically reduces rebuild times and decreases the exposure volumes have to the extra load of recovering redundancy.
nodes support distributed arrays. Distributed array configurations may contain between 4 - 128 drives. Distributed arrays remove the need for separate drives that are idle until a failure occurs. Instead of allocating one or more drives as spares, the spare capacity is distributed over specific rebuild areas across all the member drives. Data can be copied faster to the rebuild area and redundancy is restored much more rapidly. Additionally, as the rebuild progresses, the performance of the pool is more uniform because all of the available drives are used for every volume extent. After the failed drive is replaced, data is copied back to the drive from the distributed spare capacity. Unlike "hot spare" drives, read/write requests are processed on other parts of the drive that are not being used as rebuild areas. The number of rebuild areas is based on the width of the array. The size of the rebuild area determines how many times the distributed array can recover failed drives without risking becoming degraded. For example, a distributed array that uses RAID 6 drives can handle two concurrent failures. After the failed drives have been rebuilt, the array can tolerate another two drive failures. If all of the rebuild areas are used to recover data, the array becomes degraded on the next drive failure.
The array width, which is also referred to as the drive count, indicates the total number of drives in a distributed array. This total includes the number of drives that are used for data capacity and parity, and the rebuild area that is used to recover data.
The rebuild area is the disk capacity that is reserved within a distributed array to regenerate data after a drive failure; it provides no usable capacity. Unlike a nondistributed array, the rebuild area is distributed across all of the drives in the array. As data is rebuilt during the copyback process, the rebuild area contributes to the performance of the distributed array because all of the volumes perform I/O requests.
A stripe, which can also be referred to as a redundancy unit, is the smallest amount of data that can be addressed. For distributed arrays, the stripe size can be 128 or 256 KiB.
The stripe width indicates the number of stripes of data that can be written at one time when data is regenerated after a drive fails. This value is also referred to as the redundancy unit width. In Figure 1, the stripe width of the array is 5.
To replace a failed member drive in the distributed array, the system can use another drive that has the same drive class as the failed drive. The system can also select a drive from a superior drive class. For example, two drive classes can contain drives of the same technology type but different data capacities. In this case, the superior drive class is the drive class that contains the higher capacity drives.
To display information about all of the drive classes that are available on the system, use the lsdriveclass command. The example that is shown in lsdriveclass command output shows four drive classes on the system. Drive class 209 contains drives with a capacity of 278.9 GB; drive class 337 contains drives with a capacity of 558.4 GB. Although the drives have the same RPM speed, technology type, and block size, drive class 337 is considered to be superior to drive class 209.
id RPM capacity IO_group_id IO_group_name tech_type block_size candidate_count superior_count total_count 1 10000 418.7GB 0 io_grp0 sas_hdd 512 0 0 2 129 10000 278.9GB 0 io_grp0 sas_hdd 512 0 0 5 209 15000 278.9GB 2 io_grp2 sas_hdd 4096 2 5 2 337 15000 558.4GB 3 io_grp3 sas_hdd 4096 3 3 3
If the fault LED light on a drive is lit, the drive is marked as failed and is no longer used in the distributed array. When the system detects that a failed drive was replaced, it automatically removes the failed hardware from the array configuration. If the new drive is suitable (for example, in the same drive class), the system begins a copyback operation to make a rebuild area available in the distributed array.