When you plan your network, consideration must be given to the type of RAID configuration that you use. Lenovo Storage V7000 supports either a non-distributed array or a distributed array configuration.
An array can contain 2 - 16 drives; several arrays create the capacity for a pool. For redundancy, spare drives ("hot spares") are allocated to assume read/write operations if any of the other drives fail. The rest of the time, the spare drives are idle and do not process requests for the system. When a member drive fails in the array, the data can only be recovered onto the spare as fast as that drive can write the data. As a result, the load on the remaining member drives is increased by up to 100%. Because of this bottleneck, rebuilding the data can take many hours as the system tries to balance host and rebuild workload. The latency of I/O to the rebuilding array is affected for this entire time. Because volume data is striped across MDisks, all volumes are affected during the time it takes to rebuild the drive.
nodes support distributed arrays. Distributed array configurations may contain between 4 - 128 drives. Distributed arrays remove the need for separate drives that are idle until a failure occurs. Instead of allocating one or more drives as spares, the spare capacity is distributed over specific rebuild areas across all the member drives. Data can be copied faster to the rebuild area and redundancy is restored much more rapidly. Additionally, as the rebuild progresses, the performance of the pool is more uniform because all of the available drives are used for every volume extent. After the failed drive is replaced, data is copied back to the drive from the distributed spare capacity. Unlike "hot spare" drives, read/write requests are processed on other parts of the drive that are not being used as rebuild areas. The number of rebuild areas is based on the width of the array. The size of the rebuild area determines how many times the distributed array can recover failed drives without risking becoming degraded. For example, a distributed array that uses RAID 6 drives can handle two concurrent failures. After the failed drives have been rebuilt, the array can tolerate another two drive failures. If all of the rebuild areas are used to recover data, the array becomes degraded on the next drive failure.
The concept of distributed RAID is to distribute an array with width W across a set of X drives. For example, you might have a 2+P RAID-5 array that is distributed across a set of 40 drives. The array type and width define the level of redundancy. In the previous example, there is a 33% capacity overhead for parity. If an array stride needs to be rebuilt, two component strips must be read to rebuild the data for the third component. The set size defines how many drives are used by the distributed array. It is obviously a requirement that performance and usable capacity scales according to the number of drives in the set. The other key feature of a distributed array is that instead of having a hot spare, the set includes spare strips that are also distributed across the set of drives. The data and spares are distributed such that if one drive in the set fails redundancy can be restored by rebuilding data on to the spare strips at a rate much greater than the rate of a single component.
Distributed arrays are used to create large-scale internal managed disks. They can manage 4 - 128 drives and contain their own rebuild areas to accomplish error recovery when drives fail. As a result, rebuild times are dramatically reduced, which lowers the exposure volumes have to the extra load of recovering redundancy. Because the capacity of these managed disks is potentially so great, when they are configured in the system the overall limits change in order to allow them to be virtualized. For every distributed array, the space for 16 MDisk extent allocations is reserved and therefore 15 other MDisk identities are removed from the overall pool of 4096. Distributed arrays also aim to provide a uniform performance level. A distributed array can contain multiple drive classes if the drives are similar (for example, the drives have the same attributes, but the capacities are larger) to achieve this performance. All the drives in a distributed array must come from the same I/O group to maintain a simple configuration model.
One disadvantage of a distributed array is that the array redundancy is covering a greater number of components. Therefore, mean time between failure (MTBF) is reduced. Quicker rebuild times improve MTBF; however, there are still limits to how widely distributed an array can be before the MTBF becomes unacceptable.