Deduplication can be configured with thin-provisioned and compressed volumes in data reduction pools for added capacity savings. Deduplication is a type of data reduction that eliminates duplicate copies of data.
With deduplication, the system identifies unique chunks of data, called signatures, to determine whether new data is written to the storage. Deduplication is a hash-based solution, which means chunks of data are compared to their signatures rather than to the data itself. If the signature of the new data matches an existing signature that is stored on the system, then the new data is replaced with a reference. The reference points to the stored data, instead of writing the data to storage. This process saves capacity on the backend storage by not writing new data to storage and might improve performance on read operations to data with an existing signature. The same data pattern can occur many times and deduplication decreases the amount of data that needs to be stored on the system. A part of every hash-based deduplication solution is a repository that supports looking up matches for incoming data. The system contains a database that maps the signature of the data to the volume and its virtual address. If an incoming write operation does not have a signature that is stored in the database, then a duplicate is not detected and the incoming data is stored on backend storage. To maximize the space that is available for the database, the system distributes this repository between all nodes in the I/O groups that contain deduplicated volumes. Each node carries a distinct portion of the records that are stored in the database. If nodes are removed or added to the system, the database is redistributed between the nodes ensure full use of available memory. Lenovo Storage V5030 and Lenovo Storage V5030F systems require a compression license and additional memory module (16 GB DIMM) installed on the system to support deduplication.
When you create a volume, you can specify to include deduplication with other supported capacity savings methods, like compression and thin-provisioning. Deduplicated volumes must be created in data reduction pools.You cannot include deduplicated volumes in data reduction pools and compressed volumes in standard pools within the same I/O group. The system monitors these restrictions when you are creating volumes and issues errors if these conditions are met. However, if you are creating compressed volumes in data reduction pools, deduplication can be used within the same I/O group. If you have existing volumes in standard pools, you can migrate them to data reduction pools to add deduplication to increase capacity savings for the volume.