Compressed volumes

When you create volumes, you can specify compression as a method to save capacity for the volume. With compressed volumes, data is compressed as it is written to disk, saving additional space. To use the compression function, you must obtain the IBM® Real-time Compression™ license.

Like thin-provisioned volumes, compressed volumes have virtual, real, and used capacities. Use the following guidelines before working with compressed volumes:

You can also monitor information on compression usage to determine the savings to your storage capacity when volumes are compressed. To monitor system-wide compression savings and capacity, select Monitoring > System. You can compare the amount of capacity used before compression is applied to the capacity that is used for all compressed volumes. In addition you can view the total percentage of capacity savings when compression is used on the system. In addition you can also monitor compression savings across individual pools and volumes. For volumes, you can use these compression values to determine which volumes have achieved the highest compression savings.

Benefits of compression

Using compression reduces the amount of physical storage across your environment. You can reuse free disk space in the existing storage without archiving or deleting data.

Compressing data as it is written to the volume also reduces the environmental requirements per unit of storage. After compression is applied to stored data, the required power and cooling per unit of logical storage is reduced because more logical data is stored on the same amount of physical storage. Within a particular storage system more data can be stored which reduces overall rack unit requirements.

Compression can be implemented without impacting the existing environment and can be used with other storage processes, such as mirrored volumes and Copy Services functions.

Compressed volumes provide an equivalent level of availability as regular volumes. Compression can be implemented into an existing environment without an impact to service and existing data can be compressed transparently while it is being accessed by users and applications.

When you use compression, monitor overall performance and CPU utilization to ensure that other system functions have adequate bandwidth. If compression is used excessively, overall bandwidth for the system might be impacted. To view performance statistics that are related to compression, select Monitoring > Performance and then select Compression % on the CPU Utilization graph.

Common uses for compressed volumes

Compression can be used to consolidate storage in both block storage and file system environments. Compressing data reduces the amount of capacity that is needed for volumes and directories. Compression can be used to minimize storage utilization of logged data. Many applications, such as lab test results, require constant recording of application or user status. Logs are typically represented as text files or binary files that contain a high repetition of the same data patterns.

By using volume mirroring, you can convert an existing fully allocated volume to a compressed volume without disrupting access to the original volume content. The management GUI contains specific directions on converting a generic volume to a compressed volume.

Planning for compressed volumes

Before implementing compressed volumes on your system, assess the current types of data and volumes that are used on your system. Do not compress data which is already compressed as part of its normal workload. Data, such as video, compressed file formats, (.zip files), or compressed user productivity file formats (.pdf files), is compressed as it is saved. It is not effective to spend system resources for compression on these types of files since little additional savings can be achieved. Encrypted data also cannot be compressed.

There are two types of volumes to consider: homogeneous and heterogeneous. Homogeneous volumes are typically better candidates for compression. Homogeneous volumes contain data that was created by a single application and these volumes store the same kind of data. Examples of these could include: database applications, email, and server virtualization data. Heterogeneous volumes are volumes that contain data that was created by several different applications and contain different types of data. Since different data types populate such volumes, there are situations where compressed or encrypted data are stored on these volumes. In such cases, system resources can be spent on data that cannot be compressed. Avoid compressing heterogeneous volumes, unless the heterogeneous volumes contain only compressible, unencrypted data.

To determine if current volumes on your system could be compressed for additional capacity savings, the system supports CLI commands that analyze the volumes for potential compression savings. The analyzevdisk command can be ran on a single volume and the all the volumes that are on the system can be analyzed using the analyzevdiskbysystem command. Any volumes created after the compression analysis completes can be evaluated individually for compression savings. Ensure that volumes to be analyzed contain as much active data as possible rather than volumes that are mostly empty of data. Analyzing active data increases accuracy and reduces the risk of analyzing old data that is already deleted but can still have traces on the device. These commands provide the functionality of the Comprestimator Utility which is a tool that can be downloaded to hosts to evaluate compression savings. In some environments, third-party applications or access to hosts are restricted. The commands provide similar function without these restrictions. Storage administrators can use the commands to determine compression savings for volumes on the system to quickly evaluate compression.

After the analysis completes, you can display the results using the lsvdiskanalysis command. You can display results for all the volumes or single volumes by specifying a volume name or identifier for individual analysis.

There are various configuration items that affect the performance of compression on the system. To attain high compression ratios and performance on your system, ensure that the following guidelines have been met:
  • If you have only a small number (between 10 and 20) of compressed volumes, configure them on one I/O group and do not split compressed volumes between different I/O groups.
  • For larger numbers of compressed volumes on systems with more than one I/O group, distribute compressed volumes across I/O groups to ensure access to these volumes are evenly distributed among the I/O groups.
  • Identify and use compressible data only. Different data types have different compression ratios, and it is important to determine the compressible data currently on your system. You can use tools that estimate the compressible data or use commonly known ratios for common applications and data types. Storing these data types on compressed volumes saves disk capacity and improves the benefit of using compression on your system. The following table shows the compression ratio for common applications and data types:
    Table 1. Compression ratio for data types. Table 1 describes the compression ratio of common data types and applications that provide high compression ratios.
    Data Types/Applications Compression Ratios
    Databases Up to 80%
    Server or Desktop Virtualization Up to 75%
    Engineering Data Up to 70%
    Email Up to 80%
  • Ensure that you have an extra 10% of capacity in the pools that are used for compressed volumes for the additional metadata and to provide an error margin in the compression ratio.
  • Use compression on homogeneous volumes.
  • Avoid using any client, file-system, or application based-compression with the system compression.
  • Do not compress encrypted data.

To use compressed volumes without affecting performance of existing non-compressed volumes in a pre-existing system, ensure that you understand the way that resources are re-allocated when the first compressed volume is created.

Compression requires dedicated hardware resources within the node canisters which are assigned or de-assigned when compression is enabled or disabled. Compression is enabled whenever the first compressed volume in an I/O group is created and is disabled when the last compressed volume is removed from the I/O group.

As a result of the reduced hardware resources available to process non-compressed host-to-disk I/O, you should not create compressed volumes if the CPU utilization of node canisters in an I/O group is consistently above values in the following table. Performance might be degraded for existing non-compressed volumes in the I/O group if compressed volumes are created.

Use Monitoring > Performance in the management GUI during periods of high host workload to measure CPU utilization.

Table 2. CPU utilization of node canisters
Per Node Canister Lenovo Storage V7000 Gen1 Lenovo Storage® V7000 Gen2
CPU already close to or above: 25% 50%

Hints and tips are available in the IBM Redbooks® Solution Guide titled Implementing IBM Real-time Compression in SAN Volume Controller and IBM Lenovo Storage V7000, and in the IBM Redpaper™ publication titled Real-time Compression in SAN Volume Controller and Lenovo Storage V7000.