Starting statistics collection

You can start the collection of cluster statistics from the Starting the Collection of Statistics panel in the management GUI.

Introduction

For each collection interval, the management GUI creates four statistics files: one for managed disks (MDisks), named Nm_stat; one for volumes and volume copies, named Nv_stat; one for nodes, named Nn_stat; and one for drives, named Nd_stat. The files are written to the /dumps/iostats directory on the node. To retrieve the statistics files from the non-configuration nodes onto the configuration node, svctask cpdumps command must be used.

A maximum of 16 files of each type can be created for the node. When the 17th file is created, the oldest file for the node is overwritten.

Fields

The following fields are available for user definition:
Interval
Specify the interval in minutes between the collection of statistics. You can specify 1 - 60 minutes in increments of 1 minute.

Tables

The following tables describe the information that is reported for individual nodes and volumes.

Table 1 describes the statistics collection for MDisks, for individual nodes.

Table 1. Statistics collection for individual nodes
Statistic name Description
id Indicates the name of the MDisk for which the statistics apply.
idx Indicates the identifier of the MDisk for which the statistics apply.
rb Indicates the cumulative number of blocks of data that is read (since the node has been running).
re Indicates the cumulative read external response time in milliseconds for each MDisk. The cumulative response time for disk reads is calculated by starting a timer when a SCSI read command is issued and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
ro Indicates the cumulative number of MDisk read operations that are processed (since the node has been running).
rq Indicates the cumulative read queued response time in milliseconds for each MDisk. This response is measured from above the queue of commands to be sent to an MDisk because the queue depth is already full. This calculation includes the elapsed time that is taken for read commands to complete from the time they join the queue.
wb Indicates the cumulative number of blocks of data written (since the node has been running).
we Indicates the cumulative write external response time in milliseconds for each MDisk. The cumulative response time for disk writes is calculated by starting a timer when a SCSI write command is issued and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
wo Indicates the cumulative number of MDisk write operations processed (since the node has been running).
wq Indicates the cumulative write queued response time in milliseconds for each MDisk. This is measured from above the queue of commands to be sent to an MDisk because the queue depth is already full. This calculation includes the elapsed time taken for write commands to complete from the time they join the queue.
Table 2 describes the VDisk (volume) information that is reported for individual nodes.
Note: MDisk statistics files for nodes are written to the /dumps/iostats directory on the individual node.
Table 2. Statistic collection for volumes for individual nodes
Statistic name Description
id Indicates the volume name for which the statistics apply.
idx Indicates the volume for which the statistics apply.
rb Indicates the cumulative number of blocks of data read (since the node has been running).
rl Indicates the cumulative read response time in milliseconds for each volume. The cumulative response time for volume reads is calculated by starting a timer when a SCSI read command is received and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
rlw Indicates the worst read response time in microseconds for each volume since the last time statistics were collected. This value is reset to zero after each statistics collection sample.
ro Indicates the cumulative number of volume read operations processed (since the node has been running).
wb Indicates the cumulative number of blocks of data written (since the node has been running).
wl Indicates the cumulative write response time in milliseconds for each volume. The cumulative response time for volume writes is calculated by starting a timer when a SCSI write command is received and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
wlw Indicates the worst write response time in microseconds for each volume since the last time statistics were collected. This value is reset to zero after each statistics collection sample.
wo Indicates the cumulative number of volume write operations processed (since the node has been running).
wou Indicates the cumulative number of volume write operations that are not aligned on a 4K boundary.
xl Indicates the cumulative read and write data transfer response time in milliseconds for each volume since the last time the node was reset. When this statistic is viewed for multiple volumes and with other statistics, it can indicate if the latency is caused by the host, fabric, or the Lenovo Storage V7000.

Table 3 describes the VDisk information related to Metro Mirror or Global Mirror relationships that is reported for individual nodes.

Table 3. Statistic collection for volumes that are used in Metro Mirror and Global Mirror relationships for individual nodes
Statistic name Description
gwl Indicates cumulative secondary write latency in milliseconds. This statistic accumulates the cumulative secondary write latency for each volume. You can calculate the amount of time to recovery from a failure based on this statistic and the gws statistics.
gwo Indicates the total number of overlapping volume writes. An overlapping write is when the logical block address (LBA) range of write request collides with another outstanding request to the same LBA range and the write request is still outstanding to the secondary site.
gwot Indicates the total number of fixed or unfixed overlapping writes. When all nodes in all clusters are running Lenovo Storage V7000 version 4.3.1, this records the total number of write I/O requests received by the Global Mirror feature on the primary that have overlapped. When any nodes in either cluster are running Lenovo Storage V7000 versions earlier than 4.3.1, this value does not increment.
gws Indicates the total number of write requests that have been issued to the secondary site.

Table 4 describes the port information that is reported for individual nodes

Table 4. Statistic collection for node ports
Statistic name Description
bbcz Indicates the total time in microseconds for which the port had data to send but was prevented from doing so by a lack of buffer credit from the switch.
cbr Indicates the bytes received from controllers.
cbt Indicates the bytes transmitted to disk controllers.
cer Indicates the commands received from disk controllers.
cet Indicates the commands initiated to disk controllers.
hbr Indicates the bytes received from hosts.
hbt Indicates the bytes transmitted to hosts.
her Indicates the commands received from hosts.
het Indicates the commands initiated to hosts.
icrc Indicates the number of CRC that are not valid.
id Indicates the port identifier for the node.
itw Indicates the number of transmission word counts that are not valid.
lf Indicates a link failure count.
lnbr Indicates the bytes received to other nodes in the same cluster.
lnbt Indicates the bytes transmitted to other nodes in the same cluster.
lner Indicates the commands received from other nodes in the same cluster.
lnet Indicates the commands initiated to other nodes in the same cluster.
lsi Indicates the lost-of-signal count.
lsy Indicates the loss-of-synchronization count.
pspe Indicates the primitive sequence-protocol error count.
rmbr Indicates the bytes received to other nodes in the other clusters.
rmbt Indicates the bytes transmitted to other nodes in the other clusters.
rmer Indicates the commands received from other nodes in the other clusters.
rmet Indicates the commands initiated to other nodes in the other clusters.
wwpn Indicates the worldwide port name for the node.

Table 5 describes the node information that is reported for each nodes.

Table 5. Statistic collection for nodes
Statistic name Description
cluster_id Indicates the name of the cluster.
cluster Indicates the name of the cluster.
cpu busy - Indicates the total CPU average core busy milliseconds since the node was reset. This statistic reports the amount of the time the processor has spent polling while waiting for work versus actually doing work. This statistic accumulates from zero.
comp - Indicates the total CPU average core busy milliseconds for compression process cores since the node was reset.
system - Indicates the total CPU average core busy milliseconds since the node was reset. This statistic reports the amount of the time the processor has spent polling while waiting for work versus actually doing work. This statistic accumulates from zero. This is the same information as the information provided with the cpu busy statistic and will eventually replace the cpu busy statistic.
cpu_core id - Indicates the CPU core id.
comp - Indicates the per-core CPU average core busy milliseconds for compression process cores since node was reset.
system - Indicates the per-core CPU average core busy milliseconds for system process cores since node was reset.
id Indicates the name of the node.
node_id Indicates the unique identifier for the node.
rb Indicates the number of bytes received.
re Indicates the accumulated receive latency, excluding inbound queue time. This statistic is the latency that is experienced by the node communication layer from the time that an I/O is queued to cache until the time that the cache gives completion for it.
ro Indicates the number of messages or bulk data received.
rq Indicates the accumulated receive latency, including inbound queue time. This statistic is the latency from the time that a command arrives at the node communication layer to the time that the cache completes the command.
wb Indicates the bytes sent.
we Indicates the accumulated send latency, excluding outbound queue time. This statistic is the time from when the node communication layer issues a message out onto the Fibre Channel until the node communication layer receives notification that the message has arrived.
wo Indicates the number of messages or bulk data sent.
wq Indicates the accumulated send latency, including outbound queue time. This statistic includes the entire time that data is sent. This time includes the time from when the node communication layer receives a message and waits for resources, the time to send the message to the remote node, and the time taken for the remote node to respond.

Table 6 describes the statistics collection for volumes.

Table 6. Cache statistics collection for volumes and volume copies
Statistic Acronym Statistics for volume cache Statistics for volume copy cache Statistics for volume cache partition Statistics for volume copy cache partition Statistics for the Node Overall Cache Cache statistics for mdisks Units and state

read ios 

ri 

Yes 

Yes 

 

 

 

 

ios, cumulative

write ios 

wi 

Yes 

Yes 

 

 

 

 

ios, cumulative

read misses 

Yes 

Yes 

 

 

 

 

sectors, cumulative

read hits 

rh 

Yes 

Yes 

 

 

 

 

sectors, cumulative

flush_through writes 

ft 

Yes 

Yes 

 

 

 

 

sectors, cumulative

fast_write writes 

fw 

Yes 

Yes 

 

 

 

 

sectors, cumulative

write_through writes 

wt 

Yes 

Yes 

 

 

 

 

sectors, cumulative

write hits 

wh 

Yes 

Yes 

 

 

 

 

sectors, cumulative

prefetches 

 

Yes 

 

 

 

 

sectors, cumulative

prefetch hits (prefetch data that is read)

ph 

 

Yes 

 

 

 

 

sectors, cumulative

prefetch misses (prefetch pages that are discarded without any sectors read)

pm 

 

Yes 

 

 

 

 

pages, cumulative

modified data

m

Yes

Yes

        sectors, snapshot, non-cumulative

read and write cache data

v

Yes

Yes

        sectors snapshot, non-cumulative

destages 

Yes 

Yes 

 

 

 

 

sectors, cumulative 

fullness Average 

fav 

 

 

Yes 

Yes 

 

 

%, non-cumulative

fullness Max 

fmx 

 

 

Yes 

Yes 

 

 

%, non-cumulative

fullness Min 

fmn 

 

 

Yes 

Yes 

 

 

%, non-cumulative

Destage Target Average 

dtav 

 

 

 

Yes 

 

Yes 

IOs capped 9999, non-cumulative

Destage Target Max 

dtmx

 

 

 

Yes 

 

 

IOs, non-cumulative

Destage Target Min 

dtmn

 

 

 

Yes 

 

 

IOs, non-cumulative

Destage In Flight Average 

dfav 

 

 

 

Yes 

 

Yes 

IOs capped 9999, non-cumulative

Destage In Flight Max 

dfmx

 

 

 

Yes 

 

IOs, non-cumulative

Destage In Flight Min 

dfmn

 

 

 

Yes 

 

 

IOs, non-cumulative

destage latency average 

dav 

Yes 

Yes 

Yes 

Yes 

Yes 

Yes 

s capped 9999999, non-cumulative

destage latency max 

dmx 

   

Yes 

Yes 

Yes 

  s capped 9999999, non-cumulative

destage latency min 

dmn 

   

Yes 

Yes 

Yes 

 

s capped 9999999, non-cumulative

destage count 

dcn 

Yes 

Yes

Yes 

Yes 

Yes 

 

ios, non-cumulative

stage latency average 

sav 

Yes 

Yes 

 

 

Yes 

 

s capped 9999999, non-cumulative

stage latency max 

smx 

 

 

 

 

Yes 

 

s capped 9999999, non-cumulative

stage latency min 

smn 

 

 

 

 

Yes 

 

s capped 9999999, non-cumulative

stage count 

scn 

Yes 

Yes

 

 

Yes 

 

ios, non-cumulative

prestage latency average 

pav 

 

Yes 

 

 

Yes 

 

s capped 9999999, non-cumulative

prestage latency max 

pmx 

 

 

 

 

Yes 

 

s capped 9999999, non-cumulative

prestage latency min 

pmn 

 

 

 

 

Yes 

 

s capped 9999999, non-cumulative

prestage count 

pcn 

 

Yes 

 

 

Yes 

 

ios, non-cumulative

Write Cache Fullness Average 

wfav 

 

 

 

 

Yes 

 

%, non-cumulative

Write Cache Fullness Max 

wfmx 

 

 

 

 

Yes 

 

%, non-cumulative

Write Cache Fullness Min 

wfmn 

 

 

 

 

Yes 

 

%, non-cumulative

Read Cache Fullness Average 

rfav 

 

 

 

 

Yes 

 

%, non-cumulative

Read Cache Fullness Max 

rfmx 

 

 

 

 

Yes 

 

%, non-cumulative

Read Cache Fullness Min 

rfmn 

 

 

 

 

Yes 

 

%, non-cumulative

Pinned Percent 

pp 

Yes 

Yes 

Yes 

Yes 

Yes 

 

% of total cache snapshot, non-cumulative

data transfer latency average 

tav 

Yes 

Yes 

 

 

 

 

s capped 9999999, non-cumulative

Track Lock Latency (Exclusive) Average 

teav 

Yes 

Yes 

 

 

 

 

s capped 9999999, non-cumulative

Track Lock Latency (Shared) Average 

tsav 

Yes 

Yes 

 

 

 

 

s capped 9999999, non-cumulative

Cache I/O Control Block Queue Time 

hpt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Cache Track Control Block Queue Time 

ppt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Owner Remote Credit Queue Time 

opt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Non-Owner Remote Credit Queue Time 

npt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Admin Remote Credit Queue Time 

apt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Cdcb Queue Time 

cpt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Buffer Queue Time 

bpt 

 

 

 

 

Yes 

 

Average s, non-cumulative

Hardening Rights Queue Time 

hrpt 

 

 

 

 

Yes 

 

Average s, non-cumulative
Note: Any statistic with a name av, mx, mn, and cn is not cumulative. These statistics reset every statistics interval. For example, if the statistic does not have a name with name av, mx, mn, and cn, and it is an Ios or count, it will be a field containing a total number.
  • The term pages means in units of 4096 bytes per page.
  • The term sectors means in units of 512 bytes per sector.
  • The term s means microseconds.
  • Non-cumulative means totals since the previous statistics collection interval.
  • Snapshot means the value at the end of the statistics interval (rather than an average across the interval or a peak within the interval).

Table 7 describes the statistic collection for volume cache per individual nodes.

Table 7. Statistic collection for volume cache per individual nodes. This table describes the volume cache information that is reported for individual nodes.
Statistic name Description
cm Indicates the number of sectors of modified or dirty data that are held in the cache.
ctd Indicates the total number of cache destages that were initiated writes, submitted to other components as a result of a volume cache flush or destage operation.
ctds Indicates the total number of sectors that are written for cache-initiated track writes.
ctp Indicates the number of track stages that are initiated by the cache that are prestage reads.
ctps Indicates the total number of staged sectors that are initiated by the cache.
ctrh Indicates the number of total track read-cache hits on prestage or non-prestage data. For example, a single read that spans two tracks where only one of the tracks obtained a total cache hit, is counted as one track read-cache hit.
ctrhp Indicates the number of track reads received from other components, treated as cache hits on any prestaged data. For example, if a single read spans two tracks where only one of the tracks obtained a total cache hit on prestaged data, it is counted as one track read for the prestaged data. A cache hit that obtains a partial hit on prestage and non-prestage data still contributes to this value.
ctrhps Indicates the total number of sectors that are read for reads received from other components that obtained cache hits on any prestaged data.
ctrhs Indicates the total number of sectors that are read for reads received from other components that obtained total cache hits on prestage or non-prestage data.
ctr Indicates the total number of track reads received. For example, if a single read spans two tracks, it is counted as two total track reads.
ctrs Indicates the total number of sectors that are read for reads received.
ctwft Indicates the number of track writes received from other components and processed in flush through write mode.
ctwfts Indicates the total number of sectors that are written for writes that are received from other components and processed in flush through write mode.
ctwfw Indicates the number of track writes received from other components and processed in fast-write mode.
ctwfwsh Indicates the track writes in fast-write mode that were written in write-through mode because of the lack of memory.
ctwfwshs Indicates the track writes in fast-write mode that were written in write through due to the lack of memory.
ctwfws Indicates the total number of sectors that are written for writes that are received from other components and processed in fast-write mode.
ctwh Indicates the number of track writes received from other components where every sector in the track obtained a write hit on already dirty data in the cache. For a write to count as a total cache hit, the entire track write data must already be marked in the write cache as dirty.
ctwhs Indicates the total number of sectors that are received from other components where every sector in the track obtained a write hit on already dirty data in the cache.
ctw Indicates the total number of track writes received. For example, if a single write spans two tracks, it is counted as two total track writes.
ctws Indicates the total number of sectors that are written for writes that are received from components.
ctwwt Indicates the number of track writes received from other components and processed in write through write mode.
ctwwts Indicates the total number of sectors that are written for writes that are received from other components and processed in write through write mode.
cv Indicates the number of sectors of read and write cache data that is held in the cache.

Table 8 describes the XML statistics specific to an IP Partnership port.

Table 8. XML statistics for an IP Partnership port
Statistic name Description
ipbz Indicates the average size (in bytes) of data that are being submitted to the IP partnership driver since the last statistics collection period.
ipre Indicates the bytes retransmitted to other nodes in other clusters by the IP partnership driver.
iprt Indicates the average round-trip time in microseconds for the IP partnership link since the last statistics collection period.
iprx Indicates the bytes received from other nodes in other clusters by the IP partnership driver.
ipsz Indicates the average size (in bytes) of data that are being transmitted by the IP partnership driver since the last statistics collection period.
iptx Indicates the bytes transmitted to other nodes in other clusters by the IP partnership driver.

Actions

The following actions are available to the user:

OK
Click this button to change statistic collection.
Cancel
Click this button to exit the panel without changing statistic collection.
XML formatting information
The XML is more complicated now, as seen in this raw XML from the volume (Nv_statistics) statistics. Notice how the names are similar but because they are in a different section of the XML, they refer to a different part of the VDisk.
<vdsk idx="0"
ctrs="213694394" ctps="0" ctrhs="2416029" ctrhps="0"
ctds="152474234" ctwfts="9635" ctwwts="0" ctwfws="152468611"
ctwhs="9117" ctws="152478246" ctr="1628296" ctw="3241448"
ctp="0" ctrh="123056" ctrhp="0" ctd="1172772"
ctwft="200" ctwwt="0" ctwfw="3241248" ctwfwsh="0"
ctwfwshs="0" ctwh="538" cm="13768758912876544" cv="13874234719731712"
gwot="0" gwo="0" gws="0" gwl="0"
id="Master_iogrp0_1"
ro="0" wo="0" rb="0" wb="0"
rl="0" wl="0" rlw="0" wlw="0" xl="0">
Vdisk/Volume statistics
<ca r="0" rh="0" d="0" ft="0"
wt="0" fw="0" wh="0" ri="0"
wi="0" dav="0" dcn="0" pav="0" pcn="0" teav="0"  tsav="0"  tav="0"
pp="0"/>
<cpy idx="0">
volume copy statistics
<ca r="0" p="0" rh="0" ph="0"
d="0" ft="0" wt="0" fw="0"
wh="0" pm="0" ri="0" wi="0"
dav="0" dcn="0" sav="0" scn="0"
pav="0" pcn="0" teav="0"  tsav="0"
tav="0"  pp="0"/>
</cpy>
<vdsk>
The <cpy idx="0"> means its in the volume copy section of the VDisk, whereas the statistics shown under Vdisk/Volume statistics are outside of the cpy idx section and therefore refer to a VDisk/volume.
Similarly for the volume cache statistics for node and partitions:
<uca><ca dav="18726" dcn="1502531" dmx="749846" dmn="89"
sav="20868" scn="2833391" smx="980941" smn="3"
pav="0" pcn="0" pmx="0" pmn="0"
wfav="0" wfmx="2" wfmn="0"
rfav="0" rfmx="1" rfmn="0"
pp="0"
hpt="0" ppt="0" opt="0" npt="0"
apt="0" cpt="0" bpt="0" hrpt="0"
/><partition id="0"><ca dav="18726" dcn="1502531" dmx="749846" dmn="89"
fav="0" fmx="2" fmn="0"
dfav="0" dfmx="0" dfmn="0"
dtav="0" dtmx="0" dtmn="0"
pp="0"/></partition>
This output describes the volume cache node statistics where <partition id="0"> the statistics are described for partition 0.

Replacing <uca> with <lca> means that the statistics are for volume copy cache partition 0.