Starting statistics collection: Panel help

You can start the collection of cluster statistics from the Starting the Collection of Statistics panel in the management GUI.

Introduction

For each collection interval, the management GUI creates four statistics files: one for managed disks (MDisks), named Nm_stat; one for volumes and volume copies, named Nv_stat; one for nodes, named Nn_stat; and one for drives, named Nd_stat. The files are written to the /dumps/iostats directory on the node. To retrieve the statistics files from the non-configuration nodes onto the configuration node, svctask cpdumps command must be used.

A maximum of 16 files of each type can be created for the node. When the 17th file is created, the oldest file for the node is overwritten.

Fields

The following fields are available for user definition:

Interval: Specify the interval in minutes between the collection of statistics. You can specify 1 - 60 minutes in increments of 1 minute.

Tables

The following tables describe the information that is reported for individual nodes and volumes.

Table 1 describes the statistics collection for MDisks, for individual nodes.

Table 1. Statistics collection for individual nodes
Statistic name	Description
id	Indicates the name of the MDisk for which the statistics apply.
idx	Indicates the identifier of the MDisk for which the statistics apply.
rb	Indicates the cumulative number of blocks of data that is read (since the node has been running).
re	Indicates the cumulative read external response time in milliseconds for each MDisk. The cumulative response time for disk reads is calculated by starting a timer when a SCSI read command is issued and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
ro	Indicates the cumulative number of MDisk read operations that are processed (since the node has been running).
rq	Indicates the cumulative read queued response time in milliseconds for each MDisk. This response is measured from above the queue of commands to be sent to an MDisk because the queue depth is already full. This calculation includes the elapsed time that is taken for read commands to complete from the time they join the queue.
wb	Indicates the cumulative number of blocks of data written (since the node has been running).
we	Indicates the cumulative write external response time in milliseconds for each MDisk. The cumulative response time for disk writes is calculated by starting a timer when a SCSI write command is issued and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
wo	Indicates the cumulative number of MDisk write operations processed (since the node has been running).
wq	Indicates the cumulative write queued response time in milliseconds for each MDisk. This is measured from above the queue of commands to be sent to an MDisk because the queue depth is already full. This calculation includes the elapsed time taken for write commands to complete from the time they join the queue.

Table 2 describes the VDisk (volume) information that is reported for individual nodes.

Note: MDisk statistics files for nodes are written to the /dumps/iostats directory on the individual node.

Table 2. Statistic collection for volumes for individual nodes
Statistic name	Description
id	Indicates the volume name for which the statistics apply.
idx	Indicates the volume for which the statistics apply.
rb	Indicates the cumulative number of blocks of data read (since the node has been running).
rl	Indicates the cumulative read response time in milliseconds for each volume. The cumulative response time for volume reads is calculated by starting a timer when a SCSI read command is received and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
rlw	Indicates the worst read response time in microseconds for each volume since the last time statistics were collected. This value is reset to zero after each statistics collection sample.
ro	Indicates the cumulative number of volume read operations processed (since the node has been running).
wb	Indicates the cumulative number of blocks of data written (since the node has been running).
wl	Indicates the cumulative write response time in milliseconds for each volume. The cumulative response time for volume writes is calculated by starting a timer when a SCSI write command is received and stopped when the command completes successfully. The elapsed time is added to the cumulative counter.
wlw	Indicates the worst write response time in microseconds for each volume since the last time statistics were collected. This value is reset to zero after each statistics collection sample.
wo	Indicates the cumulative number of volume write operations processed (since the node has been running).
wou	Indicates the cumulative number of volume write operations that are not aligned on a 4K boundary.
xl	Indicates the cumulative read and write data transfer response time in milliseconds for each volume since the last time the node was reset. When this statistic is viewed for multiple volumes and with other statistics, it can indicate if the latency is caused by the host, fabric, or the Lenovo Storage V7000.

Table 3 describes the VDisk information related to Metro Mirror or Global Mirror relationships that is reported for individual nodes.

Table 3. Statistic collection for volumes that are used in Metro Mirror and Global Mirror relationships for individual nodes
Statistic name	Description
gwl	Indicates cumulative secondary write latency in milliseconds. This statistic accumulates the cumulative secondary write latency for each volume. You can calculate the amount of time to recovery from a failure based on this statistic and the gws statistics.
gwo	Indicates the total number of overlapping volume writes. An overlapping write is when the logical block address (LBA) range of write request collides with another outstanding request to the same LBA range and the write request is still outstanding to the secondary site.
gwot	Indicates the total number of fixed or unfixed overlapping writes. When all nodes in all clusters are running Lenovo Storage V7000 version 4.3.1, this records the total number of write I/O requests received by the Global Mirror feature on the primary that have overlapped. When any nodes in either cluster are running Lenovo Storage V7000 versions earlier than 4.3.1, this value does not increment.
gws	Indicates the total number of write requests that have been issued to the secondary site.

Table 4 describes the port information that is reported for individual nodes

Table 4. Statistic collection for node ports
Statistic name	Description
bbcz	Indicates the total time in microseconds for which the port had data to send but was prevented from doing so by a lack of buffer credit from the switch.
cbr	Indicates the bytes received from controllers.
cbt	Indicates the bytes transmitted to disk controllers.
cer	Indicates the commands received from disk controllers.
cet	Indicates the commands initiated to disk controllers.
hbr	Indicates the bytes received from hosts.
hbt	Indicates the bytes transmitted to hosts.
her	Indicates the commands received from hosts.
het	Indicates the commands initiated to hosts.
icrc	Indicates the number of CRC that are not valid.
id	Indicates the port identifier for the node.
itw	Indicates the number of transmission word counts that are not valid.
lf	Indicates a link failure count.
lnbr	Indicates the bytes received to other nodes in the same cluster.
lnbt	Indicates the bytes transmitted to other nodes in the same cluster.
lner	Indicates the commands received from other nodes in the same cluster.
lnet	Indicates the commands initiated to other nodes in the same cluster.
lsi	Indicates the lost-of-signal count.
lsy	Indicates the loss-of-synchronization count.
pspe	Indicates the primitive sequence-protocol error count.
rmbr	Indicates the bytes received to other nodes in the other clusters.
rmbt	Indicates the bytes transmitted to other nodes in the other clusters.
rmer	Indicates the commands received from other nodes in the other clusters.
rmet	Indicates the commands initiated to other nodes in the other clusters.
wwpn	Indicates the worldwide port name for the node.

Table 5 describes the node information that is reported for each nodes.

Table 5. Statistic collection for nodes
Statistic name	Description
cluster_id	Indicates the name of the cluster.
cluster	Indicates the name of the cluster.
cpu	busy - Indicates the total CPU average core busy milliseconds since the node was reset. This statistic reports the amount of the time the processor has spent polling while waiting for work versus actually doing work. This statistic accumulates from zero.
	comp - Indicates the total CPU average core busy milliseconds for compression process cores since the node was reset.
	system - Indicates the total CPU average core busy milliseconds since the node was reset. This statistic reports the amount of the time the processor has spent polling while waiting for work versus actually doing work. This statistic accumulates from zero. This is the same information as the information provided with the `cpu busy` statistic and will eventually replace the `cpu busy` statistic.
cpu_core	id - Indicates the CPU core id.
	comp - Indicates the per-core CPU average core busy milliseconds for compression process cores since node was reset.
	system - Indicates the per-core CPU average core busy milliseconds for system process cores since node was reset.
id	Indicates the name of the node.
node_id	Indicates the unique identifier for the node.
rb	Indicates the number of bytes received.
re	Indicates the accumulated receive latency, excluding inbound queue time. This statistic is the latency that is experienced by the node communication layer from the time that an I/O is queued to cache until the time that the cache gives completion for it.
ro	Indicates the number of messages or bulk data received.
rq	Indicates the accumulated receive latency, including inbound queue time. This statistic is the latency from the time that a command arrives at the node communication layer to the time that the cache completes the command.
wb	Indicates the bytes sent.
we	Indicates the accumulated send latency, excluding outbound queue time. This statistic is the time from when the node communication layer issues a message out onto the Fibre Channel until the node communication layer receives notification that the message has arrived.
wo	Indicates the number of messages or bulk data sent.
wq	Indicates the accumulated send latency, including outbound queue time. This statistic includes the entire time that data is sent. This time includes the time from when the node communication layer receives a message and waits for resources, the time to send the message to the remote node, and the time taken for the remote node to respond.

Table 6 describes the statistics collection for volumes.

Table 6. Cache statistics collection for volumes and volume copies
Statistic	Acronym	Statistics for volume cache	Statistics for volume copy cache	Statistics for volume cache partition	Statistics for volume copy cache partition	Statistics for the Node Overall Cache	Cache statistics for mdisks	Units and state
read ios	ri	Yes	Yes					ios, cumulative
write ios	wi	Yes	Yes					ios, cumulative
read misses	r	Yes	Yes					sectors, cumulative
read hits	rh	Yes	Yes					sectors, cumulative
flush_through writes	ft	Yes	Yes					sectors, cumulative
fast_write writes	fw	Yes	Yes					sectors, cumulative
write_through writes	wt	Yes	Yes					sectors, cumulative
write hits	wh	Yes	Yes					sectors, cumulative
prefetches	p		Yes					sectors, cumulative
prefetch hits (prefetch data that is read)	ph		Yes					sectors, cumulative
prefetch misses (prefetch pages that are discarded without any sectors read)	pm		Yes					pages, cumulative
modified data	m	Yes	Yes					sectors, snapshot, non-cumulative
read and write cache data	v	Yes	Yes					sectors snapshot, non-cumulative
destages	d	Yes	Yes					sectors, cumulative
fullness Average	fav			Yes	Yes			%, non-cumulative
fullness Max	fmx			Yes	Yes			%, non-cumulative
fullness Min	fmn			Yes	Yes			%, non-cumulative
Destage Target Average	dtav				Yes		Yes	IOs capped 9999, non-cumulative
Destage Target Max	dtmx				Yes			IOs, non-cumulative
Destage Target Min	dtmn				Yes			IOs, non-cumulative
Destage In Flight Average	dfav				Yes		Yes	IOs capped 9999, non-cumulative
Destage In Flight Max	dfmx				Yes			IOs, non-cumulative
Destage In Flight Min	dfmn				Yes			IOs, non-cumulative
destage latency average	dav	Yes	Yes	Yes	Yes	Yes	Yes	s capped 9999999, non-cumulative
destage latency max	dmx			Yes	Yes	Yes		s capped 9999999, non-cumulative
destage latency min	dmn			Yes	Yes	Yes		s capped 9999999, non-cumulative
destage count	dcn	Yes	Yes	Yes	Yes	Yes		ios, non-cumulative
stage latency average	sav	Yes	Yes			Yes		s capped 9999999, non-cumulative
stage latency max	smx					Yes		s capped 9999999, non-cumulative
stage latency min	smn					Yes		s capped 9999999, non-cumulative
stage count	scn	Yes	Yes			Yes		ios, non-cumulative
prestage latency average	pav		Yes			Yes		s capped 9999999, non-cumulative
prestage latency max	pmx					Yes		s capped 9999999, non-cumulative
prestage latency min	pmn					Yes		s capped 9999999, non-cumulative
prestage count	pcn		Yes			Yes		ios, non-cumulative
Write Cache Fullness Average	wfav					Yes		%, non-cumulative
Write Cache Fullness Max	wfmx					Yes		%, non-cumulative
Write Cache Fullness Min	wfmn					Yes		%, non-cumulative
Read Cache Fullness Average	rfav					Yes		%, non-cumulative
Read Cache Fullness Max	rfmx					Yes		%, non-cumulative
Read Cache Fullness Min	rfmn					Yes		%, non-cumulative
Pinned Percent	pp	Yes	Yes	Yes	Yes	Yes		% of total cache snapshot, non-cumulative
data transfer latency average	tav	Yes	Yes					s capped 9999999, non-cumulative
Track Lock Latency (Exclusive) Average	teav	Yes	Yes					s capped 9999999, non-cumulative
Track Lock Latency (Shared) Average	tsav	Yes	Yes					s capped 9999999, non-cumulative
Cache I/O Control Block Queue Time	hpt					Yes		Average s, non-cumulative
Cache Track Control Block Queue Time	ppt					Yes		Average s, non-cumulative
Owner Remote Credit Queue Time	opt					Yes		Average s, non-cumulative
Non-Owner Remote Credit Queue Time	npt					Yes		Average s, non-cumulative
Admin Remote Credit Queue Time	apt					Yes		Average s, non-cumulative
Cdcb Queue Time	cpt					Yes		Average s, non-cumulative
Buffer Queue Time	bpt					Yes		Average s, non-cumulative
Hardening Rights Queue Time	hrpt					Yes		Average s, non-cumulative

Note: Any statistic with a name av, mx, mn, and cn is not cumulative. These statistics reset every statistics interval. For example, if the statistic does not have a name with name av, mx, mn, and cn, and it is an Ios or count, it will be a field containing a total number.

The term pages means in units of 4096 bytes per page.
The term sectors means in units of 512 bytes per sector.
The term s means microseconds.
Non-cumulative means totals since the previous statistics collection interval.
Snapshot means the value at the end of the statistics interval (rather than an average across the interval or a peak within the interval).

Table 7 describes the statistic collection for volume cache per individual nodes.

Table 7. Statistic collection for volume cache per individual nodes. This table describes the volume cache information that is reported for individual nodes.
Statistic name	Description
cm	Indicates the number of sectors of modified or dirty data that are held in the cache.
ctd	Indicates the total number of cache destages that were initiated writes, submitted to other components as a result of a volume cache flush or destage operation.
ctds	Indicates the total number of sectors that are written for cache-initiated track writes.
ctp	Indicates the number of track stages that are initiated by the cache that are prestage reads.
ctps	Indicates the total number of staged sectors that are initiated by the cache.
ctrh	Indicates the number of total track read-cache hits on prestage or non-prestage data. For example, a single read that spans two tracks where only one of the tracks obtained a total cache hit, is counted as one track read-cache hit.
ctrhp	Indicates the number of track reads received from other components, treated as cache hits on any prestaged data. For example, if a single read spans two tracks where only one of the tracks obtained a total cache hit on prestaged data, it is counted as one track read for the prestaged data. A cache hit that obtains a partial hit on prestage and non-prestage data still contributes to this value.
ctrhps	Indicates the total number of sectors that are read for reads received from other components that obtained cache hits on any prestaged data.
ctrhs	Indicates the total number of sectors that are read for reads received from other components that obtained total cache hits on prestage or non-prestage data.
ctr	Indicates the total number of track reads received. For example, if a single read spans two tracks, it is counted as two total track reads.
ctrs	Indicates the total number of sectors that are read for reads received.
ctwft	Indicates the number of track writes received from other components and processed in flush through write mode.
ctwfts	Indicates the total number of sectors that are written for writes that are received from other components and processed in flush through write mode.
ctwfw	Indicates the number of track writes received from other components and processed in fast-write mode.
ctwfwsh	Indicates the track writes in fast-write mode that were written in write-through mode because of the lack of memory.
ctwfwshs	Indicates the track writes in fast-write mode that were written in write through due to the lack of memory.
ctwfws	Indicates the total number of sectors that are written for writes that are received from other components and processed in fast-write mode.
ctwh	Indicates the number of track writes received from other components where every sector in the track obtained a write hit on already dirty data in the cache. For a write to count as a total cache hit, the entire track write data must already be marked in the write cache as dirty.
ctwhs	Indicates the total number of sectors that are received from other components where every sector in the track obtained a write hit on already dirty data in the cache.
ctw	Indicates the total number of track writes received. For example, if a single write spans two tracks, it is counted as two total track writes.
ctws	Indicates the total number of sectors that are written for writes that are received from components.
ctwwt	Indicates the number of track writes received from other components and processed in write through write mode.
ctwwts	Indicates the total number of sectors that are written for writes that are received from other components and processed in write through write mode.
cv	Indicates the number of sectors of read and write cache data that is held in the cache.

Table 8 describes the XML statistics specific to an IP Partnership port.

Table 8. XML statistics for an IP Partnership port
Statistic name	Description
ipbz	Indicates the average size (in bytes) of data that are being submitted to the IP partnership driver since the last statistics collection period.
ipre	Indicates the bytes retransmitted to other nodes in other clusters by the IP partnership driver.
iprt	Indicates the average round-trip time in microseconds for the IP partnership link since the last statistics collection period.
iprx	Indicates the bytes received from other nodes in other clusters by the IP partnership driver.
ipsz	Indicates the average size (in bytes) of data that are being transmitted by the IP partnership driver since the last statistics collection period.
iptx	Indicates the bytes transmitted to other nodes in other clusters by the IP partnership driver.

Actions

The following actions are available to the user:

OK: Click this button to change statistic collection.
Cancel: Click this button to exit the panel without changing statistic collection.

XML formatting information

The XML is more complicated now, as seen in this raw XML from the volume (Nv_statistics) statistics. Notice how the names are similar but because they are in a different section of the XML, they refer to a different part of the VDisk.

<vdsk idx="0"
ctrs="213694394" ctps="0" ctrhs="2416029" ctrhps="0"
ctds="152474234" ctwfts="9635" ctwwts="0" ctwfws="152468611"
ctwhs="9117" ctws="152478246" ctr="1628296" ctw="3241448"
ctp="0" ctrh="123056" ctrhp="0" ctd="1172772"
ctwft="200" ctwwt="0" ctwfw="3241248" ctwfwsh="0"
ctwfwshs="0" ctwh="538" cm="13768758912876544" cv="13874234719731712"
gwot="0" gwo="0" gws="0" gwl="0"

id="Master_iogrp0_1"
ro="0" wo="0" rb="0" wb="0"
rl="0" wl="0" rlw="0" wlw="0" xl="0">
Vdisk/Volume statistics
<ca r="0" rh="0" d="0" ft="0"
wt="0" fw="0" wh="0" ri="0"
wi="0" dav="0" dcn="0" pav="0" pcn="0" teav="0"  tsav="0"  tav="0"
pp="0"/>

<cpy idx="0">

volume copy statistics
<ca r="0" p="0" rh="0" ph="0"
d="0" ft="0" wt="0" fw="0"
wh="0" pm="0" ri="0" wi="0"
dav="0" dcn="0" sav="0" scn="0"
pav="0" pcn="0" teav="0"  tsav="0"
tav="0"  pp="0"/>

</cpy>
<vdsk>

The <cpy idx="0"> means its in the volume copy section of the VDisk, whereas the statistics shown under Vdisk/Volume statistics are outside of the cpy idx section and therefore refer to a VDisk/volume.

Similarly for the volume cache statistics for node and partitions:

<uca><ca dav="18726" dcn="1502531" dmx="749846" dmn="89"
sav="20868" scn="2833391" smx="980941" smn="3"
pav="0" pcn="0" pmx="0" pmn="0"
wfav="0" wfmx="2" wfmn="0"
rfav="0" rfmx="1" rfmn="0"
pp="0"
hpt="0" ppt="0" opt="0" npt="0"
apt="0" cpt="0" bpt="0" hrpt="0"
/><partition id="0"><ca dav="18726" dcn="1502531" dmx="749846" dmn="89"
fav="0" fmx="2" fmn="0"
dfav="0" dfmx="0" dfmn="0"
dtav="0" dtmx="0" dtmn="0"
pp="0"/></partition>

This output describes the volume cache node statistics where <partition id="0"> the statistics are described for partition 0.

Replacing <uca> with <lca> means that the statistics are for volume copy cache partition 0.