SCSI event reporting

Nodes can notify their hosts of events for SCSI commands that are issued.

SCSI status

Some events are part of the SCSI architecture and are handled by the host application or device drivers without reporting an event. Some events, such as read and write I/O events and events that are associated with the loss of nodes or loss of access to backend devices, cause application I/O to fail. To help troubleshoot these events, SCSI commands are returned with the Check Condition status and a 32-bit event identifier is included with the sense information. The identifier relates to a specific event in the event log.

If the host application or device driver captures and stores this information, you can relate the application failure to the event log.

Table 1 describes the SCSI status and codes that are returned by the nodes.

Table 1. SCSI status
Status	Code	Description
Good	00h	The command was successful.
Check condition	02h	The command failed and sense data is available.
Condition met	04h	N/A
Busy	08h	An Auto-Contingent Allegiance condition exists and the command specified NACA=0.
Intermediate	10h	N/A
Intermediate - condition met	14h	N/A
Reservation conflict	18h	Returned as specified in SPC2 and SAM-2 where a reserve or persistent reserve condition exists.
Task set full	28h	The initiator has at least one task queued for that LUN on this port.
ACA active	30h	This code is reported as specified in SAM-2.
Task aborted	40h	This code is returned if TAS is set in the control mode page 0Ch. The node has a default setting of TAS=0, which cannot be changed; therefore, the node does not report this status.

SCSI Sense

Nodes notify the hosts of events on SCSI commands. Table 2 defines the SCSI sense keys, codes, and qualifiers that are returned by the nodes.

Table 2. SCSI sense keys, codes, and qualifiers
Key	Code	Qualifier	Definition	Description
2h	04h	01h	Not Ready. The logical unit is in the process of becoming ready.	The node lost sight of the system and cannot perform I/O operations. The additional sense does not have additional information.
2h	04h	0Ch	Not Ready. The target port is in the state of unavailable.	The following conditions are possible: The node lost sight of the system and cannot perform I/O operations. The additional sense does not have additional information. The node is in contact with the system but cannot perform I/O operations to the specified logical unit because of either a loss of connectivity to the backend controller or some algorithmic problem. This sense is returned for offline volumes.
3h	00h	00h	Medium event	This is only returned for read or write I/Os. The I/O suffered an event at a specific LBA within its scope. The location of the event is reported within the sense data. The additional sense also includes a reason code that relates the event to the corresponding event log entry. For example, a RAID controller event or a migrated medium event.
4h	08h	00h	Hardware event. A command to logical unit communication failure has occurred.	The I/O suffered an event that is associated with an I/O event that is returned by a RAID controller. The additional sense includes a reason code that points to the sense data that is returned by the controller. This is only returned for I/O type commands. This event is also returned from FlashCopy target volumes in the prepared and preparing state.
5h	25h	00h	Illegal request. The logical unit is not supported.	The logical unit does not exist or is not mapped to the sender of the command.

Reason codes

The reason code appears in bytes 20-23 of the sense data. The reason code provides the node with a specific log entry. The field is a 32-bit unsigned number that is presented with the most significant byte first. Table 3 lists the reason codes and their definitions.

If the reason code is not listed in Table 3, the code refers to a specific event in the event log that corresponds to the sequence number of the relevant event log entry.

Table 3. Reason codes
Reason code (decimal)	Description
40	The resource is part of a stopped FlashCopy mapping.
50	The resource is part of a Metro Mirror or Global Mirror relationship and the secondary LUN in the offline.
51	The resource is part of a Metro Mirror or Global Mirror and the secondary LUN is read only.
60	The node is offline.
71	The resource is not bound to any domain.
72	The resource is bound to a domain that was recreated.
73	Running on a node that is contracted out for some reason that is not attributable to any path that is going offline.
80	Wait for the repair to complete, or delete the volume.
81	Wait for the validation to complete, or delete the volume.
82	An offline thin-provisioned volume that caused data to be pinned in the directory cache. Adequate performance cannot be achieved for other thin-provisioned volumes, so they are taken offline.
85	The volume that is taken offline because checkpointing to the quorum disk failed.
86	The repairvdiskcopy -medium command that created a virtual medium error where the copies differed.
93	An offline RAID-5 or RAID-6 array that caused in-flight-write data to be pinned. Good performance cannot be achieved for other arrays and so they are taken offline.
94	An array MDisk that is part of the volume that is taken offline because checkpointing to the quorum disk failed.
95	This reason code is used in MDisk bad block dump files to indicate that the data loss was caused by having to resync parity with rebuilding strips or some other RAID algorithm reason due to multiple failures.
96	A RAID-6 array MDisk that is part of the volume that is taken offline because an internal metadata table is full.