Slow response to commands that are issued from a passive node of a VCS cluster or CVM node-join failures when I/O fencing is enabled

book

Article ID: 100013797

calendar_today

Updated On:

Description

Error Message

No messages informing of ssd retries are logged in /var/adm/messages.  This problem is only evident as very slow booting of one or more cluster nodes or problems with CVM node-join while at least one node has SCSI3-PGR reservations on disks shared between nodes.

Cause

This is a change in Solaris and not a change in the SFHA products.  Oracle InfoDoc:  Format Utility Running Slowly on Veritas Cluster Nodes (Doc ID 1601460.1) incorrectly infers this is due to a default setting in VCS. This is incorrect. VCS does not manipulate ssd_retry_on_reservation_conflict in any way.

 

If the kernel parameter "ssd_retry_on_reservation_conflict" is enabled, the host retries the SCSI query on LUNs that are in-use on the active node in the cluster.  No messages about the retries are logged in /var/adm/messages

To check if the parameter is enabled run the following command:

#echo "ssd_retry_on_reservation_conflict/X" | mdb -k

ssd_retry_on_reservation_conflict:
ssd_retry_on_reservation_conflict:              1            

1 = Enabled
0 = Disabled
 

Resolution

1. Disable the kernel paramter "ssd_retry_on_reservation_conflict" by adding the following line to /etc/system:

set ssd:ssd_retry_on_reservation_conflict=0x0
 

2. Reboot the host for the changes to take effect.

 


Applies To

Solaris version 10 1/13 U11 and later  (Also includes Solaris 11.1 SPARC)  VERITAS Cluster Server (VCS) with SCSI3-PGR I/O Fencing enabled and at least one node has placed PGR reservations on LUN devices that are visible to both or all hosts.  The disks do not have to have the 'shared' flag set.
 

Issue/Introduction

Commands may respond slowly from the passive node of a cluster when I/O fencing is enabled.  This problem may also be evident with Cluster Volume Manager (CVM) shared disks with occasional CVM node-join timeouts as well as slow booting and slow response to commands that access disks. This affects both OS commands, such as format, as well as Storage Foundation commands, such as vxdisk.