CVM shared disk groups allow the 'detach-policy' to be set to two distinct values as described below:
Local (LDP) - the failure is contained on the local node, i.e. the disk device/dmpnode is detached only on the affected machine (assuming no other nodes experience similar issues with connectivity). Note that the node experiencing the failure will no longer be able to perform I/O to volumes containing objects on the failed disk(s) even if alternate enabled plexes exist within the volumes.
Global (GDP) - the failure is propagated cluster - if a disk device/dmpnode should fail on one node, all nodes detach plexes for that failed disk device/dmpnode, however can continue to perform I/O to affected volumes assuming alternate enabled plexes remain.
Note that GDP is the default 'detach-policy'.
To view the current setting of 'disk detach policy' for a shared disk group the vxdg list command can be used. For example:
# vxdg list sharedg | grep detach-policy
detach-policy: global
To change the detach-policy to local the following command should be used:
# vxdg -g sharedg set diskdetpolicy=local
To change the detach-policy to global the following command should be used:
# vxdg -g sharedg set diskdetpolicy=global
Note that commands modifying disk group configuration (i.e. modification of 'detach-policy') must be run on the CVM master.
From internal testing performed by Veritas it has been found that the current implementation of Cluster Volume Manager (CVM) is not able to predictably handle all disk failures when using a local disk detach policy and shared disk groups. Examples of the issues encountered are described below, however due to these findings, Veritas do NOT currently support use of 'local disk detach policy' in CVM environments where shared disk groups contain volumes with DCO logs attached.
Issues found when using local detach policy and shared disk groups containing volumes with DCO logs attached are described below.
Detach policy may differ depending on where I/O errors are first encountered:
CVM is not consistent in its use of GDP or LDP when the CVM master loses access to disk in a shared disk group containing DCO log volumes, and the disk detach policy is set to LDP. CVM behaviour changes depending on whether the initial I/O error after loss of connectivity is experienced against a data volume plex or a DCO log volume plex.
If the initial I/O error is against a data volume plex, CVM will use LDP as configured and detach the disk/dmpnode only on the node experiencing the issue. If the I/O error is seen against a DCO log volume plex, however, CVM will use GDP and detach the dmpnode cluster wide.
This causes the following problems:
- It is not predicable whether a failure will trigger LDP or GDP meaning that CVM appears to operate in an inconsistent manner.
- When using LDP it is reasonable to expect that a volumes redundancy will not be affected on nodes which do not experience issues. If I/O errors are seen against a DCO log volume plex, however, the use of GDP means that the corresponding dmpnode will be detached cluster wide causing a cluster wide loss of redundancy against any volumes using thie dmpnode.
- Failures on slave nodes will consistently trigger LDP as configured. As such use of LDP can cause an inconsistency in behaviour between nodes in the same cluster (i.e. master will use GDP whereas slave nodes will use LDP)
LDP being converted to GDP:
LDP gets converted to GDP if more than one node experiences an I/O error on the same plex. This behaviour is per LDP design however it can cause an issue in environments using CVM, shared disk groups, and site consistency (i.e. explicitly mirroring shared data volumes with DCO logs attached across sites/enclosures).
For example, when using LDP and a node loses access to a disk device/dmpnode and experiences a corresponding I/O error, CVM triggers 'check-repair' causing all nodes in the cluster to try and read/write back to the same region of plex against which the error was experienced. If this read/writeback I/O fails on any other node, LDP is converted to GDP causing the dmpnode to be detached cluster wide.
When using shared disk groups and site consistent environments it is reasonable to assume that all nodes at a given site may lose access to some storage whilst nodes at other sites retain access. Due to this behaviour, however, if there are multiple nodes at the site experiencing issues, the failure will be converted to GDP causing a cluster wide detach and possible loss of volume redundancy at all sites and not simply the site experiencing issues.
This causes the following problems:
- Failures which are localised to a subset of nodes get propagated to all nodes in the cluster
- A failure which is localised to one site in a campus cluster gets propagated to all sites in the cluster
- The expected functionality of LDP (i.e. failures are contained on a subset of nodes) is lost
To avoid the issues described in this document shared disk groups containing volumes with DCO logs attached should be configured to use a global disk detach policy. Note that this includes campus cluster environments using site consistency and shared disk groups as DCO logs are automatically created in this scenario.
Applies To
Storage Foundation with Cluster Volume Manager (CVM) shared disk groups where volumes have DCO logs attached
All supported unix/linux platforms/versions
Note that non CVM environments (i.e. no shared disk groups) and shared disk groups where DCO logs are not used are not affected by the issue. In these environments use of local detach policy is fully supported starting with the 5.1 release of Storage Foundation on Solaris/AIX/linux, and the 5.1SP1 release of Storage Foundation on HP-UX.