Due to Etrack incident 1469365 as listed in the Supplemental Material section, a Persistent FastResync snapshot (with version 20DCO) in DISABLED or INVALID state can cause the source volume to hang. (This issue doesn't affect Persistent FastResync snapshot with version 0DCO. DCO stands for Data Change Object.) The incident causes VeritasVolume Manager to incorrectly handle the failed write operation to the disabled snapshot volume and leads to the subsequent I/O hang.
The following hang scenarios have been reported when a source volume has a DISABLED or INVALID snapshot.
Scenario 1. The file system on the source volume hangs.
Scenario 2. The mount process hangs when the file system on the source volume is being mounted.
Scenario 3. In a Veritas Volume Replication (VVR)configuration the affected primary data volume may cause vxrecover to hang.
Scenario 4. In a VVR configuration if the affected source volume is a secondary data volume, the VVR replication or synchronization can hang.
Scenario 5. In a VVR configuration if the affected source volume is a secondary data volume, the secondary Rlink may not be able to disconnect. It is because the VVR error handling Staged I/O may hang because previous I/Os are hung. The will lead to the secondary rlink remains in "connected" state while the primary rlink goes into "disconnected" state.
Once the I/Os started to hang, subsequent vx commands (e.g.vxsnap, vxrlink) may timeout and the following system error messages are logged.
unix: Warning: VxVM vxio V-5-3-0 commit: Timedout abort the transaction!
unix: Warning: VxVM vxio V-5-3-0 commit: Timedout waiting for Volume PFIcerpt-rvl_0d to quiesce, count 1
The incident is fixed in the following patches and the subsequent patches with higher versions.
Storage Foundation 5.0MP3RP2 on Solaris
Storage Foundation 5.0MP3RP2on AIX
Storage Foundation 5.0MP3RP2 on Linux
Storage Foundation 5.0.1 onHP-UX 11.31
Veritas strongly recommends customers who are using Persistent FastResync snapshot with version 20 DCO to apply the latest patch as soon as possible.
The latest patches can be obtained through Veritas Operations Services(VOS).
https://sort.veritas.com
The incident is not fixed on HP-UX 11.23 platform yet as of March2010. The fix will be available in future release of the Veritas Volume Manager 5.0MP2 Rolling Patch on HP-UX 11.23platform.
Workaround
=========
A workaround for the hang problem is to dissociate the snapshot volume. Once the offensive snapshot volume is dissociated from the source volume, the incident can be avoided. Please note that if the system hangs during boot because vxrecover (which is started by the system startup rc script) hangs, you'll need to boot the system to single user mode to fix the issue.
(Notes - The FastResync feature was previously called Fast Mirror Resynchronization or FMR.)