File system corruption may occur due to Volume Manager incorrectly calculating pub_offset in a cluster environment

book

Article ID: 100008142

calendar_today

Updated On:

Description

Error Message

Various errors can occur when attempting to mount a file system including invalid Object location Table file (OLT) or a missing, corrupt, etc., superblock. fsck will also fail with errors such as:

UX:vxfs fsck: ERROR: V-3-20012: not a valid vxfs file system
invalid super-block
search for auxiliary super-block? (ynq)y
alternate super-block not found
UX:vxfs fsck: ERROR: V-3-20694: cannot initialize aggregate
file system check failure, aborting ... 

Cause

An example scenario where the file system could become corrupt is when a disk is initialized as a CDS disk type with a public offset of 16 on one node, and when the disk group is imported on another node, the disk will show a public slice disk offset of 0, unless the command vxdisk scandisks [for Linux] or vxdiskconfig [for Solaris] is executed on the node before the disk group is imported. If these commands are not run, the file system data will be allocated on the wrong location. The problem will not be noticed If the system is not rebooted since the incorrect disk offset of 0 matches the wrong location of the file system data. Once the system reboots, the public slice disk offset gets its correct value and now the observed file system metadata is in the wrong locations and cannot be mounted.

Resolution

This issue has been addressed if Storage Foundation for UNIX/Linux version 5.0MP3RP5 and 6.0 and above. If upgrading is not an option, the following procedure should be executed prior to deporting and importing a disk group whenever there is a change in disk format:

Linux:

vxdisk scandisks

Solaris:

vxdiskconfig

The above commands will prevent the issue from occurring. If the issue has already occurred, please contact Veritas Technical Service immediately. File system recovery depends on rebuilding the disk group and may not always be successful.

 

 

Issue/Introduction

If a disk is initialized by different disk formats, e.g. LUN resize, across failover cluster nodes, subsequent disk group deport/import operations may cause inconsistency in pub_offset to be flushed to the disk which could lead to file system corruption. The cached offset value will be corrected in memory and not read off disk until the system or node is rebooted.

Additional Information

ETrack: 2534316