CFS file system with more than 4,294,967,296 (or 2^32) file system blocks is vulnerable to corruption when a new per-node intent log is created

book

Article ID: 100010640

calendar_today

Updated On:

Description

Error Message

Please note that the actual symptom may vary and depends on what type of data was incorrectly zeroed, either file data, VxFS metadata or free space extents.
 
If file system metadata was corrupted and VxFS attempts to access the corrupted metadata, then the system message log might contain messages resembling the following:
 
Example 1 - full fsck flag is set on a file system because an inode is corrupt:
 
vxfs: msgcnt 5 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/testdg/vol1 file system fullfsck flag set - vx_ierror
vxfs: msgcnt 6 mesg 017: V-2-17: vx_attr_iget - /dev/vx/dsk/testdg/vol1 file system inode 13675215 marked bad incore

 
 

Example 2 - full fsck flag is set on a file system because a directory block is corrupt:
 
vxfs: msgcnt 47 mesg 096:  V-2-96: vx_setfsflags - /dev/vx/dsk/testdg/vol1 file system fullfsck  flag set - vx_ierror
vxfs: msgcnt 48 mesg 017:  V-2-17: vx_dirbread - /dev/vx/dsk/testdg/vol1 file system inode 55010476  marked bad incore

 

 
Example 3 - full fsck flag is setup on a file system because a directory entry is corrupt:
 
vxfs: msgcnt 10 mesg 008: V-2-8: vx_direrr: vx_readdir_int_1 - /dev/vx/dsk/testdg/archive file system dir inode 56925668 dev/block 0/1537502882 dirent inode 0 error 6
vxfs: msgcnt 11 mesg 096: V-2-96: vx_setfsflags - /dev/vx/dsk/testdg/archive file system fullfsck flag set - vx_direrr

 

Cause

In CFS each node that has the file system cluster mounted has its own intent-log in the file system. An intent-log is created when an additional node mounts the file system as a CFS Secondary.
Note that intent-logs are never removed, they are reused.
 
After creating an intent log, it is cleared. When clearing the intent log, an incorrect block number is passed to the log clearing routine resulting in zeroing out an incorrect location. The incorrect location might point to file data, file system metadata, or part of the file system’s available free space. If file system metadata is corrupted, it will be detected by VxFS when the corrupt metadata is subsequently accessed and the file system will be marked for full fsck. If user file data is corrupt, then the corresponding user program may complain about the corruption.
 
 

How to check the number of file system blocks in the file system:
 
Use the fstyp command to extract information from the superblock, as follows:
 
# /opt/VRTS/bin/fstyp -t vxfs -v /dev/vx/rdsk/diskgroup/volume
vxfs
magic a501fcf5  version 7
ctime 1310631499 837351  (Thu Jul 14 09:18:19 2011 BST)
log_version 12 logstart 0  logend 0
bsize  4096 size  24159191040 dsize  24159191040  ninode 0  nau 0          <<===  size 24,159,191,040
defiextsize 0  oilbsize 0  immedlen 96  ndaddr 10
aufirst 0  emap 0  imap 0  iextop 0  istart 0
bstart 0  femap 0  fimap 0  fiextop 0  fistart 0  fbstart 0
=============================================

 
If the size is greater than 4,294,967,296 file system blocks like the above example, the file system is exposed to this issue. Avoid creating additional intent-logs in the file system (do not cluster mount the file system from additional nodes). Please contact Symantec support for further confirmation.
 

Resolution

There are no plans to address this issue by way of a patch or hotfix in the current or previous versions of the software at the present time. However, the issue is currently scheduled to be addressed in the next major revision of the product. Please note that Veritas Technologies LLC reserves the right to remove any fix from the targeted release if it does not pass quality assurance tests.  Veritas’ plans are subject to change and any action taken by you based on the above information or your reliance upon the above information is made at your own risk.


 

How to check if the problem has occurred:
 
The location of a per-node intent log can be checked with the following commands:
 
Find out the current total number of per-node intent logs in the file system. The VxFS fsdb command "pnolt" can be used.  
 
# echo "pnolt | grep Total" | /opt/VRTS/bin/fsdb /dev/vx/rdsk/testdg/testvol
Total number of pnolt records: 4
 
 

In the above example, currently there are four pnolts which means that there are currently four per-node intent logs.
 
Secondly, the inode number of the per-node intent log can be displayed with the same fsdb command.
 
# echo '0pnolt' | /opt/VRTS/bin/fsdb /dev/vx/rdsk/testdg/testvol | grep pn_logino
pn_logino[0] 9 pn_logino[1] 41 pn_flags 0x0
# echo '1pnolt' | /opt/VRTS/bin/fsdb /dev/vx/rdsk/testdg/testvol | grep pn_logino
pn_logino[0] 74 pn_logino[1] 75 pn_flags 0x0
# echo '2pnolt' | /opt/VRTS/bin/fsdb /dev/vx/rdsk/testdg/testvol | grep pn_logino
pn_logino[0] 32768 pn_logino[1] 32769 pn_flags 0x0
# echo '3pnolt' | /opt/VRTS/bin/fsdb /dev/vx/rdsk/testdg/testvol | grep pn_logino
pn_logino[0] 32772 pn_logino[1] 32773 pn_flags 0x0
 
 

Lets check the block number allocated to the intent log for pnolt3 ( the corresponding inode no is 32772)
 
# echo “1fset.32772i.mapall” | /opt/VRTS/bin/fsdb /dev/vx/rdsk/testdg/testvol
offset    device          block           length
0           0            19139067904     65536
 
 

In the above example, the location of the intent log is higher than 4,294,967,295, so when this intent-log was created (at mount time) a wrong location in file system would have been cleared (zeroed).
 
If the size is greater than 4,294,967,296 file system blocks like the above example, the file system is exposed to the issue. Avoid creating additional intent-logs in the file system (do not cluster mount the file system from additional nodes). Please contact Veritas support for further confirmation
 
**Note:: 0pnolt will use the intent-log created at mkfs time. This intent-log (1fset.9i.mapall) might also be located above 4,294,967,296 file system blocks as it can be relocated when resizing the file system, however relocating an intent-log will not result in corruption.
 

Issue/Introduction

At mkfs time, one intent-log is created in the file system, however every cluster file system [CFS] mount (be it a primary or secondary mount) has its own intent-log within the file system. An additional intent-log is therefore created when cluster mounting a CFS secondary for the first time, the CFS primary mount uses the intent-log created at mkfs time. To manage the intent-logs, and other extra objects required for CFS, we also create a holding object call a PNOLT – per node object location table. Once created, the PNOLTs (and corresponding intent-logs) are never deleted. Therefore, once a file system has been cluster mounted it will for evermore contain PNOLTs.

A cluster-mounted file system that has more than 4,294,967,296 file system blocks can zero out an incorrect location anywhere in the file system within the first 4,294,967,296 file system blocks due to an incorrect typecasting when a per-node intent log is created by a CFS secondary mount, corruption can occur when initialising (zeroing) the new intent-log immediately after creating it. The wrong location can be zeroed. This issue can only occur if the new intent-log is located above an offset of 4,294,967,296 file system blocks.

For example, if, in a 3-node SFCFS cluster, a file system was previously cluster mounted on a maximum of two nodes at the same time anytime in the past, then the file system will only contain two intent logs. If the file system is then cluster mounted from the third node in the cluster (meaning it is now mounted from three nodes at the same time), then the third mount will create a new PNOLT and new intent log in the file system. If this newly created intent log happens to be located at an offset of 4,294,967,296 + N file system blocks, then because of the incorrect typecasting the location starting from the N modulo 4,294,967,296 file system block will be incorrectly zeroed out. The amount of data which is incorrectly zeroed will be equal to or less than the size of the new intent log. An intent log can be up to 256MB in size.

Additional Information

ETrack: 3259634