How to detect and correct inode corruption associated with transient fiber link failures

book

Article ID: 100038941

calendar_today

Updated On:

Description

This article contains the procedure to detect and correct inode corruption associated with transient fiber link failures.

Warning: Read this article, in its entirety, before making any changes. Understand that failure to follow the proper procedures could lead to data loss. Veritas should be contacted in conjunction with using this article. We recommend doing a complete backup of the data, if possible, prior to correcting inode corruption to help prevent data loss.

Procedure

This is the procedure to help detect and correct Veritas File System inode corruption as a result of transient fiber link failures. Under certain conditions, incore inodes can be marked bad if all paths to a device have been disabled. If the paths are re-enabled, and the file system is still enabled, it is then possible for these incore inodes to be flushed to disk and the superblock marked as needing a full fsck. A subsequent full fsck will clear these inodes; deleting the file. This procedure is most relevant to inodes marked bad due to read failures, as the inode was not being updated at the time, and has the most probability of being recovered successfully.

After a link failure has been detected, the /var/adm/messages file should be analyzed for possible inode failures. File System will print "vxfs:" messages to /var/adm/messages, and will usually indicate which inodes have been marked bad.

A typical message looks like this:

Mar 15 17:26:21 ioccrmprep1 unix: Warning: msgcnt 31 vxfs: mesg 017:
vx_ilock - /opt/data/ora16/preprod file system inode 10 marked bad

This indicates inode 10 on file system /opt/data/ora16/preprod was marked bad. This is expected behavior, however, since the paths to the device were reset and the file system was still enabled, inode 10 was flushed to disk. Since this was a read failure, the likelihood that inode 10 is actually corrupt is very small.

Note: Veritas Support should be contacted if file system corruption is suspected.

Procedure to Delete and Correct

1. Unmount the file system in order to attempt repairs on corrupted inodes. The superblock can be analyzed on the failing file system to verify that it has been marked as needing a full fsck using the following command:

% echo "8192B.p S" | fsdb -F vxfs /dev/vx/rdsk/rootdg/meta

The actual device can be obtained from the vfstab file.

The output will look something like this:

super-block at 00000002.0000
magic a501fcf5 version 4
ctime 983738769 811577 (Sun Mar 4 12:46:09 2001 PDT)
log_version 9 logstart 0 logend 0
bsize 4096 size 6043904 dsize 6043904 ninode 0 nau 0
defiextsize 0 oilbsize 0 immedlen 96 ndaddr 10
aufirst 0 emap 0 imap 0 iextop 0 istart 0
bstart 0 femap 0 fimap 0 fiextop 0 fistart 0 fbstart 0
nindir 2048 aulen 32768 auimlen 0 auemlen 2
auilen 0 aupad 0 aublocks 32768 maxtier 15
inopb 16 inopau 0 ndiripau 0 iaddrlen 2 bshift 12
inoshift 4 bmask fffff000 boffmask fff checksum e06a935f
free 2459213 ifree 0
efree 1 2 2 2 1 1 0 0 0 1 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
flags 301 mod 0 clean 3c
time 984695526 21111 (Thu Mar 15 14:32:06 2001 PDT)
oltext[0] 15 oltext[1] 774 oltsize 1
iauimlen 1 iausize 4 dinosize 256
checksum2 41d
checksum3 0

The key is the flags field. In this case it is "301", which breaks down to mean: VX_FULLFSCK | VX_METAIOERR | VX_DATAIOERR per the following defines:

VX_FULLFSCK 0x0001 full fsck required
VX_LOGBAD 0x0002 log is invalid, do not do replay
VX_NOLOG 0x0004 no logging, do not do replay
VX_RESIZE 0x0008 resize in progress
VX_LOGRESET 0x0010 log reset desired
VX_UPGRADING 0x0020 upgrade in progress
VX_UQUOTACHECK 0x0040 V2 only, moved to CUT in V3
VX_GQUOTACHECK 0x0080 V2 only, moved to CUT in V3
VX_METAIOERR 0x0100 file system meta-data i/o error
VX_DATAIOERR 0x0200 file data i/o error

2. Now that it is known that this file system has corruption, it is a good idea to perform a full backup of your data. Also recommended is to dump the metadata with the "metasave" utility. Saving the metadata is a good idea in case there are problems with fsdb later on.

3. Run a full fsck with the -n option to see which inodes are marked bad:

% fsck -F vxfs -n /dev/vx/rdsk/rootdg/meta | grep "marked bad"

vxfs fsck: file system had I/O error(s) on meta-data.
vxfs fsck: file system had I/O error(s) on user data.
fileset 999 primary-ilist inode 2 marked bad, allocation flags (0x0001)
fileset 999 primary-ilist inode 3 marked bad, allocation flags (0x0001)
fileset 999 primary-ilist inode 10 marked bad, allocation flags (0x0001)

This indicates that inodes 2, 3, and 10 are marked bad.

4. Set the "aflag" field to 0x0 using fsdb. This step must be done very carefully since it involves writing to the file system structure itself. The incorrect use of fsdb can destroy the file system.

Now, clear inodes 2, 3, and 10:

% echo "999fset.2i.af=0x0" | fsdb -F vxfs /dev/vx/rdsk/rootdg/meta
0000028a.0230: 0
% echo "999fset.3i.af=0x0" | fsdb -F vxfs /dev/vx/rdsk/rootdg/meta
0000028a.0330: 0
% echo "999fset.10i.af=0x0" | fsdb -F vxfs /dev/vx/rdsk/rootdg/meta
0000028a.0a30: 0

Again, the device needs to be the raw device for the file system.

5. The inode aflag has been cleared for the 3 inodes. Now verify with fsck:

% fsck -F vxfs -n /dev/vx/rdsk/rootdg/meta | grep "marked bad"

vxfs fsck: file system had I/O error(s) on meta-data.
vxfs fsck: file system had I/O error(s) on user data.

6. Now it should be safe to run a full fsck with the -y option:

% fsck -F vxfs -y /dev/vx/rdsk/rootdg/meta

vxfs fsck: file system had I/O error(s) on meta-data.
vxfs fsck: file system had I/O error(s) on user data.
log replay in progress
file system is not clean, full fsck required
pass0 - checking structural files
pass1 - checking inode sanity and blocks
pass2 - checking directory linkage
pass3 - checking reference counts
pass4 - checking resource maps
OK to clear log? (ynq)y
set state to CLEAN? (ynq)y

7. Mount the file system and check the inodes:

% mount -F vxfs /dev/vx/dsk/rootdg/meta /meta

% ls -li /meta

total 28672224
4 -rw-r----- 1 vray 101 2097160192 Mar 8 12:35 data1_01a.dbf
5 -rw-r----- 1 vray 101 2097160192 Mar 8 12:35 data1_01b.dbf
6 -rw-r----- 1 vray 101 2097160192 Mar 8 12:35 data1_01c.dbf
7 -rw-r----- 1 vray 101 2097160192 Mar 8 12:35 data1_01d.dbf
8 -rw-r----- 1 vray 101 2097160192 Mar 8 12:35 data1_01e.dbf
9 -rw-r----- 1 vray 101 2097160192 Mar 8 12:35 data1_01f.dbf
10 -rw-r----- 1 root other 2097160192 Mar 15 13:45 file1.dbf
3 drwxr-xr-x 2 vray 101 96 Mar 4 12:46 lost+found/

Issue/Introduction

How to detect and correct inode corruption associated with transient fiber link failures

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"