VMware ESXi guest virtual machines on NFS datastore become corrupted after deduplication of VxFS file system

book

Article ID: 100011804

calendar_today

Updated On:

Description

Error Message

There are no error messages apparent either on the ESXi host itself, nor the machine or appliance hosting the VxFS file system which is exported as an NFS share. Virtual Machines may not exhibit any immediate signs of corruption if silent corruption (only payload file system data affected) has occurred, or the corruption may be apparent if file system metadata has been corrupted, in which case the user may be prompted to run chkdsk.exe or fsck tools. In addition, VMs that were previously running may not be able to restart in cases where the corruption has affected files or virtual disk structures which are required for booting.

Typical output messages from the chkdsk.exe utility on a Windows VM after corruption has been identified:

from file record segment 36732.
Deleting corrupt attribute record (128, “)
from file record segment 36815.
Deleting corrupt attribute record (128, ““)
from file record segment 36831.
Deleting corrupt attribute record (128, “)
from file record segment 36834.
Deleting corrupt attribute record (128, “)
from file record segment 37171.
Deleting corrupt attribute record (128, “)
from file record segment 37176.
Deleting corrupt attribute record (128, ““)
from file record segment 37460.
Deleting corrupt attribute record (128, “)
from file record segment 59509.
59648 file records processed.
rile verification completed.
19 large file records processed.
0 bad file records processed.
o EA records processed.
44 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
59 percent complete. (61438 of 86646 index entries processed)

from file record segment 37460.
Deleting corrupt attribute record (128, “)
from file record segment 59509.
59648 file records processed.
rile verification completed.
19 large file records processed.
o bad file records processed.
O EA records processed.
44 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
correcting error in index $130 for file 3722.
correcting error in index $130 for file 3722.
sorting index $130 in file 3722.
correcting error in index $130 for file 6644.
correcting error in index $130 for file 6644.
sorting index $130 in file 6644.
correcting error in index $130 for file 6993.
correcting error in index $130 for file 6993.
sorting index $130 in file 6993.
correcting error in index $130 for file 7458.
correcting error in index $130 for file 7458.
sorting index $130 in file 7458.
65 percent complete. (68986 of 86646 index entries processed)

from file record segment 37460.
Deleting corrupt attribute record (128, “)
from file record segment 59509.
59648 file records processed.
rile verification completed.
19 large file records processed.
o bad file records processed.
O EA records processed.
44 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
correcting error in index $130 for file 3722.
correcting error in index $130 for file 3722.
sorting index $130 in file 3722.
correcting error in index $130 for file 6644.
correcting error in index $130 for file 6644.
sorting index $130 in file 6644.
correcting error in index $130 for file 6993.
correcting error in index $130 for file 6993.
sorting index $130 in file 6993.
correcting error in index $130 for file 7458.
correcting error in index $130 for file 7458.
sorting index $130 in file 7458.
65 percent complete. (68986 of 86646 index entries processed)
 

 

Resolution

The shared_pg_enabled VxFS tunable can be set to 0, disabling the VxFS shared page cache. This setting prevents the corruption from occurring. The tunable can be set globally (affecting all VxFS file systems) or on a per file system basis.

To configure the tunable, it is necessary to modify the /etc/vx/tunefstab file. If the file does not exist, create it:
# touch /etc/vx/tunefstab

To set the tunable for each NFS exported file system individually, add an entry as follows:

# echo "/dev/vx/dsk// shared_pg_enabled=0" >> /etc/vx/tunefstab
Replace and with values from your NFS host. Use the df command to find the correct strings.

To set the tunable globally, use the following command:
# echo "system_default shared_pg_enabled=0" >> /etc/vx/tunefstab

More information on the tunefstab file is available via the installed man pages, or at https://sort.Veritas.com/public/documents/sfha/6.0/linux/manualpages/html/man/file_system/html/man4/tunefstab.4.html


Applies To

Any VMware vSphere environment where the datastore is configured on an NFS share which is backed by a deduplicated VxFS file system. Deduplication is available only on Storage Foundation 6.x or newer. Environments where deduplication is not used are not affected.

Issue/Introduction

In a VMware vSphere environment, it is necessary to configure one or more storage locations (called datastores) on each ESXi host in which to store the configuration and virtual disk files for guest Virtual Machines (VMs). These datastores can be either block-based and located on locally attached disks or SAN LUNs, or file-based and located on a remote NFS share. If the NFS share mounted as an ESXi host datastore is backed by a VxFS file system which is then deduplicated, the virtual hard disk images for VMs can become corrupted, sometimes to such an extent that they cannot be repaired using the chkdsk.exe tool for Windows guests or the fsck tool for UNIX and Linux VMs.

Datastores hosting two or more similar VMs (especially cloned VMs) are significantly more susceptible to corruption, since there are more duplicate blocks in the filesystem where space can be reclaimed.