Veritas Volume Manager (VxVM) administrative I/O operations may result in data loss with RHEL5 (REDHAT) when VxVM tasks are interrupted (Ctlr-z) and resumed with fg and bg

book

Article ID: 100015833

calendar_today

Updated On:

Cause


When administrative I/O is being performed, VxVM creates a task to handle the whole process of copying data between subdisks.

During the VxVM code implementation, we moved the logic of generating plex-attach IO operations from the kernel. Once we submit the task to the kernel, we wait for completion using the poll() routine.

Once the task is finished, we signal the file descriptor (fd) for the task about its completion, and expect that poll() will return, and check the status of the task in kernel.

Considering that these can be long running tasks, that could run for hours (based on the amount of data to be copied, IO load on volumes etc). Periodically checking the task status in kernel would not be a good approach as we want to avoid multiple IOCTLs, hence decided to use the poll() mechanism to signal.

Therefore to avoid periodically sending IOCTLs to the kernel for it to check the status of the task, we use the select() primitive on the file descriptor (fd) task and wait for the task completion event.

 
With RHEL5, the select() I/O primitive returned from the signal handler will not retry, instead, it will return an EINTR errno of "-1". With the original code logic, we considered EINTR as a user interrupt, hence terminated the operation due to user intervention.
 
The issue does not occur with RHEL6.x and higher, as RedHat have optimized their code.


Administrative I/O Impact:


When the vol_auto_adminio_control tunable is turned on, administrative I/O (adminio) will be set to TRUE, where we submit the task and wait for its completion by calling poll() or select().
If the vol_auto_adminio_control tunable is turned off, adminio is set to FALSE, it will use an alternate code path, a copy thread will be implemented and it will not use a poll() call.



SmartMove Value:


SmartMove reduces the time and I/O required to attach or reattach a plex to an existing VxVM volume, in the specific case where a VxVM volume has a VxFS file system mounted on it.
The SmartMove feature uses the VxFS information to detect free extents and avoid copying them. This behavior helps optimize the thin storage utilization.
SmartMove also provides the following benefits:
 
  • Less I/O is sent through the host, through the storage network and to the disks/LUNs
  • Faster plex creation, resulting in faster array migrations
  • Ability to migrate from a traditional LUN to a thinly provisioned LUN, removing unused space in the proces
To use the SmartMove feature, VxVM and VxFS must be running 5.0 MP3 or later.

Resolution

VxVM code changes have been done to handle the EINTR response when the task is not finished.
When poll() returns EINTR inside vol_admintask_wait(), we will now check the status of the task and retry poll() if task is not in a DONE state.



Patch Information:


A series of public (GA) QA tested patches for VxVM 6.0.5 and higher are being created for RHEL5. The tentative release date is late August 2015.

A VxVM 6.1.1 public patch will be released for RHEL5, as this is the last supported product version supporting RHEL5. The tentative release date for 6.1.1 is mid to late October 2015.

Note: These tentative release dates may be subject to change.
 
Even though the current RHEL6 and RHEL7 releases do not exhibit the issue, VxVM patches will eventually be released for 6.1.1, 6.2.1 and 7.0. These patches will include the enhanced VxVM code check to safeguard against future interoperability issues.


Workarounds:


Until the public GA VxVM patches can be deployed, we strongly recommend that both VxVM tunables for admin I/O and SmartMove are disabled.


1.] Disable auto throttling of administrative IO's
 
#  vxtune vol_auto_adminio_control
Tunable                               Current Value   Default Value   Reboot   
---------------------------------   ---------------   -------------   ------   
vol_auto_adminio_control                          1               1      N     


By default the value of tunable is 1, the feature is turned-on.
If the value of tunable is set to 0, the adminio de-prioritization feature is turned off.

In earlier VxVM releases, the tunable was hidden, however, it has been made visible in the more recent VxVM releases.
 
# vxtune | grep vol_auto_adminio_control


To disable adminio, type:

 
# vxtune vol_auto_adminio_control 0

# vxtune vol_auto_adminio_control

Tunable                               Current Value   Default Value   Reboot   
---------------------------------   ---------------   -------------   ------   
vol_auto_adminio_control                          0               1      N   



2.] Disable SmartMove


The vxdefault CLI command can be used to disable SmartMove functionality:

 
# vxdefault set usefssmartmove none
 
# vxdefault list
KEYWORD                        CURRENT-VALUE   DEFAULT-VALUE  
autoreminor                    on              on             
autostartvolumes               on              on             
fssmartmovethreshold           100             100            
reclaim_on_delete_start_time   22:10           22:10          
reclaim_on_delete_wait_period  1               1              
same_key_for_alldgs            off             off            
sharedminorstart               33000           33000          
storage_connectivity           resilient       resilient      
usefssmartmove                 none            all     
 


3.] To limit the risk when using vxevac, the -k argument should be used.


Example:

 
# /etc/vx/bin/vxevac -g movedg -k hus_1300_1563 alloc=hus_1300_1564 


As the “-k” option was specified, the user can use the rollback option to roll back the changes to the initial state.
 
# /etc/vx/bin/vxevac -g movedg rollback hus_1300_1563


Since the vxevac -k approach adds extra flexibility, the -k approach should be the recommended syntax in production environments and future use.

The user can safely place the vxevac –k operation in the background using &, and bring it to the foreground whenever needed.

Once the vxevac operation has finished, the user can then commit the event on successful completion.

 
# /etc/vx/bin/vxevac -g movedg commit hus_1300_1563


 

Issue/Introduction

A critical issue surrounding RHEL5 (RedHat 5.x) has been recently identified and can result in data loss. This article attempts to explain and operational impact, risk and valid workarounds.


Conditions:


1.] Platform: The issue only exists with RHEL5.
2.] Product Versions: Veritas Volume Manager (VxVM) 6.x.x and 7.x.
3.] Features that use SmartMove and AdminIO.

Note: Both SmartMove and AdminIO functionality are enabled by default. Storage Foundation 5.x.x is not impacted as VxVM is not using the poll() routine for tasks.
 
Administrative (admin) I/O’s consist of I/O’s that are needed for infrastructure (mirroring, snapshots). It can be anything that is not “application I/O”.

Typical examples in VxVM are:  mirror resyncs, subdisk moves, plex attaches, relayouts, snapshots.

When a VxVM admin I/O operation is interrupted with RHEL5.x, it will not continue when fg or bg is executed. This is because the VxVM operation has been abruptly interrupted.

VxVM commands that can initiate admin I/O:

vxassist mirror/snapcreate
vxplex att/cp/mv/instsync
vxsnap addmir/reattach
vxsd mv
vxevac


The Storage Foundation (SF) SmartMove feature, enables Veritas File System (VxFS) and Veritas Volume Manager (VxVM) know which blocks have data. VxVM, which is the copy engine for migration, copies only the used blocks and avoids copying unused blocks.


Example:


The vxevac utility moves subdisks off the specified Veritas Volume Manager (VxVM) disk (medianame) to the specified destination disks (new_medianame...)

With RHEL 5.x, when using SmartMove with "vxevac", if the operation is interrupted by "Ctlr-z" followed by fg or bg. The operation does not continue, instead it is terminated and it is not resumed.
This gives the incorrect impression that the operation has completed, resulting in potential loss of data.