I/O hang is observed when Storage Replicator Log (SRL) volume becomes full and SRL overflow protection is set to "override", "fail", or "dcm" mode

book

Article ID: 100011903

calendar_today

Updated On:

Cause

VVR offers multiple options for SRL overflow protection that can be used to meet different requirements: override, fail, dcm, and autodcm (default).

When SRL overflow protection mode is set to "override", "fail", or "dcm", and the SRL is about to become full (while the RLINK is connected), VVR will stall the primary-side I/O (write requests) until the SRL is drained below the threshold; in other words, VVR operates to preserve the consistency of the secondary site (although it may be out-of-date) in the event of an imminent SRL overflow.

These three modes differ from each other in their behavior while the RLINK is disconnected:

  • override: no protection (SRL will overflow; a full resync is required to bring the secondary up-to-date again)
  • fail: all write requests fail until SRL is drained
  • dcm: convert to DCM logging

 

With the default SRL overflow protection mode, "autodcm", VVR will automatically switch to DCM logging mode whenever SRL becomes full (regardless of RLINK status); application I/O is never stalled.

Resolution

Check the current SRL overflow protection setting, and switch to "autodcm" mode in order to avoid I/O hang upon imminent SRL overflow.

How to check the current SRL overflow protection mode (per RLINK):

# vxprint -Pl

( example )
Disk group: datadg
Rlink:    rlk_to_secondary_host
info:     timeout=500 packet_size=1440 rid=0.205642
---SNIP---
state:    state=ACTIVE
          synchronous=off latencyprot=off srlprot=override      <<<<
assoc:    rvg=rvg_data
---SNIP---
protocol: UDP/IP
flags:    write enabled attached consistent connected asynchronous

How to change SRL overflow protection mode to "autodcm" (each volume in the RVG must have a DCM):

# vradmin -g set [] srlprot=autodcm

( example )
# vradmin -g datadg set rvg_data srlprot=autodcm

 


Applies To

All VVR environment

Issue/Introduction

In Volume Replicator (VVR) environment, all application I/O on the primary node will hang when the SRL becomes almost full while SRL overflow protection mode is set to "override", "fail", or "dcm" (and RLINK is still connected); the I/O hang will persist until the SRL is drained below the pre-defined threshold (SRL usage < 95% or free space > 20MB, whichever comes first).
This is per design (expected behavior), but may result in a sustained service interruption if the user is not aware of the current setting or choose one of these modes without realizing the implications.