Setting number of diskgroup configuration copies to more than 255 can cause configuration copy to enter error state

book

Article ID: 100007348

calendar_today

Updated On:

Description

Error Message

config disk 3pardata0_4858 copy 1 len=48144 state=error failed
       config-tid=0.4708 pending-tid=0.4708
       Error: block=1 error=Internal configuration daemon error

config disk ibm_ds8x002_1285 copy 1 len=48144 state=new failed 
       config-tid=0.2230600 pending-tid=0.2230600
       Error: error=Volume error 0
 

Cause

The  problem is caused by Etrack incident 339282 where the diskgroup configuration update is limited by the vol_maxparallelio parameter.   By default vol_maxparallelio is set to 255.    When the diskgroup configuration is updated, all the configuration copies will be updated in parallel and the parameter vol_maxparallelio limits the number of parallel I/Os can be performed in pre-5.1 VxVM versions.

Resolution

Please upgrade to Storage Foundation 5.1 and above to fix the issue.

If the diskgroup configuration is already corrupted because of this issue, please increase the value of vol_maxparallelio to higher than the number of disks in the diskgroup and reboot the system to fix the diskgroup configuration copy corruption.   All the configuration copies with the following error can be fixed in this way.

config disk 3pardata0_4858 copy 1 len=48144 state=error failed
       config-tid=0.4708 pending-tid=0.4708
       Error: block=1 error=Internal configuration daemon error

For the following configuration copy error, the disk may need to be temporarily removed from the diskgroup and add it back to fix it.

config disk ibm_ds8x002_1285 copy 1 len=48144 state=new failed 
       config-tid=0.2230600 pending-tid=0.2230600
       Error: error=Volume error 0
 
You may need to temporarily remove the disk and add the disk back to fix the above error.

# vxdg -g "dg" -k rmdisk "dm"

# vxdg -g "dg" -k adddisk "dm"="da"

# vxdisk list 3pardata0_4858
Device:    3pardata0_4858               <<< this is the "da" name (disk access name)
devicetag: 3pardata0_4858
type:      auto
hostid:    p595_8n_mp1
disk:      name=adg002 id=1321935524.346.p595_8n_mp1       <<< the is the "dm" name (disk media name)

Please note that if there are mirrored volumes, the mirrored volumes may need to be resynchronized again.


Applies To

The problem only affects pre-5.1 VxVM versions.    The problem is fixed through the Etrack incident 339282 on VxVM 5.1 and above.   Please refer to the Supplemental Material sectoin for details.

Issue/Introduction

In pre-5.1 Veritas Volume Manager (VxVM), if there are more than 255 enabled database configuration copies, VxVM will fail to update all the configuration copies and leave some of them in error state. For example, if there are more than 255 disks in the diskgroup and the diskgroup attribute nconfig is setup to "all", then most of the diskgroup configuration copies will enter error state when it is checked with vxdg list "dg" command. # vxedit -g adg set nconfig=all nlog=all adg # vxdg list adg
Group:     adg
dgid:      1321935150.327.p595_8n_mp1
import-id: 1024.948
flags:     cds
version:   140
alignment: 8192 (bytes)
local-activation: read-write
ssb:            on
autotagging:    on
detach-policy: global
dg-fail-policy: dgdisable
copies:    nconfig=all nlog=all           <<< enabling the configuration copies on all disks
config:    seqno=0.4716 permlen=48144 free=47136 templen=463 loglen=7296

config disk 3pardata0_4857 copy 1 len=48144 state=clean online         <<< GOOD COPY

config disk 3pardata0_4858 copy 1 len=48144 state=error failed           <<< BAD COPY
       config-tid=0.4708 pending-tid=0.4708
       Error: block=1 error=Internal configuration daemon error
...
config disk ibm_ds8x002_1285 copy 1 len=48144 state=new failed       <<< BAD COPY
       config-tid=0.2230600 pending-tid=0.2230600
       Error: error=Volume error 0
   

Additional Information

ETrack: 339282