Unable to start vxconfigd

book

Article ID: 100024422

calendar_today

Updated On:

Description

Error Message

# vxconfigd -k   

VxVM vxdisk ERROR V-5-1-684 IPC failure: Configuration daemon is not accessible

 

Cause

Incorrect LUN provisioning procedures has caused an unstable device tree from the Operating System and Volume Manager perspective which may lead to an incore corruption of the Volume Manager in kernel configuration database.

Resolution

The only clean solution to fix this problem is to perform a reconfiguration reboot of the system.

Further recommendations to avoid hitting this type of situation are:

1.) Ensure that correct LUN provisioning guidelines are followed to ensure correct removal of LUN's from a host from Volume Manager and the Operating System perspective.

2.) Upgrade to 5.0 MP3 or later where the Data Corruption Prevention Activation (DCPA) feature is introduced, which will avoid this type of vxconfigd incore corruption. However, there is no guarantee that the DCPA will be able to completely prevent this situation from occurring.

3.) Ensure that the required Hardware settings are configured at the Storage Array end based on the Hardware Compabibility List and Hardware article documents for the version of Storage Foundation.


 


Issue/Introduction

The vxconfigd daemon cannot be restarted. Diskgroups are imported and volumes are enabled with filesystems mounted. All I/O is operating normally but vxconfigd cannot be restarted. Due to a recently performed LUN addition followed by the execution of "vxdisk scandisks" or "vxdctl enable" may lead to a similar situation where vxconfigd may core dump and/or may not be able to restart. This will lead to vxconfigd being inaccessible and therefore no Volume Manager operations can be performed. However, any mounted filesystems on existing volumes may continue to perform the I/O operations as usual and without any impact due to vxconfigd not being in a running state. Since Volume Manager and VxDMP is a layer on top of the OS SCSI sd/ssd layer, it depends upon the underlying layers to manage and configure the Volume Manager device tree. It may not be possible to correct the vxconfigd incore corruption by reverting the changes to the device tree by unpresenting the newly added LUN's. There may be one or more symptoms to identify this situation. 1.) Devices marked for removal in Volume Manager previously and the new LUN/s may be presented via the old paths. Volume Manager considers the old device to return back and since the newly presented LUN is not the same as the old LUN, vxconfigd will not manage it. # grep "0xffffffff" /etc/vx/disk.info NETAPP%5FLUN%5F2082338%5FHnWdg4IGi%2FcU c3t500A098387095B30d1 0xffffffff 0x2 FAS30700_1 FAS3070 2082338
NETAPP%5FLUN%5F2082338%5FHnWdg4IGibAy c3t500A098387095B30d3 0xffffffff 0x2 FAS30700_3 FAS3070 2082338
NETAPP%5FLUN%5F2082338%5FHnWdg4IGic%2FB c3t500A098387095B30d5 0xffffffff 0x2 FAS30700_5 FAS3070 2082338
NETAPP%5FLUN%5F2082338%5FHnWdg4IGidTT c3t500A098387095B30d6 0xffffffff 0x2 FAS30700_6 FAS3070 2082338 If such entries with "0xffffffff" exist in the "/etc/vx/disk.info", then it means that the devices are marked for removal but not yet removed. A device tree cleanup from the OS and Volume Manager perspective is recommended prior to any new LUN provisioning on the host.   2.) Discrepancy in the number of devices visible to the host device tree as compared to the available devices presented to the host. There may be a situation where the Operating System may have fewer LUN's accessible, but more number of device metanodes visible to the host due to previous LUN provisioning which may have left some stale devices: eg: From the OS format output: # echo | format | tail -4
Mode sense page(3) reports nsect value as 1165, adjusting it to 911
          /pci@1e,600000/pci@0/pci@9/pci@0,2/pci@1/scsi@2,1/sd@6,0
      17. c3t8d0 <FUJITSU-MAW3073NC-0104 cyl 65533 alt 2 hd 2 sec 1095>
          /pci@1e,600000/pci@0/pci@9/pci@0,2/pci@1/scsi@2,1/sd@8,0
Specify disk (enter its number): Specify disk (enter its number):
  # ls -l /dev/rdsk/ | grep -i c*s2 | wc -l
      28 As we can see from above, the OS format reports 18 devices accessible as compared to 28 device dmpnodes visible to the host. This is a mismatch at the OS level itself and there may be suspect stale device nodes. If such a situation exists, then kindly verify with the Operating System vendor to ensure that the correct number of device nodes are visible on the host from an Operating System perspective before provisioning any new storage.   3.) Any other discrepancy in the device tree may lead to a situation of a incore corruption of the Volume Manager vxconfigd database. Ensure that the Operating System and any other multipathing software device tree is clean before rescanning the Volume Manager device tree