Disks will be seen in "error" state even if all the paths to the devices are in enabled state

book

Article ID: 100004284

calendar_today

Updated On:

Description

Error Message

Disks are seen in "error" state in Volume Manager "vxdisk list" output:

# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
ibm_vscsi0_0 auto:LVM        -            -            LVM
ibm_vscsi0_1 auto:LVM        -            -            LVM
ibm_vscsi0_2 auto:LVM        -            -            LVM
ibm_vscsi0_3 auto:LVM        -            -            LVM
san_vc0_0    auto            -            -            error
san_vc0_1    auto            -            -            error
san_vc0_2    auto            -            -            error
san_vc1_0    auto            -            -            error
san_vc1_1    auto            -            -            error
san_vc1_2    auto            -            -            error
san_vc2_0    auto            -            -            error
san_vc2_1    auto            -            -            error
san_vc2_2    auto            -            -            error
san_vc3_0    auto:cdsdisk    sapGODdg03   sapGODdg     online nohotuse
san_vc3_1    auto:cdsdisk    sapGODdg04   sapGODdg     online nohotuse
san_vc3_2    auto:cdsdisk    -            (vxfencoorddg) online
san_vc3_3    auto:none       -            -            online invalid
-            -         sapGODdg01   sapGODdg     failed nohotuse was:san_vc1_0
-            -         sapGODdg02   sapGODdg     failed nohotuse was:san_vc1_1
 

Resolution

Check for the device errors and the configuration. Confirm that the Operating System can view the device and access the OS label of the devices.

# vxdisk -o alldgs list | egrep "DEVICE|san_vc1_0"
DEVICE       TYPE            DISK         GROUP        STATUS
san_vc1_0    auto            -            -            error
-            -         sapGODdg01   sapGODdg     failed nohotuse was:san_vc1_0

As we can see from above, the device is in error state and also seen as in "failed" state in Volume Manager.

# vxdisk list san_vc1_0
Device:    san_vc1_0
devicetag: san_vc1_0
type:      auto
flags:     error private autoconfig
pubpaths:  block=/dev/vx/dmp/san_vc1_0 char=/dev/vx/rdmp/san_vc1_0
guid:      {945cc878-1dd1-11b2-8e63-0a297c632e49}
udid:      IBM%5F2145%5F020063a08b20XX01%5F60050768018E822C80000000000001C9
site:      -
errno:     No such file or directory                                                                                                  <<<====== Note This errno: 
Multipathing information:
numpaths:   4
hdisk12 state=enabled
hdisk31 state=enabled
hdisk36 state=enabled
hdisk47 state=enabled

# lsdev -C | egrep "hdisk12|hdisk31|hdisk36|hdisk47"
hdisk12        Available 00-08-02 FC 2145
hdisk31        Available 00-08-02 FC 2145
hdisk36        Available 01-08-02 FC 2145
hdisk47        Available 01-08-02 FC 2145
 

As we can see, all the paths to the device are in enabled state in Volume Manager and are in Available state in the Operating system. The indicator to proceed is from the "errno" which suggests "No file or directory"

So what file or directory does this refer to? It could be the dmpnode which gets created by vxconfigd for each device in the directory locations in /dev/vx/[r]dmp

Let's check the files in the following locations on the system:

# ls -l /dev/vx/rdmp
total 8
drwxr-xr-x    2 root     system          256 Dec 11 07:11 .
drwxr-xr-x    6 root     system         4096 Oct 17 14:58 ..
crw-------    1 root     system       40,  2 Dec 11 07:11 ibm_vscsi0_0
crw-------    1 root     system       40,  1 Dec 11 07:11 ibm_vscsi0_1
crw-------    1 root     system       40,  3 Dec 11 07:11 ibm_vscsi0_2
crw-------    1 root     system       40,  4 Dec 11 07:11 ibm_vscsi0_3
crw-------    1 root     system       40,  7 Dec 11 07:11 san_vc3_0
crw-------    1 root     system       40,  6 Dec 11 07:11 san_vc3_1
crw-------    1 root     system       40,  5 Dec 11 07:11 san_vc3_2
crw-------    1 root     system       40,  8 Dec 11 07:11 san_vc3_3

# ls -l /dev/vx/dmp
total 8
drwxr-xr-x    2 root     system          256 Dec 11 07:11 .
drwxr-xr-x    6 root     system         4096 Oct 17 14:58 ..
brw-------    1 root     system       40,  2 Dec 11 07:11 ibm_vscsi0_0
brw-------    1 root     system       40,  1 Dec 11 07:11 ibm_vscsi0_1
brw-------    1 root     system       40,  3 Dec 11 07:11 ibm_vscsi0_2
brw-------    1 root     system       40,  4 Dec 11 07:11 ibm_vscsi0_3
brw-------    1 root     system       40,  7 Dec 11 07:11 san_vc3_0
brw-------    1 root     system       40,  6 Dec 11 07:11 san_vc3_1
brw-------    1 root     system       40,  5 Dec 11 07:11 san_vc3_2
brw-------    1 root     system       40,  8 Dec 11 07:11 san_vc3_3
 

The device dmpnodes do not exist in /dev/vx/[r]dmp. As we can see, all the devices in error state have missing dmpnodes in the OS.

So the solution is to run "vxdctl enable" to rescan the device tree and re-build the missing dmpnodes for the devices.

However, if a "vxdctl enable" does not resolve the problem, then it is something else preventing the device dmpnodes from being created as device character/block devices in the directory location /dev/vx/[r]dmp

The way to troubleshoot this is to restart vxconfigd in debug mode or simply enable debug mode and then run "vxdctl enable"

We run the "vxconfigd" in debug mode 9

# vxdctl debug 9 /var/tmp/vxconfigd.log

# vxdctl enable

and if we scan for the device name by a grep from the vxconfigd.log, here are the messages seen: (only snippet provided below)

12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-20300 assign_disk_local_name: Assign name san_vc1_0 with flag 0x2 to disk with devno 0x280011
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-21656 ddl_set_alias_property: Associating alias hdisk12 of type 0 with DMP device san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-14467 Disk is /dev/rhdisk12, DMP node is san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-21656 ddl_set_alias_property: Associating alias hdisk31 of type 0 with DMP device san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-14467 Disk is /dev/rhdisk31, DMP node is san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-21656 ddl_set_alias_property: Associating alias hdisk36 of type 0 with DMP device san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-14467 Disk is /dev/rhdisk36, DMP node is san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-21656 ddl_set_alias_property: Associating alias hdisk47 of type 0 with DMP device san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-14467 Disk is /dev/rhdisk47, DMP node is san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-15012 dmp_make_mpnode(thread 1078):  devno 0x280011 device tag = san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-15020 dmp_make_mpnode:raw pathname = /dev/vx/rdmp//san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-15019 dmp_make_mpnode:block pathname = /dev/vx/dmp//san_vc1_0
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-0 mknod: Cannot make node /dev/vx/rdmp//san_vc1_0: No space left on device
12/11 09:30:03:  VxVM vxconfigd DEBUG V-5-1-0 mknod: Cannot make node /dev/vx/dmp//san_vc1_0:No space left on device

As we can see from the above snippet of the vxconfigd debug log, the error is clearly indicating the problem "Cannot make node.... No space left on device".

Hence we need to check what is the space utilization on the root filesystem which is where the /dev/ directory will reside.

# df -k
Filesystem    1024-blocks   Free     %Used    Iused %Iused Mounted on
/dev/hd4           524288         0            100%     9916    88%    /
 

As we can see, the root filesystem is showing space utilization as 100% with 0 Free blocks. Hence, the problem is identified as why are the device dmpnodes unable to be created for some devices, whereas other devices have existing dmpnodes and hence few or all devices may be seen in "error" state in Volume Manager even if the underlying dmp paths are seen as enabled for those devices. This is not a Volume Manager issue but an underlying OS issue which causes the Volume Manager failure to create the required dmpnodes for devices in the Operating System.

If the issue is identical to the above explained scenario, then it will resolve by following the steps as below:

1.) Free up sufficient space in the root (/) filesystem to enable the dmpnodes to be created in /dev/vx/[r]dmp. Also, it is not wise to continue running the Operating System with 100% space utilization for the root filesystem

2.) Perform a "vxdctl enable"

3.) If step 2 above does not resolve the problem, and devices are still seen in error state, then try to restart vxconfigd to restart the daemon. (Please note if system in VCS configuration, it is wise to freeze the system before performing a "vxconfigd -k")

# vxconfigd -k

4.) If restarting vxconfigd also does not resolve the issue then troubleshoot the issue further as there could be multiple issues causing the problem. You may contact the Technical Support if the above steps do not resolve the problem.

 

 

Issue/Introduction

Disks are in "error" state in volume manager even if all the paths to the devices are in Enabled state and thus preventing the diskgroup from being imported. If the diskgroup is controlled from VCS, then it would mean that the Service Group containing the DiskGroup resource would not able to online the resource.