The memory of vxconfigd grows from:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10065 root 18 0 641m 15m 2724 R 82.4 0.4 0:10.22 vxconfigd
10065 root 18 0 643m 17m 2724 R 83.3 0.4 0:12.73 vxconfigd
10065 root 17 0 73016 15m 2724 S 88.6 0.4 0:15.40 vxconfigd
...............
to:
10065 root 25 0 780m 153m 2800 R 86.1 3.9 78:11.60 vxconfigd
10065 root 16 0 790m 153m 2800 S 87.3 3.9 78:14.23 vxconfigd
10065 root 18 0 780m 153m 2800 R 88.9 3.9 78:16.91 vxconfigd
..............
vxdclid is slow in processing the events but the instrumented code for idle lun probing is generating tons of events, so events get pile up in vxdclid client structure, if stop generating event it slowly clear all the events and free up the memory. It is not leaking any memory, only thing is processing of events is slow which is causing vxconfigd memory to grow. In customer case event were continuously coming that is why memory usage never decreased.
From dmpevents log we can see that the paths are under EMC Powerpath control. So the node will not be marked idle but a node_idle event will be generated for all idle paths under TPD metanodes every second. We should either return something from the function dmp_set_node_idle() and call the event only on success or add the if condition to check if it elongs to a TPD metanode at the beginning of dmp_check_node_idle() and return from there if it is so. This should fix the customer's problem.
This problem is a cross-platform known issue, and will be fixed in SF5.1RP2.
Applies To
SF5.1 Solaris 10
Reproduce steps:
1. setup vxdclid by /opt/VRTSsfmh/adm/dclisetup.sh
2. connect a EMC diskarray with powerpath (any model is ok)
3. make a diskagroup
4. wait 7~8 hours, and you can notice vxconfigd memory usage is grow up to a unreasonable amount.
(for different platform use different tools to check memory usage for vxconfigd)