Veritas Volume Manager (VxVM) 5.1 SP1 RP2 (Solaris Sparc) addresses a VxVM defect surrounding a memory leak in relation to Third Party Dynamic-Multipathing (TPD) controlled devices

book

Article ID: 100005945

calendar_today

Updated On:

Description

Error Message



How to validate and confirm the memory leak in the vxconfigd process on Solaris has been addressed

 

This procedure uses the "libumem" feature of the Solaris Operating System, the feature was first made available in Solaris 9 Update 3, hence this approach can not be employed on a Solaris server prior to this version.

libumem is a library, developed by Sun, that is used to track memory allocations of userland processes and as such, can be used to help identify memory leaks.

The process requires that the VxVM daemon vxconfigd be stopped and restarted in order for the libumem feature to be used.

Ensure that any applications that make use of vxconfigd will not be disrupted whilst the below procedure is carried out.

For example, it is recommended to freeze all related "Service Groups" if Veritas Cluster Server (VCS) is in use.
                       Check that vxconfigd is not carrying out any tasks (with vxtask list) prior to implementing this procedure.

1.]   Stop the VxVM daemon "vxconfigd".
 

# vxdctl stop


# vxdctl mode
mode: not-running


2.] Make a copy of the current vxconfigd binary, and rename it as shown below.
 

# cd /sbin

# ls -al vxconfigd*
-r-xr-xr-x   1 root     sys      5594248 Apr  5 20:05 vxconfigd
-r-xr-xr-x   1 root     sys      5808760 Jan 14 06:10 vxconfigd.orig
-r-xr-xr-x   1 root     sys      5594248 Apr  5 20:05 vxconfigd.SunOS_5.10
-r-xr-xr-x   1 root     sys      7246072 Apr  5 20:28 vxconfigd.SunOS_5.9


# cp vxconfigd vxconfigd.version     # plesae replace "version" with the current VxVM number

# mv  vxconfigd vxconfigd.orig

 
3.] Create the libumem wrapper script.
 

# vi vxconfigd

- Add the following line entry to the file:

UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1  /sbin/vxconfigd.orig $*


- Write and quit the file.
 

# cat vxconfigd
UMEM_DEBUG=default UMEM_LOGGING=transaction LD_PRELOAD=libumem.so.1  /sbin/vxconfigd.orig $*


# chmod 555 vxconfigd
# chown root:sys vxconfigd



4.]  Restart the VXVM process using the vxconfigd wrapper script, which calls the renamed vxconfigd.orig binary.

 

# vxdctl stop

# vxdctl mode
mode: not-running

# /sbin/vxconfigd -x syslog >/dev/null 2>&1

# vxdctl mode

mode: enabled

 

5.] Confirm the new vxconfigd process is running with the *.orig extension.


# ps -ef | grep vxconfigd

    root  7601  5489   0 15:03:17 pts/1       0:00 grep vxconfigd
    root  7305     1   0 14:58:51 ?           0:04 /sbin/vxconfigd.orig
 

# pgrep vxconfigd
7305

 
6.] Once the problem has been reproduced, ensure the memory leak has occurred (the larger the leak the better).

The Solaris command 'pmap' can be used to monitor memory usage for the given process, example shown below:

 

# pmap -x `pgrep vxconfigd` 

 
 
7.] Force the process to drop a core (specify the pid number accordingly to that identified using the pgrep command above):


Example:

# gcore 7305


8.] Capture findleaks information from the core:
 
 
# echo "::findleaks -dv" | mdb core.7305
 


As well as the output of the above command, send in core file the vxconfigd binary and shared libraries used by vxconfigd to Symantec Support for analysis.

 


What should not happen



Create a wrapper script to help trigger the potential memory leak.
 

# more loop.sh
while true
do
    vxdmpadm exclude vxvm dmpnodename=emcpower26s2
    date
    pmap -x `pgrep vxconfigd` | egrep '(heap|total)'
    vxdmpadm include vxvm dmpnodename=emcpower26s2
done


In this instance, the wrapper script "loop.sh", excludes and includes the same EMC TPD device "emcpower26s2" in a loop.

The date, time and related pmap output is captured to see if the memory usage is increasing.

 


# ./loop.sh
Wednesday,  6 April 2011 15:07:08 BST
003DC000   15280   15280   15272       - rwx--    [ heap ]
012C8000      72      72      72       - rwx--    [ heap ]
012DA000    4736    3432    3432       - rwx--    [ heap ]
0177A000      72      72      72       - rwx--    [ heap ]
0178C000      72      72      72       - rwx--    [ heap ]
0179E000     144     144     144       - rwx--    [ heap ]
017C2000     144     144     144       - rwx--    [ heap ]
017E6000      72      72      72       - rwx--    [ heap ]
total Kb   30432   28880   21168       -
Wednesday,  6 April 2011 15:07:22 BST
003DC000   15280   15280   15272       - rwx--    [ heap ]
012C8000      72      72      72       - rwx--    [ heap ]
012DA000    4736    3432    3432       - rwx--    [ heap ]
0177A000      72      72      72       - rwx--    [ heap ]
0178C000      72      72      72       - rwx--    [ heap ]
0179E000     144     144     144       - rwx--    [ heap ]
017C2000     144     144     144       - rwx--    [ heap ]
017E6000    2880    1752    1752       - rwx--    [ heap ]
01AB6000     144     144     144       - rwx--    [ heap ]
01ADA000     144     144     144       - rwx--    [ heap ]
01AFE000     144     136     136       - rwx--    [ heap ]
total Kb   33656   30968   23256       -
Wednesday,  6 April 2011 15:07:36 BST
pmap: cannot examine 7305: address space is changing
Wednesday,  6 April 2011 15:07:49 BST
003DC000   15280   15280   15272       - rwx--    [ heap ]
012C8000      72      72      72       - rwx--    [ heap ]
012DA000    4736    3432    3432       - rwx--    [ heap ]
0177A000      72      72      72       - rwx--    [ heap ]
0178C000      72      72      72       - rwx--    [ heap ]
0179E000     144     144     144       - rwx--    [ heap ]
017C2000     144     144     144       - rwx--    [ heap ]
017E6000    2880    1752    1752       - rwx--    [ heap ]
01AB6000     144     144     144       - rwx--    [ heap ]
01ADA000     144     144     144       - rwx--    [ heap ]
01AFE000    3048    1968    1968       - rwx--    [ heap ]
01DF8000      72      72      72       - rwx--    [ heap ]
01E0A000      72      72      72       - rwx--    [ heap ]
01E1C000     288     288     288       - rwx--    [ heap ]
01E64000      72      72      72       - rwx--    [ heap ]
01E76000    2808    1696    1696       - rwx--    [ heap ]
02134000      72      72      72       - rwx--    [ heap ]
02146000      72      72      72       - rwx--    [ heap ]
02158000     216     216     216       - rwx--    [ heap ]
0218E000     144     144     144       - rwx--    [ heap ]
total Kb   40344   35472   27760       -
Wednesday,  6 April 2011 15:08:03 BST
003DC000   15280   15280   15272       - rwx--    [ heap ]
012C8000      72      72      72       - rwx--    [ heap ]
012DA000    4736    3432    3432       - rwx--    [ heap ]
0177A000      72      72      72       - rwx--    [ heap ]
0178C000      72      72      72       - rwx--    [ heap ]
0179E000     144     144     144       - rwx--    [ heap ]
017C2000     144     144     144       - rwx--    [ heap ]
017E6000    2880    1752    1752       - rwx--    [ heap ]
01AB6000     144     144     144       - rwx--    [ heap ]
01ADA000     144     144     144       - rwx--    [ heap ]
01AFE000    3048    1968    1968       - rwx--    [ heap ]
01DF8000      72      72      72       - rwx--    [ heap ]
01E0A000      72      72      72       - rwx--    [ heap ]
01E1C000     288     288     288       - rwx--    [ heap ]
01E64000      72      72      72       - rwx--    [ heap ]
01E76000    2808    1696    1696       - rwx--    [ heap ]
02134000      72      72      72       - rwx--    [ heap ]
02146000      72      72      72       - rwx--    [ heap ]
02158000     216     216     216       - rwx--    [ heap ]
0218E000    3456    2360    2360       - rwx--    [ heap ]
total Kb   43688   37720   30008       -



 

Cause


S.D.R.F ( Symptom / Description / Resolution / Feature ) Content:


Incident no::2346469    Tracking ID ::2346470

Symptom:

Executing "vxdmpadm exclude vxvm dmpnodename=" can trigger a memory leak in vxconfigd's heap segment. vxconfigd's heap segment continues to grow while executing "vxdmpadm exclude vxvm dmpnodename=" and "vxdmpadm include vxvm dmpnodename=" in a loop. vxconfigd's heap does not shrink even after terminating the loop.


Description:

"vxdmpadm exclude vxvm dmpnodename=" is used to remove a device from VxVM's control.

Similarly, "vxdmpadm include vxvm dmpnodename=" is used to place a device under VxVM"s control.

1) While placing a device under VxVM's  control, a chunk of memory is allocated by vxconfigd to store VxVM specific meta-data specific to the device.
     Because of a defect, this chunk of memory is not freed while removing the device from VxVM's control using "vxdmpadm exclude".

2) While removing a device from VxVM's  control, a chunk of memory is allocated by vxconfigd as scratchpad to store details of the device being excluded from VxVM's control. 
     This chunk of memory is used as temporary storage during the execution of "vxdmpadm exclude". 
     Because of a defect, this scratchpad memory is not freed at the end of execution of "vxdmpadm exclude"


Resolution:

Both of the defects described in the Description section have been fixed as follows.

The chunk of memory allocated during "vxdmpadm include" operation to store VxVM specific meta-data specific to the device is freed during "vxdmpadm exclude" operation.

The scratchpad memory allocated during "vxdmpadm exclude" to store details of the device being excluded from VxVM's control is freed at the end of  "vxdmpadm exclude" operation.



 

Resolution

The product defect is resolved with 5.1 SP1 RP2 ( Solaris Sparc).

The patch is available from the Veritas Operation Readiness Tools website.

https://sort.Veritas.com/patch/matrix


Applies To

 

In order for the fix to work, 5.1 SP1 RP2 (Solaris Sparc) must be applied.


 

 

Issue/Introduction

This documents attempts to explain the issue discovered with Veritas Volume Manager (VxVM) 5.1 SP1, for which is addressed when applying VxVM 5.1 RP2 (Solaris Sparc).

A series of VxVM related memory leaks were identified when suppressing a Third Party Dynamic-Multipathing (TPD) (e.g. EMC PowerPath TPD, Solaris MPxIO) devices in loop with 5.1 SP1 (Solaris Sparc).

With 5.1 SP1 (Solaris Sparc), the vxdmpadm CLI operation fails to suppress the specified  TPD Veritas disk access name(s).

With 5.1 SP1 RP1 P1 HF5 (Solaris Sparc) for the VxVM, the user can now suppress the specified EMC TPD controlled device(s) from VxVM's view.


Background content:

The VxVM instantaneous device suppression feature was introduced in Veritas Volume Manager (VxVM) 5.0 MP3.

Note: From 5.0 MP3 onwards, It is no longer necessary to reboot the host in order to the exclude/suppress the specified Veritas disk access names (da) from the "vxdisk list" output.

Upon device exclusion, the specified Veritas disk access (da) name is dynamically removed from the VxVM CLI "vxdisk list" device listing.


The product functionality initially designed in Veritas Volume Manager 5.0 MP3 does not function in the same way as the 5.1 release, until 5.0 MP3 RP4 HF1 (Solaris) has been applied.
The Veritas Volume Manager 5.1 release works as designed, and does not suffer from the product defect encountered in 5.0 MP3 in relation to DMP controlled devices. 
  See Veritas Article: 000007823 for more details in relation to the 5.0 MP3 related defect.
Veritas Volume Manager 5.0 MP3 RP4 HF1 (Solaris) enables LUN suppression and inclusion using "vxdmpadm include/exclude vxvm dmpnodename=<da-name>"

  Prior to the installation of the Veritas Volume Manager (VxVM) 5.0 MP3 RP4 HF1 patch, a product design oversight was encountered. The user was required to disable each path related to the Veritas disk access (da) name.


Veritas Volume Manager (VxVM) 5.0 MP3 RP4 HF1 (Solaris) enables the user to suppress and include a LUN when using vxdmpadm CLI command "vxdmpadm [ include | exclude ] vxvm dmpnodename=<da-name>" in relation to DMP controlled devices.  

Additional Information

ETrack: 2346470