VVR RVG role migration returned error 448 due to socket communication issue between master/slave nodes

book

Article ID: 100049519

calendar_today

Updated On:

Description

Error Message

2020-12-23 03:45:34 hostDR03 VxVM VVR Notice V-5-20-6505 CmdMgr TAG_D Attempting to execute command: "/usr/sbin/vradmin -g P8CMSDG -s migrate test_RVG", locale = C

2020-12-23 03:45:34 P8CMSDR03 VxVM VVR Notice V-5-20-6589 CmdMgr TAG_D Command returned ERROR code 448, errTokens = 10.10.14.12

 

2020-12-23 02:20:30 VxVM VVR Warning V-5-20-0 TAG_C IpmHandle::open: select error(select returned numactfds = 0, errno: Error 0, setting  errno to ETIMEDOUT)

2020-12-23 02:20:30 hostDCPR01 VxVM VVR Warning V-5-20-6557 IpmHandle TAG_C Cannot connect to hostDCPR01 on port 8199

Below following error seen from vradmin repstatus command output

Config Errors:
10.10.14.12: vradmind not reachable on cluster peer

Cause

At the time of migration, the secondary site master vradmind (hostDCPR01 ) was unable to communicate with slave vradmind (hostDCPR02) due to some socket communication issues on vradmind  port 8199

 

By analyzing the tcpdump output we have a probable cause for the failback migration issue. The following entry (highlighted below ) in /etc/hosts of the primary site master node is causing the configuration issue.

Here is the current snip of the /etc/hosts file in the customer environment on node P8CMSPR01.

127.0.0.1 loopback localhost localhost.localdomain        hostDCPR01  # loopback (lo0) name/address

The hostname(hostDCPR01  ) mentioned in the loopback in /etc/hosts file is causing the issue.

Resolution

By removing the hostname from the loopback entry in the /etc/hosts.

Before:

127.0.0.1                            loopback localhost          localhost.localdomain    hostDCPR01            # loopback (lo0) name/address

After:

127.0.0.1                            loopback localhost          localhost.localdomain    # loopback (lo0) name/address

 

By doing the above change the cluster peer nodes could connect through 8199 port and the configuration issue was resolved. The TCP dump also shows the same

Veritas has concluded that this is not a product bug but a configuration issue.

Issue/Introduction

While processing the RVG role migration, the secondary site returned error 448 due to socket communication issue between master/slave nodes of secondary site.