Slave node fails to join the cluster

book

Article ID: 100002483

calendar_today

Updated On:

Description

Error Message

VxVM vxconfigd ERROR V-5-1-11092 cleanup_client: (Format error in disk private region) ###
VxVM vxconfigd ERROR V-5-1-11467 kernel_fail_join() : Reconfiguration interrupted: Reason is retry to add a node failed (##, #)
 

Resolution

Slave node fails to join the cluster even though all the disks are visible and below errors are logged:

VxVM vxconfigd NOTICE V-5-1-7899 CVM_VOLD_CHANGE command received
VxVM vxconfigd ERROR V-5-1-11092 cleanup_client: (Format error in disk private region) ###
VxVM vxconfigd ERROR V-5-1-11467 kernel_fail_join() :           Reconfiguration interrupted: Reason is retry to add a node failed (##, #)
VxVM vxconfigd NOTICE V-5-1-7901 CVM_VOLD_STOP command received


CAUSE:

This issue can occur if there are any disks in failed state on the master node. This could have resulted from a transient SAN issue.

On Master node
# vxdisk -o alldgs list
DEVICE       TYPE            DISK         GROUP        STATUS
c1t1d0s2     auto:sliced    rootdisk       rootdg       online
.
c3t4d0s2     auto:cdsdisk   appdg01        appdg        online shared
c3t5d0s2     auto:cdsdisk   appdg02        appdg        online shared
c3t6d0s2     auto:cdsdisk   appdg03        appdg        online shared
c3t7d0s2     auto:cdsdisk   -                 (appdg)      online shared
c3t8d0s2     auto:cdsdisk   -                (appdg)       online shared
c3t9d0s2     auto:cdsdisk   appdg06        appdg        online shared
c3t9d0s2     auto:cdsdisk   appdg06        appdg        online shared
- - appdg04        appdg         failed was:c3t7d0s2
- -   appdg05        appdg         failed was:c3t8d0s2


SOLUTION:

1. Fix the failed disk issue on the master.

i.) Ensure that the disk are visible and online
# vxdisk list

ii.) Check and reattach the failed disks
# /etc/vx/bin/vxreattach -c $DM
# /etc/vx/bin/vxreattach -br $DM

iii.) Check the diskgroup and fix any volume issues
# vxprint -htg $Diskgroup

iv.) Flush the diskgroup to ensure on-disk structures are current.
# vxdg flush $Diskgroup

2. Then start CVM on the Slave node.
# hagrp -online cvm -sys $SYS
 
 

 

Issue/Introduction

Slave node fails to join the cluster