Offline of VCS IP resource causes another IP resource on the same bonded NIC to fault.

book

Article ID: 100027205

calendar_today

Updated On:

Description

Error Message

From engine logs:

2012/04/28 19:27:45 VCS NOTICE V-16-1-10300 Initiating Offline of Resource IP_test1 (Owner: Unspecified, Group: test1) on System rh5u6n02
2012/04/28 19:27:46 VCS INFO V-16-1-10305 Resource IP_test1 (Owner: Unspecified, Group: test1) is offline on rh5u6n02 (VCS initiated)
2012/04/28 19:27:46 VCS NOTICE V-16-1-10446 Group test1 is offline on system rh5u6n02
2012/04/28 19:28:40 VCS ERROR V-16-2-13067 (rh5u6n02) Agent is calling clean for resource(IP_test2) because the resource became OFFLINE unexpectedly, on its own.

Cause

Different netmask entries on base Bonded NIC and VIPs.

Resolution

ifconfig -a output:

bond0     Link encap:Ethernet  HWaddr 00:50:56:8D:01:DD
          inet addr: x.x.x.12  Bcast:10.208.19.255  Mask:255.255.252.0    <<<<<<<<          inet6 addr: fe80::250:56ff:fe8d:1dd/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:16153410 errors:0 dropped:0 overruns:0 frame:0
          TX packets:161380 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3240547042 (3.0 GiB)  TX bytes:24031736 (22.9 MiB)

bond0:0   Link encap:Ethernet  HWaddr 00:50:56:8D:01:DD
          inet addr:x.x.x.89  Bcast:0.0.0.0  Mask:255.255.255.0                  <<<<<<<<          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0:1   Link encap:Ethernet  HWaddr 00:50:56:8D:01:DD
          inet addr:x.x.x.90  Bcast:0.0.0.0  Mask:255.255.255.0                   <<<<<<<<          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

bond0:2   Link encap:Ethernet  HWaddr 00:50:56:8D:01:DD
          inet addr:x.x.x.91  Bcast:0.0.0.0  Mask:255.255.255.0                    <<<<<<<<          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
 

From debug logs:

2012/04/28 21:37:27 VCS DBG_1 V-16-50-0 IP:IP_test1:monitor:device bond0 address x.x.x.89 netmask 255.255.255.0
    IP.C:ip_monitor[200]
2012/04/28 21:37:27 VCS DBG_1 V-16-50-0 IP:IP_test1:monitor:Number of Interfaces: 6
    IP.C:ip_monitor[228]
2012/04/28 21:37:27 VCS DBG_5 V-16-50-0 IP:IP_test1:monitor:Gathering status of device bond0
    IP.C:ip_monitor[249]
2012/04/28 21:37:27 VCS DBG_5 V-16-50-0 IP:IP_test1:monitor:Interface bond0 address does not match    <<<<<<<    IP.C:get_ipv4_status[363]

Description:


So, on this setup, the bond0 interface hosts two networks - x.x.x.12 /22 (via bond0) and x.x.x.89/24 (via bond0:0).
Note that since bond0:0 is the first IP for x.x.x.89/24 network, it becomes primary IP for that network. All other VIPs on bond0:1 through
bond0:2 become secondary IPs for x.x.x.89/24 network. As per Linux operating system's network design, whenever a primary IP for a
network is removed from an interface, all secondary IPs on that network that are plumbed on the same device are automatically removed.
So when the customer attempted to offline the IP resource that was online on bond0:0, the operating system removed all secondary VIPs automatically. Due to this, all other IP resources reported FAULT.

The resolution to this issue is to correct the netmask of bond0 interface. You can get it achieved via following steps:
1. Offline all IP resources
2. Either correct the bond0 configuration (possibly within the ifcfg-file) and set correct netmask (255.255.255.0 or prefix length 24) or set netmask for IP resources.
3. Restart network services to get proper config on bond0.
4. Once the NIC resource detects it as ONLINE, you can online all IP resources.

With this, the bond0 will host primary IP and all IP resources go online as secondary IPs for the same network. So offline of any VIP will not affect any other VIP.


Applies To

rhel 5u6 with SFHA 5.1SP1GA

Issue/Introduction

There are multiple VIPs plumbed over a Bonded NIC by IP resources and when you try to offline one of IP resources it faults other VIPs plumbed over that particular  Bonded NIC. [root@rh5u6n02 ~]# hares -offline IP_test1 -sys rh5u6n02
[root@rh5u6n02 ~]# hares -state IP_test1
#Resource      Attribute             System      Value
IP_test1 State                 rh5u6n01 ONLINE
IP_test2 State                 rh5u6n02 FAULTED