Looking somewhat before in the messages ... :
Jan 9 12:26:30 host4 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 1149 ticks
Jan 9 12:26:30 host4 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 1149 ticks
..........................
Jan 9 12:28:33 host4 Had[13119]: [ID 702911 daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB (10 seconds)
Jan 9 12:28:33 host4 Had[13119]: [ID 702911 daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB (10 seconds)
Jan 9 12:28:33 host4 Had[13119]: [ID 702911 daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB (10 seconds)
..........................
Jan 9 12:28:42 host4 gab: [ID 854858 kern.notice] GAB INFO V-15-1-20124 timer not called for 28 seconds
Jan 9 12:28:42 host4 gab: [ID 854858 kern.notice] GAB INFO V-15-1-20124 timer not called for 28 seconds
Jan 9 12:28:42 host4 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 2714 ticks
Jan 9 12:28:42 host4 llt: [ID 678236 kern.notice] LLT INFO V-14-1-10035 timer not called for 2714 ticks
Jan 9 12:28:42 host4 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (nxge2) node 0 inactive 12 sec (142455498)
Jan 9 12:28:42 host4 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (nxge2) node 0 inactive 12 sec (142455498)
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (nxge2) node 0 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (nxge2) node 0 in trouble
Jan 9 12:28:42 host4 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (nxge10) node 0 inactive 13 sec (107520707)
Jan 9 12:28:42 host4 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (nxge10) node 0 inactive 13 sec (107520707)
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (nxge10) node 0 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (nxge10) node 0 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 2 (lowpri) node 0 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 2 (lowpri) node 0 in trouble
Jan 9 12:28:42 host4 llt: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (lowpri) node 0. 4 more to go.
Jan 9 12:28:42 host4 llt: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (lowpri) node 0. 4 more to go.
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (nxge2) node 1 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 0 (nxge2) node 1 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (nxge10) node 1 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 1 (nxge10) node 1 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 2 (lowpri) node 1 in trouble
Jan 9 12:28:42 host4 llt: [ID 140958 kern.notice] LLT INFO V-14-1-10205 link 2 (lowpri) node 1 in trouble
Jan 9 12:28:42 host4 llt: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (nxge10) node 1. 4 more to go.
Jan 9 12:28:42 host4 llt: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 1 (nxge10) node 1. 4 more to go.
Jan 9 12:28:42 host4 llt: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (lowpri) node 1. 4 more to go.
Jan 9 12:28:42 host4 llt: [ID 592107 kern.notice] LLT INFO V-14-1-10510 sent hbreq (NULL) on link 2 (lowpri) node 1. 4 more to go.
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (nxge2) node 0 active
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (nxge2) node 0 active
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (nxge2) node 1 active
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (nxge2) node 1 active
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (nxge2) node 2 active
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 0 (nxge2) node 2 active
Jan 9 12:28:42 host4 llt: [ID 860062 kern.notice] LLT INFO V-14-1-10024 link 1 (nxge10) node 0 active
Not very informative. However, gives a clue that affected node has some issues from LLT network or link side, as the affected system's logs prints issues with LLT link for all other node.
Now having a look at one of the other node's logs, reveals that all the link to host4 become inactive for 15 secs and then declared as expired. :
>>>> link2 (lowpri)
Jan 9 12:28:29 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (lowpri) node 4 inactive 14 sec (23919368)
Jan 9 12:28:29 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 2 (lowpri) node 4 inactive 14 sec (23919368)
....
Jan 9 12:28:30 host2 llt: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 2 (lowpri) node 4 expired
Jan 9 12:28:30 host2 llt: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 2 (lowpri) node 4 expired
>>>> link1 (nxge10)
Jan 9 12:28:30 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (nxge10) node 4 inactive 14 sec (106635010)
Jan 9 12:28:30 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 1 (nxge10) node 4 inactive 14 sec (106635010)
Jan 9 12:28:30 host2 llt: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 1 (nxge10) node 4 expired
Jan 9 12:28:30 host2 llt: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 1 (nxge10) node 4 expired
>>>> link0 (nxge2)
Jan 9 12:28:36 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (nxge2) node 4 inactive 14 sec (141415768)
Jan 9 12:28:36 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (nxge2) node 4 inactive 14 sec (141415768)
Jan 9 12:28:37 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (nxge2) node 4 inactive 15 sec (141415790)
Jan 9 12:28:37 host2 llt: [ID 487101 kern.notice] LLT INFO V-14-1-10032 link 0 (nxge2) node 4 inactive 15 sec (141415790)
Jan 9 12:28:37 host2 llt: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 0 (nxge2) node 4 expired
Jan 9 12:28:37 host2 llt: [ID 205468 kern.notice] LLT INFO V-14-1-10509 link 0 (nxge2) node 4 expired
The above logs show that the affected node faced network partition (all the LLT links become inactive and so expired after 15 sec.s) , hence causing the system panic.
Please check for any link-down or similar OS messages are printed in OS log file.
The same can happen if the system is hung (even for a couple of minutes) which makes the LLT un-responsive.
Applies To
5 node 5.0 MP3RP1 cluster on Solaris 10 platform.