/proc/sysrq-trigger is often used to obtain kernel thread list on a Linux system. It has been found that this can lead to major problems (panics) on systems running Cluster Server.
If sysrq is used to obtain a thread list, the LLT (Low Latency Transport) driver running on the system will not be able to send heartbeats until the thread list has completed.
The execution of the thread list is commonly found to be longer than the LLT peerinact timeout (defines the period of heartbeat inactivity before a heartbeat link is declared inactive), which means that the other nodes in the cluster will declare it to be dead and remove it from membership. When the thread list does complete, the node will again be able to send LLT heartbeats, but as it is no longer in the cluster the system will panic. If the timeouts are set high enough for the thread list to complete, then the whole cluster is basically paused while the thread list completes due to waiting for responses.
This applies to both RHEL (Red Hat Enterprise Linux) and SLES (SuSE Linux Enterprise Server).
If it essential to obtain the thread list, the following workaround is available which involves setting the LLT peerinact value such that it never times out and changing the value of printk (affects kernel messages to console, dmesg & syslog)
Workaround
1) Obtain current setting of peerinact and record for future use (default is 1600):
# lltconfig -T query | grep peerinact
2) Set peerinact value to a large vaule (for example 180000 is 30 minutes) and verify it has been changed:
# lltconfig -T peerinact:180000
# lltconfig -T query | grep peerinact
Please don't set the peerinact timer to higher than 214748. Due to Etrack incident 3304583, any peerinact timer value higher than 214748 will cause an internal kernel variable to overflow and may cause the LLT links to disconnect.
3) Save current printk setting:
# cat /proc/sys/kernel/printk > /tmp/printk
4) Set printk value to 0
# echo 0 > /proc/sys/kernel/printk
5) Enable SysRq if not enabled
# echo 1 > /proc/sys/kernel/sysrq
6) Take the thread list
# echo t > /proc/sysrq-trigger
7) Reset the original printk value
# cat /tmp/printk > /proc/sys/kernel/printk
8) Reset the peerinact value to the original recorded in (1):
# lltconfig -T peerinact:
# lltconfig -T query | grep peerinact
9) Disable sysrq if required
# echo 0> /proc/sys/kernel/sysrq
NOTE The cluster's ability to respond to network/nodes failures is affected whilst the peerinact value is set to a high value.