A Solaris DTrace script to troubleshoot "LLT INFO V-14-1-10035 timer not called for 105 ticks" messages

book

Article ID: 100005634

calendar_today

Updated On:

Description

Error Message

LLT INFO V-14-1-10035 timer not called for 105 ticks

Cause

The informational messages are logged if the Solaris kernel does not run the LLT thread for more than 100 system clock ticks (equivalent to a total of 1 second) after the LLT thread is scheduled.  LLT schedules itself to run by calling the Solaris kernel function timeout(9F) to set a timer of 5 system clock ticks.   (Each system clock tick is 10ms, 5 system clock ticks is 50ms.)  According to the timeout(9F) manual page, the exact time interval over which the scheduled thread will run is not guaranteed.   The following is extracted from the timeout(9F) manual page.

------------- Extracted from timeout(9F) -------------
DESCRIPTION
     The timeout() function schedules the specified  function  to
     be  called  after  a specified time interval. The exact time
     interval over which  the  timeout  takes  effect  cannot  be
     guaranteed, but the value given is a close approximation.

     The function called by timeout() must  adhere  to  the  same
     restrictions as a driver soft interrupt handler.

     The delay(9F) function calls timeout(). Because timeout() is
     subject  to priority inversion, drivers waiting on behalf of
     processes   with   real-time    constraints    should    use
     cv_timedwait(9F) rather than delay().
-------------- End extracted from timeout(9F) ----------------
 

When the Solaris kernel runs the LLT thread, LLT will check the time lapsed between each run of itself.   LLT will log a message like the following one if the kernel doesn't run the LLT thread for more than 100 system clock ticks.  

LLT INFO V-14-1-10035 timer not called for 105 ticks

 

Resolution

Usually those LLT informational messages are caused by a busy system or an unstable system where the kernel cannot schedule the threads to run efficiently.  The following is a Dtrace script to monitor the time interval over which the LLT thread is called after the timeout(9F) kernel function is called.

llt_timer_not_called.d:

------------------ Begin of dtrace script ---------------
#!/usr/sbin/dtrace -s
#pragma D option quiet

dtrace:::BEGIN
{
prev_timeout_calltime = timestamp;
}

fbt:llt:llt_timer_handler:entry
{
self->handler_entrytime = timestamp;
self->timeout_time = self->handler_entrytime - prev_timeout_calltime;

@a = lquantize(self->timeout_time/1000000,0,5000,50);

self->TRACE = 1;
}

fbt::timeout:return
/self->TRACE == 1/
{
prev_timeout_calltime = timestamp;
}

fbt:llt:llt_timer_handler:return
{
self->TRACE=0;
}

tick-30s
{
printf("%Y\n", walltimestamp);
printa(@a);
}
-------------- End of dtrace script -----------------

The above script print a summary of the distribution of the time intervals every 30 seconds.   For example,

# ./llt_timer_not_called.d 
2011 Mar 18 13:40:27

           value  ------------- Distribution ------------- count   
             < 0 |                                         0       
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   571     
              50 |@@                                       29      
             100 |                                         0       

 
2011 Mar 18 13:40:57
           value  ------------- Distribution ------------- count   
             < 0 |                                         0       
               0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1119    
              50 |@                                        42      
             100 |                                         0       
             150 |                                         5       
             200 |                                         5       
             250 |                                         0       
.....
 

In the last record, it means from the time the script is started, there are 1119 times where the LLT thread is run within 50ms after the timeout(9F) is called, 42 times the LLT thread is run between 50ms and 100ms after timeout(9F) is called, 5 times the LLT thread is run between 150ms to 200ms after timeout(9F) is called, and so forth.

If it is suspected the LLT thread is not scheduled to run by the kernel for a significant period time more than the 50ms as requested in the timeout(9F) kernel function, Oracle (formerly Sun Microsystems) should be contacted to find out why the kernel cannot schedule the LLT thread to run as requested in the timeout(9F) function call.

 


Applies To

Solaris 10 or above (where dtrace is available)

Veritas Cluster Server (VCS)

Issue/Introduction

Veritas Cluster Server (VCS)  Low Latency Transport (LLT)  logs the below informational messages in /var/adm/messages. LLT INFO V-14-1-10035 timer not called for 105 ticks