With DMP in the Control and/or IO domains managing the multiple paths, Solaris 10 Guest domain experiences slow I/O performance

book

Article ID: 100015503

calendar_today

Updated On:

Description

Error Message

The disks in the Guest domain are 100% busy as reported by iostat.

    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
2501.4    3.2   19.5    0.0  0.0 10.5    0.0    4.2   2 100 c0d10
2563.4    1.1   20.0    0.0  0.0 18.9    0.0    7.4   3 100 c0d22
 

lockstat shows lock contention in the VDC driver.

# /usr/sbin/lockstat -kIW -i 977 -D 20 -s 40 /usr/bin/sleep 2

Fri Nov 28 13:30:28 MSK 2014 - started
Profiling interrupt: 188338 events in 2.038 seconds (92426 events/sec)
...
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
2907   2%  89% 0.00     2370 cpu[6]                 wrpause_delay
 
      nsec ------ Time Distribution ------ count     Stack
      1024 |                               41        default_lock_delay
      2048 |@@@@@@@@@                      893       mutex_vector_enter
      4096 |@@@@@@@@@@@@@@@@@@@            1933      vdc_send_request
      8192 |                               38        vdc_do_op
     16384 |                               1         vdc_strategy             <<< VDC
     32768 |                               0         gendmpstrategy
     65536 |                               0         voldiskiostart
    131072 |                               1         vol_subdisksio_start
                                                     volkcontext_process
                                                     volkiostart
                                                     vol_fsvm_strategy
                                                     vx_dev_strategy
                                                     vx_dio_physio
                                                     vx_dio_rdwri
                                                     fdd_rw
                                                     fdd_odm_rw
                                                     odm_vx_io
                                                     odm_io_sync
                                                     odm_io
                                                     odm_io_stat
                                                     odmioctl
                                                     fop_ioctl
                                                     ioctl
                                                     syscall_trap
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
2790   1%  90% 0.00     2244 cpu[14]                wrpause_delay
 
      nsec ------ Time Distribution ------ count     Stack
      1024 |                               44        default_lock_delay
      2048 |@@@@@@@@@@@                    1026      mutex_vector_enter
      4096 |@@@@@@@@@@@@@@@@@@             1694      vdc_send_request
      8192 |                               26        vdc_do_op
                                                     vdc_strategy
                                                     gendmpstrategy
                                                     voldiskiostart
                                                     vol_subdisksio_start
                                                     volkcontext_process
                                                     volkiostart
                                                     vol_fsvm_strategy
                                                     vx_dev_strategy
                                                     vx_dio_physio
                                                     vx_dio_rdwri
                                                     fdd_write_end
                                                     fdd_rw
                                                     fdd_odm_rw
                                                     odm_vx_aio
                                                     odm_vx_io
                                                     odm_io_issue
                                                     odm_io_start
                                                     odm_io_req
                                                     odm_request_io
                                                     odm_io
                                                     odm_io_stat
                                                     odmioctl
                                                     fop_ioctl
                                                     ioctl
                                                     syscall_trap
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
2576   1%  92% 0.00     2274 cpu[3]                 wrpause_delay
 
      nsec ------ Time Distribution ------ count     Stack
      1024 |                               40        current_thread
      2048 |@@@@@@@@@@                     931       default_lock_delay
      4096 |@@@@@@@@@@@@@@@@@@             1573      mutex_vector_enter
      8192 |                               31        vdc_send_request
     16384 |                               0         vdc_do_op
     32768 |                               1         vdc_strategy
                                                     gendmpstrategy
                                                     voldiskiostart
                                                     vol_subdisksio_start
                                                     volkcontext_process
                                                     volkiostart
                                                     vol_fsvm_strategy
                                                     vx_dev_strategy
                                                     vx_dio_physio
                                                     vx_dio_rdwri
                                                     fdd_write_end
                                                     fdd_rw
                                                     fdd_odm_rw
                                                     odm_vx_aio
                                                     odm_vx_io
                                                     odm_io_issue
                                                     odm_io_start
                                                     odm_io_req
                                                     odm_request_io
                                                     odm_io
                                                     odm_io_stat
                                                     odmioctl
                                                     fop_ioctl
                                                     ioctl
                                                     syscall_trap
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller
2556   1%  93% 0.00     2394 cpu[6]                 wrpause_delay
 
      nsec ------ Time Distribution ------ count     Stack
      1024 |                               25        current_thread
      2048 |@@@@@@@@@                      817       default_lock_delay
      4096 |@@@@@@@@@@@@@@@@@@@            1665      mutex_vector_enter
      8192 |                               30        vdc_send_request
     16384 |                               0         vdc_do_op
     32768 |                               1         vdc_strategy
     65536 |                               2         gendmpstrategy
                                                     voldiskiostart
                                                     vol_subdisksio_start
                                                     volkcontext_process
                                                     volkiostart
                                                     vol_fsvm_strategy
                                                     vx_dev_strategy
                                                     vx_dio_physio
                                                     vx_dio_rdwri
                                                     fdd_rw
                                                     fdd_odm_rw
                                                     odm_vx_io
                                                     odm_io_sync
                                                     odm_io
                                                     odm_io_stat
                                                     odmioctl
                                                     fop_ioctl
                                                     ioctl
                                                     syscall_trap
 

# /usr/sbin/lockstat -CcwP -n 100000 -x aggrate=10hz -D 20 -s 40 /usr/bin/sleep 2

Fri Nov 28 13:30:24 MSK 2014 - started
Adaptive mutex spin: 88222 events in 2.055 seconds (42934 events/sec)        <<< total 88222 events
 
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Hottest Caller
31044  46%  46% 0.00   219854 0x300165fa980          vdc_process_data_msg+0x4     <<< 31044 out of 88222 events
 
      nsec ------ Time Distribution ------ count     Stack
       256 |                               3         vdc_process_msg_thread+0x5e0          <<< VDC
       512 |                               58        thread_start+0x4
      1024 |                               86
      2048 |                               197
      4096 |                               337
      8192 |@                              1550
     16384 |@@@@@@@                        7945
     32768 |@@@@@                          5911
     65536 |@@@@@                          5215
    131072 |@@@@                           4387
    262144 |@@                             2310
    524288 |                               860
   1048576 |                               550
   2097152 |                               679
   4194304 |                               612
   8388608 |                               328
  16777216 |                               16                <<< spun for up to 16ms !!!
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Hottest Caller
30334  44%  91% 0.00   215390 0x300168a63c0          vdc_process_data_msg+0x4
 
      nsec ------ Time Distribution ------ count     Stack
       256 |                               4         vdc_process_msg_thread+0x5e0
       512 |                               47        thread_start+0x4
      1024 |                               109
      2048 |                               167
      4096 |                               330
      8192 |@                              1562
     16384 |@@@@@@@                        7745
     32768 |@@@@@                          5658
     65536 |@@@@@                          5273
    131072 |@@@@                           4196
    262144 |@@                             2305
    524288 |                               839
   1048576 |                               603
   2097152 |                               617
   4194304 |                               533
   8388608 |                               330
  16777216 |                               16
-------------------------------------------------------------------------------
 
 
# /usr/sbin/lockstat -HcwP -n 100000 -x aggrate=10hz -D 20 -s 40 /usr/bin/sleep 2

Fri Nov 28 13:30:34 MSK 2014 - started
Adaptive mutex hold: 2146042 events in 3.771 seconds (569020 events/sec)
 
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Hottest Caller
14526  12%  12% 0.00   124503 0x300165fa980          vdc_send_request+0x224
 
      nsec ------ Time Distribution ------ count     Stack
      8192 |                               35        vdc_do_op+0x64            <<< VDC
     16384 |                               77        vdc_strategy+0xcc
     32768 |                               117       gendmpstrategy+0x550
     65536 |@@@@@@@@                       4247      voldiskiostart+0x56c
    131072 |@@@@@@@@@@                     4867      vol_subdisksio_start+0x1f0
    262144 |@@@@@@@@                       4265      volkcontext_process+0xcc
    524288 |@                              856       volkiostart+0xea8
   1048576 |                               40        vol_fsvm_strategy+0x128
   2097152 |                               21        vx_dev_strategy+0x84
   4194304 |                               0         vx_dio_physio+0x664
   8388608 |                               0         vx_dio_rdwri+0x638
  16777216 |                               0         fdd_rw+0x590
  33554432 |                               1         fdd_odm_rw+0x304                        
                                                     odm_vx_io+0x74
                                                     odm_io_sync+0x360
                                                     odm_io+0x2ac
                                                     odm_io_stat+0x154
                                                     odmioctl+0x1a0
                                                     fop_ioctl+0x2c
                                                     ioctl+0x184
                                                     syscall_trap+0xac
-------------------------------------------------------------------------------
Count indv cuml rcnt     nsec Lock                   Hottest Caller
11952   9%  21% 0.00   105948 0x300168a63c0          vdc_send_request+0x224
 
      nsec ------ Time Distribution ------ count     Stack
     65536 |@@@@@@@@@@                     4247      vdc_do_op+0x64
    131072 |@@@@@@@@@@@@                   4832      vdc_strategy+0xcc
    262144 |@@@@@@                         2420      gendmpstrategy+0x550
    524288 |@                              411       voldiskiostart+0x56c
   1048576 |                               24        vol_subdisksio_start+0x1f0
   2097152 |                               9         volkcontext_process+0xcc
   4194304 |                               0         volkiostart+0xea8
   8388608 |                               0         vol_fsvm_strategy+0x128
  16777216 |                               0         vx_dev_strategy+0x84
  33554432 |                               1         vx_dio_physio+0x664
                                                     vx_dio_rdwri+0x638
                                                     fdd_rw+0x590
                                                     fdd_odm_rw+0x304
                                                     odm_vx_io+0x74
                                                     odm_io_sync+0x360
                                                     odm_io+0x2ac
                                                     odm_io_stat+0x154
                                                     odmioctl+0x1a0
                                                     fop_ioctl+0x2c
                                                     ioctl+0x184
                                                     syscall_trap+0xac

Cause

The problem only affects Solaris 10 Guest domain.   Solaris 11 has improved implementation of VDC and doesn't experience the problem.

Please refer to the "Symantec Storage Foundation and High Availability Solutions Virtualization Guide - Solaris" for details of supported Storage Foundation models.   The following is the Virtualization Guide from the latest SF/HA version 6.2.

http://www.symantec.com/business/support/resources/sites/BUSINESS/content/live/DOCUMENTATION/7000/DOC7880/en_US/sfhas_virtualization_62_sol.pdf

The problem described in this article mainly affects the Guest-based Storage Foundation stack model with Solaris 10 Guest domain and with DMP in the Control and/or I/O domains managing the multiple paths to the disks.

The slow Guest I/O performance can be caused by the Guest domain VDC driver contends for a few number of virtual devices.   With DMP in the Control and/or IO domains managing the multiple paths, there will be less number of OS devices bound to the Guest domain.   Under heavy Guest domain I/Os load on just a few virtual disks, the Guest domain VDC driver cannot handle the heavy I/Os and causes individual virtual disk to exhibit 100% busy.

Resolution

Please upgrade the Guest domain to Solaris 11 because Solaris 11 VDC has much improved performance comparing to the Solaris 10 VDC.

If you must use Solaris 10 Guest domain (for example, because of some application restriction), please use a Solaris 11 Guest domain and create a Solaris 10 Branded Zone for the application.    In this model, the actual VDC driver used is the improved Solaris 11 one.

Similar issue is reported in the following Oracle Doc ID.  

Oracle Doc ID 1628563.1 - Write I/O Performance in Oracle VM Server for SPARC (LDoms) Guest Domains Degrades with High Average Service Time

But in one customer case, the Solaris 10 OS patch didn't help and the lockstat still showed high lock contentions.   According to Oracle, the performance difference between Solaris 10 and 11 may be due to the following request for enhancement.

RFE (request for enhancement) 15288596 SUNBT6328213 BP_MAPIN () SHOULD USE KPM

The change was to improve the I/O performance and was relied on the new capabilities of Solaris 11.  The porting of this change onto Solaris 10 is not possible because it will require too extensive changes in the Solaris kernel code.

Issue/Introduction

This may not be a problem with DMP.   The problem can be caused by the Solaris 10 LDOM Virtual Disk Client (VDC) driver.