Rpool becomes faulty/degrades on every reboot and resilvered automatically.

book

Article ID: 100045764

calendar_today

Updated On:

Description

Error Message

Jun 18  11:08:13 server101 IMPACT: Fault tolerance of the pool may be compromised.
Jun 18  11:08:13 server101 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-LR for the latest service procedures and policies regarding this diagnosis.


Jun 18  11:08:43 server101 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-QJ, TYPE: Fault, VER: 1, SEVERITY: Minor
Jun 18  11:08:43 server101 EVENT-TIME: Mon Jun 18  11:08:43 MSK 2019

Jun 18  11:08:43 server101 PLATFORM: SPARC-T7-4, CSN: unknown, HOSTNAME: server101
Jun 18  11:08:43 server101 SOURCE: zfs-diagnosis, REV: 1.0
Jun 18  11:08:43 server101 EVENT-ID: 6eac1493-ac06-46b5-b7de-e96124a6c10e
Jun 18  11:08:43 server101 DESC: Missing data on ZFS device 'id1,dmp@n60012440f48e08000000000000000252/n60012440f48e08000000000000000252-a' in pool 'rpool'. Applications are unaffected if sufficient replicas exist.
Jun 18  11:08:43 server101 AUTO-RESPONSE: An attempt will be made automatically to recover the data. The device and pool will be degraded.
Jun 18  11:08:43 server101 IMPACT: The device and pool may continue functioning in degraded state until data is recovered.
Jun 18  11:08:43 server101 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-QJ for the latest service procedures and policies regarding this diagnosis.

Note : rpool is automatically resilvering. This problem obseved on Solaris 11 after each reboot.

When customer disables dmp native support,these messages are not seen and also resilvering is not happening.

-bash-4.1# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices are unavailable in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or 'fmadm repaired', or replace the device
        with 'zpool replace'.
        Run 'zpool status -v' to see device specific details.
  scan: resilvered 31.1M in 1s with 0 errors on Thu Jun  18

config:

        NAME                        STATE     READ WRITE CKSUM
        rpool                       DEGRADED     0     0     0
          mirror-0                  DEGRADED     0     0     0
            tagmastore-usp0_0044s6  ONLINE       0     0     0
            tagmastore-usp0_0043s6  UNAVAIL      0     0     0

Cause

Rpool was seen in DEGRADED state since the DMP devices were not available on reboot when rpool is imported. 

Resolution

With dmp_native_support in enabled state, the command "zpool clear rpool" is executed as part of the vxvm-startup2 script. It was specifically added through the internal incident 3799663 to fix the failure related to mirrored rpools. 

No problems observed with IS 7.4.1.

Applies to

This problem seen on SF 6.2.1. Fixed with Private Hot Fix - VM 6.2.1.8301. This patch will be available through Veritas support, after proper validation of the problem. 

Issue/Introduction

Rpool becomes faulty/degrades on every reboot and resilvered automatically. This problem observed after enabling dmp_native_support.

Error Message

Jun 18 11:08:13 server101 IMPACT: Fault tolerance of the pool may be compromised.
Jun 18 11:08:13 server101 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-LR for the latest service procedures and policies regarding this diagnosis.
Jun 18 11:08:43 server101 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-QJ, TYPE: Fault, VER: 1, SEVERITY: Minor
Jun 18 11:08:43 server101 EVENT-TIME: Mon Jun 18 11:08:43 MSK 2019 Jun 18 11:08:43 server101 PLATFORM: SPARC-T7-4, CSN: unknown, HOSTNAME: server101
Jun 18 11:08:43 server101 SOURCE: zfs-diagnosis, REV: 1.0
Jun 18 11:08:43 server101 EVENT-ID: 6eac1493-ac06-46b5-b7de-e96124a6c10e
Jun 18 11:08:43 server101 DESC: Missing data on ZFS device 'id1,dmp@n60012440f48e08000000000000000252/n60012440f48e08000000000000000252-a' in pool 'rpool'. Applications are unaffected if sufficient replicas exist.
Jun 18 11:08:43 server101 AUTO-RESPONSE: An attempt will be made automatically to recover the data. The device and pool will be degraded.
Jun 18 11:08:43 server101 IMPACT: The device and pool may continue functioning in degraded state until data is recovered.
Jun 18 11:08:43 server101 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-QJ for the latest service procedures and policies regarding this diagnosis. Note : rpool is automatically resilvering. This problem obseved on Solaris 11 after each reboot. When customer disables dmp native support,these messages are not seen and also resilvering is not happening. -bash-4.1# zpool status
pool: rpool
state: DEGRADED
status: One or more devices are unavailable in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or 'fmadm repaired', or replace the device
with 'zpool replace'.
Run 'zpool status -v' to see device specific details.
scan: resilvered 31.1M in 1s with 0 errors on Thu Jun 18 config: NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
tagmastore-usp0_0044s6 ONLINE 0 0 0
tagmastore-usp0_0043s6 UNAVAIL 0 0 0

Additional Information

JIRA: STESC-3123