NFS resource was OFFLINE unexpectedly where NFSRestart resource and nfs_postoffline trigger were configured.
book
Article ID: 100004579
calendar_today
Updated On:
Description
Error Message
2011/01/10 00:10:41 VCS INFO V-16-10011-5107 (arcp5605ecd5p1) NFS:nfs:monitor:Group might be going offline. Waiting for nfs_postoffline trigger to restart daemons
2011/01/10 00:11:41 VCS INFO V-16-10011-5107 (arcp5605ecd5p1) NFS:nfs:monitor:Group might be going offline. Waiting for nfs_postoffline trigger to restart daemons
2011/01/10 00:12:42 VCS ERROR V-16-2-13067 (arcp5605ecd5p1) Agent is calling clean for resource(nfs) because the resource became OFFLINE unexpectedly, on its own.
Cause
The default value of a nfs resource's attribute called 'LockFileTimeout' is 180 (3 minutes) and that was not long enough according to /opt/VRTSvcs/bin/NFS/monitor as below;
.
if ($timediff < $lockfile_timeout) { << report nfs resource onlined until LockFIleTimeout was reached !
VCSAG_LOGDBG_MSG (4, "File <$nfs_sync_file> exists, file accessed before <$timediff> seconds");
VCSAG_LOG_MSG ("I", "Group might be going offline. Waiting for nfs_postoffline trigger to restart daemons", 5107); <<<<<<<<< log this messages to the engine log
$state = $VCSAG_RES_ONLINE; <<<<<< reports the nfs resource as ONLINE up to the LockFIleTimeout
} else {
# remove the sync file because it has timed out
VCSAG_LOGDBG_MSG (4, "File <$nfs_sync_file> exists, timed out, removing file");
unlink $nfs_sync_file;
.
.
Resolution
Once the nfs resource was faulted and then clean script was called to clean the resource and retried to be onlined according to NFS agent’s RestartLimit (default 1) then onlined successfully.
So actually nothing needs to be done because it was onlined after restart.
Also note that this happens on a standby node once the service group which includes this nfs resource was failed over to another node successfully.
But vcs reports the nfs resource as ERROR and if this kind of ERROR is not bearable then increase the nfs resource’s LockFIleTimeout attribute.
Eg)
# haconf -makerw
# hares -modify nfs LockFileTimeout 300
# haconf -dump -makero
Applies To
vcs 5.1
AIX 6.1
but this may apply to all vcs versions where nfs_postoffline trigger was available.
Issue/Introduction
NFSRestart agent's offline script kills nfsd and mountd and then nfs_postoffline trigger restarts those daemons and lockd,statd,nfsrgyd and gssd.
If nfs agent's monitor script monitors nfsd and mountd before nfs_postoffline trigger started those daemons then it report the nfs resource as OFFLINE unexpectedly.
Was this article helpful?
thumb_up
Yes
thumb_down
No