SDS Operator Pod in CrashLoopBackOff state following OpenShift node removal

book

Article ID: 100074564

calendar_today

Updated On:

Description

Error Message

CrashLoopBackOff

Reference

JIRA : STESC-9614

Cause

In a disk-based fencing configuration using the VIKE solution, when a worker node is completely removed from the OpenShift cluster, the sds-operator is unable to transition to a running state even though it successfully initiates the removal process for the node.

Example:

# oc get infoscalecluster

Name VERSION CLUSTERID STATE DISKGROUPS STATUS AGE isc-primary 8.0.400 1000 ProcessingRemoveNode vrts_kube_dg-1000 Degraded 262d

Resolution

Please contact Arctera Support to obtain the updated SDS Operator images compatible with version 8.0.400

Steps to replace the image:

Load the image into the private registry.
Login to private registry
podman load -i
podman tag /infoscale-sds-operator:8.0.400-rhel
podman push /infoscale-sds-operator:8.0.400-rhel
Login into the node where the sds-operator pod is running with the core user.
Elevate to root user: sudo su - root
Pull the updated images from the registry podman pull /infoscale-sds-operator:8.0.400-rhel
On bastion host, edit the SDS Operator deployment
oc edit deployment infoscale-sds-operator

Change both occurrences from image: to /infoscale-sds-operator:8.0.400-rhel
At the top of spec: add:
nodeName:

Additional Information

JIRA: STESC-9614

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"