SDS Operator Pod in CrashLoopBackOff state following OpenShift node removal

book

Article ID: 100074564

calendar_today

Updated On:

Description

Error Message

CrashLoopBackOff

Reference

JIRA : STESC-9614

Cause

In a disk-based fencing configuration using the VIKE solution, when a worker node is completely removed from the OpenShift cluster, the sds-operator is unable to transition to a running state even though it successfully initiates the removal process for the node.

Example:

# oc get infoscalecluster

Name                 VERSION   CLUSTERID  STATE                        DISKGROUPS             STATUS    AGE         
isc-primary          8.0.400   1000          ProcessingRemoveNode vrts_kube_dg-1000      Degraded  262d

Resolution

Please contact Arctera Support to obtain the updated SDS Operator images compatible with version 8.0.400

Steps to replace the image:

  • Load the image into the private registry.

    Login to private registry 
    podman load -i  
    podman tag
    /infoscale-sds-operator:8.0.400-rhel
    podman push /infoscale-sds-operator:8.0.400-rhel

  • Login into the node where the sds-operator pod is running with the core user.

  • Elevate to root user: sudo su - root 

  • Pull the updated images from the registry podman pull /infoscale-sds-operator:8.0.400-rhel

  • On bastion host, edit the SDS Operator deployment
    oc edit deployment infoscale-sds-operator
  • Change both occurrences from image: to /infoscale-sds-operator:8.0.400-rhel
     
  • At the top of spec: add:
    nodeName:

Additional Information

JIRA: STESC-9614