Kube Pod Crash Looping

KubePodCrashLooping #

Meaning #

Pod is in CrashLoop which means the app dies or is unresponsive and kubernetes tries to restart it automatically.

Impact #

Service degradation or unavailability. Inability to do rolling upgrades. Certain apps will not perform required tasks such as data migrations.

Diagnosis #

  • Check template via kubectl -n $NAMESPACE get pod $POD.
  • Check pod events via kubectl -n $NAMESPACE describe pod $POD.
  • Check pod logs via kubectl -n $NAMESPACE logs $POD -c $CONTAINER
  • Check pod template parameters such as:
    • pod priority
    • resources - maybe it tries to use unavailable resource, such as GPU but there is limited number of nodes with GPU
    • readiness and liveness probes may be incorrect - wrong port or command, check is failing too fast due to short timeout for response

Other things to check:

  • app responding extremely slow due to resource constraints such as memory too low, not enough CPU which is required on start
  • app waits for other services to start, such as database
  • misconfiguration causing app crash on start
  • missing files such as configmaps/secrets/volumes
  • read only filesystem
  • wrong user permissions in container
  • lack of special container capabilities (securityContext)
  • app is executed in different directory than expected (for example WORKDIR from Docerkfile is not used in OpenShift)

Mitigation #

Talk with developers or read documentation about the app, ensure to define sane default values to start the app.

See Debugging Pods