AlertmanagerClusterCrashlooping #
Meaning #
Half or more of the Alertmanager instances within the same cluster are crashlooping.
Impact #
Alerts could be notified multiple time unless pods are crashing to fast and no alerts can be sent.
Diagnosis #
kubectl get pod -l app=alertmanager
NAMESPACE   NAME                    READY   STATUS              RESTARTS    AGE
default     alertmanager-main-0     1/2     CrashLoopBackOff    37107 2d
default     alertmanager-main-1     2/2     Running             0 43d
default     alertmanager-main-2     2/2     Running             0 43d 
Find the root cause by looking to events for a given pod/deployement
kubectl get events --field-selector involvedObject.name=alertmanager-main-0
Mitigation #
Make sure pods have enough resources (CPU, MEM) to work correctly.