AlertmanagerClusterDown #
Meaning #
Half or more of the Alertmanager instances within the same cluster are down.
Impact #
You have an unstable cluster, if everything goes wrong you will lose the whole cluster.
Diagnosis #
Verify why pods are not running.
You can get a big picture with events
.
$ kubectl get events --field-selector involvedObject.kind=Pod | grep alertmanager
Mitigation #
There are no cheap options to mitigate this risk. Verifying any new changes in preprod before production environment should improve stability.