AlertmanagerFailedReload is triggered when the Alertmanager instance
for the cluster monitoring stack has consistently failed to reload its
configuration for a certain period.
The impact depends on the type of the error you will find in the logs. Most of the time, previous configuration is still working, thanks to multiple instances, so avoid deleting existing pods.
Verify if there is an error in
config-reloader container logs.
Here an example with network issues.
$ kubectl logs sts/alertmanager-main -c config-reloader level=error ts=2021-09-24T11:24:52.69629226Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/alertmanager/-/reload\": dial tcp [::1]:9093: connect: connection refused"
You can also verify directly
alertmanager.yaml file (default:
Running amtool check-config alertmanager.yaml on your configuration file will help you detect problem related to syntax.
You could also rollback
alertmanager.yaml to the previous version in order to get back to a stable version.