Summary
We’re seeing an alert being raised because thanos_cache_operation_failures_total increased. This happens quite often during normal browsing in Grafana. The failing labels are consistently operation="add" and reason="not-stored" with backend="memcached". The failures are reported by the mimir-querier component.
We're just trying to figure out if this is
a) normal
b) abnormal but due to our config
c) a bug in mimir
I'm happy to provide more debugging info as needed. Thanks!
Environment
- Chart: mimir-distributed:5.7.0
- Platform: Rancher RKE2
More info
Metric for the failure:
thanos_cache_operation_failures_total{
backend="memcached",
component="querier",
container="querier",
service="mimir-querier"
pod="mimir-querier-688b5d66d5-lrvtd",
endpoint="http-metrics",
job="mimir-querier",
name="metadata-cache",
namespace="mimir",
operation="add",
reason="not-stored",
...
}