-
Notifications
You must be signed in to change notification settings - Fork 627
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the bug?
It seems that the querier component has had a weird nil panic, here are the last 5 minutes worth of logs:
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:54:31.00605734Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:55:01.006000333Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:55:31.006165321Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:56:01.005745932Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:56:31.006359276Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:57:01.006425226Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:57:31.006198034Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:58:01.005515026Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:58:31.006173094Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:59:01.006374875Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:59:31.005690563Z"}
{"addr":"<redacted>:9095","caller":"pool.go:250","level":"warn","msg":"removing frontend failing healthcheck","reason":"failing healthcheck status: NOT_SERVING","ts":"2025-08-06T01:59:38.973757497Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:00:01.006177286Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:00:31.005580221Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:01:01.006101429Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:01:31.005657498Z"}
panic: runtime error: slice bounds out of range [:563] with length 552
goroutine 1517222253 [running]:
github.com/prometheus/prometheus/model/labels.decodeString({0xc01526d680?, 0x92?}, 0x2480?)
/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/model/labels/labels_stringlabels.go:57 +0x65
github.com/prometheus/prometheus/model/labels.Compare({{0xc01526d440?, 0x234?}}, {{0xc01526d680?, 0x233?}})
/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/model/labels/labels_stringlabels.go:366 +0x18c
github.com/grafana/mimir/pkg/distributor.mergeSeriesChunkStreams({0xc01499a008?, 0xc00d13cf60?, 0xc00d13bae0?}, 0x2)
/__w/mimir/mimir/pkg/distributor/query.go:515 +0x15d
github.com/grafana/mimir/pkg/distributor.(*Distributor).queryIngesterStream(0xc001c31008, {0x370a608, 0xc00d13ce40}, {0xc00d13cf60, 0x1, 0x1}, 0xc00d13cf30, 0xc001c3cfc0)
/__w/mimir/mimir/pkg/distributor/query.go:413 +0x5b2
github.com/grafana/mimir/pkg/distributor.(*Distributor).QueryStream.func1({0x370a608, 0xc00d13ce40})
/__w/mimir/mimir/pkg/distributor/query.go:94 +0xeb
github.com/grafana/dskit/instrument.CollectedRequest({0x370a608, 0xc00d13ce10}, {0x2fb965e, 0x17}, {0x36fe400, 0xc00053e6f0}, 0xc0353fa000?, 0xc006f273e8)
/__w/mimir/mimir/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x25d
github.com/grafana/mimir/pkg/distributor.(*Distributor).QueryStream(0x10?, {0x370a608?, 0xc00d13ce10?}, 0xc00a1d2fd0?, 0xc00d13bf80?, 0x2a?, {0xc00d978520?, 0x30?, 0xc006f274e8?})
/__w/mimir/mimir/pkg/distributor/query.go:82 +0x138
github.com/grafana/mimir/pkg/querier.(*distributorQuerier).streamingSelect(0xc00d13b680, {0x370a608, 0xc00d13ce10}, 0x1987d1b365b, 0x1987d1c20ba, {0xc00d978520?, 0xc0006fb4d0?, 0x2fc1ab7?})
/__w/mimir/mimir/pkg/querier/distributor_queryable.go:124 +0x74
github.com/grafana/mimir/pkg/querier.(*distributorQuerier).Select(0xc00d13b680, {0x370a608?, 0xc00d13cdb0?}, 0x60?, 0xc004f5d650, {0xc00d978520, 0x4, 0x4})
/__w/mimir/mimir/pkg/querier/distributor_queryable.go:120 +0x385
github.com/grafana/mimir/pkg/querier.multiQuerier.Select({{0xc0027605a0, 0x2, 0x2}, 0xc001c3cfc0, {0x274a48a78000, {0x0, {{...}, {...}, {...}, {...}, ...}, ...}, ...}, ...}, ...)
/__w/mimir/mimir/pkg/querier/querier.go:402 +0xc6a
github.com/grafana/mimir/pkg/storage/lazyquery.LazyQuerier.Select.func1()
/__w/mimir/mimir/pkg/storage/lazyquery/lazyquery.go:55 +0x42
created by github.com/grafana/mimir/pkg/storage/lazyquery.LazyQuerier.Select in goroutine 1517222252
/__w/mimir/mimir/pkg/storage/lazyquery/lazyquery.go:54 +0x285
How to reproduce it?
Not entirely sure how to reproduce it, this was a normal Mimir deployment in a production environment.
What did you think would happen?
At the time of the crash, from the metrics:
- they show an increase in the blocks queried with compact level 3.
- there was a drop in the series hash cache hit ratio
- other metrics look normal, nothing unusual
Everything seems to be business as usual for Mimir around that time.
What was your environment?
The environment runs mimir-2.16.0
release deployed with mimir-distributed
helm chart.
Any additional context to share?
All metrics look normal during when the crash happened for one pod.
Only 1 out of 12 querier pods has crashed with the above panic.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working