Skip to content

Bug: panic: runtime error: slice bounds out of range #12316

@alinbalutoiu

Description

@alinbalutoiu

What is the bug?

It seems that the querier component has had a weird nil panic, here are the last 5 minutes worth of logs:

{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:54:31.00605734Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:55:01.006000333Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:55:31.006165321Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:56:01.005745932Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:56:31.006359276Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:57:01.006425226Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:57:31.006198034Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:58:01.005515026Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:58:31.006173094Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:59:01.006374875Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T01:59:31.005690563Z"}
{"addr":"<redacted>:9095","caller":"pool.go:250","level":"warn","msg":"removing frontend failing healthcheck","reason":"failing healthcheck status: NOT_SERVING","ts":"2025-08-06T01:59:38.973757497Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:00:01.006177286Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:00:31.005580221Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:01:01.006101429Z"}
{"caller":"memberlist_client.go:552","level":"info","msg":"initiating cleanup of obsolete entries","ts":"2025-08-06T02:01:31.005657498Z"}
panic: runtime error: slice bounds out of range [:563] with length 552

goroutine 1517222253 [running]:
github.com/prometheus/prometheus/model/labels.decodeString({0xc01526d680?, 0x92?}, 0x2480?)
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/model/labels/labels_stringlabels.go:57 +0x65
github.com/prometheus/prometheus/model/labels.Compare({{0xc01526d440?, 0x234?}}, {{0xc01526d680?, 0x233?}})
	/__w/mimir/mimir/vendor/github.com/prometheus/prometheus/model/labels/labels_stringlabels.go:366 +0x18c
github.com/grafana/mimir/pkg/distributor.mergeSeriesChunkStreams({0xc01499a008?, 0xc00d13cf60?, 0xc00d13bae0?}, 0x2)
	/__w/mimir/mimir/pkg/distributor/query.go:515 +0x15d
github.com/grafana/mimir/pkg/distributor.(*Distributor).queryIngesterStream(0xc001c31008, {0x370a608, 0xc00d13ce40}, {0xc00d13cf60, 0x1, 0x1}, 0xc00d13cf30, 0xc001c3cfc0)
	/__w/mimir/mimir/pkg/distributor/query.go:413 +0x5b2
github.com/grafana/mimir/pkg/distributor.(*Distributor).QueryStream.func1({0x370a608, 0xc00d13ce40})
	/__w/mimir/mimir/pkg/distributor/query.go:94 +0xeb
github.com/grafana/dskit/instrument.CollectedRequest({0x370a608, 0xc00d13ce10}, {0x2fb965e, 0x17}, {0x36fe400, 0xc00053e6f0}, 0xc0353fa000?, 0xc006f273e8)
	/__w/mimir/mimir/vendor/github.com/grafana/dskit/instrument/instrument.go:172 +0x25d
github.com/grafana/mimir/pkg/distributor.(*Distributor).QueryStream(0x10?, {0x370a608?, 0xc00d13ce10?}, 0xc00a1d2fd0?, 0xc00d13bf80?, 0x2a?, {0xc00d978520?, 0x30?, 0xc006f274e8?})
	/__w/mimir/mimir/pkg/distributor/query.go:82 +0x138
github.com/grafana/mimir/pkg/querier.(*distributorQuerier).streamingSelect(0xc00d13b680, {0x370a608, 0xc00d13ce10}, 0x1987d1b365b, 0x1987d1c20ba, {0xc00d978520?, 0xc0006fb4d0?, 0x2fc1ab7?})
	/__w/mimir/mimir/pkg/querier/distributor_queryable.go:124 +0x74
github.com/grafana/mimir/pkg/querier.(*distributorQuerier).Select(0xc00d13b680, {0x370a608?, 0xc00d13cdb0?}, 0x60?, 0xc004f5d650, {0xc00d978520, 0x4, 0x4})
	/__w/mimir/mimir/pkg/querier/distributor_queryable.go:120 +0x385
github.com/grafana/mimir/pkg/querier.multiQuerier.Select({{0xc0027605a0, 0x2, 0x2}, 0xc001c3cfc0, {0x274a48a78000, {0x0, {{...}, {...}, {...}, {...}, ...}, ...}, ...}, ...}, ...)
	/__w/mimir/mimir/pkg/querier/querier.go:402 +0xc6a
github.com/grafana/mimir/pkg/storage/lazyquery.LazyQuerier.Select.func1()
	/__w/mimir/mimir/pkg/storage/lazyquery/lazyquery.go:55 +0x42
created by github.com/grafana/mimir/pkg/storage/lazyquery.LazyQuerier.Select in goroutine 1517222252
	/__w/mimir/mimir/pkg/storage/lazyquery/lazyquery.go:54 +0x285

How to reproduce it?

Not entirely sure how to reproduce it, this was a normal Mimir deployment in a production environment.

What did you think would happen?

At the time of the crash, from the metrics:

  • they show an increase in the blocks queried with compact level 3.
  • there was a drop in the series hash cache hit ratio
  • other metrics look normal, nothing unusual

Everything seems to be business as usual for Mimir around that time.

What was your environment?

The environment runs mimir-2.16.0 release deployed with mimir-distributed helm chart.

Any additional context to share?

All metrics look normal during when the crash happened for one pod.
Only 1 out of 12 querier pods has crashed with the above panic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions