Bug: Shipping blocks to GCS not working when rolling restart/ scaling down the ingesters #11176
Unanswered
vaishnavi216
asked this question in
Help and support
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What is the bug?
We recently Deployed Mimir in GKS there we using GCStorage bucket. normally every 2 to 3 hrs flushing happening properly but when we scaling down the ingester , blocks shipping to GCS bucket not happening .
ingester struck in Termination state for 30mins. this behaviour we saw when sending 3Million load ,
very less like 1000k series not seeing any issues while shutdown
How to reproduce it?
mimir Version: 2.15.1
Deploying in gks with GCS bucket as storage
Sending more load to mimir
Try Shutdown the ingester intermidiatly
What did you think would happen?
Blocks being currectly shipped to GCS bucket when ingester rolling restart/scaling down happening
What was your environment?
Mimir version: 2.15.1
deploying in GKS
Any additional context to share?
during termination ingester logs
ingester termination logs:
ts=2025-04-09T08:28:59.410883045Z caller=ingester.go:1310 level=debug user=gcp-mimir-02 event="complete commit" commitDuration=103.475µs
ts=2025-04-09T08:28:59.414778717Z caller=lifecycler.go:594 level=info msg="lifecycler loop() exited gracefully" ring=ingester
ts=2025-04-09T08:28:59.414795483Z caller=lifecycler.go:977 level=info msg="changing instance state from" old_state=ACTIVE new_state=LEAVING ring=ingester
ts=2025-04-09T08:28:59.414892132Z caller=lifecycler.go:1056 level=info msg="transfers are disabled"
ts=2025-04-09T08:28:59.414904014Z caller=ingester.go:3428 level=info msg="starting to flush and ship TSDB blocks"
ts=2025-04-09T08:29:00.110591649Z caller=log.go:245 level=debug msg="Failed UDP ping: esobb-mimir-overrides-exporter-enterprise-845947884-vx7rr-e38a995d (timeout reached)"
ts=2025-04-09T08:29:10.111404793Z caller=log.go:245 level=debug msg="Failed UDP ping: esobb-mimir-query-frontend-enterprise-594987867c-rvvmp-bdfe6da9 (timeout reached)"
ts=2025-04-09T08:29:11.598532143Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=compact.go:777 time=2025-04-09T08:29:11.598509396Z msg="write block" mint=1744178412671 maxt=1744185600000 ulid=01JRCTNCRDYCQBGQ0MX81YXG3E duration=12.065179401s ooo=false
ts=2025-04-09T08:29:11.603209422Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=db.go:1936 time=2025-04-09T08:29:11.603197738Z msg="Deleting obsolete block" block=01JRBCRKKDCSCS45K4PAP752CK
ts=2025-04-09T08:29:11.967594369Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=head.go:1402 time=2025-04-09T08:29:11.967578383Z msg="Head GC completed" caller=truncateMemory duration=364.345858ms
ts=2025-04-09T08:29:12.008401409Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=checkpoint.go:99 time=2025-04-09T08:29:12.008386333Z msg="Creating checkpoint" from_segment=55 to_segment=58 mint=1744185600000
ts=2025-04-09T08:29:15.111913119Z caller=log.go:245 level=debug msg="Failed UDP ping: esobb-mimir-distributor-enterprise-57798458cd-rwdk5-d9a61e5e (timeout reached)"
ts=2025-04-09T08:29:17.867856722Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=head.go:1364 time=2025-04-09T08:29:17.867827991Z msg="WAL checkpoint complete" first=55 last=58 duration=5.85954412s
ts=2025-04-09T08:29:31.390905052Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=compact.go:777 time=2025-04-09T08:29:31.390884417Z msg="write block" mint=1744185600000 maxt=1744187335197 ulid=01JRCTNYNBHGB0KNSSFS72JPSM duration=13.522993842s ooo=false
ts=2025-04-09T08:29:35.112133168Z caller=log.go:245 level=debug msg="Failed UDP ping: esobb-mimir-ruler-enterprise-54f647db9b-l4bfz-f91394c9 (timeout reached)"
ts=2025-04-09T08:29:36.765510014Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=head.go:1402 time=2025-04-09T08:29:36.765492718Z msg="Head GC completed" caller=truncateMemory duration=5.371604512s
ts=2025-04-09T08:29:36.765637067Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=checkpoint.go:99 time=2025-04-09T08:29:36.765625625Z msg="Creating checkpoint" from_segment=59 to_segment=60 mint=1744187335197
ts=2025-04-09T08:29:39.141992428Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=head.go:1364 time=2025-04-09T08:29:39.141970938Z msg="WAL checkpoint complete" first=59 last=60 duration=2.376426982s
ts=2025-04-09T08:29:39.142176019Z caller=handler.go:77 level=info user=gcp-mimir-02 caller=db.go:1556 time=2025-04-09T08:29:39.142170235Z msg="compact ooo head resulted in no blocks" duration=1.963µs
ts=2025-04-09T08:29:39.142417585Z caller=ingester.go:3194 level=debug msg="TSDB blocks compaction completed successfully" user=gcp-mimir-02 compactReason=forced
ts=2025-04-09T08:30:10.112180782Z caller=log.go:245 level=debug msg="Failed UDP ping: esobb-mimir-compactor-enterprise-1-45433155 (timeout reached)"
ts=2025-04-09T08:30:50.112376442Z caller=log.go:245 level=debug msg="Failed UDP ping: esobb-mimir-compactor-enterprise-0-70dca9c6 (timeout reached)"
Beta Was this translation helpful? Give feedback.
All reactions