fix: semaphore use for StoreChunks #1866

cody-littley · 2025-08-12T15:59:23Z

Why are these changes needed?

Fixes the way StoreChunks() uses a semaphore, old pattern never released semaphore.

github-actions · 2025-08-12T15:59:47Z

The latest Buf updates on your PR. Results from workflow Buf Proto / buf (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`✅ passed`	`✅ passed`	`✅ passed`	Aug 13, 2025, 3:32 PM

anupsv · 2025-08-12T16:36:18Z

node/grpc/server_v2.go

+		probe.SetStage("acquire_buffer_capacity")
+		semaphoreCtx, cancel := context.WithTimeout(ctx, s.node.Config.StoreChunksBufferTimeout)
+		defer cancel()
+		err = s.node.StoreChunksSemaphore.Acquire(semaphoreCtx, int64(downloadSizeInBytes))


This blocks until the request size in bytes is free or the semaphoreCtx is done right. This essentially means the storechunks request from the disperser fails incase the previous requests are still being processed or so ? Does this mean disperser needs to retry on this error ?

This essentially means the storechunks request from the disperser fails incase the previous requests are still being processed

Correct. The intention of this change is to prevent too many chunks from being in memory at a single instant in time.

Does this mean disperser needs to retry on this error ?

I've currently got dispersal retries disabled, as the current implementation is extremely inefficient. So if this triggers, the validator will not end up signing for some batches. I think this is ok though... if the disperser is sending more work than the validator can handle, it NEEDS to skip some batches or else it will accumulate a backlog that will eventually lead to OOM.

So essentially failed dispersals is what we are agreeing to.

Possibly worth having a discussion on this.

I'm not convinced it's possible to have a robust, high performance system without the ability shed load when under high stress. It's the difference between a system that recovers from a spike too large to handle, and a system that face plants when traffic spikes above a critical threshold.

node/node_v2.go

litt3 · 2025-08-13T14:14:42Z

node/config.go

@@ -424,7 +412,7 @@ func NewConfig(ctx *cli.Context) (*Config, error) {
 		LittDBReadCacheSizeBytes:      uint64(ctx.GlobalFloat64(flags.LittDBReadCacheSizeGBFlag.Name) * units.GiB),
 		LittDBReadCacheSizeFraction:   ctx.GlobalFloat64(flags.LittDBReadCacheSizeFractionFlag.Name),
 		LittDBStoragePaths:            ctx.GlobalStringSlice(flags.LittDBStoragePathsFlag.Name),
-		LittUnsafePurgeLocks:          ctx.GlobalBool(flags.LittUnsafePurgeLocksFlag.Name),
+		LittRespectLocks:              ctx.GlobalBool(flags.LitRespectLocksFlag.Name),


Suggested change

LittRespectLocks: ctx.GlobalBool(flags.LitRespectLocksFlag.Name),

LittRespectLocks: ctx.GlobalBool(flags.LittRespectLocksFlag.Name),

node/flags/flags.go

Fix semaphore use for StoreChunks()

f60cd57

cody-littley requested a review from litt3 August 12, 2025 15:59

anupsv reviewed Aug 12, 2025

View reviewed changes

litt3 previously approved these changes Aug 12, 2025

View reviewed changes

node/node_v2.go Outdated Show resolved Hide resolved

node/node_v2.go Show resolved Hide resolved

node/node_v2.go Outdated Show resolved Hide resolved

make suggested changes

ba06a7b

cody-littley dismissed litt3’s stale review via ba06a7b August 12, 2025 19:11

anupsv previously approved these changes Aug 12, 2025

View reviewed changes

litt3 previously approved these changes Aug 12, 2025

View reviewed changes

Override locks when starting validator, by default

2e36a68

cody-littley dismissed stale reviews from litt3 and anupsv via 2e36a68 August 13, 2025 14:05

litt3 previously approved these changes Aug 13, 2025

View reviewed changes

Change default flag behavior

cf3f27f

cody-littley dismissed litt3’s stale review via cf3f27f August 13, 2025 14:18

make suggested changes

20285b3

litt3 approved these changes Aug 13, 2025

View reviewed changes

cody-littley enabled auto-merge August 13, 2025 15:57

cody-littley added this pull request to the merge queue Aug 13, 2025

Merged via the queue into master with commit 11659e8 Aug 13, 2025
24 of 25 checks passed

cody-littley deleted the fix-storeChunks-semaphore branch August 13, 2025 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: semaphore use for StoreChunks #1866

fix: semaphore use for StoreChunks #1866

cody-littley commented Aug 12, 2025

Uh oh!

github-actions bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

anupsv Aug 12, 2025

Uh oh!

cody-littley Aug 12, 2025

Uh oh!

anupsv Aug 12, 2025

Uh oh!

cody-littley Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

litt3 Aug 13, 2025

Uh oh!

cody-littley Aug 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	LittRespectLocks: ctx.GlobalBool(flags.LitRespectLocksFlag.Name),
	LittRespectLocks: ctx.GlobalBool(flags.LittRespectLocksFlag.Name),

fix: semaphore use for StoreChunks #1866

fix: semaphore use for StoreChunks #1866

Conversation

cody-littley commented Aug 12, 2025

Why are these changes needed?

Uh oh!

github-actions bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anupsv Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

cody-littley Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

anupsv Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

cody-littley Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

litt3 Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

cody-littley Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 12, 2025 •

edited

Loading