-
Notifications
You must be signed in to change notification settings - Fork 3.8k
op-batcher: exit process on criticial throttling RPC error #17924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 5 commits
7047886
4a26bcf
38d966c
f01c3b7
e60ef78
750cedf
e886c7e
59a967a
678c2bc
a57e224
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -99,6 +99,7 @@ type DriverSetup struct { | |
| // batches to L1 for availability. | ||
| type BatchSubmitter struct { | ||
| DriverSetup | ||
| closeApp context.CancelCauseFunc | ||
|
Comment on lines
101
to
+102
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks to me like the |
||
|
|
||
| wg *sync.WaitGroup | ||
| shutdownCtx, killCtx context.Context | ||
|
|
@@ -121,13 +122,14 @@ type BatchSubmitter struct { | |
| } | ||
|
|
||
| // NewBatchSubmitter initializes the BatchSubmitter driver from a preconfigured DriverSetup | ||
| func NewBatchSubmitter(setup DriverSetup) *BatchSubmitter { | ||
| func NewBatchSubmitter(setup DriverSetup, closeApp context.CancelCauseFunc) *BatchSubmitter { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...then we can also avoid passing it around like here. |
||
| state := NewChannelManager(setup.Log, setup.Metr, setup.ChannelConfig, setup.RollupConfig) | ||
| if setup.ChannelOutFactory != nil { | ||
| state.SetChannelOutFactory(setup.ChannelOutFactory) | ||
| } | ||
|
|
||
| batcher := &BatchSubmitter{ | ||
| closeApp: closeApp, | ||
| DriverSetup: setup, | ||
| channelMgr: state, | ||
| } | ||
|
|
@@ -623,20 +625,12 @@ func (l *BatchSubmitter) singleEndpointThrottler(wg *sync.WaitGroup, throttleSig | |
| return | ||
| } | ||
|
|
||
| var rpcErr rpc.Error | ||
| if errors.As(err, &rpcErr) && eth.ErrorCode(rpcErr.ErrorCode()).IsGenericRPCError() { | ||
| l.Log.Error("SetMaxDASize RPC method unavailable on endpoint, shutting down. Either enable it or disable throttling.", | ||
| "endpoint", endpoint, "err", err) | ||
|
|
||
| // We have a strict requirement that all endpoints must have the SetMaxDASize endpoint, and shut down if this RPC method is not available | ||
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) | ||
| defer cancel() | ||
| // Call StopBatchSubmitting in another goroutine to avoid deadlock. | ||
| go func() { | ||
| _ = l.StopBatchSubmitting(ctx) | ||
| }() | ||
| return | ||
| if isCriticalThrottlingRPCError(err) { | ||
| // We have a strict requirement that all endpoints must have the SetMaxDASize endpoint, | ||
| // and shut down if this RPC method is not available or returns another application-level error. | ||
| l.shutdownOnCriticalError(fmt.Errorf("SetMaxDASize RPC method unavailable at %s, either enable it or disable throttling: %w", endpoint, err)) | ||
| } else if err != nil { | ||
| // Transport-level errors are retried. | ||
| l.Log.Warn("SetMaxDASize RPC failed for endpoint, retrying.", "endpoint", endpoint, "err", err) | ||
| retryTimer.Reset(retryInterval) | ||
| return | ||
|
|
@@ -671,6 +665,17 @@ func (l *BatchSubmitter) singleEndpointThrottler(wg *sync.WaitGroup, throttleSig | |
| } | ||
| } | ||
|
|
||
| func isCriticalThrottlingRPCError(err error) bool { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tests for this would be good. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice! Short & Sweet. |
||
| var rpcErr rpc.Error | ||
| return errors.As(err, &rpcErr) && eth.ErrorCode(rpcErr.ErrorCode()).IsGenericRPCError() | ||
| } | ||
|
|
||
| func (l *BatchSubmitter) shutdownOnCriticalError(err error) { | ||
| l.Log.Error("Shutting down batcher on critical error", "err", err) | ||
| // Call closeApp to trigger process to exit (gracefully) | ||
| l.closeApp(err) | ||
| } | ||
|
Comment on lines
+677
to
+681
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unprotected nil The new Add a nil check before calling Don't like this finding? Reply "dismiss" and it won't appear again in future scans. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to allow using a
Comment on lines
+677
to
+681
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tests for this would be good; does it shut down? Is it graceful? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tested this manually so far, and yes it does shut down successfully. I'm not totally sure we want to write a full end to end test for this, because we would need to spawn the batcher in a subprocess, attach it to an rpc endpoint returning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I extended an existing test here a57e224. I'm pretty happy with this level of testing, it shows the "closeApp" is called under the right conditions. Having that |
||
|
|
||
| // throttlingLoop acts as a distributor that spawns individual throttling loops for each endpoint | ||
| // and fans out the unsafe bytes updates to each endpoint | ||
| func (l *BatchSubmitter) throttlingLoop(wg *sync.WaitGroup, unsafeBytesUpdated chan int64) { | ||
|
|
||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a slightly easier and more straight forward way of stopping the batcher is to call
BatcherService.Stopwhen a critical error is found in the driver. It feels a bit wrong that we need to pass down a very high-level closer function down to the driver since theBatcherServicealready has aStopfunction (that should be involved when thecloseAppfunction is called that is passed down in thebatcher.Mainfunction). So we're still calling it, but more indirectly than necessary.So instead, we can just pass down
BatcherService.Stopto the driver when setting it up atBatcherService.initDriver.Note that this way, we also don't need to create
closeAppfunctions in the test setups that callBatcherServiceFromCLIConfig. And currently, the tests closer functions just make the tests fail, but still not stop the actual batcher service, if I see correctly.Or am I missing something and you considered this slightly easier option?