-
Notifications
You must be signed in to change notification settings - Fork 144
maintainence(lib/babe): refactor code to use epochHandler for control of epoch logic #2151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dot/peerset/peerset.go
Outdated
@@ -340,8 +342,9 @@ func (ps *PeerSet) reportPeer(change ReputationChange, peers ...peer.ID) error { | |||
return err | |||
} | |||
|
|||
// TODO: change Noop to Drop once #2098 is fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if i should put commenting out this ban peer stuff into another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not just uncomment this out for now? I wasn't able to reproduce the nodes banning at this point at least locally.
lib/babe/epoch_test.go
Outdated
"github.com/stretchr/testify/require" | ||
) | ||
|
||
var keyring, _ = keystore.NewSr25519Keyring() //nolint:typecheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I don't declare keyring here I get an undeclared error, but I then get a lint error since it's defined in babe_integration_test.go
. how do i fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, i just gotta put it in a non integration test file
lib/babe/epoch_handler_test.go
Outdated
require.Equal(t, 200, len(eh.slotToProof)) | ||
require.Equal(t, uint64(1), eh.epochNumber) | ||
require.Equal(t, getSlotStartTime(9999, sd), eh.startTime) | ||
require.Equal(t, uint64(9999), eh.firstSlot) | ||
require.Equal(t, constants, eh.constants) | ||
require.Equal(t, epochData, eh.epochData) | ||
require.NotNil(t, eh.handleSlot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you assert the entire eh
? Also eh
is eh variable naming wise 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't assert the entire epochHandler
because the slotToProof
value is uses randomness in the VRF calculation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we assign it some predictable value in the test? Or perhaps dependency inject the source of randomness? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm it would probably require overriding the claimPrimarySlot
function which is used to generate the proof... I can do it if you'd really like but seems like that sorta veers into territory of changing the code for tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in if it's a lot of code change, don't; otherwise I'd say yes your code should be changed to be more testable 😉
lib/babe/babe.go
Outdated
return err | ||
} | ||
err = func() error { | ||
ctx, cancel := context.WithCancel(b.ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm of the proponent where we should not be using context.Context
within package boundaries. I know we already do this in most of the Service
implementations. If what we're looking for is cancellation, then we can just create a channel for that purpose. context.Context
is meant to traverse through API boundaries and contain request scoped data. I've seen a lot of examples of go code using them to handle long running processes and I feel like it introduces more weight than required.
In this case the Service.ctx
is being wrapped to add cancellation to the current epochHandler
. So to my knowledge that the Service.ctx
was created with context.WithCancel
, then ctx.Done()
would close if the original CancelFunc
was called. But we wrap this context again in line 347. So does that mean that if we call the local cancel CancelFunc
than any goroutines listening on Service.ctx
would receive the channel close? I would assume no, but these are things that make the code more difficult to understand. I think as a best practice you shouldn't call context.WithCancel
or any of the other context.WithXXX
functions on supplied contexts.
I think epochHandler.run()
should be modified to accept a context.Context
that's not the original one stored in Service
. This runEngine
function should listen on the Service
one and trigger cancellation within this function for the current epochHandler
.
@timwu20 made your suggested change regarding the context, let me know what you think now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small changes!
lib/babe/babe.go
Outdated
|
||
if err = babeService.setupParameters(cfg); err != nil { | ||
return nil, err | ||
constants: &constants{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is a pointer by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know lol. want me to change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes pleaaaase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
lib/babe/babe.go
Outdated
auths := make([]types.Authority, len(b.epochHandler.epochData.authorities)) | ||
copy(auths, b.epochHandler.epochData.authorities) | ||
return auths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can copy types.Authority
directly, you would need a deep copy method I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// retry to run the engine at some point (maybe the next epoch) if | ||
// there's an error. | ||
if err := b.runEngine(); err != nil { | ||
logger.Criticalf("failed to run block production engine: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 💯
lib/babe/babe.go
Outdated
if err != nil { | ||
logger.Errorf("failed to get current epoch: %s", err) | ||
return err | ||
return nil, fmt.Errorf("failed to initiate epoch %d: %s", epoch, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed to initiate epoch %d: %s", epoch, err) | |
return nil, fmt.Errorf("failed to initiate epoch %d: %w", epoch, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done (also fixed all the other error returns)
lib/babe/babe.go
Outdated
} | ||
epochStartSlot, err := b.epochState.GetStartSlotForEpoch(epoch) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to get start slot for current epoch %d: %s", epoch, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("failed to get start slot for current epoch %d: %s", epoch, err) | |
return nil, fmt.Errorf("failed to get start slot for current epoch %d: %w", epoch, err) |
func (b *Service) runEngine() error { | ||
epoch, err := b.epochState.GetCurrentEpoch() | ||
if err != nil { | ||
return fmt.Errorf("failed to get current epoch: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("failed to get current epoch: %s", err) | |
return fmt.Errorf("failed to get current epoch: %w", err) |
lib/babe/babe.go
Outdated
return err | ||
} | ||
for { | ||
err = func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have this one as a separate global scope function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
lib/babe/babe.go
Outdated
defer cancel() | ||
b.epochHandler, err = b.initiateAndGetEpochHandler(epoch) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return err | |
return fmt.Errorf("cannot initiate epoch handler: %w", err) |
lib/babe/babe.go
Outdated
// get start slot for current epoch | ||
nextEpochStart, err := b.epochState.GetStartSlotForEpoch(epoch + 1) | ||
if err != nil { | ||
return fmt.Errorf("failed to get start slot for next epoch %d: %s", epoch+1, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("failed to get start slot for next epoch %d: %s", epoch+1, err) | |
return fmt.Errorf("failed to get start slot for next epoch %d: %w", epoch+1, err) |
// TODO: sometimes grandpa fails to initiate due to a "Key not found" | ||
// error, this shouldn't happen. | ||
if err := s.initiate(); err != nil { | ||
logger.Criticalf("failed to initiate: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panic? Error channel? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the same sort of issue as with BABE, we probably need to add an error channel so the higher-level service handler can either panic or restart the service. I'll make an issue for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/babe/verify_integration_test.go
Outdated
epochData.authorities = make([]types.Authority, 1) | ||
epochData.authorities[0] = types.Authority{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
epochData.authorities = make([]types.Authority, 1) | |
epochData.authorities[0] = types.Authority{ | |
epochData.authorities = []types.Authority{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lib/babe/verify_integration_test.go
Outdated
epochData.threshold = maxThreshold | ||
epochData.authorityIndex = 0 | ||
|
||
var slotNumber uint64 = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var slotNumber uint64 = 1 | |
const slotNumber = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
lib/babe/verify_integration_test.go
Outdated
@@ -214,14 +230,18 @@ func TestVerificationManager_VerifyBlock_Secondary(t *testing.T) { | |||
} | |||
|
|||
func TestVerificationManager_VerifyBlock_MultipleEpochs(t *testing.T) { | |||
t.Skip() // no idea why it's complaining it can't find the epoch data. fix later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO 😉
|
||
if genCfg == nil { | ||
genCfg = genesisBABEConfig | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this moved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this isn't here the verifier tests panic :p
mockBlockState := NewMockBlockState(ctrl) | ||
mockEpochState0 := NewMockEpochState(ctrl) | ||
mockEpochState1 := NewMockEpochState(ctrl) | ||
mockEpochState2 := NewMockEpochState(ctrl) | ||
|
||
mockEpochState0.EXPECT().GetStartSlotForEpoch(gomock.Eq(uint64(0))).Return(uint64(1), nil) | ||
mockEpochState1.EXPECT().GetStartSlotForEpoch(gomock.Eq(uint64(1))).Return(uint64(201), nil) | ||
mockEpochState2.EXPECT().GetStartSlotForEpoch(gomock.Eq(uint64(1))).Return(uint64(201), nil) | ||
|
||
mockEpochState1.EXPECT().HasEpochData(gomock.Eq(uint64(1))).Return(true, nil) | ||
mockEpochState2.EXPECT().HasEpochData(gomock.Eq(uint64(1))).Return(true, nil) | ||
|
||
kp := keyring.Alice().(*sr25519.Keypair) | ||
authority := types.NewAuthority(kp.Public(), uint64(1)) | ||
testEpochData := &types.EpochData{ | ||
Randomness: [32]byte{1}, | ||
Authorities: []types.Authority{*authority}, | ||
} | ||
|
||
mockEpochState1.EXPECT().GetEpochData(gomock.Eq(uint64(1))).Return(testEpochData, nil) | ||
mockEpochState2.EXPECT().GetEpochData(gomock.Eq(uint64(1))).Return(testEpochData, nil) | ||
|
||
mockEpochState1.EXPECT().HasConfigData(gomock.Eq(uint64(1))).Return(true, nil) | ||
mockEpochState2.EXPECT().HasConfigData(gomock.Eq(uint64(1))).Return(false, nil) | ||
|
||
testConfigData := &types.ConfigData{ | ||
C1: 1, | ||
C2: 1, | ||
} | ||
|
||
mockEpochState1.EXPECT().GetConfigData(gomock.Eq(uint64(1))).Return(testConfigData, nil) | ||
|
||
testLatestConfigData := &types.ConfigData{ | ||
C1: 1, | ||
C2: 2, | ||
} | ||
|
||
mockEpochState2.EXPECT().GetLatestConfigData().Return(testLatestConfigData, nil) | ||
|
||
testEpochDataEpoch0 := &types.EpochData{ | ||
Randomness: [32]byte{9}, | ||
Authorities: []types.Authority{*authority}, | ||
} | ||
|
||
mockEpochState0.EXPECT().GetLatestEpochData().Return(testEpochDataEpoch0, nil) | ||
mockEpochState0.EXPECT().GetLatestConfigData().Return(testConfigData, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could configure your mocks according to test cases?
Like this for example.
There is also a reason where I believe having NewController(t)
with the parent test might give some funny logs/behavior especially when running parallel subtests inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that would be nice, I'll work on that. gotta work on making my tests cleaner haha
lib/babe/babe.go
Outdated
defer timer.Stop() | ||
select { | ||
case <-timer.C: | ||
if errors.Is(err, errServicePaused) || errors.Is(err, context.Canceled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to return an error if the context gets canceled 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can? idk, the only time the context would be cancelled is if the node shutdowns afaik, so i don't think it matters to return an error
babeService.epochHandler, err = babeService.initiateAndGetEpochHandler(0) | ||
require.NoError(t, err) | ||
|
||
authoringSlots := getAuthoringSlots(babeService.epochHandler.slotToProof) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit just in case
authoringSlots := getAuthoringSlots(babeService.epochHandler.slotToProof) | |
authoringSlots := getAuthoringSlots(babeService.epochHandler.slotToProof) | |
require.NotEmpty(t, authoringSlots) |
lib/babe/build_integration_test.go
Outdated
} | ||
|
||
rt, err := babeService.blockState.GetRuntime(nil) | ||
require.NoError(t, err) | ||
|
||
_, err = babeService.buildBlock(parentHeader, slot, rt) | ||
_, err = babeService.buildBlock(parentHeader, slot, rt, 0, &VrfOutputAndProof{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's 0
here? Perhaps have it as a named constant above this line? 🤔
lib/babe/crypto_test.go
Outdated
@@ -77,7 +77,7 @@ func TestCalculateThreshold(t *testing.T) { | |||
} | |||
|
|||
func Test_checkPrimaryThreshold(t *testing.T) { | |||
keyring, _ := keystore.NewSr25519Keyring() | |||
keyring, _ := keystore.NewSr25519Keyring() //nolint:govet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is govet complaining here 🤔
If it's because of the error, just assert it's nil?
lib/babe/epoch_handler.go
Outdated
if err := h.handleSlot(h.epochNumber, swt.slotNum, h.epochData.authorityIndex, h.slotToProof[swt.slotNum]); err != nil { //nolint:lll | ||
logger.Warnf("failed to handle slot %d: %s", swt.slotNum, err) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit don't use the short err check form with a big line like this.
if err := h.handleSlot(h.epochNumber, swt.slotNum, h.epochData.authorityIndex, h.slotToProof[swt.slotNum]); err != nil { //nolint:lll | |
logger.Warnf("failed to handle slot %d: %s", swt.slotNum, err) | |
continue | |
} | |
err := h.handleSlot(h.epochNumber, swt.slotNum, | |
h.epochData.authorityIndex, h.slotToProof[swt.slotNum]) | |
if err != nil { | |
logger.Warnf("failed to handle slot %d: %s", swt.slotNum, err) | |
continue | |
} |
lib/babe/epoch_handler.go
Outdated
authoringSlots := make([]uint64, len(slotToProof)) | ||
i := 0 | ||
for authoringSlot := range slotToProof { | ||
authoringSlots[i] = authoringSlot | ||
i++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit simpler
authoringSlots := make([]uint64, len(slotToProof)) | |
i := 0 | |
for authoringSlot := range slotToProof { | |
authoringSlots[i] = authoringSlot | |
i++ | |
authoringSlots := make([]uint64, 0, len(slotToProof)) | |
for authoringSlot := range slotToProof { | |
authoringSlots = append(authoringSlots, authoringSlot) |
lib/babe/epoch_handler_test.go
Outdated
timer := time.After(sd * 100) | ||
select { | ||
case <-timer: | ||
require.Equal(t, 100-(firstExecutedSlot-startSlot), callsToHandleSlot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Have 100
as a shared inline constant please
lib/babe/epoch_handler_test.go
Outdated
|
||
errCh := make(chan error) | ||
go eh.run(ctx, errCh) | ||
timer := time.After(sd * 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Ideally use time.NewTimer(sd * 100)
so you can stop the timer in the error receiving select case.
lib/babe/epoch_handler_test.go
Outdated
|
||
ctx, cancel := context.WithCancel(context.Background()) | ||
defer cancel() | ||
eh, err := newEpochHandler(1, startSlot, epochData, constants, testHandleSlotFunc, keypair) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit Please rename eh
to epochHandler
or handler
or any full work 😸
lib/babe/babe.go
Outdated
for { | ||
if err = b.handleEpoch(epoch); err != nil { | ||
if errors.Is(err, errServicePaused) || errors.Is(err, context.Canceled) { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
for { | |
if err = b.handleEpoch(epoch); err != nil { | |
if errors.Is(err, errServicePaused) || errors.Is(err, context.Canceled) { | |
return nil | |
for { | |
err = b.handleEpoch(epoch) | |
if errors.Is(err, errServicePaused) || errors.Is(err, context.Canceled) { | |
return nil | |
} else if err != nil { | |
return err | |
} |
lib/babe/babe.go
Outdated
@@ -263,7 +263,9 @@ func (b *Service) Stop() error { | |||
// Authorities returns the current BABE authorities | |||
func (b *Service) Authorities() []types.Authority { | |||
auths := make([]types.Authority, len(b.epochHandler.epochData.authorities)) | |||
copy(auths, b.epochHandler.epochData.authorities) | |||
for i, auth := range b.epochHandler.epochData.authorities { | |||
auths[i] = *(auth.DeepCopy()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit do you need the parentheses? I think this would work too
auths[i] = *(auth.DeepCopy()) | |
auths[i] = *auth.DeepCopy() |
lib/babe/babe.go
Outdated
defer cancel() | ||
b.epochHandler, err = b.initiateAndGetEpochHandler(epoch) | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return err | |
return fmt.Errorf("cannot initiate and get epoch handler for epoch %d: %w", epoch, err) |
ctx, cancel := context.WithCancel(context.Background()) | ||
defer cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit perhaps move this further down before it's needed at go b.epochHandler.run(ctx, errCh)
lib/babe/babe.go
Outdated
// stop current epoch handler | ||
cancel() | ||
case err := <-errCh: | ||
logger.Errorf("error from epochHandler: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Errorf("error from epochHandler: %w", err) | |
logger.Errorf("error from epochHandler: %s", err) |
case <-epochTimer.C: | ||
// stop current epoch handler | ||
cancel() | ||
case err := <-errCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also call cleanup()
here
case err := <-errCh: | |
case err := <-errCh: | |
cleanup() |
@qdm12 I think I addressed all your comments, feel free to rereview! |
i think this also closes #2217 (don't see the error anymore when running this branch), will try it more to confirm though. |
Changes
epochHandler
module, which will handle the authorship slots for that epochTests
Issues
Primary Reviewer