Skip to content

Conversation

prasadborole1
Copy link
Contributor

  • Implemented size based backoff that reduces the num of csrs by half if encountered failure and doubles on success.
  • Currently synchronization backoff task has maxInterval of 2 mins (for 5 sec of sync interval). Adding Option to set maxInterval that will override default.

Size based backoff will recover in 8 steps (40 secs for 5 sec of sync interval) for current limit of 500.

Failure num of CSRs
1 250
2 125
3 62
4 31
5 15
6 7
7 3
8 1

fixes #4264

Copy link
Collaborator

@MarcosDY MarcosDY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments,
I think you will need to update documentation, at least the one related with sync interval,
because if some user set it to a big number this backoff can result confusing

@@ -136,8 +136,10 @@ func (m *manager) updateSVIDs(ctx context.Context, log logrus.FieldLogger, c SVI

update, err := m.fetchSVIDs(ctx, csrs)
if err != nil {
m.csrSizeLimitedBackoff.Failure()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may we add a defer, and validate if no error returned, and based on that play with backoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it closer to server call

@prasadborole1
Copy link
Contributor Author

some minor comments, I think you will need to update documentation, at least the one related with sync interval, because if some user set it to a big number this backoff can result confusing

@MarcosDY Can you please point to docs which needs update or where to add new backoff details?

amoore877
amoore877 previously approved these changes Jun 29, 2023
Copy link
Member

@amoore877 amoore877 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are addressed. Looks like just some a couple open things from @MarcosDY needing changes still.

Thanks for this!

@MarcosDY
Copy link
Collaborator

looks like agent sync interval was never documented https://github.com/spiffe/spire/blob/main/doc/spire_agent.md#agent-configuration-file
what do you think about adding that inside experimental table? with the default value?
I'm not sure how much information add about this new backoff strategy but a single explanation about it will works with a backoff can be very useful for people that is debugging it

Signed-off-by: Prasad Borole <[email protected]>
MarcosDY
MarcosDY previously approved these changes Jul 3, 2023
Copy link
Collaborator

@rturner3 rturner3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, just a couple small comments

Signed-off-by: Prasad Borole <[email protected]>
@rturner3 rturner3 added this to the 1.7.1 milestone Jul 3, 2023
@rturner3 rturner3 merged commit 66c518c into spiffe:main Jul 3, 2023
Neniel pushed a commit to Neniel/spire that referenced this pull request Jul 10, 2023
…4279)

* [SPIRE Agent] Add size based backoff strategy for fetchSVIDs

Signed-off-by: Prasad Borole <[email protected]>
Signed-off-by: Neniel <[email protected]>
Neniel pushed a commit to Neniel/spire that referenced this pull request Jul 21, 2023
…4279)

* [SPIRE Agent] Add size based backoff strategy for fetchSVIDs

Signed-off-by: Prasad Borole <[email protected]>
Signed-off-by: Neniel <[email protected]>
Neniel pushed a commit to Neniel/spire that referenced this pull request Aug 24, 2023
…4279)

* [SPIRE Agent] Add size based backoff strategy for fetchSVIDs

Signed-off-by: Prasad Borole <[email protected]>
Signed-off-by: Neniel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SPIRE Agent svid signing requests should support backoff retries
4 participants