Skip to content

Improve performance for fetching authorized entries #6034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 30, 2025

Conversation

sorindumitru
Copy link
Collaborator

@sorindumitru sorindumitru commented Apr 28, 2025

This does a bunch of speedups to authorized entries lookup (used for syncing agent entries but also for signing SVIDs):

  • Make the caches return a new type, ReadOnlyEntry. This delays the clone of the entry from the cache until we actually need to clone it. Useful for SyncAuthorizedEntries where we retrieve all entries that the agent is authorized for but would only need to clone a few of them that we actually send to the agent. Also useful for signing SVIDs which can clone just the fields of the entry that are used.
  • Switch from proto.Clone to our own Clone method. This allows us to combine cloning with output mask application, which avoid cloning some fields. This is also 1.5-2x faster than proto.Clone
  • The entry cache can use just the path component of the entry when storing entries in the cache. This avoid some unnecessary work, which was seen in spire-server cpu profile data.

Some of the benchmarks we have in the tests show good improvements. Seen improvements also while running this, although somewhat smaller than the benchmark since some of the work still needs to be done in other places. Before:

BenchmarkBuildInMemory-16                             55          21637647 ns/op        14817260 B/op     100311 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16            5833            205069 ns/op          291966 B/op       3005 allocs/op
BenchmarkEntryLookup-16                               46          25513459 ns/op         5123503 B/op     136200 allocs/op

and after:

BenchmarkBuildInMemory-16                             54          19256267 ns/op         9915763 B/op     100310 allocs/op
BenchmarkGetAuthorizedEntriesInMemory-16           64569             17356 ns/op            9338 B/op         10 allocs/op
BenchmarkEntryLookup-16                              100          11445476 ns/op          156498 B/op       1536 allocs/op

For the events based cache. Before:

BenchmarkGetAuthorizedEntriesInMemory-16            9115            131016 ns/op          205207 B/op       1513 allocs/op
BenchmarkEntryLookup-16                               57          18919070 ns/op          209983 B/op       2461 allocs/op

and after:

BenchmarkGetAuthorizedEntriesInMemory-16           23606             50479 ns/op           68260 B/op         13 allocs/op
BenchmarkEntryLookup-16                               62          17778846 ns/op           91850 B/op       1180 allocs/op

Copy link
Member

@amartinezfayo amartinezfayo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sorindumitru for this, it looks great!

Comment on lines +42 to +60
func (e *ReadOnlyEntry) GetX509SvidTtl() int32 {
return e.entry.X509SvidTtl
}

func (e *ReadOnlyEntry) GetJwtSvidTtl() int32 {
return e.entry.JwtSvidTtl
}

func (e *ReadOnlyEntry) GetDnsNames() []string {
return slices.Clone(e.entry.DnsNames)
}

func (e *ReadOnlyEntry) GetRevisionNumber() int64 {
return e.entry.RevisionNumber
}

func (e *ReadOnlyEntry) GetCreatedAt() int64 {
return e.entry.CreatedAt
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like we there is test coverage for this, could you add that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test to verify that the getters return the expected value

@sorindumitru
Copy link
Collaborator Author

One thing I realized was that it's somewhat possible to switch the trust domain of a deployment. I've added some changes to filter the entries with the wrong trust domain from the cache and a test for that too.


// Manually clone the entry instead of using the protobuf helpers
// since those are two times slower.
func (e *ReadOnlyEntry) Clone(mask *types.EntryMask) *types.Entry {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about the potential for field skew here. If the types.Entry struct is updated in the future (e.g., new fields are added), the manual Clone implementation will also need to be updated. Otherwise, newly added fields won't be cloned, which could lead to subtle, hard-to-detect bugs.

It might be helpful to add a comment warning about this, or ideally, include a test (perhaps using reflection) that ensures all fields are properly cloned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test for this, which indeed uses reflection to make sure all fields are cloned:

func TestReadOnlyEntryClone(t *testing.T) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that, thanks!

This also improves performance slightly since we no longer clone some slices.

Signed-off-by: Sorin Dumitru <[email protected]>
Will be used in cases where an entry returned from a cache should not be modified. Access to individual fields can copy/clone fields as needed.

Signed-off-by: Sorin Dumitru <[email protected]>
This slightly speeds up the SVID signing operation since we no longer have to fully clone the entries the agent has requested SVIDs for.

Signed-off-by: Sorin Dumitru <[email protected]>
spiffeIDFromProto() takes about 20% of the time to lookup authorized entries, so it seems worthwhile removing it.

The path component is sufficient to identify the entries since all entries are going to have the same trust domain.

Signed-off-by: Sorin Dumitru <[email protected]>
@amartinezfayo amartinezfayo merged commit 4c28ec2 into spiffe:main May 30, 2025
188 of 191 checks passed
@amartinezfayo amartinezfayo added this to the 1.12.3 milestone May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants