Skip to content

Conversation

tomlikestorock
Copy link
Contributor

@tomlikestorock tomlikestorock commented Oct 29, 2024

Description of changes

This provides a 3x speedup based on local benchmarking with a large schema that has big action entity hierarchy relationships. Today Entities.entities is a HashMap of <EntityUID, Entity>. When from_entities brings in the entities and actions from a given schema, it currently relies on Arc::unwrap_or_clone, which frequently ends up being clone. For a large set of very interrelated Actions, this repeated clone becomes expensive.

Changing Entities.entities to be a HashMap of <EntityUID, Arc<Entity>> allows us to keep the clone cheap, and Arc'ing the incoming Entities becomes cheap as well.

Changing the interface of the HashMap means altering what create_entities_map returns, adding an implementation of TCNode for Arc<Entity>, and udpating how IntoIterator works to return values from the Arc's.

Issue #, if available

This contributes to 1285.

Checklist for requesting a review

The change in this PR is (choose one, and delete the other options):

  • A change "invisible" to users (e.g., documentation, changes to "internal" crates like cedar-policy-core, cedar-validator, etc.)

I confirm that this PR (choose one, and delete the other options):

  • Does not update the CHANGELOG because my change does not significantly impact released code.

I confirm that cedar-spec (choose one, and delete the other options):

  • Does not require updates because my change does not impact the Cedar formal model or DRT infrastructure.

I confirm that docs.cedarpolicy.com (choose one, and delete the other options):

  • Does not require updates because my change does not impact the Cedar language specification.

Comment on lines 325 to 326
fn create_entity_map(es: impl Iterator<Item = Entity>) -> Result<HashMap<EntityUID, Entity>> {
fn create_entity_map(es: impl Iterator<Item = Entity>) -> Result<HashMap<EntityUID, Arc<Entity>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also have the input es here be an iterator of Arc<Entity>, and force callers to map Arc::new if needed?

Copy link
Contributor

@john-h-kastner-aws john-h-kastner-aws Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same concern also applies to add_entities, right? I'd be in favor of making our internal APIs more explicit by taking Arc on these functions. Public APIs shouldn't change ofc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't sure how interested you'd be in propagating this Arc expectation (at least internally). I can make change that reflect this for sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to this request in 1da976e, let me know what you think.

Comment on lines 345 to 348
self.entities
.into_values()
.map(Arc::unwrap_or_clone)
.collect::<Vec<Entity>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to .collect() here seems like a potential performance problem as well ... is it possible to avoid it with a different choice of type IntoIter above, so that the implementation here can just be self.entities.into_values().map(Arc::unwrap_or_clone)? or alternately, should we change the type Item to be Arc<Entity>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, I'll check that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 11e70bd. I changed the IntoIter type to reflect the .map call as the return type. This gave another speed boost as is.

Copy link
Contributor

@john-h-kastner-aws john-h-kastner-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Craig's comments are probably worth addressing.

Comment on lines 325 to 326
fn create_entity_map(es: impl Iterator<Item = Entity>) -> Result<HashMap<EntityUID, Entity>> {
fn create_entity_map(es: impl Iterator<Item = Entity>) -> Result<HashMap<EntityUID, Arc<Entity>>> {
Copy link
Contributor

@john-h-kastner-aws john-h-kastner-aws Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same concern also applies to add_entities, right? I'd be in favor of making our internal APIs more explicit by taking Arc on these functions. Public APIs shouldn't change ofc.

schema
.action_entities()
.into_iter()
.map(|e| (e.uid().clone(), Arc::unwrap_or_clone(e))),
Copy link
Contributor

@john-h-kastner-aws john-h-kastner-aws Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a second to puzzle out why this actually gives a speed up, so I'll leave my reasoning here. Maybe worth elaborating in a comment.

We remove this unwrap_or_clone, but add a make_mut (which I assume must clone if there's another reference to the data) call inside TC computation, so at first glance it feels like that should be a wash. But, the TC was already computed for the entity map (which we just constructed, so one ref to each Arc), and we separately computed TC for the actions inside the schema constructor (while it had unique ownership). Put together these should mean we never trigger a clone on the make_mut during TC computation.

…rc the incoming entities, update function parameters to receive Arc entities, implement TCNode for Arc<Entity>

Signed-off-by: Thomas Hill <[email protected]>
… receive Arc<Entities> instead, update all tests to take this into account, update call sites of these functions to map to Arc<Entity> as well

Signed-off-by: Thomas Hill <[email protected]>
…llecting by updating the IntoIter type, and remove .collect().

Signed-off-by: Thomas Hill <[email protected]>
@cdisselkoen
Copy link
Contributor

Thanks for this work!

Copy link
Contributor

@john-h-kastner-aws john-h-kastner-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks the build error is coming from the protobuf feature

…ped entities into account.

Signed-off-by: Thomas Hill <[email protected]>
@tomlikestorock
Copy link
Contributor Author

Protobuf tests updated to use Arc wrapped entities in 857509b.

@john-h-kastner-aws john-h-kastner-aws merged commit a81e0b9 into cedar-policy:main Nov 7, 2024
19 checks passed
cdisselkoen pushed a commit that referenced this pull request Nov 8, 2024
cdisselkoen pushed a commit that referenced this pull request Nov 8, 2024
…ntities (#1296)

Signed-off-by: Thomas Hill <[email protected]>
Signed-off-by: Craig Disselkoen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants