Partial schema validation #8

john-h-kastner-aws · 2023-06-16T23:19:59Z

mwhicks1

(Got delayed, so pushing a couple of comments but not a complete review.)

text/0008-partial-schema-validation.md

max2me · 2023-07-07T17:37:34Z

I just want to point out that ability to declare additionalAttributes on any Record type (as this RFC proposed) provides a fine-grained control over validation and makes it possible to accomodate scenarios that AWS Verified Access has. There service knows all top level keys of the context but might not have information about all attributes of nested Records.

khieta · 2023-07-10T18:01:45Z

text/0008-partial-schema-validation.md

+
+# Summary
+
+This RFC proposes extending the Cedar validator to work with partial or even complete empty schemas,


complete -> completely

khieta · 2023-07-10T18:07:18Z

text/0008-partial-schema-validation.md

+When a policy mentions an undefined entity type, or attempts to access an undefined attribute of some entity type,
+the validator will not signal an error, unless it recognizes that there is an inevitable type error.
+For example, even if the type of `access_level` is not declared, we know `principal.access_level > "5"` will inevitably error due to the `>` comparison against a string.
+Partial schema validation does not attempt to detect all type errors, and is therefore unsound.


Per offline discussion, it would be interesting to emit warnings in cases where the current proposal silently accepts the policy (e.g., in cases where the policy mentions an undefined entity type, or attempts to access an undefined attribute).

I like this idea because it allows us to prove some kind of soundness property (namely that in the absence of validation warnings or errors, the policy will not produce an error at authorization time). The other benefit of this design (as pointed out in offline discussion) is that it allows for tooling to help users improve their schemas (e.g., offering to add a definition for a new entity type).

khieta · 2023-07-10T18:09:20Z

text/0008-partial-schema-validation.md

+# Motivation
+
+We want to provide some of the benefits of policy validation to Cedar users who have not written or may not be able to write a schema.
+Detecting any amount potential errors during development is preferable to discovering the errors later,


amount -> amount of

khieta · 2023-07-10T18:14:58Z

text/0008-partial-schema-validation.md

+Undeclared entity types and actions are treated as incomplete according to the second two kinds of incompleteness.
+At the extreme, it is possible to have a schema that does not declare any entity types or actions.
+
+This policy will validate with an empty schema that does not declares either of the entity types or the action.


either of the entity types or the action -> any entity types or actions

khieta · 2023-07-10T18:22:22Z

text/0008-partial-schema-validation.md

+The bottom type functions like the [TypeScript `any` type](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#any).
+It is used to ensure that the type is allowed to appear in any expression without causing additional type errors.
+The `>` operator expects a value of type `Long` as its operand.
+It will in fact accept any subtype of `Long`, and the bottom type is a subtype of every type, so it accepted.


accepted -> is accepted

khieta · 2023-07-10T19:10:56Z

text/0008-partial-schema-validation.md

+so an expression `principal.role == "Admin"` would still be accepted by the validator.
+
+Alternatively, we might have complete information about the attributes present for `User` entities.
+When we know that `access_level` does not exist, the `shape` for the principal entity type can include `"additionalAttributes": false` to specify that there are no undeclared attributes for this entity type.


Note that "additionalAttributes": false is the default, so it is unnecessary to say this explicitly.

I see now that it's noted in a later section, but still good to include here to head off confusion.

khieta · 2023-07-10T19:16:15Z

text/0008-partial-schema-validation.md

+We extend the schema format to support the above example schema.
+Specifically, we add the following properties to the JSON:
+
+* Any record type, including the record types for entity attributes and action context attributes, may include `"additionalAttributes": true` to specify that values having that record type may have attribute other than those declared.


may have attribute -> may have attributes

khieta · 2023-07-10T19:20:53Z

text/0008-partial-schema-validation.md

+This change does not necessarily require any change to the behavior of the standard validator,
+but the additions to the schema format could be soundly supported.


Will the implementation make these changes, or is this noting that the changes are possible?

khieta · 2023-07-10T19:25:18Z

text/0008-partial-schema-validation.md

+A sound, type inference based, approach to partial schema validation would report _more_ type error than this proposal,
+so it could not be used as a drop-in replacement.


type error -> type errors

Also: why would an inference-based approach result in more errors?

khieta · 2023-07-10T19:25:53Z

text/0008-partial-schema-validation.md

+
+* Policy validation is not required to use Cedar,
+  so we could continue to support validation only for users who are able to provide a complete schema.
+* We could design and implement a type inference algorithm for Cedar.


It would be useful to have more details about the schema inference option to determine whether it is a viable alternative.

Experimental implementation of partial schema validation as described in cedar-policy/rfcs#8.

luxas · 2025-05-13T06:55:31Z

Hi! 👋 What is the status on this RFC? I see that the partial-validate feature is at least partially implemented. Did the design change as part of implementation? I just noticed the additional_attributes bool on the RecordType, which led me here. One of the things I noticed (which is probably expected/desired today, is that only the JSON syntax has this field, additional_attributes does not round-trip through the Cedarschema encoding.

One use-case I have (which is related, but slightly different) is to be able to say for some struct type that "any property that is not known, should have this type"; e.g. string, just like OpenAPI v3's additionalProperties. This would allow serializing a string-string map with arbitrary values into Cedar as-is, with something like additionalProperties { ty: TypeVariant::String }. You can do this today with a has check for a type that it empty, e.g.

type LabelsMap = {}
type Metadata = {
   labels: LabelsMap,
}
entity Resource = {
  metadata: Metadata,
}

permit(principal, action, resource == Resource) when {
  resource.metadata.labels has foolabel && resource.metadata.labels.foolabel == "bla"
};

The VS Code schema validator (Cedar 4.4.0) allows this, even in the default validation mode, although the type of resource.metadata.labels.foolabel is not known (which could result in a schema typing error not being caught and thus a runtime error). However, cedar validate in strict mode correctly tells this is (as per the schema) an impossible policy. At runtime, using cedar authorize, this policy works well, but indeed requires the has guard.

If additionalProperties was part of the schema, the has guard maybe wouldn't be needed, and it would be possible to typecheck the value. However, I realize this would probably get dangerously close to maps, which are not part of Cedar today (except in tags), and thus might make this harder to analyze.

I'm happy to open a dedicated issue on this topic, as it's only tangientally related to additional_attributes as used in this RFC, but wanted to mention this here first.

john-h-kastner-aws · 2025-05-13T20:22:32Z

Thanks for the interest @luxas

Hi! 👋 What is the status on this RFC? I see that the partial-validate feature is at least partially implemented. Did the design change as part of implementation?

This RFC is currently inactive with no plans to move forward with it. It is implemented as an experimental feature under the partial-validate feature which IIRC does implement the feature as described here. You can experiment with it, and if it solves an important use case we'd be really interested in hearing about it.

I just noticed the additional_attributes bool on the RecordType, which led me here. One of the things I noticed (which is probably expected/desired today, is that only the JSON syntax has this field, additional_attributes does not round-trip through the Cedarschema encoding.

Right, this is expected. When we added the Cedar schema syntax we did not incorporate the partial schema validation features.

If additionalProperties was part of the schema, the has guard maybe wouldn't be needed, and it would be possible to typecheck the value. However, I realize this would probably get dangerously close to maps, which are not part of Cedar today (except in tags), and thus might make this harder to analyze.

Based on my understanding of this use case, it sounds like you would want RFC #66 (open records for simple maps), but after extensive discussion we decided that we could not implement it while maintaining analyzability. Entity tags are the best we can offer here. I feel like tags should are a pretty natural solution for representing a string:string map, but let me know where it fails short.

luxas · 2025-05-15T20:14:24Z

Thanks for the response!

I think that for my case, I'll work around it by either:

a) "bottom-up": dynamically updating the schema of the "open" Cedar record (which in JSON/OpenAPI is an unstructured string-string map) to include the record attributes that become well-known by actually referencing them in a policy (e.g. by using #74), or
b) "top-down": walking objects in the database to discover which labels (record attribute keys) "exist" in the system, and thus populating the schema, or
c) (what I have now) model the map as a Set<{key: String, value: String}> which works well, but is not as expressive and intuitive to use.

So I don't think I need this feature per se, if I do any of the above, as I anyways need the soundness guarantees that the normal schema validator provides.

john-h-kastner-aws · 2025-05-15T20:51:18Z

Workarounds seem reasonable

a) Sounds tricky since you'll need to infer the type of the attributes based on how they're used. Doing it soundly in the general case will be difficult.

b) Probably a good approach if it's workable. Depends a lot on what the database looks like.

c) A good workaround if the limited expressiveness isn't an issue, but I'm curious why you have this instead of using entity tags. Entity tags were largely motivated by supporting this pattern while allowing direct access to the values by hasTag and getTag operations.

luxas · 2025-05-16T07:33:13Z

a) Sounds tricky since you'll need to infer the type of the attributes based on how they're used. Doing it soundly in the general case will be difficult.

In this case, I know that key is always a string (as it's modelled from a JSON object), and I know the Cedar type of value too, thanks to the OpenAPI schema (and in 90%+ of cases just a string). So what I infer from policies is just the attribute names/keys, basically, which I think should work.

A good workaround if the limited expressiveness isn't an issue, but I'm curious why you have this instead of using entity tags. Entity tags were largely motivated by supporting this pattern while allowing direct access to the values by hasTag and getTag operations.

The reason was (until I read your message and realized how to do it, thanks!) that I didn't realize that I don't have to put tags on the "parent" object, but can declare an intermediate type. So the simplified practical example:

namespace before {
    entity Pod = {
        metadata: ObjectMeta,
    };
    type ObjectMeta = {
        labels: Set<StringKeyValue>,
        annotations: Set<StringKeyValue>,
    };
    type StringKeyValue = {
        key: String,
        value: String
    };
    action update appliesTo {
        principal: User,
        resource: Pod
    };
    entity User;
}

This is the existing case for https://github.com/awslabs/cedar-access-control-for-k8s/

When I was thinking about tags, I somehow thought about putting the tags on ObjectMeta (the "parent"), which didn't really fit nicely, especially as there were two of these open records, labels and annotations.

With this discussion, I realized that nothing stops me from declaring an intermediate entity, not sure why I didn't realize this before, when reading the tags RFC.

namespace after {
    entity Pod = {
        metadata: ObjectMeta,
    };
    type ObjectMeta = {
        labels: StringMap,
        annotations: StringMap
    };
    entity StringMap = {
        
    } tags String;
    action update appliesTo {
        principal: User,
        resource: Pod
    };
    entity User;
}

Now I can write:

permit(
    principal,
    action,
    resource is after::Pod
) when {
    resource.metadata.labels.hasTag("owner") && resource.metadata.labels.getTag("owner") == "lucas" 
};

And to support the case of figuring out all of the available labels (when not caring about the values), to make sure there aren't other tags/labels I don't want to be there, but also don't know the name of, I can add:

entity StringMap = {
    keys: Set<String>
} tags String;

such that I can make sure the object doesn't contain other, unauthorized labels, only the two allowlisted ones here (owner, other-ok-label):

permit(
    principal,
    action,
    resource is after::Pod
) when {
    resource.metadata.labels.hasTag("owner") &&
    resource.metadata.labels.getTag("owner") == "lucas" &&
    ["owner", "other-ok-label"].containsAll(resource.metadata.labels.keys) 
};

For posterity, this is how it would have looked like if we had open records, not very different:

permit(
    principal,
    action,
    resource is after::Pod
) when {
    resource.metadata.labels has "owner" &&
    resource.metadata.labels.owner == "lucas" &&
    ["owner", "other-ok-label"].containsAll(resource.metadata.labelKeys) 
};

Thanks for helping with the thinking!
It might be good to document this somewhere in the docs as "so, you think you need an open record" 😄

john-h-kastner-aws added 3 commits June 16, 2023 23:16

Partial schema validation RFC

6ab6ae4

Proof reading edits

1bd1404

Another typo

05fcd87

john-h-kastner-aws mentioned this pull request Jun 21, 2023

Generalizing template placeholders #3

Closed

khieta mentioned this pull request Jun 23, 2023

Policy validation with partial schema cedar-policy/cedar#99

Closed

2 tasks

mwhicks1 reviewed Jun 23, 2023

View reviewed changes

text/0008-partial-schema-validation.md Outdated Show resolved Hide resolved

text/0008-partial-schema-validation.md Outdated Show resolved Hide resolved

john-h-kastner-aws changed the title ~~Partial schema validation RFC~~ Partial schema validation Jun 26, 2023

edits

c11e58c

cdisselkoen added the pending This RFC is pending; for definitions see the README label Jul 3, 2023

khieta reviewed Jul 10, 2023

View reviewed changes

khieta mentioned this pull request Jul 25, 2023

Support returning ValidationWarnings from the validate function cedar-policy/cedar#225

Closed

2 tasks

john-h-kastner-aws mentioned this pull request Nov 20, 2023

Partial schema validation cedar-policy/cedar#79

Merged

john-h-kastner-aws added a commit to cedar-policy/cedar that referenced this pull request Nov 27, 2023

Partial schema validation (#79)

a9d2ed5

Experimental implementation of partial schema validation as described in cedar-policy/rfcs#8.

john-h-kastner-aws added pre-acceptance-experimental This RFC is pre-acceptance experimental; for definitions see the README and removed pending This RFC is pending; for definitions see the README labels Nov 27, 2023

cdisselkoen mentioned this pull request Jul 8, 2024

Object-Entity Mapping (OEM) #72

Open


		# Summary

		This RFC proposes extending the Cedar validator to work with partial or even complete empty schemas,

		This change does not necessarily require any change to the behavior of the standard validator,
		but the additions to the schema format could be soundly supported.

		A sound, type inference based, approach to partial schema validation would report _more_ type error than this proposal,
		so it could not be used as a drop-in replacement.

Partial schema validation #8

Are you sure you want to change the base?

Partial schema validation #8

Uh oh!

Conversation

john-h-kastner-aws commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwhicks1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

max2me commented Jul 7, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luxas commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

john-h-kastner-aws commented May 13, 2025

Uh oh!

luxas commented May 15, 2025

Uh oh!

john-h-kastner-aws commented May 15, 2025

Uh oh!

luxas commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

john-h-kastner-aws commented Jun 16, 2023 •

edited

Loading

luxas commented May 13, 2025 •

edited

Loading