Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions text/0037-null-context-entity-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Null JSON values in context and entity attribute JSON data

## Related issues and PRs

- Reference Issues:
- Implementation PR(s):

## Timeline

- Started: 2023-11-02

## Summary

The JSON representation for request contexts and entity attributes currently rejects any data containing a `null` JSON value.
With the proposal in this RFC, we would accept `null` JSON values appearing as the value of record attributes while continuing to reject `null` everywhere else.

## Basic example

Given a JSON object `{"foo": 1, "bar": null}`, we will ignore the `null` attribute and construct the Cedar record `{foo: 1}` instead of returning an error result.
In the JSON object `{"foo": [null]}`, the `null` is not directly an attribute value, so current behavior will not change (we fail to construct a Cedar record).

## Motivation

We want to support users passing generic JSON data as a request context with minimal preprocessing requirements.
We do this by constructing, for example, Cedar records from JSON objects and Cedar strings from JSON strings.
Notably, however, we will not construct any request context if there is a `null` JSON value appearing _anywhere_ in the JSON data, requiring that users who may have `null` in their context preprocess their data to remove it.

## Detailed design

To build the attributes for a context or entity, we accept a generic [JSON value](https://www.json.org/json-en.html) with a few limitation:

1. If the value is an object, it must not contain the same attribute twice. If it contains the reserved attributes `__expr`, `__entity`, or `__extn`, then it may also be rejected based on specific rules for these escapes.
2. If the value is a number, it must be an integer. A JSON value could be a fractional (e.g, `1.0`) or exponential (e.g., `1e0`) value, but these are not permitted.
3. The `null` value may not occur anywhere.

These limitations ensure that there is an obvious mapping from JSON values to their Cedar equivalents.
Fractional and exponential numbers encode floating point numbers which do not exist in Cedar, so they are not accepted.
In the case of `null`, Cedar does not have any `null` value, so there is no single obvious Cedar value that it should map to in all situations.

In this RFC we observe that when `null` occurs as the value of a record attribute there is a natural interpretation.
We can treat the `null` value as explicitly marking the attribute as _not_ present in the record, so we can ignore the attribute while continuing to error on `null` anywhere else.

If an object contains an attribute more than once it will still be an error even if the value of some of the attributes are `null`.

We use JSON objects with the reserved attributes `__expr`, `__entity`, or `__extn` as escapes for writing Cedar values that don't have JSON equivalents.
These are JSON objects, so this RFC as written up to this point implies ignoring `null` on attributes appearing here.
We will instead continue to treat `null` as an error in this context.
Comment on lines +56 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what the reasoning is either way -- why should we accept or not accept "__entity": null and similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded on this bit in the text. Treating it as {} would implicitly make it a record instead of an entity. We could parse null as an entity (unspecified?), but I think that's beyond the scope of the problem this RFC wants to solve.


### Schema based parsing

The static type system used by Cedar includes optional attributes.
When constructing a context based on a context type from a schema we can ensure that any attributes that are not optional are present with a non-`null` value.
Optional attributes may be present, present with a `null` value, or absent.

For the following context type, `{"foo": null, "bar": 1}` and `{"bar": 1}` are accepted because `foo` is not required, but `{"bar": null}` and `{}` are rejected because `bar` is required.

```json
{
"type": "Record",
"attributes": {
"foo": {"type": "Long", "required": false},
"bar": {"type": "Long", "required": true}
}
}
```

We can further ensure that any attribute present with a `null` value is an optional attributes in context type.
This would mean rejecting `{"bar": 1, "baz": null}` because `baz` is not an optional attribute in the record type.
This extra check is not necessary to meet the preconditions for validation soundness since the Cedar value constructed would be `{bar: 1}` which does match the type, but this may be useful for detecting errors in input data.

## Drawbacks

### Unexpected behavior

Ignoring `null` may not be the desired behavior for all users.
It seems reasonable that a `null` attribute value can be safely ignored, but a user may expect that `context has foo` evaluates to `true` if `foo` is present in their JSON context data.
It will instead evaluate to `false` if `foo` is present but has a `null` value.

In support of this point, observe how JavaScript treats objects containing `null`:
```js
Object.hasOwn({foo: null}, 'foo') // true
```
Comment on lines +87 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a pretty big drawback -- it's making me lean more towards disallowing null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What errors does a user see when they pass in a null value somewhere? Could we address the issue in this RFC by just making clearer error messages to explain why null values aren't allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This is a major draw back.

The current error is data did not match any variant of untagged enum CedarValueJson, so, yes, this need to be improved, but from the complaints we've heard about this, users do understand that null is the problem.


### Inconsistent behavior

Handling `null` when it occurs directly as an attribute while continuing to error elsewhere could lead to surprising errors.
If a user first observes that `null` is ignored as an attribute value, they may later assume that it will be ignored as an element of a set.

## Alternatives

### Ignore `null` everywhere

To avoid inconsistent handling of `null`, we could decided that `null` occurring _anywhere_ should be ignored.
Aside from objects, the only relevant case is arrays.
This entails interpreting `{"foo": [null]}` as `{foo: []}`.

This would lead to misleading behavior around the `any` and `all` operators proposed in [RFC 21](https://github.com/cedar-policy/rfcs/pull/21) (and, to a lesser extent, the existing `containsAll` and `containsAny` operators).
The following policy would permit access even if `context.portNumbers` contains a `null` value, but `null` is clearly not within this range.

```cedar
permit(principal, action, resource)
when {
context.portNumbers.all(it >= 8000 && it <= 8999)
};
```

This behavior is concerning, but, if we accept the misleading behavior around `has` when ignore `null` attributes, I don't think it should be considered unacceptable.

### Add `null` to Cedar

We don't want to add `null` as in Java where any reference may be `null`, but it could be added as the element of a unit type without causing too much trouble.
This could work well for when using Cedar as a dynamic language, but its use would be limited in a statically typed context.

### Error on `null`

This is our current behavior.
As stated above, it limits what data can be used to construct a context, forcing users to remove it in their own preprocessing passes.
Cedar should be easy and ergonomic to use, so we should hesitate to offload work users without good reason, but, if the correct interpretation of `null` is not always the same, it may be the correct choice here.

If we take this alternative, we should keep in mind that many users are likely to find themselves writing the same filter to remove `null` from their data, so we should consider providing this filter ourselves, even if it is not an integral part of the Cedar data format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound that hard, actually. There are lots of libraries for easy JSON parsing/transformation, no? I think that it might be confusing that Cedar drops nulls for attributes, but errors when they are provided in sets. Better to let users drop null if it's acceptable to them, wherever null might appear?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the drawbacks you've listed seem non-trivial ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, based on the drawbacks I'm also no longer fully convinced that dropping null attribute only is the right solution. Dropping null everywhere actually seems better, but the example with all is very misleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's too much value in us providing the filter ourselves. Our filter would presumably just be a wrapper around some existing JSON parsing function. Also, the context/entity data may be coming from other languages (not Rust) and it may make more sense to do the filtering there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the data as JSONValue objects, so we're in a position to transform it easily and correctly. My concern is that someone might have JSON data as a string, so we're requiring them to parse and update it, or try to do some ad-hoc string operations.


## Unresolved questions

* Unexpected attribute with `null` value in schema based parsing.