Skip to content

Conversation

@hudlow
Copy link
Contributor

@hudlow hudlow commented Nov 6, 2025

String and byte literals including escape sequences, triple-quoted multiline variations, and raw variants are currently largely untested in the conformance suite. This PR is intended to fully remediate this gap.

Notably, cel-go fails tests that have an unescaped carriage return in a triple-quoted string — it seemingly canonicalizes carriage returns to line feeds. I don't think there should be any doubt that this is, in fact, a bug in cel-go.

@hudlow
Copy link
Contributor Author

hudlow commented Nov 6, 2025

@TristonianJones A quick review would be appreciated — I'd rather not merge my fixes to cel-es without tests.

@TristonianJones
Copy link
Collaborator

/gcbrun

}

test {
name: "single_quoted_escaped_carriage_return"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the CEL stacks (C++, Java, and Go) normalize carriage returns to from \r -> \n. I don't think this makes a huge difference, but it can simplify certain things. What are your feelings here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, there are many arguments to be made for many normalizations, and I don't think this one is so overwhelmingly compelling to be the one exception. Also, doing the normalization with byte literals seems categorically wrong to me.

I'm surprised the behavior is consistent — I figured it was a fluke in Go but didn't test the others. There might be some angle I'm not considering, but absent that, it would be my preference that we follow the spec and add the conformance tests as they are.

Copy link
Collaborator

@TristonianJones TristonianJones Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a decision that I can't quite pin down unless the same normalization is happening in Google SQL ... I'll have to check and get back to you. It shouldn't actually be a problem to remove though it may result in some change detector test failures

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this came from GoogleSQL:

https://github.com/google/zetasql/blob/c3f174111ccfbd6b5b0b112925ef962a3a5c8d47/zetasql/public/strings.cc#L198

Looks like we preserved this behavior for full literal parity between CEL and SQL

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove support from CEL, Google SQL will probably still work just fine ... I'll try to test tomorrow to see what happens to figure out if we can relax the constraint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick follow-up! I went ahead and added tests for \r\n sequences as well, since I hadn't thought about those being special-cased. Let me know what you figure out and I can adjust as needed.

It certainly isn't the end of the world if we need to codify the line-ending normalization into the CEL spec.

Copy link
Collaborator

@TristonianJones TristonianJones Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the tests for the moment to the current behavior to get you unblocked. My GoogleSQL translation checks seem to not result in any issues, but I'll need to do some broader tests to validate that I can drop the normalization in a timely fashion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #490 and commented out the offending tests so we don't create an explicit gap between the spec and the conformance suite.

Copy link
Collaborator

@TristonianJones TristonianJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these cases overlap with the basic.textproto test cases, but I don't mind that they do. I just had one question regarding carriage return handling.

@hudlow hudlow force-pushed the string-parsing-tests branch from 77d9616 to c425788 Compare November 7, 2025 06:00
@hudlow
Copy link
Contributor Author

hudlow commented Nov 7, 2025

@TristonianJones Thanks for the quick review! I responded and also added one more (unrelated) commit for some tests I forgot to add (unassigned code points in string literals).

@TristonianJones
Copy link
Collaborator

/gcbrun

@TristonianJones
Copy link
Collaborator

/gcbrun

@TristonianJones TristonianJones merged commit 7f3c4c5 into google:master Nov 8, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants