-
Notifications
You must be signed in to change notification settings - Fork 831
[Strings] Avoid mishandling unicode in interpreter #6405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Our interpreter implementations of `stringview_wtf16.length` and `stringview_wtf16.get_codeunit` are not unicode-aware, so they were previously incorrect in the face of multi-byte code units. As a fix, bail out of the interpretation if there is a non-ascii code point that would make our naive implementation incorrect.
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
cc @rluble |
| ;; CHECK: (func $encode (type $0) (result i32) | ||
| ;; CHECK-NEXT: (i32.const 2) | ||
| ;; CHECK-NEXT: ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kripken, is this optimization safe? Does Precompute do this even if the modified array would escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is safe. If there were effects (like a local.tee that allows the value to escape) then it would not happen.
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm % suggestion
| if (uint32_t(data->values[i].geti32()) > 127) { | ||
| return Flow(NONCONSTANT_FLOW); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it makes sense to add a helper for this? Could be a templated function in src/support/string.h, or maybe a helper in this file as part of the interpreter would be better, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're operating on lists of literals, I've added a helper to the interpreter.
| ;; CHECK: (func $encode (type $0) (result i32) | ||
| ;; CHECK-NEXT: (i32.const 2) | ||
| ;; CHECK-NEXT: ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is safe. If there were effects (like a local.tee that allows the value to escape) then it would not happen.

Our interpreter implementations of
stringview_wtf16.length,stringview_wtf16.get_codeunit, andstring.encode_wtf16_arrayare notunicode-aware, so they were previously incorrect in the face of multi-byte code
units. As a fix, bail out of the interpretation if there is a non-ascii code
point that would make our naive implementation incorrect.