Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented Mar 15, 2024

Our interpreter implementations of stringview_wtf16.length,
stringview_wtf16.get_codeunit, and string.encode_wtf16_array are not
unicode-aware, so they were previously incorrect in the face of multi-byte code
units. As a fix, bail out of the interpretation if there is a non-ascii code
point that would make our naive implementation incorrect.

Our interpreter implementations of `stringview_wtf16.length` and
`stringview_wtf16.get_codeunit` are not unicode-aware, so they were previously
incorrect in the face of multi-byte code units. As a fix, bail out of the
interpretation if there is a non-ascii code point that would make our naive
implementation incorrect.
@tlively
Copy link
Member Author

tlively commented Mar 15, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @tlively and the rest of your teammates on Graphite Graphite

@tlively tlively requested a review from kripken March 15, 2024 22:37
@tlively
Copy link
Member Author

tlively commented Mar 15, 2024

cc @rluble

Comment on lines +100 to +102
;; CHECK: (func $encode (type $0) (result i32)
;; CHECK-NEXT: (i32.const 2)
;; CHECK-NEXT: )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken, is this optimization safe? Does Precompute do this even if the modified array would escape?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is safe. If there were effects (like a local.tee that allows the value to escape) then it would not happen.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % suggestion

if (uint32_t(data->values[i].geti32()) > 127) {
return Flow(NONCONSTANT_FLOW);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes sense to add a helper for this? Could be a templated function in src/support/string.h, or maybe a helper in this file as part of the interpreter would be better, I'm not sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're operating on lists of literals, I've added a helper to the interpreter.

Comment on lines +100 to +102
;; CHECK: (func $encode (type $0) (result i32)
;; CHECK-NEXT: (i32.const 2)
;; CHECK-NEXT: )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is safe. If there were effects (like a local.tee that allows the value to escape) then it would not happen.

@tlively tlively merged commit 63db13b into main Mar 19, 2024
@tlively tlively deleted the no-interpret-unicode branch March 19, 2024 04:17
@gkdn gkdn mentioned this pull request Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants