[Parser] Parse try/catch/catch_all/delegate #6128

tlively · 2023-11-18T03:27:11Z

Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded
and unfolded forms.

The first source of significant complexity is the optional IDs after catch
and catch_all in the unfolded form, which can be confused for tag indices and
require backtracking to parse correctly.

The second source of complexity is the handling of delegate labels, which are
relative to the try's parent scope despite being parsed after the try's scope
has already started. Handling this correctly requires punching a hole big
enough to drive a truck through through both the parser and IRBuilder
abstractions.

tlively · 2023-11-18T03:27:23Z

Current dependencies on/for this PR:

main
- PR Fix a bug with unreachable control flow in IRBuilder #6130
  - PR [Parser] Parse tags and throw #6126
    - PR [Parser] Parse try/catch/catch_all/delegate #6128 👈

This stack of pull requests is managed by Graphite.

kripken · 2023-11-20T16:09:38Z

src/parser/parsers.h

+
+    // It can be ambiguous whether the name after `catch` is intended to be the
+    // optional ID or the tag identifier. First try parsing with the optional
+    // ID, then retry without parsing the ID if we run into an error.


Hmm, was there no way to avoid this ambiguity in the wasm text format?

No one uses these optional IDs afaik, so one option would have been to not have them. But then it would have been inconsistent with the text format for other control flow structures 😱

What is this optional ID thing? I didn't know this kind of thing existed until now... 🤯 Do we ever use this? Do we have tests that have this? (I can't seem to find any in this PR. Please let me know if I missed them)

This optional ID only appears in the unfolded versions of control flow instructions. For block, for example, it lets you repeat the block label at the end (like block $foo ... end $foo), which can help readability when blocks are very long and you're trying to figure out where branches to $foo will end up. Since these IDs are only allowed in the unfolded format, we only use them in wat-kitchen-sink.wast so far. In this PR they are tested in $try-br-name.

kripken · 2023-11-20T16:12:05Z

src/wasm/wasm-ir-builder.cpp

+      // keep this complexity contained within the handling of trys and
+      // delegates, pretend there is just the single normal label and add a
+      // prefix to it to generate the delegate label. The prefix is added on the
+      // try when its scope ends.


Do we have this complexity in the old parsers?

No, because we don't allow branches to anything except for blocks and loops. The old parser also doesn't support delegating to any scope except to a parent try directly.

I see. Is it not simpler to add a new block as needed, as the old parsers do? (and rely on the optimizer to fold unneeded ones etc.)

But I don't feel strongly here if this feels like the right solution to you. lgtm overall, though I see test failures.

We do still add a new block as needed for normal control flow branches that target this try. But adding a new block won't help for delegates that target the try because in Binaryen IR, delegate must target the Try directly and cannot target a block.

So what's the difference between the old way and the new way? In the old parser/IR, try's labels can only be targeted from delegates and not branches. Also branches cannot jump to trys; we created an internal block to mimic jumping to a try. Are these not true anymore in the new parser/IR? I haven't reviewed the other PRs for the new parser/IRBuilder so I'd appreciate the context.

We're not changing the rules for the IR, so it's still the case that delegate can only target named Try directly in the IR and that normal branches cannot target Try. However, in the input text we are parsing with the new parser, we are following the standard and allowing any branch to target a try and allowing delegates to target any kind of control flow. Most of the complexity here is because we are converting from input following the standard rules to IR following our more restrictive IR rules on the fly.

tlively · 2023-11-21T01:56:32Z

The test failures turned out to be a pre-existing bug that I fixed independently in #6130.

Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded and unfolded forms. The first sources of significant complexity is the optional IDs after `catch` and `catch_all` in the unfolded form, which can be confused for tag indices and require backtracking to parse correctly. The second source of complexity is the handling of delegate labels, which are relative to the try's parent scope despite being parsed after the try's scope has already started. Handling this correctly requires punching a whole big enough to drive a truck through through both the parser and IRBuilder abstractions.

aheejin

Sorry for the delayed reply!

aheejin · 2023-11-22T21:59:03Z

src/parser/parsers.h

+
+    // It can be ambiguous whether the name after `catch` is intended to be the
+    // optional ID or the tag identifier. First try parsing with the optional
+    // ID, then retry without parsing the ID if we run into an error.


What is this optional ID thing? I didn't know this kind of thing existed until now... 🤯 Do we ever use this? Do we have tests that have this? (I can't seem to find any in this PR. Please let me know if I missed them)

aheejin · 2023-11-22T22:00:58Z

src/parser/parsers.h

+      if (parseID && tag.getErr()) {
+        // Instead of returning an error, retry without the ID.
+        parseID = false;
+        ctx.in.lexer.setIndex(afterCatchPos);
+        continue;
+      }


(While I'm not sure what the ID is in the first place...) Is this trying to parse an ID or a tag? I thought the upper if (line 967-975) parses ID (if any) and this parses tag. But this seems to set parseID to false if tag parsing went unsuccessful. Why?

And does this optional ID only exist in the non-folded format?

You're right that lines 967-975 parse the ID. This logic here tries to parse a tag (on line 977) and if that fails, it resets the parser to try again without trying to parse an ID.

If a tag parsing fails, why do we set parseID false instead?
And what does it retry to do? Given that we set parseID false here, it doesn't retry to parse an ID. Then does it retry to parse a tag? But tag parsing just failed now.

The ambiguous case this is handling looks like this:

(tag $t) (func $ambiguous try $t catch $t end )

When parsing the catch, the parser first tries to parse an optional ID that must match the label of the try, and it succeeds because it sees $t after the catch. However, when it then tries to parse the mandatory tag index, it fails because the next token is end. The problem is that the $t after the catch was the tag name and there was no optional ID after all. The parser sets parseID = false and resets to just after the catch, and now it skips parsing the optional ID so it correctly parses the $t as a tag name.

Ah I see. Can you add this explanation in the comments? I think it will help reading.

aheejin · 2023-11-22T22:10:42Z

src/wasm-ir-builder.h

+    static ScopeCtx makeTry(Try* tryy, Name originalLabel = {}) {
+      return ScopeCtx(TryScope{tryy, originalLabel});
+    }
+    static ScopeCtx makeCatch(Try* tryy, Name originalLabel, Name label) {


What are label and originalLabel respectively and how are they different?

originalLabel is the name as it appears in the text input. label is a version of originalLabel that might have been modified to avoid having duplicate labels, which are allowed in the standard text format but not in Binaryen IR.

Why do we need to keep originalLabel internally then? Is that because you try to roundtrip the labels as given? (If so, given that the code itself does not roundtrip anyway after being stored as a Binaryen IR, is it necessary?)

And why do if and try need only a originalLabel while else and catch needs both?

We need to keep the scope's originalLabel around because it can be used by branch instructions inside the scope and we need to manage the mapping from originalLabel to internal label, including removing the mapping when the scope ends.

Thanks. Why do if and try need only a originalLabel while else and catch needs both?

Since the labels are only created lazily once we encounter a branch or delegate to a scope, it's not possible for there to be a nontrivial label at the beginning of an if or try scope. But at the beginning of an else or catch scope, there might have been a branch in the corresponding if or try, so we might have already generated a label that we need to propagate along to make sure branches in the else or catch go to the right place.

aheejin · 2023-11-22T22:24:19Z

src/wasm/wasm-ir-builder.cpp

+    } else if (auto* tryy = scope.getTry()) {
+      std::cerr << "try";
+      if (tryy->name) {
+        std::cerr << " " << tryy->name;
+      }
+    } else if (auto* tryy = scope.getCatch()) {
+      std::cerr << "catch";
+      if (tryy->name) {
+        std::cerr << " " << tryy->name;
+      }
+    } else if (auto* tryy = scope.getCatchAll()) {
+      std::cerr << "catch_all";
+      if (tryy->name) {
+        std::cerr << " " << tryy->name;
+      }


Why do we print name only with trys and not with blocks and loops?

block and loop names will be printed down below when we print the OriginalLabel. In contrast, Try names aren't considered OriginalLabels because they are treated specially and used only for delegates, so if we want to print Try names we have to go out of our way to print them here.

You said originalLabel is the name as it appears in the text input. above. Then a try doesn't have an originalLabel even if the input has one? I guess I'm confused because I don't understand the purpose of originalLabel.

If I have this text input:

try $myTry try delegate $myTry catch $tag end

Then the originalLabel for that try is myTry and the name field in the Try IR node will be empty until we reach the delegate, at which point we will set it to myTry as well. Then when we finish the catch scope we will set the name to __delegate__myTry to differentiate it from the wrapper block $myTry that we use the originalName for.

aheejin · 2023-11-22T22:29:20Z

src/wasm/wasm-ir-builder.cpp

+  tryy->name = Name();
+  pushScope(ScopeCtx::makeTry(tryy, label));


Are try->name and the given label parameter different things?

Yes; the label is the name used to branch to the try with normal branches like br and br_if. It will end up being the name of a block we wrap around the try. tryy->name will ultimately end up being another name used to delegate to this try. Unfortunately we can't reuse label for that because the wrapper block and the try must have different names to validate.

aheejin · 2023-11-22T22:41:42Z

src/wasm/wasm-ir-builder.cpp

+  bool wasTry = true;
+  auto* tryy = scope.getTry();
+  if (!tryy) {
+    wasTry = false;
+    tryy = scope.getCatch();
+  }


Isn't it guaranteed that we are in a catch by the time we enter visitCatch? Do we need to check for this? Maybe I am not familiar with how to use these visit methods in the new IRBuilder.

In the text parser you're right that we are guaranteed to be in the catch by the time we enter visitCatch. However, when we start using IRBuilder in the binary parser, an incorrect binary could have the opcode for catch anywhere and we might end up calling visitCatch when we're not in a try context. To keep the future binary parser as simple as possible and to help catch any other bugs we might have using IRBuilder, I've built IRBuilder to not assume it's being used correctly and to report meaningful errors about incorrect usage instead.

Not sure if I understood it correctly, so let me know if I'm mistaken:

Line 626-629 means when we parse a non-first catch in a sequence of try-catch-catch-..., which is not an error. When parsing a non-first valid catch in a try-catch-catch-..., getTry() returns null but getCatch() returns a scope.

Line 630-632 means when a catch appears after neither of a try or catch.

Are these correct?

Yes, that's exactly right. getTry() and getCatch() both return the Try* associated with the scope or null.

aheejin · 2023-11-22T23:25:08Z

src/wasm/wasm-ir-builder.cpp

+      // keep this complexity contained within the handling of trys and
+      // delegates, pretend there is just the single normal label and add a
+      // prefix to it to generate the delegate label. The prefix is added on the
+      // try when its scope ends.


So what's the difference between the old way and the new way? In the old parser/IR, try's labels can only be targeted from delegates and not branches. Also branches cannot jump to trys; we created an internal block to mimic jumping to a try. Are these not true anymore in the new parser/IR? I haven't reviewed the other PRs for the new parser/IRBuilder so I'd appreciate the context.

aheejin · 2023-11-29T01:33:54Z

src/wasm/wasm-ir-builder.cpp

+      delegateTry->name = *getLabelName(label);
+      tryy->delegateTarget =
+        Name(std::string("__delegate__") + delegateTry->name.toString());


Is it more complicated if we prefix delegateTry->name with __delegate__ here earlier and not when we wrap up the try?

Good call, no, it's much simpler to do the prefixing here. I just pushed an update that makes this change.

aheejin · 2023-11-29T01:47:37Z

src/wasm/wasm-ir-builder.cpp

 }

-Result<Index> IRBuilder::getLabelIndex(Name label) {
+Result<Index> IRBuilder::getLabelIndex(Name label, bool inDelegate) {


Can we possibly call visitDelegate after calling visitEnd for that try and avoid this complexity? Setting delegateTarget afterwards does not affect the result of Try::finalize.

I think the problem there is that visitDelegate wouldn't know which Try to add a delegateTarget to because visitEnd will have popped the Try off the scope stack. I suppose we could try to go find the Try at the top of the expressionStack of the new top scope, but that also seems like a hack. I prefer this current hack because the weirdness is at least captured in function signatures. If we tried to find the Try on the scope stack, the leaky abstraction wouldn't be visible or documented in any interfaces.

aheejin · 2023-11-29T01:51:27Z

src/wasm/wasm-ir-builder.cpp

-Result<Index> IRBuilder::getLabelIndex(Name label) {
+Result<Index> IRBuilder::getLabelIndex(Name label, bool inDelegate) {
  auto it = labelDepths.find(label);
  if (it == labelDepths.end() || it->second.empty()) {


I checked the definition of labelDepths, and not sure why the value needs to be not a single index but a stack of indices. I thought that is mostly for translating a name to an index?

It's to handle shadowing of label names correctly. If we have this:

block $L block $LL block $L block $LL br $LL end br $LL end end

We still need to know where the second br $LL should go, so we can't have completely overwritten the outer $LL when we reached the inner $LL.

Wow who allowed label shadowing... Thanks for the explanation.

aheejin · 2023-11-29T01:54:45Z

test/lit/wat-kitchen-sink.wast

+ ;; CHECK-NEXT:  (block $l
+ ;; CHECK-NEXT:   (try $__delegate__l
+ ;; CHECK-NEXT:    (do
+ ;; CHECK-NEXT:     (block $l_0
+ ;; CHECK-NEXT:      (block $l_1


When do we create wrapping blocks? I thought we do when there are branches targeting the label. This case already has an inner block though... Why do we have two more blocks in the result?

Because of how we handle delegate labels as prefixed versions of "normal" scope labels, the delegate $l here also sets the "normal" label for the try $l scope, so we end up emitting a wrapper block around the try as if there were a normal branch to $l as well.

We could add some logic to try to avoid this unnecessary wrapper block, but it didn't seem worth adding complexity to remove it.

In general the logic for adding wrapper blocks around a scope does not consider whether there are already blocks inside that scope.

aheejin · 2023-11-29T01:56:11Z

src/wasm/wasm-ir-builder.cpp

+      // The real label we're referencing, if it exists, has been shadowed by
+      // the `try`. Get the previous label with this name instead.


Can you add a comment with a very short code snippet or something to show what this case is?

aheejin · 2023-11-29T19:46:51Z

src/wasm/wasm-ir-builder.cpp

-Result<Index> IRBuilder::getLabelIndex(Name label) {
+Result<Index> IRBuilder::getLabelIndex(Name label, bool inDelegate) {
  auto it = labelDepths.find(label);
  if (it == labelDepths.end() || it->second.empty()) {


Wow who allowed label shadowing... Thanks for the explanation.

aheejin · 2023-11-29T21:32:15Z

src/wasm/wasm-ir-builder.cpp

+  return Ok{};
+}
+
+Result<> IRBuilder::visitDelegate(Index label) {


Does this correctly handle the case when delegate is within a catch? If so, how about adding a test case for that?

try $l ... try $l ... catch try delegate $l end end

delegate here should go to the outer try, not the inner one.

Yes! I've just added test cases for that and the similar case where the delegate is in a catch_all now.

aheejin

Thanks for the explanations and sorry for the review delays!

tlively · 2023-11-29T23:43:59Z

Thank you for the thorough review!

Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded and unfolded forms. The first sources of significant complexity is the optional IDs after `catch` and `catch_all` in the unfolded form, which can be confused for tag indices and require backtracking to parse correctly. The second source of complexity is the handling of delegate labels, which are relative to the try's parent scope despite being parsed after the try's scope has already started. Handling this correctly requires punching a whole big enough to drive a truck through through both the parser and IRBuilder abstractions.

tlively mentioned this pull request Nov 18, 2023

[Parser] Parse tags and throw #6126

Merged

tlively requested review from aheejin, ashleynh and kripken November 18, 2023 03:33

kripken reviewed Nov 20, 2023

View reviewed changes

tlively force-pushed the parser-tag-throw branch from 687b078 to be8295c Compare November 21, 2023 01:53

tlively mentioned this pull request Nov 21, 2023

Fix a bug with unreachable control flow in IRBuilder #6130

Merged

tlively force-pushed the parser-try branch from b24e536 to 3a81aad Compare November 21, 2023 01:53

Base automatically changed from parser-tag-throw to main November 21, 2023 07:50

tlively force-pushed the parser-try branch from 3a81aad to ee0fadf Compare November 21, 2023 07:54

aheejin reviewed Nov 22, 2023

View reviewed changes

aheejin reviewed Nov 29, 2023

View reviewed changes

simplify delegate name prefixing

f850379

aheejin reviewed Nov 29, 2023

View reviewed changes

add code examples to comments

9854381

aheejin reviewed Nov 29, 2023

View reviewed changes

fix folded delegate, test delegate in catch

258ec69

aheejin approved these changes Nov 29, 2023

View reviewed changes

tlively merged commit 71b9cc0 into main Nov 30, 2023

tlively deleted the parser-try branch November 30, 2023 01:52

aheejin mentioned this pull request Dec 12, 2023

[Parser] Parse rethrow #6155

Merged

		tryy->name = Name();
		pushScope(ScopeCtx::makeTry(tryy, label));

		// The real label we're referencing, if it exists, has been shadowed by
		// the `try`. Get the previous label with this name instead.

[Parser] Parse try/catch/catch_all/delegate #6128

[Parser] Parse try/catch/catch_all/delegate #6128

Uh oh!

Conversation

tlively commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively commented Nov 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively commented Nov 21, 2023

Uh oh!

aheejin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aheejin Nov 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tlively commented Nov 18, 2023 •

edited

Loading

tlively commented Nov 18, 2023 •

edited

Loading

aheejin Nov 22, 2023 •

edited

Loading

aheejin Nov 29, 2023 •

edited

Loading

aheejin Nov 22, 2023 •

edited

Loading