Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented Nov 18, 2023

Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded
and unfolded forms.

The first source of significant complexity is the optional IDs after catch
and catch_all in the unfolded form, which can be confused for tag indices and
require backtracking to parse correctly.

The second source of complexity is the handling of delegate labels, which are
relative to the try's parent scope despite being parsed after the try's scope
has already started. Handling this correctly requires punching a hole big
enough to drive a truck through through both the parser and IRBuilder
abstractions.

@tlively
Copy link
Member Author

tlively commented Nov 18, 2023


// It can be ambiguous whether the name after `catch` is intended to be the
// optional ID or the tag identifier. First try parsing with the optional
// ID, then retry without parsing the ID if we run into an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, was there no way to avoid this ambiguity in the wasm text format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one uses these optional IDs afaik, so one option would have been to not have them. But then it would have been inconsistent with the text format for other control flow structures 😱

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this optional ID thing? I didn't know this kind of thing existed until now... 🤯 Do we ever use this? Do we have tests that have this? (I can't seem to find any in this PR. Please let me know if I missed them)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optional ID only appears in the unfolded versions of control flow instructions. For block, for example, it lets you repeat the block label at the end (like block $foo ... end $foo), which can help readability when blocks are very long and you're trying to figure out where branches to $foo will end up. Since these IDs are only allowed in the unfolded format, we only use them in wat-kitchen-sink.wast so far. In this PR they are tested in $try-br-name.

// keep this complexity contained within the handling of trys and
// delegates, pretend there is just the single normal label and add a
// prefix to it to generate the delegate label. The prefix is added on the
// try when its scope ends.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have this complexity in the old parsers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because we don't allow branches to anything except for blocks and loops. The old parser also doesn't support delegating to any scope except to a parent try directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is it not simpler to add a new block as needed, as the old parsers do? (and rely on the optimizer to fold unneeded ones etc.)

But I don't feel strongly here if this feels like the right solution to you. lgtm overall, though I see test failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do still add a new block as needed for normal control flow branches that target this try. But adding a new block won't help for delegates that target the try because in Binaryen IR, delegate must target the Try directly and cannot target a block.

Copy link
Member

@aheejin aheejin Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the difference between the old way and the new way? In the old parser/IR, try's labels can only be targeted from delegates and not branches. Also branches cannot jump to trys; we created an internal block to mimic jumping to a try. Are these not true anymore in the new parser/IR? I haven't reviewed the other PRs for the new parser/IRBuilder so I'd appreciate the context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not changing the rules for the IR, so it's still the case that delegate can only target named Try directly in the IR and that normal branches cannot target Try. However, in the input text we are parsing with the new parser, we are following the standard and allowing any branch to target a try and allowing delegates to target any kind of control flow. Most of the complexity here is because we are converting from input following the standard rules to IR following our more restrictive IR rules on the fly.

@tlively
Copy link
Member Author

tlively commented Nov 21, 2023

The test failures turned out to be a pre-existing bug that I fixed independently in #6130.

Base automatically changed from parser-tag-throw to main November 21, 2023 07:50
Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded
and unfolded forms.

The first sources of significant complexity is the optional IDs after `catch`
and `catch_all` in the unfolded form, which can be confused for tag indices and
require backtracking to parse correctly.

The second source of complexity is the handling of delegate labels, which are
relative to the try's parent scope despite being parsed after the try's scope
has already started. Handling this correctly requires punching a whole big
enough to drive a truck through through both the parser and IRBuilder
abstractions.
Copy link
Member

@aheejin aheejin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed reply!


// It can be ambiguous whether the name after `catch` is intended to be the
// optional ID or the tag identifier. First try parsing with the optional
// ID, then retry without parsing the ID if we run into an error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this optional ID thing? I didn't know this kind of thing existed until now... 🤯 Do we ever use this? Do we have tests that have this? (I can't seem to find any in this PR. Please let me know if I missed them)

Comment on lines +978 to +983
if (parseID && tag.getErr()) {
// Instead of returning an error, retry without the ID.
parseID = false;
ctx.in.lexer.setIndex(afterCatchPos);
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(While I'm not sure what the ID is in the first place...) Is this trying to parse an ID or a tag? I thought the upper if (line 967-975) parses ID (if any) and this parses tag. But this seems to set parseID to false if tag parsing went unsuccessful. Why?

And does this optional ID only exist in the non-folded format?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that lines 967-975 parse the ID. This logic here tries to parse a tag (on line 977) and if that fails, it resets the parser to try again without trying to parse an ID.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a tag parsing fails, why do we set parseID false instead?
And what does it retry to do? Given that we set parseID false here, it doesn't retry to parse an ID. Then does it retry to parse a tag? But tag parsing just failed now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ambiguous case this is handling looks like this:

(tag $t)
(func $ambiguous
  try $t
  catch $t
  end
)

When parsing the catch, the parser first tries to parse an optional ID that must match the label of the try, and it succeeds because it sees $t after the catch. However, when it then tries to parse the mandatory tag index, it fails because the next token is end. The problem is that the $t after the catch was the tag name and there was no optional ID after all. The parser sets parseID = false and resets to just after the catch, and now it skips parsing the optional ID so it correctly parses the $t as a tag name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Can you add this explanation in the comments? I think it will help reading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

static ScopeCtx makeTry(Try* tryy, Name originalLabel = {}) {
return ScopeCtx(TryScope{tryy, originalLabel});
}
static ScopeCtx makeCatch(Try* tryy, Name originalLabel, Name label) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are label and originalLabel respectively and how are they different?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originalLabel is the name as it appears in the text input. label is a version of originalLabel that might have been modified to avoid having duplicate labels, which are allowed in the standard text format but not in Binaryen IR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to keep originalLabel internally then? Is that because you try to roundtrip the labels as given? (If so, given that the code itself does not roundtrip anyway after being stored as a Binaryen IR, is it necessary?)

And why do if and try need only a originalLabel while else and catch needs both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep the scope's originalLabel around because it can be used by branch instructions inside the scope and we need to manage the mapping from originalLabel to internal label, including removing the mapping when the scope ends.

Copy link
Member

@aheejin aheejin Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Why do if and try need only a originalLabel while else and catch needs both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the labels are only created lazily once we encounter a branch or delegate to a scope, it's not possible for there to be a nontrivial label at the beginning of an if or try scope. But at the beginning of an else or catch scope, there might have been a branch in the corresponding if or try, so we might have already generated a label that we need to propagate along to make sure branches in the else or catch go to the right place.

Comment on lines +222 to +236
} else if (auto* tryy = scope.getTry()) {
std::cerr << "try";
if (tryy->name) {
std::cerr << " " << tryy->name;
}
} else if (auto* tryy = scope.getCatch()) {
std::cerr << "catch";
if (tryy->name) {
std::cerr << " " << tryy->name;
}
} else if (auto* tryy = scope.getCatchAll()) {
std::cerr << "catch_all";
if (tryy->name) {
std::cerr << " " << tryy->name;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we print name only with trys and not with blocks and loops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block and loop names will be printed down below when we print the OriginalLabel. In contrast, Try names aren't considered OriginalLabels because they are treated specially and used only for delegates, so if we want to print Try names we have to go out of our way to print them here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said originalLabel is the name as it appears in the text input. above. Then a try doesn't have an originalLabel even if the input has one? I guess I'm confused because I don't understand the purpose of originalLabel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I have this text input:

try $myTry
 try
 delegate $myTry
catch $tag
end

Then the originalLabel for that try is myTry and the name field in the Try IR node will be empty until we reach the delegate, at which point we will set it to myTry as well. Then when we finish the catch scope we will set the name to __delegate__myTry to differentiate it from the wrapper block $myTry that we use the originalName for.

Comment on lines +504 to +505
tryy->name = Name();
pushScope(ScopeCtx::makeTry(tryy, label));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are try->name and the given label parameter different things?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; the label is the name used to branch to the try with normal branches like br and br_if. It will end up being the name of a block we wrap around the try. tryy->name will ultimately end up being another name used to delegate to this try. Unfortunately we can't reuse label for that because the wrapper block and the try must have different names to validate.

Comment on lines +624 to +629
bool wasTry = true;
auto* tryy = scope.getTry();
if (!tryy) {
wasTry = false;
tryy = scope.getCatch();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it guaranteed that we are in a catch by the time we enter visitCatch? Do we need to check for this? Maybe I am not familiar with how to use these visit methods in the new IRBuilder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the text parser you're right that we are guaranteed to be in the catch by the time we enter visitCatch. However, when we start using IRBuilder in the binary parser, an incorrect binary could have the opcode for catch anywhere and we might end up calling visitCatch when we're not in a try context. To keep the future binary parser as simple as possible and to help catch any other bugs we might have using IRBuilder, I've built IRBuilder to not assume it's being used correctly and to report meaningful errors about incorrect usage instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understood it correctly, so let me know if I'm mistaken:

  • Line 626-629 means when we parse a non-first catch in a sequence of try-catch-catch-..., which is not an error. When parsing a non-first valid catch in a try-catch-catch-..., getTry() returns null but getCatch() returns a scope.
  • Line 630-632 means when a catch appears after neither of a try or catch.

Are these correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly right. getTry() and getCatch() both return the Try* associated with the scope or null.

// keep this complexity contained within the handling of trys and
// delegates, pretend there is just the single normal label and add a
// prefix to it to generate the delegate label. The prefix is added on the
// try when its scope ends.
Copy link
Member

@aheejin aheejin Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what's the difference between the old way and the new way? In the old parser/IR, try's labels can only be targeted from delegates and not branches. Also branches cannot jump to trys; we created an internal block to mimic jumping to a try. Are these not true anymore in the new parser/IR? I haven't reviewed the other PRs for the new parser/IRBuilder so I'd appreciate the context.

Comment on lines 697 to 699
delegateTry->name = *getLabelName(label);
tryy->delegateTarget =
Name(std::string("__delegate__") + delegateTry->name.toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more complicated if we prefix delegateTry->name with __delegate__ here earlier and not when we wrap up the try?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, no, it's much simpler to do the prefixing here. I just pushed an update that makes this change.

}

Result<Index> IRBuilder::getLabelIndex(Name label) {
Result<Index> IRBuilder::getLabelIndex(Name label, bool inDelegate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we possibly call visitDelegate after calling visitEnd for that try and avoid this complexity? Setting delegateTarget afterwards does not affect the result of Try::finalize.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem there is that visitDelegate wouldn't know which Try to add a delegateTarget to because visitEnd will have popped the Try off the scope stack. I suppose we could try to go find the Try at the top of the expressionStack of the new top scope, but that also seems like a hack. I prefer this current hack because the weirdness is at least captured in function signatures. If we tried to find the Try on the scope stack, the leaky abstraction wouldn't be visible or documented in any interfaces.

Result<Index> IRBuilder::getLabelIndex(Name label) {
Result<Index> IRBuilder::getLabelIndex(Name label, bool inDelegate) {
auto it = labelDepths.find(label);
if (it == labelDepths.end() || it->second.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the definition of labelDepths, and not sure why the value needs to be not a single index but a stack of indices. I thought that is mostly for translating a name to an index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to handle shadowing of label names correctly. If we have this:

block $L
 block $LL
  block $L
    block $LL
      br $LL
    end
    br $LL
  end
end

We still need to know where the second br $LL should go, so we can't have completely overwritten the outer $LL when we reached the inner $LL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow who allowed label shadowing... Thanks for the explanation.

Comment on lines +1768 to +1772
;; CHECK-NEXT: (block $l
;; CHECK-NEXT: (try $__delegate__l
;; CHECK-NEXT: (do
;; CHECK-NEXT: (block $l_0
;; CHECK-NEXT: (block $l_1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we create wrapping blocks? I thought we do when there are branches targeting the label. This case already has an inner block though... Why do we have two more blocks in the result?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of how we handle delegate labels as prefixed versions of "normal" scope labels, the delegate $l here also sets the "normal" label for the try $l scope, so we end up emitting a wrapper block around the try as if there were a normal branch to $l as well.

We could add some logic to try to avoid this unnecessary wrapper block, but it didn't seem worth adding complexity to remove it.

In general the logic for adding wrapper blocks around a scope does not consider whether there are already blocks inside that scope.

Comment on lines 784 to 785
// The real label we're referencing, if it exists, has been shadowed by
// the `try`. Get the previous label with this name instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment with a very short code snippet or something to show what this case is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Result<Index> IRBuilder::getLabelIndex(Name label) {
Result<Index> IRBuilder::getLabelIndex(Name label, bool inDelegate) {
auto it = labelDepths.find(label);
if (it == labelDepths.end() || it->second.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow who allowed label shadowing... Thanks for the explanation.

return Ok{};
}

Result<> IRBuilder::visitDelegate(Index label) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this correctly handle the case when delegate is within a catch? If so, how about adding a test case for that?

try $l
  ...
  try $l
    ...
  catch
    try
    delegate $l
  end
end

delegate here should go to the outer try, not the inner one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I've just added test cases for that and the similar case where the delegate is in a catch_all now.

Copy link
Member

@aheejin aheejin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanations and sorry for the review delays!

@tlively
Copy link
Member Author

tlively commented Nov 29, 2023

Thank you for the thorough review!

@tlively tlively merged commit 71b9cc0 into main Nov 30, 2023
@tlively tlively deleted the parser-try branch November 30, 2023 01:52
@aheejin aheejin mentioned this pull request Dec 12, 2023
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
Parse the legacy v3 syntax for try/catch/catch_all/delegate in both its folded
and unfolded forms.

The first sources of significant complexity is the optional IDs after `catch`
and `catch_all` in the unfolded form, which can be confused for tag indices and
require backtracking to parse correctly.

The second source of complexity is the handling of delegate labels, which are
relative to the try's parent scope despite being parsed after the try's scope
has already started. Handling this correctly requires punching a whole big
enough to drive a truck through through both the parser and IRBuilder
abstractions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants