Skip to content

Conversation

Glyphack
Copy link
Contributor

@Glyphack Glyphack commented Apr 3, 2025

Summary

part of: #15655

I tried generating the source order function using code generation. I tried a simple approach, but it is not enough to generate all of them this way.

There is one good thing, that most of the implementations are fine with this. We only have a few that are not. So one benefit of this PR could be it eliminates a lot of the code, hence changing the AST structure will only leave a few places to be fixed.

The source_order field determines if a node requires a source order implementation. If it’s empty it means source order does not visit anything.

Initially I didn’t want to repeat the field names. But I found two things:

  • ExprIf statement unlike other statements does not have the fields defined in source order. This and also some fields do not need to be included in the visit. So we just need a way to determine order, and determine presence.
  • Relying on the fields sounds more complicated to me. Maybe another solution is to add a new attribute order to each field? I'm open to suggestions.
    But anyway, except for the ExprIf we don't need to write the field names in order. Just knowing what fields must be visited are enough.

Some nodes had a more complex visitor:

ExprCompare required zipping two fields.

ExprBoolOp required a match over the fields.

FstringValue required a match, I created a new walk_ function that does the match. and used it in code generation. I don’t think this provides real value. Because I mostly moved the code from one file to another. I was tried it as an option. I prefer to leave it in the code as before.

Some visitors visit a slice of items. Others visit a single element. I put a check on this in code generation to see if the field requires a for loop or not. I think better approach is to have a consistent style. So we can by default loop over any field that is a sequence.

For field types StringLiteralValue and BytesLiteralValue the types are not a sequence in toml definition. But they implement iter so they are iterated over. So the code generation does not properly identify this. So in the code I'm checking for their types.

Test Plan

All the tests should pass without any changes.
I checked the generated code to make sure it's the same as old code. I'm not sure if there's a test for the source order visitor.

@AlexWaygood AlexWaygood added the internal An internal refactor or improvement label Apr 3, 2025
@Glyphack Glyphack force-pushed the autogenerate-visit-source-order branch 2 times, most recently from aeb3638 to 42a3a21 Compare April 3, 2025 15:39
@Glyphack
Copy link
Contributor Author

Glyphack commented Apr 3, 2025

Hey @dcreager what do you think about this? there is some work left but I'll finish them. Just sending it now to get feedback.

Copy link
Contributor

github-actions bot commented Apr 3, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dcreager
Copy link
Member

dcreager commented Apr 3, 2025

Thanks for tackling this, this is exactly the kind of change I was hoping the auto-generation script would enable.

The source_order field determines if a node requires a source order implementation. If it’s empty it means source order does not visit anything.

Initially I didn’t want to repeat the field names. But I found two things:

  • ExprIf statement unlike other statements does not have the fields defined in source order. This and also some fields do not need to be included in the visit. So we just need a way to determine order, and determine presence.

  • Relying on the fields sounds more complicated to me. Maybe another solution is to add a new attribute order to each field? I'm open to suggestions.
    But anyway, except for the ExprIf we don't need to write the field names in order. Just knowing what fields must be visited are enough.

I think it's worth finding a way to not have so much repetition between the field list and this new source_order field. Most types want to include all fields in the source order walk, in the same order they appear in the field list. So that should be the default, if source_order is missing.

For types that need special handling (like ExprBoolOp) you could instead have custom_source_order = true.

For fields that shouldn't be included in the walk, you could have a field-level skip_source_order = true.

And for the types whose fields aren't in order, can we just reorder the field list to be in source order? Is there anything else that depends on the current field ordering for e.g. ExprIf? If so (and therefore we can't reorder), then you could have custom_source_order = ["field", "list"]. (Or maybe a different name if you don't want to conflict with the first option mentioned above)

@MichaReiser
Copy link
Member

I like the idea of custom_source_order = true. I think it's fine to hand write the visitor for the few nodes that have fields that need to be skipped or require visiting in a custom ordering.

The main purpose of the source code generators is to avoid boilerplate. Our goal isn't to write a full meta programming language :)

@Glyphack Glyphack force-pushed the autogenerate-visit-source-order branch 2 times, most recently from 3070b0b to f479aec Compare April 8, 2025 15:10
@Glyphack
Copy link
Contributor Author

Glyphack commented Apr 8, 2025

Thanks for the ideas, I added skip_source_order for fields & custom_source_order for nodes. The toml looks less messy now.

I documented the new fields and source order generation.

For ExprIf I updated the field order. The snapshots required an update so I updated them.

(Sorry for the long delay I'll keep it shorter for next round)

@Glyphack Glyphack force-pushed the autogenerate-visit-source-order branch from ee1a5e0 to dc05df3 Compare April 8, 2025 15:16
@Glyphack Glyphack marked this pull request as ready for review April 8, 2025 20:28
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This overall looks good to me. I've a few smaller inline suggestions.

I think we should revert the AST field order changes and the changes to the SourceOrderVisitor. This PR should exclusively be around replacing the existing code without changing any semantics (or how structs are laid out in memory).

@Glyphack Glyphack force-pushed the autogenerate-visit-source-order branch from c190599 to 82c5d24 Compare April 9, 2025 17:12
@sharkdp sharkdp removed their request for review April 9, 2025 17:13
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, overall, looks good. One more inline comment about skip_source_order and, what I think, its overlap with type_to_visitor_function

@@ -77,8 +86,7 @@ fields = [
{ name = "name", type = "Identifier" },
{ name = "type_params", type = "Box<crate::TypeParams>?" },
{ name = "parameters", type = "Box<crate::Parameters>" },

{ name = "returns", type = "Expr?" },
{ name = "returns", type = "Expr?", is_annotation = true },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of in_annotation. Could we add some documentation for it.

Copy link
Contributor Author

@Glyphack Glyphack Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot about this one. This is only needed to determine the visitor function. For other expressions we use visit_expr but for annotations we have visit_annotation.
This could go to the generate script as well since we don't have a lot of annotations and it's not gonna change probably. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be true for StmtAnnAssign, TypeAlias and other statements or is it just this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it for StmtAnnAssign. For TypeAlias we did not have it before?

https://github.com/astral-sh/ruff/pull/17180/files#diff-97bb55be932e9d4bbd9219dfd4d62507ae171c8b642a3feceeceeb7fd4d1c6a6L121

But you are right for value it should be annotation. I changed it to annotation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to preserve the behavior from the hand written visitor and to change the visiting (if that's indeed the case) in a separate PR

Comment on lines +776 to +798
type_to_visitor_function: dict[str, VisitorInfo] = {
"Decorator": VisitorInfo("visit_decorator"),
"Identifier": VisitorInfo("visit_identifier"),
"crate::TypeParams": VisitorInfo("visit_type_params", True),
"crate::Parameters": VisitorInfo("visit_parameters", True),
"Expr": VisitorInfo("visit_expr"),
"Stmt": VisitorInfo("visit_body", True),
"Arguments": VisitorInfo("visit_arguments", True),
"crate::Arguments": VisitorInfo("visit_arguments", True),
"Operator": VisitorInfo("visit_operator"),
"ElifElseClause": VisitorInfo("visit_elif_else_clause"),
"WithItem": VisitorInfo("visit_with_item"),
"MatchCase": VisitorInfo("visit_match_case"),
"ExceptHandler": VisitorInfo("visit_except_handler"),
"Alias": VisitorInfo("visit_alias"),
"UnaryOp": VisitorInfo("visit_unary_op"),
"DictItem": VisitorInfo("visit_dict_item"),
"Comprehension": VisitorInfo("visit_comprehension"),
"CmpOp": VisitorInfo("visit_cmp_op"),
"FStringValue": VisitorInfo("visit_f_string_value"),
"StringLiteralValue": VisitorInfo("visit_string_literal"),
"BytesLiteralValue": VisitorInfo("visit_bytes_literal"),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we change this to only include the exceptions. Most names can be derived directly from the type's name by converting to snake case.

It would also be nice if there's a way to specify those names without having to touch the generator script.

This also makes me wonder. Is this, sort of, the inverse of skip_source_order because we need an entry for every type except the types in skip_source_order. Could we use this in some way to simplify the configuration/ make it easier to extend in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, this mapping is proportional to the number of types that fields can have, not proportional to the number of fields themselves or syntax nodes, so it's less worrisome to spell out all of the options. We already do that, for instance, in the definition of SourceOrderVisitor, since we have to explicitly list all of the visit_[foo] methods that a consumer might want to implement.

That said, I agree with Micha's suggestion to make this mapping only include the exceptions if possible.


There's also another option, which I don't know yet if I like. But I want to write it up to see what I feel about it.

You could also do this with a trait over in source_order.rs, that is implemented for each possible field type, and encodes how to turn that into a method call on the visitor. So something like:

trait FieldVisitor {
    fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V);
}

impl FieldVisitor for Decorator {
    fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V) {
        visitor.visit_decorator(self);
    }
}

/* and so on */

Things I don't like about this suggestion:

  • More verbose than the mapping you've defined in the generator script.

Things I do like:

  • Less verbose in the for-loop in the generator script (see below).
  • "How to visit each field" is defined right next to the SourceOrderVisitor trait itself, making it less likely they'll get out of sync.
  • Only have to touch this trait and its impls when you add a new field type, not when you add a new syntax node or field.
  • I personally find it a bit easier to understand, since you can see the Rust code directly. And the logic becomes very straightforward: for each field, we call its FieldVisitor::visit method.

What do others think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do others think?

That's an interesting trick.

Would the trait implementations be code-gened or hand written?

This is interesting. What I dislike about it is that it adds "accidental" complexity. It's something that's only necessary because we codegen the visitor.

I think I have a slight preference to keep the complexity in generate.py over spilling it into the generated nodes (and increase comp time a little)

I do like the things that you call out as positives :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea too. One other thing that made the visitors hard without touching the code here was that some visit a slice of that type(like visit_decorators) and some visit a singular type.

I'm probably wrong but I feel this change is overall useful even if there was no code generation.
But the diff will probably be huge for this one. I can do another PR and see how does the code end up if we apply this change.

My reply to first comment starts here:

This also makes me wonder. Is this, sort of, the inverse of skip_source_order because we need an entry for every type except the types in skip_source_order. Could we use this in some way to simplify the configuration/ make it easier to extend in the future.

@MichaReiser are you suggesting to generate the visitor name from the field name?
And for field types that are not straightforward have this table, so we remove most of the entries in the table.
Sorry I didn't get what does it have to do with skip_source_order.

I initially created the table for visitors that visit a sequence. I wanted to mark what visitors require a for loop.
Although for some like visit_decorators you can infer it's visiting a seq by the "s" at the end for alias it's not the case.

I'm thinking about this idea, to determine the visitor name:

  1. Lookup the table (for exceptional cases)
  2. Convert the name to snake_case and build the visitor
  3. Infer if it's visiting sequence by the s in the end

Hopefully the point 3 can be gone after I send a refactor either just making it consistent or applying Douglas's suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if we can explore this idea as a separate PR, unless it creates a lot of additional work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that. I think I'm also fine landing this PR as is and creating a separate PR to simplify this type_to_visitor_function. I let you and @dcreager decide on what you prefer. @dcreager can you go ahead and merge this PR if you're happy with it because I'll be out next week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of inferring behavior from the presence of an s at the end of the name — that's seem too magical for what we're trying to accomplish here. If that's what we would have to do to avoid the table, I'd rather keep the table.

👍 to landing this as-is and iterating on possible changes in later PRs

Comment on lines +776 to +798
type_to_visitor_function: dict[str, VisitorInfo] = {
"Decorator": VisitorInfo("visit_decorator"),
"Identifier": VisitorInfo("visit_identifier"),
"crate::TypeParams": VisitorInfo("visit_type_params", True),
"crate::Parameters": VisitorInfo("visit_parameters", True),
"Expr": VisitorInfo("visit_expr"),
"Stmt": VisitorInfo("visit_body", True),
"Arguments": VisitorInfo("visit_arguments", True),
"crate::Arguments": VisitorInfo("visit_arguments", True),
"Operator": VisitorInfo("visit_operator"),
"ElifElseClause": VisitorInfo("visit_elif_else_clause"),
"WithItem": VisitorInfo("visit_with_item"),
"MatchCase": VisitorInfo("visit_match_case"),
"ExceptHandler": VisitorInfo("visit_except_handler"),
"Alias": VisitorInfo("visit_alias"),
"UnaryOp": VisitorInfo("visit_unary_op"),
"DictItem": VisitorInfo("visit_dict_item"),
"Comprehension": VisitorInfo("visit_comprehension"),
"CmpOp": VisitorInfo("visit_cmp_op"),
"FStringValue": VisitorInfo("visit_f_string_value"),
"StringLiteralValue": VisitorInfo("visit_string_literal"),
"BytesLiteralValue": VisitorInfo("visit_bytes_literal"),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First off, this mapping is proportional to the number of types that fields can have, not proportional to the number of fields themselves or syntax nodes, so it's less worrisome to spell out all of the options. We already do that, for instance, in the definition of SourceOrderVisitor, since we have to explicitly list all of the visit_[foo] methods that a consumer might want to implement.

That said, I agree with Micha's suggestion to make this mapping only include the exceptions if possible.


There's also another option, which I don't know yet if I like. But I want to write it up to see what I feel about it.

You could also do this with a trait over in source_order.rs, that is implemented for each possible field type, and encodes how to turn that into a method call on the visitor. So something like:

trait FieldVisitor {
    fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V);
}

impl FieldVisitor for Decorator {
    fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V) {
        visitor.visit_decorator(self);
    }
}

/* and so on */

Things I don't like about this suggestion:

  • More verbose than the mapping you've defined in the generator script.

Things I do like:

  • Less verbose in the for-loop in the generator script (see below).
  • "How to visit each field" is defined right next to the SourceOrderVisitor trait itself, making it less likely they'll get out of sync.
  • Only have to touch this trait and its impls when you add a new field type, not when you add a new syntax node or field.
  • I personally find it a bit easier to understand, since you can see the Rust code directly. And the logic becomes very straightforward: for each field, we call its FieldVisitor::visit method.

What do others think?

Comment on lines +819 to +836
visitor = type_to_visitor_function[field.parsed_ty.inner]
if field.is_annotation:
visitor = annotation_visitor_function

if field.parsed_ty.optional:
body += f"""
if let Some({field.name}) = {field.name} {{
visitor.{visitor.name}({field.name});
}}\n
"""
elif not visitor.accepts_sequence and field.parsed_ty.seq:
body += f"""
for elm in {field.name} {{
visitor.{visitor.name}(elm);
}}
"""
else:
body += f"visitor.{visitor.name}({field.name});\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait would let you get rid of all of this logic (except for is_annotation, I think, since that's overriding how an Expr is visited):

Suggested change
visitor = type_to_visitor_function[field.parsed_ty.inner]
if field.is_annotation:
visitor = annotation_visitor_function
if field.parsed_ty.optional:
body += f"""
if let Some({field.name}) = {field.name} {{
visitor.{visitor.name}({field.name});
}}\n
"""
elif not visitor.accepts_sequence and field.parsed_ty.seq:
body += f"""
for elm in {field.name} {{
visitor.{visitor.name}(elm);
}}
"""
else:
body += f"visitor.{visitor.name}({field.name});\n"
if field.is_annotation:
body += f"""
visitor.visit_annotation({field.name});
"""
else:
body += f"{field.name}.visit(visitor);\n"

@Glyphack Glyphack force-pushed the autogenerate-visit-source-order branch 2 times, most recently from 4dbd1a1 to 144f253 Compare April 11, 2025 17:44
Comment on lines +776 to +798
type_to_visitor_function: dict[str, VisitorInfo] = {
"Decorator": VisitorInfo("visit_decorator"),
"Identifier": VisitorInfo("visit_identifier"),
"crate::TypeParams": VisitorInfo("visit_type_params", True),
"crate::Parameters": VisitorInfo("visit_parameters", True),
"Expr": VisitorInfo("visit_expr"),
"Stmt": VisitorInfo("visit_body", True),
"Arguments": VisitorInfo("visit_arguments", True),
"crate::Arguments": VisitorInfo("visit_arguments", True),
"Operator": VisitorInfo("visit_operator"),
"ElifElseClause": VisitorInfo("visit_elif_else_clause"),
"WithItem": VisitorInfo("visit_with_item"),
"MatchCase": VisitorInfo("visit_match_case"),
"ExceptHandler": VisitorInfo("visit_except_handler"),
"Alias": VisitorInfo("visit_alias"),
"UnaryOp": VisitorInfo("visit_unary_op"),
"DictItem": VisitorInfo("visit_dict_item"),
"Comprehension": VisitorInfo("visit_comprehension"),
"CmpOp": VisitorInfo("visit_cmp_op"),
"FStringValue": VisitorInfo("visit_f_string_value"),
"StringLiteralValue": VisitorInfo("visit_string_literal"),
"BytesLiteralValue": VisitorInfo("visit_bytes_literal"),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of inferring behavior from the presence of an s at the end of the name — that's seem too magical for what we're trying to accomplish here. If that's what we would have to do to avoid the table, I'd rather keep the table.

👍 to landing this as-is and iterating on possible changes in later PRs

@@ -1,7 +1,6 @@
---
source: crates/ruff_python_parser/tests/fixtures.rs
input_file: crates/ruff_python_parser/resources/inline/ok/ambiguous_lpar_with_items_if_expr.py
snapshot_kind: text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know where these edits to the snapshot files are coming from? They don't look relevant to the changes you've made to the generation script

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is happening depending on what version of cargo-insta you have locally when running the tests(more context mitsuhiko/insta#676). I'll go ahead and revert back to keep it tidy.

@Glyphack Glyphack force-pushed the autogenerate-visit-source-order branch from dea4114 to fb3afd1 Compare April 15, 2025 16:02
Copy link
Member

@dcreager dcreager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for addressing the snapshot changes!

As mentioned upthread, we can look at other simplifications as follow-on PRs. Happy to merge this as-is.

@dcreager dcreager merged commit 3ada36b into astral-sh:main Apr 17, 2025
22 checks passed
dcreager added a commit that referenced this pull request Apr 17, 2025
* main:
  [red-knot] Detect version-related syntax errors (#16379)
  [`pyflakes`] Add fix safety section (`F841`) (#17410)
  [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450)
  Bump 0.11.6 (#17449)
  Auto generate `visit_source_order` (#17180)
  [red-knot] Initial tests for protocols (#17436)
  [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428)
  [red-knot] Dataclasses: support `order=True` (#17406)
dcreager added a commit that referenced this pull request Apr 18, 2025
* main: (123 commits)
  [red-knot] Handle explicit class specialization in type expressions (#17434)
  [red-knot] allow assignment expression in call compare narrowing (#17461)
  [red-knot] fix building unions with literals and AlwaysTruthy/AlwaysFalsy (#17451)
  [red-knot] Type narrowing for assertions (take 2) (#17345)
  [red-knot] class bases are not affected by __future__.annotations (#17456)
  [red-knot] Add support for overloaded functions (#17366)
  [`pyupgrade`] Add fix safety section to docs (`UP036`) (#17444)
  [red-knot] more type-narrowing in match statements (#17302)
  [red-knot] Add some narrowing for assignment expressions (#17448)
  [red-knot] Understand `typing.Protocol` and `typing_extensions.Protocol` as equivalent (#17446)
  Server: Use `min` instead of `max` to limit the number of threads (#17421)
  [red-knot] Detect version-related syntax errors (#16379)
  [`pyflakes`] Add fix safety section (`F841`) (#17410)
  [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450)
  Bump 0.11.6 (#17449)
  Auto generate `visit_source_order` (#17180)
  [red-knot] Initial tests for protocols (#17436)
  [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428)
  [red-knot] Dataclasses: support `order=True` (#17406)
  [red-knot] Super-basic generic inference at call sites (#17301)
  ...
Glyphack added a commit to Glyphack/ruff that referenced this pull request Apr 20, 2025
This was a follow up task from astral-sh#17180 (comment)
separated into this PR.
@Glyphack Glyphack deleted the autogenerate-visit-source-order branch April 20, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants