-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Auto generate visit_source_order
#17180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto generate visit_source_order
#17180
Conversation
aeb3638
to
42a3a21
Compare
Hey @dcreager what do you think about this? there is some work left but I'll finish them. Just sending it now to get feedback. |
|
Thanks for tackling this, this is exactly the kind of change I was hoping the auto-generation script would enable.
I think it's worth finding a way to not have so much repetition between the field list and this new For types that need special handling (like For fields that shouldn't be included in the walk, you could have a field-level And for the types whose fields aren't in order, can we just reorder the field list to be in source order? Is there anything else that depends on the current field ordering for e.g. |
I like the idea of The main purpose of the source code generators is to avoid boilerplate. Our goal isn't to write a full meta programming language :) |
3070b0b
to
f479aec
Compare
Thanks for the ideas, I added I documented the new fields and source order generation. For (Sorry for the long delay I'll keep it shorter for next round) |
ee1a5e0
to
dc05df3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. This overall looks good to me. I've a few smaller inline suggestions.
I think we should revert the AST field order changes and the changes to the SourceOrderVisitor
. This PR should exclusively be around replacing the existing code without changing any semantics (or how structs are laid out in memory).
c190599
to
82c5d24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, overall, looks good. One more inline comment about skip_source_order
and, what I think, its overlap with type_to_visitor_function
@@ -77,8 +86,7 @@ fields = [ | |||
{ name = "name", type = "Identifier" }, | |||
{ name = "type_params", type = "Box<crate::TypeParams>?" }, | |||
{ name = "parameters", type = "Box<crate::Parameters>" }, | |||
|
|||
{ name = "returns", type = "Expr?" }, | |||
{ name = "returns", type = "Expr?", is_annotation = true }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of in_annotation
. Could we add some documentation for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I forgot about this one. This is only needed to determine the visitor function. For other expressions we use visit_expr
but for annotations we have visit_annotation
.
This could go to the generate script as well since we don't have a lot of annotations and it's not gonna change probably. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also be true for StmtAnnAssign
, TypeAlias
and other statements or is it just this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it for StmtAnnAssign
. For TypeAlias
we did not have it before?
But you are right for value it should be annotation. I changed it to annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to preserve the behavior from the hand written visitor and to change the visiting (if that's indeed the case) in a separate PR
type_to_visitor_function: dict[str, VisitorInfo] = { | ||
"Decorator": VisitorInfo("visit_decorator"), | ||
"Identifier": VisitorInfo("visit_identifier"), | ||
"crate::TypeParams": VisitorInfo("visit_type_params", True), | ||
"crate::Parameters": VisitorInfo("visit_parameters", True), | ||
"Expr": VisitorInfo("visit_expr"), | ||
"Stmt": VisitorInfo("visit_body", True), | ||
"Arguments": VisitorInfo("visit_arguments", True), | ||
"crate::Arguments": VisitorInfo("visit_arguments", True), | ||
"Operator": VisitorInfo("visit_operator"), | ||
"ElifElseClause": VisitorInfo("visit_elif_else_clause"), | ||
"WithItem": VisitorInfo("visit_with_item"), | ||
"MatchCase": VisitorInfo("visit_match_case"), | ||
"ExceptHandler": VisitorInfo("visit_except_handler"), | ||
"Alias": VisitorInfo("visit_alias"), | ||
"UnaryOp": VisitorInfo("visit_unary_op"), | ||
"DictItem": VisitorInfo("visit_dict_item"), | ||
"Comprehension": VisitorInfo("visit_comprehension"), | ||
"CmpOp": VisitorInfo("visit_cmp_op"), | ||
"FStringValue": VisitorInfo("visit_f_string_value"), | ||
"StringLiteralValue": VisitorInfo("visit_string_literal"), | ||
"BytesLiteralValue": VisitorInfo("visit_bytes_literal"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this to only include the exceptions. Most names can be derived directly from the type's name by converting to snake case.
It would also be nice if there's a way to specify those names without having to touch the generator script.
This also makes me wonder. Is this, sort of, the inverse of skip_source_order
because we need an entry for every type except the types in skip_source_order
. Could we use this in some way to simplify the configuration/ make it easier to extend in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, this mapping is proportional to the number of types that fields can have, not proportional to the number of fields themselves or syntax nodes, so it's less worrisome to spell out all of the options. We already do that, for instance, in the definition of SourceOrderVisitor
, since we have to explicitly list all of the visit_[foo]
methods that a consumer might want to implement.
That said, I agree with Micha's suggestion to make this mapping only include the exceptions if possible.
There's also another option, which I don't know yet if I like. But I want to write it up to see what I feel about it.
You could also do this with a trait over in source_order.rs
, that is implemented for each possible field type, and encodes how to turn that into a method call on the visitor. So something like:
trait FieldVisitor {
fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V);
}
impl FieldVisitor for Decorator {
fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V) {
visitor.visit_decorator(self);
}
}
/* and so on */
Things I don't like about this suggestion:
- More verbose than the mapping you've defined in the generator script.
Things I do like:
- Less verbose in the for-loop in the generator script (see below).
- "How to visit each field" is defined right next to the
SourceOrderVisitor
trait itself, making it less likely they'll get out of sync. - Only have to touch this trait and its impls when you add a new field type, not when you add a new syntax node or field.
- I personally find it a bit easier to understand, since you can see the Rust code directly. And the logic becomes very straightforward: for each field, we call its
FieldVisitor::visit
method.
What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do others think?
That's an interesting trick.
Would the trait implementations be code-gened or hand written?
This is interesting. What I dislike about it is that it adds "accidental" complexity. It's something that's only necessary because we codegen the visitor.
I think I have a slight preference to keep the complexity in generate.py
over spilling it into the generated nodes (and increase comp time a little)
I do like the things that you call out as positives :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea too. One other thing that made the visitors hard without touching the code here was that some visit a slice of that type(like visit_decorators
) and some visit a singular type.
I'm probably wrong but I feel this change is overall useful even if there was no code generation.
But the diff will probably be huge for this one. I can do another PR and see how does the code end up if we apply this change.
My reply to first comment starts here:
This also makes me wonder. Is this, sort of, the inverse of skip_source_order because we need an entry for every type except the types in skip_source_order. Could we use this in some way to simplify the configuration/ make it easier to extend in the future.
@MichaReiser are you suggesting to generate the visitor name from the field name?
And for field types that are not straightforward have this table, so we remove most of the entries in the table.
Sorry I didn't get what does it have to do with skip_source_order
.
I initially created the table for visitors that visit a sequence. I wanted to mark what visitors require a for loop.
Although for some like visit_decorators
you can infer it's visiting a seq by the "s" at the end for alias
it's not the case.
I'm thinking about this idea, to determine the visitor name:
- Lookup the table (for exceptional cases)
- Convert the name to snake_case and build the visitor
- Infer if it's visiting sequence by the
s
in the end
Hopefully the point 3 can be gone after I send a refactor either just making it consistent or applying Douglas's suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great if we can explore this idea as a separate PR, unless it creates a lot of additional work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of inferring behavior from the presence of an s
at the end of the name — that's seem too magical for what we're trying to accomplish here. If that's what we would have to do to avoid the table, I'd rather keep the table.
👍 to landing this as-is and iterating on possible changes in later PRs
type_to_visitor_function: dict[str, VisitorInfo] = { | ||
"Decorator": VisitorInfo("visit_decorator"), | ||
"Identifier": VisitorInfo("visit_identifier"), | ||
"crate::TypeParams": VisitorInfo("visit_type_params", True), | ||
"crate::Parameters": VisitorInfo("visit_parameters", True), | ||
"Expr": VisitorInfo("visit_expr"), | ||
"Stmt": VisitorInfo("visit_body", True), | ||
"Arguments": VisitorInfo("visit_arguments", True), | ||
"crate::Arguments": VisitorInfo("visit_arguments", True), | ||
"Operator": VisitorInfo("visit_operator"), | ||
"ElifElseClause": VisitorInfo("visit_elif_else_clause"), | ||
"WithItem": VisitorInfo("visit_with_item"), | ||
"MatchCase": VisitorInfo("visit_match_case"), | ||
"ExceptHandler": VisitorInfo("visit_except_handler"), | ||
"Alias": VisitorInfo("visit_alias"), | ||
"UnaryOp": VisitorInfo("visit_unary_op"), | ||
"DictItem": VisitorInfo("visit_dict_item"), | ||
"Comprehension": VisitorInfo("visit_comprehension"), | ||
"CmpOp": VisitorInfo("visit_cmp_op"), | ||
"FStringValue": VisitorInfo("visit_f_string_value"), | ||
"StringLiteralValue": VisitorInfo("visit_string_literal"), | ||
"BytesLiteralValue": VisitorInfo("visit_bytes_literal"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First off, this mapping is proportional to the number of types that fields can have, not proportional to the number of fields themselves or syntax nodes, so it's less worrisome to spell out all of the options. We already do that, for instance, in the definition of SourceOrderVisitor
, since we have to explicitly list all of the visit_[foo]
methods that a consumer might want to implement.
That said, I agree with Micha's suggestion to make this mapping only include the exceptions if possible.
There's also another option, which I don't know yet if I like. But I want to write it up to see what I feel about it.
You could also do this with a trait over in source_order.rs
, that is implemented for each possible field type, and encodes how to turn that into a method call on the visitor. So something like:
trait FieldVisitor {
fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V);
}
impl FieldVisitor for Decorator {
fn visit<'a, V: SourceOrderVisitor<'a>>(&'a self, visitor: &mut V) {
visitor.visit_decorator(self);
}
}
/* and so on */
Things I don't like about this suggestion:
- More verbose than the mapping you've defined in the generator script.
Things I do like:
- Less verbose in the for-loop in the generator script (see below).
- "How to visit each field" is defined right next to the
SourceOrderVisitor
trait itself, making it less likely they'll get out of sync. - Only have to touch this trait and its impls when you add a new field type, not when you add a new syntax node or field.
- I personally find it a bit easier to understand, since you can see the Rust code directly. And the logic becomes very straightforward: for each field, we call its
FieldVisitor::visit
method.
What do others think?
visitor = type_to_visitor_function[field.parsed_ty.inner] | ||
if field.is_annotation: | ||
visitor = annotation_visitor_function | ||
|
||
if field.parsed_ty.optional: | ||
body += f""" | ||
if let Some({field.name}) = {field.name} {{ | ||
visitor.{visitor.name}({field.name}); | ||
}}\n | ||
""" | ||
elif not visitor.accepts_sequence and field.parsed_ty.seq: | ||
body += f""" | ||
for elm in {field.name} {{ | ||
visitor.{visitor.name}(elm); | ||
}} | ||
""" | ||
else: | ||
body += f"visitor.{visitor.name}({field.name});\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trait would let you get rid of all of this logic (except for is_annotation
, I think, since that's overriding how an Expr
is visited):
visitor = type_to_visitor_function[field.parsed_ty.inner] | |
if field.is_annotation: | |
visitor = annotation_visitor_function | |
if field.parsed_ty.optional: | |
body += f""" | |
if let Some({field.name}) = {field.name} {{ | |
visitor.{visitor.name}({field.name}); | |
}}\n | |
""" | |
elif not visitor.accepts_sequence and field.parsed_ty.seq: | |
body += f""" | |
for elm in {field.name} {{ | |
visitor.{visitor.name}(elm); | |
}} | |
""" | |
else: | |
body += f"visitor.{visitor.name}({field.name});\n" | |
if field.is_annotation: | |
body += f""" | |
visitor.visit_annotation({field.name}); | |
""" | |
else: | |
body += f"{field.name}.visit(visitor);\n" |
4dbd1a1
to
144f253
Compare
type_to_visitor_function: dict[str, VisitorInfo] = { | ||
"Decorator": VisitorInfo("visit_decorator"), | ||
"Identifier": VisitorInfo("visit_identifier"), | ||
"crate::TypeParams": VisitorInfo("visit_type_params", True), | ||
"crate::Parameters": VisitorInfo("visit_parameters", True), | ||
"Expr": VisitorInfo("visit_expr"), | ||
"Stmt": VisitorInfo("visit_body", True), | ||
"Arguments": VisitorInfo("visit_arguments", True), | ||
"crate::Arguments": VisitorInfo("visit_arguments", True), | ||
"Operator": VisitorInfo("visit_operator"), | ||
"ElifElseClause": VisitorInfo("visit_elif_else_clause"), | ||
"WithItem": VisitorInfo("visit_with_item"), | ||
"MatchCase": VisitorInfo("visit_match_case"), | ||
"ExceptHandler": VisitorInfo("visit_except_handler"), | ||
"Alias": VisitorInfo("visit_alias"), | ||
"UnaryOp": VisitorInfo("visit_unary_op"), | ||
"DictItem": VisitorInfo("visit_dict_item"), | ||
"Comprehension": VisitorInfo("visit_comprehension"), | ||
"CmpOp": VisitorInfo("visit_cmp_op"), | ||
"FStringValue": VisitorInfo("visit_f_string_value"), | ||
"StringLiteralValue": VisitorInfo("visit_string_literal"), | ||
"BytesLiteralValue": VisitorInfo("visit_bytes_literal"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of inferring behavior from the presence of an s
at the end of the name — that's seem too magical for what we're trying to accomplish here. If that's what we would have to do to avoid the table, I'd rather keep the table.
👍 to landing this as-is and iterating on possible changes in later PRs
@@ -1,7 +1,6 @@ | |||
--- | |||
source: crates/ruff_python_parser/tests/fixtures.rs | |||
input_file: crates/ruff_python_parser/resources/inline/ok/ambiguous_lpar_with_items_if_expr.py | |||
snapshot_kind: text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know where these edits to the snapshot files are coming from? They don't look relevant to the changes you've made to the generation script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is happening depending on what version of cargo-insta you have locally when running the tests(more context mitsuhiko/insta#676). I'll go ahead and revert back to keep it tidy.
crates/ruff_python_parser/tests/snapshots/valid_syntax@nested_async_comprehension_py310.py.snap
Outdated
Show resolved
Hide resolved
dea4114
to
fb3afd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks for addressing the snapshot changes!
As mentioned upthread, we can look at other simplifications as follow-on PRs. Happy to merge this as-is.
* main: [red-knot] Detect version-related syntax errors (#16379) [`pyflakes`] Add fix safety section (`F841`) (#17410) [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450) Bump 0.11.6 (#17449) Auto generate `visit_source_order` (#17180) [red-knot] Initial tests for protocols (#17436) [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428) [red-knot] Dataclasses: support `order=True` (#17406)
* main: (123 commits) [red-knot] Handle explicit class specialization in type expressions (#17434) [red-knot] allow assignment expression in call compare narrowing (#17461) [red-knot] fix building unions with literals and AlwaysTruthy/AlwaysFalsy (#17451) [red-knot] Type narrowing for assertions (take 2) (#17345) [red-knot] class bases are not affected by __future__.annotations (#17456) [red-knot] Add support for overloaded functions (#17366) [`pyupgrade`] Add fix safety section to docs (`UP036`) (#17444) [red-knot] more type-narrowing in match statements (#17302) [red-knot] Add some narrowing for assignment expressions (#17448) [red-knot] Understand `typing.Protocol` and `typing_extensions.Protocol` as equivalent (#17446) Server: Use `min` instead of `max` to limit the number of threads (#17421) [red-knot] Detect version-related syntax errors (#16379) [`pyflakes`] Add fix safety section (`F841`) (#17410) [red-knot] Add `KnownFunction` variants for `is_protocol`, `get_protocol_members` and `runtime_checkable` (#17450) Bump 0.11.6 (#17449) Auto generate `visit_source_order` (#17180) [red-knot] Initial tests for protocols (#17436) [red-knot] Dataclasses: synthesize `__init__` with proper signature (#17428) [red-knot] Dataclasses: support `order=True` (#17406) [red-knot] Super-basic generic inference at call sites (#17301) ...
This was a follow up task from astral-sh#17180 (comment) separated into this PR.
Summary
part of: #15655
I tried generating the source order function using code generation. I tried a simple approach, but it is not enough to generate all of them this way.
There is one good thing, that most of the implementations are fine with this. We only have a few that are not. So one benefit of this PR could be it eliminates a lot of the code, hence changing the AST structure will only leave a few places to be fixed.
The
source_order
field determines if a node requires a source order implementation. If it’s empty it means source order does not visit anything.Initially I didn’t want to repeat the field names. But I found two things:
ExprIf
statement unlike other statements does not have the fields defined in source order. This and also some fields do not need to be included in the visit. So we just need a way to determine order, and determine presence.order
to each field? I'm open to suggestions.But anyway, except for the
ExprIf
we don't need to write the field names in order. Just knowing what fields must be visited are enough.Some nodes had a more complex visitor:
ExprCompare
required zipping two fields.ExprBoolOp
required a match over the fields.FstringValue
required a match, I created a new walk_ function that does the match. and used it in code generation. I don’t think this provides real value. Because I mostly moved the code from one file to another. I was tried it as an option. I prefer to leave it in the code as before.Some visitors visit a slice of items. Others visit a single element. I put a check on this in code generation to see if the field requires a for loop or not. I think better approach is to have a consistent style. So we can by default loop over any field that is a sequence.
For field types
StringLiteralValue
andBytesLiteralValue
the types are not a sequence in toml definition. But they implementiter
so they are iterated over. So the code generation does not properly identify this. So in the code I'm checking for their types.Test Plan
All the tests should pass without any changes.
I checked the generated code to make sure it's the same as old code. I'm not sure if there's a test for the source order visitor.