Skip to content

Add Substrait Protobuf Visitor Framework #136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alexanderbianchi
Copy link

Summary

We've developed an efficient visitor pattern implementation for traversing Substrait protobuf plans in Go. We'd like to contribute this to the Substrait community as it solves common traversal needs in an idiomatic Go way.

Issue

Tradeoffs and design is outlined in #135

@CLAassistant
Copy link

CLAassistant commented Jun 17, 2025

CLA assistant check
All committers have signed the CLA.

## IMPORTANT: Read-Only Analysis Only
This interface is designed for read-only analysis, and assumes the tree does not change
as it is being walked. Any modification of the tree could disrupt the walking of the
tree causing nodes to be missed, visited out of order, or cause panics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some modifications would be reasonably safe with this interface. Here are a few possibilities: https://github.com/voltrondata/spark-substrait-gateway/tree/main/src/transforms

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, I can update the description to be less intense and just warn

v.count++
}

// Visitor that only cares about expressions (e.g., finding all functions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably for finding all utilized functions versus checking the extension list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can also push a plan context that will get the extension list. didn't want the one PR to get too big.


// Visit provides a convenient way to traverse a Substrait plan.
// It walks all relations in the plan, calling the appropriate visitor methods
// based on what interfaces the visitor implements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth noting that you have to visit the entire plan. So if you wanted to stop early once you found something you couldn't handle (such as an unsupported type) you couldn't bail early.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should have an ability to exit early? the visitRel/visitExpr could return an "ok" and if any of the visits return !ok we don't traverse deeper. Wouldn't stop all other recursive branches though.

//
// If your visitor needs plan context (for extensions, etc.), construct it
// with NewPlanContext and pass it to your visitor's constructor.
func Walk(rel *proto.Rel, visitor Visitor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use this to start a visit at any point in the tree? An alternative name could be VisitRelation since it also visits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, yeah I like that name more. Will update.

)

// walk implements the core traversal logic for Substrait relations.
// For cycle detection in DAG structures, wrap your visitor with CycleDetectingVisitor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see CycleDetectingVisitor in this PR. That said are there any use cases where we would need them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, not really? If you were accepting substrait protobufs that were built within the service then there isn't anything in go to stop a cycle, but if you unmarshal the protobuf then it couldn't have a cycle. I think it's just cleaner to remove this for now.

@alexanderbianchi
Copy link
Author

Thanks @EpsilonPrime for the review, will fix up the comments and push a revision.

Copy link

codecov bot commented Jun 23, 2025

Codecov Report

Attention: Patch coverage is 88.59935% with 35 lines in your changes missing coverage. Please review.

Project coverage is 64.84%. Comparing base (121a143) to head (f0523ff).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
traverse/walk.go 88.65% 22 Missing and 10 partials ⚠️
traverse/visitor.go 76.92% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #136      +/-   ##
==========================================
+ Coverage   64.18%   64.84%   +0.66%     
==========================================
  Files          44       47       +3     
  Lines       11029    11336     +307     
==========================================
+ Hits         7079     7351     +272     
- Misses       3631     3655      +24     
- Partials      319      330      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants