-
Notifications
You must be signed in to change notification settings - Fork 298
docs: add spark connect doc page #3919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add spark connect doc page #3919
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3919 +/- ##
==========================================
+ Coverage 75.35% 77.60% +2.25%
==========================================
Files 767 768 +1
Lines 103619 98818 -4801
==========================================
- Hits 78080 76689 -1391
+ Misses 25539 22129 -3410 🚀 New features to boost your workflow:
|
@@ -0,0 +1,30 @@ | |||
# PySpark. | |||
|
|||
The `daft.pyspark` module provides a way to create a PySpark session that can be run locally or backed by a ray cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a relative link between user guide and API docs for daft.pyspark
method?
docs/mkdocs.yml
Outdated
@@ -43,6 +43,7 @@ nav: | |||
- Tutorials: resources/tutorials.md | |||
- Benchmarks: resources/benchmarks/tpch.md # Benchmarks can expand into a folder once we have more | |||
- Telemetry: resources/telemetry.md | |||
- Spark Connect: spark_connect.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to decide where best to fit this doc in terms of TOC. I feel like it would fit best under Integrations
but at the same it's not exactly the same as the other integrations. I almost want to put it under Migration Guide
but it's also not exactly that either.
If you think it belongs in L1, I would move it in between Catalogs and Distributed Computing maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future idea: I feel like we should segment the integrations further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move it to top-level. Feels like a big enough feature
docs/mkdocs/spark_connect.md
Outdated
|
||
The `daft.pyspark` module provides a way to create a PySpark session that can be run locally or backed by a ray cluster. | ||
|
||
This serves as a way to run the daft query engine, but with a spark compatible API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry small nit, can we capitalize Daft and Spark and Ray on the previous line?
docs/mkdocs.yml
Outdated
@@ -43,6 +43,7 @@ nav: | |||
- Tutorials: resources/tutorials.md | |||
- Benchmarks: resources/benchmarks/tpch.md # Benchmarks can expand into a folder once we have more | |||
- Telemetry: resources/telemetry.md | |||
- Spark Connect: spark_connect.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move it to top-level. Feels like a big enough feature
Note for reviewers
I tried to use the sphinx autodoc stuff, but the markdown in
daft.pyspark
wasn't rendering properly so i just copy/pasted it. But i don't think we're even using the sphinx stuff anymore so 🤷