docs: add spark connect doc page #3919

universalmind303 · 2025-03-05T23:17:53Z

Note for reviewers

I tried to use the sphinx autodoc stuff, but the markdown in daft.pyspark wasn't rendering properly so i just copy/pasted it. But i don't think we're even using the sphinx stuff anymore so 🤷

codecov · 2025-03-05T23:50:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.60%. Comparing base (7ef32fc) to head (58e23a5).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3919      +/-   ##
==========================================
+ Coverage   75.35%   77.60%   +2.25%     
==========================================
  Files         767      768       +1     
  Lines      103619    98818    -4801     
==========================================
- Hits        78080    76689    -1391     
+ Misses      25539    22129    -3410

see 44 files with indirect coverage changes

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ccmao1130 · 2025-03-05T23:56:26Z

docs/mkdocs/spark_connect.md

@@ -0,0 +1,30 @@
+# PySpark.
+
+The `daft.pyspark` module provides a way to create a PySpark session that can be run locally or backed by a ray cluster.


Can we add a relative link between user guide and API docs for daft.pyspark method?

ccmao1130 · 2025-03-06T00:25:23Z

docs/mkdocs.yml

@@ -43,6 +43,7 @@ nav:
    - Tutorials: resources/tutorials.md
    - Benchmarks: resources/benchmarks/tpch.md # Benchmarks can expand into a folder once we have more
    - Telemetry: resources/telemetry.md
+  - Spark Connect: spark_connect.md


I'm trying to decide where best to fit this doc in terms of TOC. I feel like it would fit best under Integrations but at the same it's not exactly the same as the other integrations. I almost want to put it under Migration Guide but it's also not exactly that either.

If you think it belongs in L1, I would move it in between Catalogs and Distributed Computing maybe?

Future idea: I feel like we should segment the integrations further

I think we can move it to top-level. Feels like a big enough feature

ccmao1130 · 2025-03-06T00:26:15Z

docs/mkdocs/spark_connect.md

+
+The `daft.pyspark` module provides a way to create a PySpark session that can be run locally or backed by a ray cluster.
+
+This serves as a way to run the daft query engine, but with a spark compatible API.


Sorry small nit, can we capitalize Daft and Spark and Ray on the previous line?

jaychia · 2025-03-06T05:15:31Z

docs/mkdocs.yml

@@ -43,6 +43,7 @@ nav:
    - Tutorials: resources/tutorials.md
    - Benchmarks: resources/benchmarks/tpch.md # Benchmarks can expand into a folder once we have more
    - Telemetry: resources/telemetry.md
+  - Spark Connect: spark_connect.md


I think we can move it to top-level. Feels like a big enough feature

add pyspark docs

1d8358c

universalmind303 requested review from jaychia and ccmao1130 March 5, 2025 23:18

github-actions bot added the docs label Mar 5, 2025

update mkdocs too

7ddb2b4

ccmao1130 reviewed Mar 6, 2025

View reviewed changes

jaychia approved these changes Mar 6, 2025

View reviewed changes

pr feedback and remove sphinx changes

58e23a5

universalmind303 merged commit 2e3189f into Eventual-Inc:main Mar 6, 2025
45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add spark connect doc page #3919

docs: add spark connect doc page #3919

Uh oh!

universalmind303 commented Mar 5, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 5, 2025 •

edited

Loading

Uh oh!

ccmao1130 Mar 5, 2025

Uh oh!

ccmao1130 Mar 6, 2025

Uh oh!

ccmao1130 Mar 6, 2025

Uh oh!

jaychia Mar 6, 2025

Uh oh!

ccmao1130 Mar 6, 2025

Uh oh!

jaychia Mar 6, 2025

Uh oh!

Uh oh!

Uh oh!

		@@ -0,0 +1,30 @@
		# PySpark.

		The `daft.pyspark` module provides a way to create a PySpark session that can be run locally or backed by a ray cluster.


		The `daft.pyspark` module provides a way to create a PySpark session that can be run locally or backed by a ray cluster.

		This serves as a way to run the daft query engine, but with a spark compatible API.

docs: add spark connect doc page #3919

docs: add spark connect doc page #3919

Uh oh!

Conversation

universalmind303 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note for reviewers

Uh oh!

codecov bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ccmao1130 Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

ccmao1130 Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

ccmao1130 Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

jaychia Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

ccmao1130 Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

jaychia Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

universalmind303 commented Mar 5, 2025 •

edited

Loading

codecov bot commented Mar 5, 2025 •

edited

Loading