Skip to content

Conversation

@cristian-zlai
Copy link
Contributor

@cristian-zlai cristian-zlai commented Dec 4, 2025

Summary

Docs for eval in test /dev loop
Plus a specific section for CI.

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive Eval guide with configuration validation, quick schema checks, sample-data testing, local test environment setup, Docker usage, and CI/CD workflow examples.
    • Expanded testing docs with pre-run validation, example commands/outputs, schema visualization, lineage, and testing workflows.
  • Chores
    • Updated an environment version reference in compiled test artifacts to "latest."

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Walkthrough

Added two documentation pages for Eval configuration validation and CI/local workflows for Zipline Hub; updated a compiled test model artifact by changing its common environment version string from "0.1.0+dev.piyush" to "latest".

Changes

Cohort / File(s) Summary
Eval documentation
docs/source/running_on_zipline_hub/Eval.md
New doc describing Eval configuration validation: quick schema checks, testing with sample data, local-eval via Docker, local Iceberg warehouse setup, and end-to-end workflows for development, PR validation, and backfill, plus CI examples (GitHub Actions, GitLab CI).
Test documentation update
docs/source/running_on_zipline_hub/Test.md
Added Eval section for pre-run validation: quick schema validation, sample-data tests, example commands/outputs, output schema and lineage visualization.
Compiled test artifact
python/test/canary/compiled/models/gcp/listing.v1__2
Single-line change: updated common env version from 0.1.0+dev.piyush to latest. No behavior changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Focus review on examples/commands in Eval.md and Test.md for accuracy and copy/paste readiness.
  • Verify the env version change in python/test/canary/compiled/models/gcp/listing.v1__2 is intentional and won’t affect test expectations.

Poem

Docs arrive with careful light,
Schemas checked through day and night,
Local tests and CI cheer,
One small version moved from here,
Eval steps set — the path is bright. ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: eval usage on site' clearly and concisely describes the main change: adding documentation about eval usage. It accurately reflects the core purpose of the pull request.
Description check ✅ Passed The description includes the required Summary section with relevant details and a Checklist section matching the template. However, the summary is brief and the Documentation update checkbox is unchecked despite being a documentation PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch crf-eval-docs

Comment @coderabbitai help to get the list of available commands and usage tips.

@cristian-zlai cristian-zlai marked this pull request as ready for review December 4, 2025 20:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
docs/source/running_on_zipline_hub/Test.md (2)

38-58: Add language specifier to output code block.

The fenced code block lacks a language identifier. While this is a CLI output example, specify a language (e.g., text or plaintext) to comply with markdown linting rules.

-```
+```text
 🟢 Eval job finished successfully
 Join Configuration: gcp.demo.user_features__1
 ...

19-78: Content organization: Eval section duplicates Eval.md.

The new Eval section here mirrors the "Quick Schema Validation" and "Testing with Sample Data" sections in the companion Eval.md file. Consider whether Test.md should link to Eval.md instead, or summarize briefly with a reference, to avoid maintaining duplicate content.

docs/source/running_on_zipline_hub/Eval.md (2)

23-43: Add language specifier to CLI output code block.

The fenced code block for example CLI output lacks a language identifier. Add text or plaintext to comply with markdown linting rules.

-```
+```text
 🟢 Eval job finished successfully
 Join Configuration: gcp.demo.user_features__1
 ...

179-183: Document /ping health check endpoint.

Line 189 uses a /ping endpoint to verify service readiness, but this endpoint isn't documented in the setup section. Add a note that the local-eval service exposes this health check endpoint, or clarify the expected behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1531426 and e3b8b2f.

📒 Files selected for processing (2)
  • docs/source/running_on_zipline_hub/Eval.md (1 hunks)
  • docs/source/running_on_zipline_hub/Test.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/source/running_on_zipline_hub/Test.md

23-23: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)
docs/source/running_on_zipline_hub/Eval.md (2)

167-177: Verify Docker image ziplineai/local-eval:latest availability.

The documentation references a Docker image that should be validated to exist in the project's registry or Docker Hub. Confirm the image is published and accessible in the intended CI environment.


1-311: Well-structured and comprehensive documentation.

The Eval feature documentation is clear, well-organized, and includes practical examples for development and CI/CD workflows. The progression from quick validation to advanced local testing is logical and helpful for users.

Comment on lines +83 to +155
```python
#!/usr/bin/env python3
"""Build a local Iceberg warehouse with test data for Chronon Eval testing."""

import os
from datetime import datetime
from pyspark.sql import SparkSession

def epoch_millis(iso_timestamp):
"""Convert ISO timestamp to epoch milliseconds"""
dt = datetime.fromisoformat(iso_timestamp.replace("Z", "+00:00"))
return int(dt.timestamp() * 1000)

def build_warehouse(warehouse_path, catalog_name="ci_catalog"):
"""Create Iceberg warehouse with test data"""

print(f"Creating test warehouse at: {warehouse_path}")
os.makedirs(warehouse_path, exist_ok=True)

# Initialize Spark with Iceberg support
spark = (
SparkSession.builder
.appName("chronon-test-warehouse-builder")
.master("local[*]")
.config("spark.jars.packages",
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3")
.config("spark.sql.extensions",
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
.config(f"spark.sql.catalog.{catalog_name}",
"org.apache.iceberg.spark.SparkCatalog")
.config(f"spark.sql.catalog.{catalog_name}.type", "hadoop")
.config(f"spark.sql.catalog.{catalog_name}.warehouse", warehouse_path)
.getOrCreate()
)

# Create namespace
print("Creating namespace 'data'...")
spark.sql("CREATE NAMESPACE IF NOT EXISTS data")
spark.sql(f"USE {catalog_name}")

# Create table with schema
print("Creating user_activities table...")
spark.sql("""
CREATE TABLE IF NOT EXISTS data.user_activities__0 (
user_id STRING,
event_time_ms BIGINT,
session_id STRING,
event_type STRING,
ds STRING
) USING iceberg
PARTITIONED BY (ds)
""")

# Insert test data
user_activities_data = [
("user_1", epoch_millis("2025-01-01T00:01:00Z"), "session_1", "view", "2025-01-01"),
("user_2", epoch_millis("2025-01-01T00:02:00Z"), "session_2", "click", "2025-01-01"),
]

df = spark.createDataFrame(
user_activities_data,
["user_id", "event_time_ms", "session_id", "event_type", "ds"]
)

df.writeTo(f"{catalog_name}.data.user_activities__0").createOrReplace()
print(f"✓ Inserted {df.count()} rows into user_activities__0")

spark.stop()
print(f"\n✓ Warehouse created successfully at: {warehouse_path}")

if __name__ == "__main__":
build_warehouse("/tmp/chronon-test-warehouse")
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

PySpark 3.5.0 Iceberg 1.4.3 compatibility production ready

💡 Result:

Short answer: Yes — Iceberg 1.4.3 provides a Spark 3.5 runtime and is intended to be used with Spark/PySpark 3.5.x in production, provided you use the matching runtime JAR (iceberg-spark-runtime-3.5_* with the correct Scala binary) and validate your Spark build/Scala version and any storage/SDK bundles you need. [1][2]

Practical checklist before production deploy:

  • Use the iceberg-spark-runtime-3.5_2.12 or _2.13 JAR that matches your Spark/PySpark Scala build. [2]
  • Don’t include other Iceberg modules (iceberg-core, etc.) on the Spark classpath — only the runtime and required storage bundles (aws/gcp/azure) to avoid dependency conflicts. [1][2]
  • Test your workload (reads, writes, partitioning, concurrency/transactions, compaction/manifest operations) end-to-end in a staging environment with your Spark distribution and JVM/Scala version. [1][2]
  • Keep an eye on vendor compatibility (if using a distribution like Cloudera, Databricks, etc.), since some distros bundle specific Iceberg versions. [3]

Sources:
[1] Apache Iceberg releases (1.4.3 includes Spark 3.5 runtime).
[2] Apache Iceberg multi-engine support / runtime guidance (Spark 3.5 integration and runtime-jar guidance).
[3] Example vendor compatibility notes (Cloudera) — verify distro-specific pinned Iceberg/Spark combinations.


Verify Scala binary version and add documentation for Iceberg JAR dependency management.

Iceberg 1.4.3 with Spark 3.5.x is production-ready, but the example should clarify that iceberg-spark-runtime-3.5_2.12 assumes Scala 2.12. Verify your Spark build uses matching Scala 2.12 (or use _2.13 variant if needed). Additionally, document that only the runtime JAR should be on the classpath—avoid including other Iceberg modules to prevent dependency conflicts. Recommend end-to-end testing in a staging environment with your actual Spark distribution and storage backend (S3/GCS/Azure) before production use.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
docs/source/running_on_zipline_hub/Test.md (1)

21-80: Consider generalizing the --conf path to cover all config types.

Since this page is about testing GroupBys, Joins and StagingQuerys, you might want the eval examples to mirror the backfill path pattern, e.g. compiled/{group_bys|staging_queries|joins}/{team}/{your_conf}, to make it clearer that eval works for more than joins.

docs/source/running_on_zipline_hub/Eval.md (1)

169-277: Tighten Docker usage in examples (image pinning and GitLab dind setup).

Two small robustness tweaks to consider:

  • Pin ziplineai/local-eval to a specific tag (and mention updating it over time) instead of :latest, so CI runs are reproducible and don’t silently change behavior on image updates.
  • In the GitLab CI example, image: python:3.11 plus services: docker:dind typically also requires installing the Docker CLI in the job image and setting DOCKER_HOST (per GitLab’s dind docs); calling docker run may otherwise fail.

These are minor, but making them explicit will help users copy the examples into real pipelines.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3b8b2f and 20f7e10.

⛔ Files ignored due to path filters (1)
  • docs/images/eval_sample.gif is excluded by !**/*.gif
📒 Files selected for processing (3)
  • docs/source/running_on_zipline_hub/Eval.md (1 hunks)
  • docs/source/running_on_zipline_hub/Test.md (1 hunks)
  • python/test/canary/compiled/models/gcp/listing.v1__2 (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/source/running_on_zipline_hub/Test.md

23-23: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: python_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: python_lint
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
python/test/canary/compiled/models/gcp/listing.v1__2 (1)

78-78: Env VERSION change looks fine.

Updating the canary env VERSION to "latest" in this compiled test artifact is safe and does not affect model logic or schema.

Comment on lines +120 to +137
# Create namespace
print("Creating namespace 'data'...")
spark.sql("CREATE NAMESPACE IF NOT EXISTS data")
spark.sql(f"USE {catalog_name}")

# Create table with schema
print("Creating user_activities table...")
spark.sql("""
CREATE TABLE IF NOT EXISTS data.user_activities__0 (
user_id STRING,
event_time_ms BIGINT,
session_id STRING,
event_type STRING,
ds STRING
) USING iceberg
PARTITIONED BY (ds)
""")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Spark Iceberg CREATE NAMESPACE current catalog behavior

💡 Result:

Short answer

  • Spark tracks a current catalog + namespace; unqualified CREATE NAMESPACE uses the current catalog and namespace. [1]
  • Iceberg’s SparkCatalog / SparkSessionCatalog implement namespace methods, and Spark passes the currentCatalog/currentNamespace to those methods, so CREATE NAMESPACE on an Iceberg catalog creates the namespace inside that catalog. (Iceberg’s SparkCatalog delegates namespace operations to the underlying Iceberg Catalog.) [2][3]

Notes / gotchas

  • SparkSessionCatalog wraps/creates a SparkCatalog for Iceberg tables, so behavior is the same but the catalog is backed by the configured Iceberg implementation. [3]
  • There are edge reports where people expected a new Spark catalog entry to appear in the Hive metastore but it didn’t — catalog registration vs. namespace creation are different concerns. Check your catalog config (spark.sql.catalog..*) and whether you’re using a Hive vs. session catalog. [4][1]

Sources

  • Apache Iceberg Spark configuration / using catalogs. [1]
  • Iceberg SparkCatalog javadoc (namespace/table/view API; currentCatalog/currentNamespace parameters). [2]
  • Iceberg SparkSessionCatalog javadoc (wraps/creates SparkCatalog). [3]
  • GitHub issue discussing Hive metastore vs. Spark catalog expectations. [4]

🏁 Script executed:

# Search the documentation file to understand the full context
cd docs/source/running_on_zipline_hub || true
if [ -f "Eval.md" ]; then
  head -150 Eval.md | tail -40
else
  find . -name "Eval.md" -type f 2>/dev/null | head -5
fi

Repository: zipline-ai/chronon

Length of output: 1523


Ensure the namespace is created in the Iceberg catalog, not the default catalog.

The CREATE NAMESPACE IF NOT EXISTS data statement runs before USE {catalog_name}, so the namespace is created in the default spark_catalog instead of the target Iceberg catalog. This causes a mismatch: the namespace data exists in spark_catalog, but the table and write operations target {catalog_name}.data. Reorder to switch catalogs first:

-    print("Creating namespace 'data'...")
-    spark.sql("CREATE NAMESPACE IF NOT EXISTS data")
-    spark.sql(f"USE {catalog_name}")
+    print(f"Creating namespace 'data' in catalog '{catalog_name}'...")
+    spark.sql(f"USE {catalog_name}")
+    spark.sql("CREATE NAMESPACE IF NOT EXISTS data")
🤖 Prompt for AI Agents
In docs/source/running_on_zipline_hub/Eval.md around lines 120 to 137, the
CREATE NAMESPACE runs before switching to the Iceberg catalog so the namespace
is created in the default spark_catalog; move the catalog switch so
spark.sql(f"USE {catalog_name}") is executed before creating the namespace (or
alternatively create the namespace fully qualified as {catalog_name}.data), then
proceed to create the table—this ensures the namespace is created in the
intended Iceberg catalog.

Comment on lines +38 to +58
```
🟢 Eval job finished successfully
Join Configuration: gcp.demo.user_features__1
- Left table: data.user_activity_7d__0
- Join parts: 2
- Conf dependencies: 3
- External tables: 2
- Output Schema:
[left] user_id: string
[left] event_timestamp: long
[left] ds: string
[joinPart: gcp.user_demographics__0] user_id_age: integer
[derivation] is_adult: boolean
Lineage:
[Join] gcp.demo.user_features__1
├── ✅ [GroupBy] gcp.user_activity_7d__0
│ └── External: project.events.user_clicks
└── ✅ [GroupBy] gcp.user_demographics__0
└── ✅ [StagingQuery] gcp.raw_demographics__0
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language to the example-output code fence (MD040).

The example output block is missing a language spec; consider marking it as plain text to satisfy markdownlint:

-```
+```text
🤖 Prompt for AI Agents
In docs/source/running_on_zipline_hub/Test.md around lines 38 to 58 the
example-output code fence is missing a language spec which triggers MD040;
update the opening triple-backtick to include a language (e.g., change ``` to
```text) so the block is explicitly marked as plain text and save the file.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/source/running_on_zipline_hub/Eval.md (1)

120-123: Fix namespace creation order to target the Iceberg catalog.

The namespace is created before switching catalogs, so it's created in the default spark_catalog instead of ci_catalog. Subsequent operations target ci_catalog.data, causing a mismatch. Move the USE statement first:

-    # Create namespace
-    print("Creating namespace 'data'...")
-    spark.sql("CREATE NAMESPACE IF NOT EXISTS data")
-    spark.sql(f"USE {catalog_name}")
+    # Create namespace
+    print(f"Creating namespace 'data' in catalog '{catalog_name}'...")
+    spark.sql(f"USE {catalog_name}")
+    spark.sql("CREATE NAMESPACE IF NOT EXISTS data")
🧹 Nitpick comments (1)
docs/source/running_on_zipline_hub/Eval.md (1)

109-117: Clarify Scala binary version and add guidance on Iceberg JAR compatibility.

The iceberg-spark-runtime-3.5_2.12 JAR assumes Scala 2.12. If your Spark build uses Scala 2.13, you'll need the _2.13 variant. Add a note clarifying this dependency and recommending verification before production use:

         .config("spark.jars.packages",
-                "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3")
+                "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.3")  # Use _2.13 if your Spark uses Scala 2.13
         .config("spark.sql.extensions",
                 "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions")
         .config(f"spark.sql.catalog.{catalog_name}",
                 "org.apache.iceberg.spark.SparkCatalog")
         .config(f"spark.sql.catalog.{catalog_name}.type", "hadoop")
         .config(f"spark.sql.catalog.{catalog_name}.warehouse", warehouse_path)
+    )
+
+    # Note: Verify that iceberg-spark-runtime JAR matches your Spark/Scala version.
+    # Use `spark.jars.packages` matching your Scala build (_2.12 vs _2.13).
+    # For production, test end-to-end with your actual Spark distribution.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20f7e10 and 74d8dde.

📒 Files selected for processing (1)
  • docs/source/running_on_zipline_hub/Eval.md (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/source/running_on_zipline_hub/Eval.md

23-23: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: python_lint
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows

Comment on lines +23 to +43
```
🟢 Eval job finished successfully
Join Configuration: gcp.demo.user_features__1
- Left table: data.user_activity_7d__0
- Join parts: 2
- Conf dependencies: 3
- External tables: 2
- Output Schema:
[left] user_id: string
[left] event_timestamp: long
[left] ds: string
[joinPart: gcp.user_demographics__0] user_id_age: integer
[derivation] is_adult: boolean
Lineage:
[Join] gcp.demo.user_features__1
├── ✅ [GroupBy] gcp.user_activity_7d__0
│ └── External: project.events.user_clicks
└── ✅ [GroupBy] gcp.user_demographics__0
└── ✅ [StagingQuery] gcp.raw_demographics__0
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Specify language for code fence showing example output.

The example output block is missing a language identifier, which triggers a linter warning. Use text or plaintext:

-```
+```text
 🟢 Eval job finished successfully
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

23-23: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/source/running_on_zipline_hub/Eval.md around lines 23 to 43, the fenced
code block showing example output lacks a language identifier which triggers a
linter warning; update the opening triple backticks to include a language such
as text or plaintext (e.g. ```text) so the block is explicitly marked as plain
text and the linter warning is resolved.

@cristian-zlai cristian-zlai added this pull request to the merge queue Dec 4, 2025
Merged via the queue into main with commit 39c4792 Dec 4, 2025
61 of 63 checks passed
@cristian-zlai cristian-zlai deleted the crf-eval-docs branch December 4, 2025 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants