Skip to content

Conversation

@piyush-zlai
Copy link
Contributor

@piyush-zlai piyush-zlai commented Dec 9, 2025

Summary

Added a first cut implementation of adding the rails to submit training jobs & deployments with the Vertex platform. The implementation is fairly basic atm (e.g 0 to 1 traffic rollouts, support for additional deploy types like b/g). We can flesh these out in follow ups - the current implementation allows us to start building a basic end to end scaffolding and MVP.

Pulled in the Vertex SDK to trigger these. We needed the HTTP client for the Vertex predict calls (to support custom & published models - this isn't as easy to configure in the SDK) but using the SDK for training and deploy is a lot more ergonomic.

Testing

Tested using the local integration test that submits a training run, creates and endpoint, uploads a model and deploys it to an endpoint. Was able to validate that the model is up and can be hit for serving.

INFO  a.c.i.cloud_gcp.VertexOrchestration - Submitting training job for test_ctr_model-1.0; Python pkg: gs://zipline-warehouse-models/builds/test_ctr_model-1.0.tar.gz; Model output dir: gs://zipline-warehouse-models/training_output/test_ctr_model-1.0/2025-12-08
INFO  a.c.i.cloud_gcp.VertexOrchestration - Training job submitted successfully: projects/703996152583/locations/us-central1/customJobs/5518405205760147456
INFO  a.c.i.c.VertexOrchestrationTest - Training job name: projects/703996152583/locations/us-central1/customJobs/5518405205760147456
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for training job. Current state: UNKNOWN
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for training job. Current state: PENDING
...
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for training job. Current state: RUNNING
...

INFO  a.c.i.c.VertexOrchestrationTest - Step 2: Creating endpoint (if absent)...
INFO  a.c.i.c.VertexOrchestrationTest - ================================================================================
INFO  a.c.i.cloud_gcp.VertexOrchestration - Found existing endpoint: projects/703996152583/locations/us-central1/endpoints/8483716271198699520
INFO  a.c.i.c.VertexOrchestrationTest - Endpoint resource name: projects/703996152583/locations/us-central1/endpoints/8483716271198699520
INFO  a.c.i.c.VertexOrchestrationTest - 
================================================================================
INFO  a.c.i.c.VertexOrchestrationTest - Step 2: Deploying model (endpoint creation + model upload + deployment)...
INFO  a.c.i.c.VertexOrchestrationTest - ================================================================================
INFO  a.c.i.cloud_gcp.VertexOrchestration - Found existing endpoint: projects/703996152583/locations/us-central1/endpoints/8483716271198699520
INFO  a.c.i.cloud_gcp.VertexOrchestration - Uploading model: test_ctr_model-1.0; Artifact URI: gs://zipline-warehouse-models/training_output/test_ctr_model-1.0/2025-12-08/model; Container Image: us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.2-1:latest
INFO  a.c.i.cloud_gcp.VertexOrchestration - Model upload completed: projects/703996152583/locations/us-central1/models/7906534239267454976
INFO  a.c.i.cloud_gcp.VertexOrchestration - 
Deploying model to endpoint. Details:
Model: projects/703996152583/locations/us-central1/models/7906534239267454976;
Endpoint: projects/703996152583/locations/us-central1/endpoints/8483716271198699520;
Replicas: 3-10
Machine type: n1-standard-4

INFO  a.c.i.cloud_gcp.VertexOrchestration - Model deployment initiated. Operation: projects/703996152583/locations/us-central1/endpoints/8483716271198699520/operations/5139381969750065152
INFO  a.c.i.c.VertexOrchestrationTest - Deployment ID: projects/703996152583/locations/us-central1/endpoints/8483716271198699520/operations/5139381969750065152
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for deployment. Current state: UNKNOWN
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for deployment. Current state: RUNNING
...

Checklist

  • Added Unit Tests
  • Covered by existing CI
  • Integration tested
  • Documentation update

Summary by CodeRabbit

  • New Features
    • Vertex AI orchestration: submit/monitor training jobs, create/manage endpoints, deploy models, and query operation/job status.
  • New Utilities
    • Centralized HTTP client and prediction request/response helpers for Vertex interactions.
  • API
    • Model platform API extended with training/deploy requests, operations, and job status types.
  • Tests
    • Added comprehensive tests covering orchestration, model construction, deployment, and status mapping.
  • Chores
    • Added GCP AI Platform client dependency.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 9, 2025

Walkthrough

Introduces a Vertex AI orchestration and HTTP client, refactors VertexPlatform to delegate training, deployment, and prediction to these new layers, adds ModelPlatform long-running operation APIs, updates tests, and adds the google-cloud-aiplatform 3.79.0 dependency.

Changes

Cohort / File(s) Summary
Maven Dependency
cloud_gcp/package.mill
Added com.google.cloud:google-cloud-aiplatform:3.79.0 to mvnDeps
HTTP client + utils
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala
New VertexHttpClient (Vert.x WebClient wrapper with ADC OAuth2 token retrieval) and VertexHttpUtils (Scala→Vert.x JSON conversion, prediction request builder, prediction extractor)
Orchestration layer
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
New VertexOrchestration managing Vertex SDK clients, job submission, endpoint creation, model upload/deploy, operation/job status translation, and lifecycle/close
Platform refactor
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala
Constructor removed credentials param; added httpClient and orchestration; delegate predict, submitTrainingJob, createEndpoint, deployModel, getJobStatus to new layers; removed inline JSON/token helpers
API extensions
online/src/main/scala/ai/chronon/online/Api.scala
Added TrainingRequest, DeployModelRequest, ModelOperation (sealed trait with cases), ModelJobStatus and extended ModelPlatform with submitTrainingJob, createEndpoint, deployModel, getJobStatus
GCP tests
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala, cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala
Added orchestration tests; updated platform tests to drop credentials and use VertexHttpUtils
Online test doubles
online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala
Added stub overrides for new ModelPlatform long-running APIs in test doubles
Spark tests
spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala
Added placeholder overrides for new ModelPlatform APIs in test platforms
Python formatting
python/test/canary/models/gcp/click_through_rate.py
Added trailing newline (formatting only)

Sequence Diagrams

sequenceDiagram
    participant Client
    participant VP as VertexPlatform
    participant HC as VertexHttpClient
    participant VAI as Vertex AI REST
    Client->>VP: predict(model, inputs, params)
    activate VP
    VP->>HC: makeHttpRequest(url, POST, body)
    activate HC
    HC->>HC: fetch OAuth2 token (ADC)
    HC->>VAI: POST /predict (Bearer token + JSON body)
    activate VAI
    VAI-->>HC: predictions JSON
    deactivate VAI
    HC-->>VP: response JSON
    deactivate HC
    VP->>VP: VertexHttpUtils.extractPredictionResults(...)
    VP-->>Client: predictions
    deactivate VP
Loading
sequenceDiagram
    participant Client
    participant VP as VertexPlatform
    participant VO as VertexOrchestration
    participant SDK as Vertex AI SDK
    Client->>VP: submitTrainingJob(request)
    activate VP
    VP->>VO: submitTrainingJob(request)
    activate VO
    VO->>SDK: JobServiceClient.createCustomJob(...)
    activate SDK
    SDK-->>VO: operation / job name
    deactivate SDK
    VO-->>VP: job name
    deactivate VO
    VP-->>Client: job name
    deactivate VP
Loading
sequenceDiagram
    participant Client
    participant VP as VertexPlatform
    participant VO as VertexOrchestration
    participant SDK as Vertex AI SDK
    Client->>VP: deployModel(request)
    activate VP
    VP->>VO: deployModel(request)
    activate VO
    VO->>SDK: ModelServiceClient.uploadModel(...)
    activate SDK
    SDK-->>VO: model resource name
    deactivate SDK
    VO->>SDK: EndpointServiceClient.deployModel(...)
    activate SDK
    SDK-->>VO: operation name
    deactivate SDK
    VO-->>VP: operation name
    deactivate VO
    VP-->>Client: operation name
    deactivate VP
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Pay extra attention to: VertexOrchestration (SDK usage, resource names, polling/operation handling), VertexHttpClient (token refresh, concurrency, null-on-failure handling), alignment between new Api.scala method contracts and orchestration implementations, and updated tests referencing new utilities.

Poem

🧭 New routes through Vertex skies,
Tokens fetched as requests fly,
Jobs and endpoints spin and hum,
Deployments march until they're done,
Code and cloud now dance as one.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: extending the API and VertexPlatform to support training job submission and model deployments.
Description check ✅ Passed The description provides a clear summary, testing details with logs, and a completed checklist. All template sections are addressed with substantive content.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch piyush/models_orch_platform

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (5)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (2)

34-36: Unreachable default case.

HttpMethod is sealed with 4 exhaustive cases; the wildcard _ is dead code.

       case DeleteMethod => client.deleteAbs(url)
-      case _            => throw new IllegalArgumentException(s"Currently unsupported HTTP method: $method")

107-107: filterKeys is deprecated in Scala 2.13+.

Use view.filterKeys(...).toMap instead.

-    val additionalParams = modelParams.filterKeys(k => k != "model_name" && k != "model_type")
+    val additionalParams = modelParams.view.filterKeys(k => k != "model_name" && k != "model_type").toMap
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (1)

23-35: Missing close() method to release orchestration resources.

VertexOrchestration has a close() for SDK clients. Consider exposing it here to prevent resource leaks.

+  def close(): Unit = {
+    orchestration.close()
+  }
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (2)

297-306: Null checks on lazy vals are unnecessary.

Lazy vals are initialized on first access; they're never null after class instantiation.

   def close(): Unit = {
     try {
-      if (jobServiceClient != null) jobServiceClient.close()
-      if (endpointServiceClient != null) endpointServiceClient.close()
-      if (modelServiceClient != null) modelServiceClient.close()
+      jobServiceClient.close()
+      endpointServiceClient.close()
+      modelServiceClient.close()
     } catch {
       case e: Exception =>
         logger.warn(s"Error closing SDK clients: ${e.getMessage}", e)
     }
   }

108-108: iterateAll() loads all endpoints into memory.

For projects with many endpoints, consider pagination or early termination on match.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71e15ef and 77411fa.

📒 Files selected for processing (10)
  • cloud_gcp/package.mill (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (1 hunks)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (3 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (1 hunks)
  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (5 hunks)
  • online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
  • online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (4 hunks)
  • python/test/canary/models/gcp/click_through_rate.py (1 hunks)
  • spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala (3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

  • cloud_gcp/package.mill
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala
🧬 Code graph analysis (4)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (9)
  • VertexOrchestration (14-307)
  • VertexOrchestration (309-445)
  • buildCustomJob (316-377)
  • buildModel (379-407)
  • parseJobState (435-444)
  • submitTrainingJob (48-83)
  • createEndpoint (85-94)
  • deployModel (145-176)
  • getOperationStatus (274-295)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (3)
  • submitTrainingJob (109-111)
  • createEndpoint (113-115)
  • deployModel (124-126)
online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (2)
online/src/main/scala/ai/chronon/online/Api.scala (4)
  • DeployModelRequest (303-303)
  • PredictRequest (300-300)
  • PredictResponse (301-301)
  • TrainingRequest (302-302)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (4)
  • submitTrainingJob (109-111)
  • createEndpoint (113-115)
  • deployModel (124-126)
  • getJobStatus (117-122)
online/src/main/scala/ai/chronon/online/Api.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (3)
  • submitTrainingJob (48-83)
  • createEndpoint (85-94)
  • deployModel (145-176)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (4)
  • submitTrainingJob (109-111)
  • createEndpoint (113-115)
  • deployModel (124-126)
  • getJobStatus (117-122)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (3)
  • VertexHttpUtils (66-134)
  • createPredictionRequestBody (93-117)
  • extractPredictionResults (122-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: python_tests
  • GitHub Check: build
  • GitHub Check: enforce_triggered_workflows
🔇 Additional comments (23)
python/test/canary/models/gcp/click_through_rate.py (1)

77-77: ✓ Formatting adjustment approved.

Trailing newline added. The canary test configuration properly exercises the new Vertex AI integration with comprehensive training and deployment specifications aligned with the PR objectives.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (4)

1-14: LGTM!

Imports and test structure look correct for unit testing VertexOrchestration builders.


15-64: LGTM!

Thorough verification of CustomJob construction including worker pool specs, args conversion, and output directory.


66-121: LGTM!

Good coverage of default values and container spec mapping including env vars.


123-157: LGTM!

Comprehensive coverage of DeployedModel resources and JobState enum mapping.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (5)

24-24: LGTM!

Constructor updated to match new VertexPlatform signature without credentials parameter.


143-143: LGTM!

Correctly migrated to VertexHttpUtils for request/response handling.


198-198: LGTM!

Consistent use of VertexHttpUtils.extractPredictionResults.


217-217: LGTM!


233-233: LGTM!

online/src/main/scala/ai/chronon/online/Api.scala (2)

302-309: LGTM!

Clean sealed trait design for ModelOperation. The data classes provide a clear contract for training and deployment operations.


324-341: LGTM!

Well-designed async interface for ML lifecycle operations. The getJobStatus dispatch via sealed trait is clean.

online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (4)

6-6: LGTM!

Import updated to include new API types.


483-488: LGTM!

Required method stubs to satisfy updated ModelPlatform interface. Using ??? is fine for test doubles that won't call these methods.


506-511: LGTM!

Same pattern for CallTrackingModelPlatform.


535-540: LGTM!

Same pattern for MultiOutputModelPlatform.

spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala (2)

781-786: LGTM!

Placeholder stubs satisfy the interface contract for tests that don't exercise training/deployment paths.


796-800: LGTM!

Same as above - stubs are appropriate for the failing mock.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (1)

109-126: LGTM!

Clean delegation to orchestration layer. Operations are well-mapped.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (4)

48-83: LGTM!

Training job submission is well-structured with appropriate logging and error handling.


309-314: LGTM!

Clean utility for Google/Scala Future conversion.


435-444: LGTM!

Exhaustive job state mapping.


226-227: No action needed. The traffic split key "0" is the correct and documented convention for Vertex AI's deployModel API. According to the API specification, "0" is a special placeholder that refers to the DeployedModel being created, and the API automatically replaces it with the actual assigned deployedModel ID. Using "0" is not an assumption but the intended API contract.

mvn"org.apache.iceberg::iceberg-spark-runtime-3.5:1.10.0",
mvn"com.google.cloud:google-cloud-bigquery:2.54.1",
mvn"com.google.cloud:google-cloud-bigtable:2.57.1",
mvn"com.google.cloud:google-cloud-aiplatform:3.79.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

google-cloud-aiplatform 3.79.0 latest version Maven Central

💡 Result:

3.79.0 is available on Maven Central (google-cloud-aiplatform-bom). [1]
However, newer releases (3.80.0 and 3.81.0) are already published (see artifact listings). [2]

Sources:
[1] repo1.maven.org — com/google/cloud/google-cloud-aiplatform-bom (shows 3.79.0).
[2] mvnrepository.com — com.google.cloud:google-cloud-aiplatform (lists 3.80.0 / 3.81.0 as newer).


🌐 Web query:

google-cloud-aiplatform security vulnerabilities CVE 2024 2025

💡 Result:

Short answer: I found no public CVEs specifically affecting the google-cloud-aiplatform (Vertex AI) client/library for 2024–2025. Sources that track package CVEs report no direct vulnerabilities, and Google Cloud’s security bulletins list CVEs for GCP services and infrastructure but do not show a CVE attributed to the google-cloud-aiplatform client in 2024–2025. [1][2][3]

Sources:

  • Snyk package vulnerability page — no direct vulnerabilities for google-cloud-aiplatform. [1]
  • PyPI project page for google-cloud-aiplatform (release history / metadata). [2]
  • Google Cloud security bulletins / CVE program (lists GCP product CVEs; no CVE entry tied to the google-cloud-aiplatform client for 2024–2025). [3]

If you want, I can:

  • search the NVD/CVE database and OSV for matches by package name and date range, or
  • monitor/alert and return results if any CVE appears for this package. Which would you prefer?

🏁 Script executed:

#!/bin/bash
# Query Maven Central for latest google-cloud-aiplatform version
curl -s "https://search.maven.org/solrsearch/select?q=g:com.google.cloud+AND+a:google-cloud-aiplatform&rows=20&wt=json" | jq -r '.response.docs[] | "\(.v)  \(.timestamp)"' | head -20

Repository: zipline-ai/chronon

Length of output: 119


Update aiplatform library to latest version.

Version 3.79.0 is outdated; 3.81.0 is the latest available release on Maven Central. No known security vulnerabilities exist for 3.79.0, but updating to the latest version is recommended for access to latest features and bug fixes.

🤖 Prompt for AI Agents
In cloud_gcp/package.mill around line 50, the pinned dependency
"com.google.cloud:google-cloud-aiplatform:3.79.0" is outdated; update the
version string to "3.81.0" to use the latest release, then run a build (or
dependency resolution) to verify compatibility and update any lockfiles or
generated dependency manifests accordingly.

Comment on lines +29 to +63
def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)(
handler: HttpResponse[Buffer] => T): Unit = {
val request = method match {
case GetMethod => client.getAbs(url)
case PostMethod => client.postAbs(url)
case PutMethod => client.putAbs(url)
case DeleteMethod => client.deleteAbs(url)
case _ => throw new IllegalArgumentException(s"Currently unsupported HTTP method: $method")
}

// Add common headers
val requestWithHeaders = request
.putHeader("Authorization", s"Bearer $getAccessToken")
val finalRequest = if (requestBody.isDefined) {
requestWithHeaders
.putHeader("Content-Type", "application/json")
.putHeader("Accept", "application/json")
} else {
requestWithHeaders
}

// Send the request
val responseFuture = requestBody match {
case Some(body) => finalRequest.sendJsonObject(body)
case None => finalRequest.send()
}

responseFuture.onComplete { asyncResult =>
if (asyncResult.succeeded()) {
handler(asyncResult.result())
} else {
handler(null)
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Returning null on failure loses error context.

When the request fails, handler(null) discards the actual exception. Consider passing the failure to the handler or using a Try/Either.

-  def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)(
-      handler: HttpResponse[Buffer] => T): Unit = {
+  def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)(
+      handler: Either[Throwable, HttpResponse[Buffer]] => T): Unit = {
     val request = method match {
       ...
     }
     ...
     responseFuture.onComplete { asyncResult =>
       if (asyncResult.succeeded()) {
-        handler(asyncResult.result())
+        handler(Right(asyncResult.result()))
       } else {
-        handler(null)
+        handler(Left(asyncResult.cause()))
       }
     }
   }
🤖 Prompt for AI Agents
In
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala
around lines 29-63, the handler is invoked with null on failure which discards
the error; change the handler signature to accept an Either[Throwable,
HttpResponse[Buffer]] (or a (Throwable, Option[HttpResponse[Buffer]]) pair) and
update this method to call handler(Right(response)) on success and
handler(Left(asyncResult.cause())) on failure so the caller receives the
original exception; update all call sites accordingly to handle the Either and
preserve error context in logs/flows.

Comment on lines +96 to +101

inputRequests.foreach { inputRequest =>
val instance = inputRequest("instance")
val jsonInstance = convertToVertxJson(instance)
instancesArray.add(jsonInstance)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential NoSuchElementException if "instance" key is missing.

inputRequest("instance") throws if key absent. Consider using get with error handling.

-      val instance = inputRequest("instance")
+      val instance = inputRequest.getOrElse("instance",
+        throw new IllegalArgumentException(s"Missing 'instance' key in input request"))
🤖 Prompt for AI Agents
In
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala
around lines 96–101, the code uses inputRequest("instance") which will throw a
NoSuchElementException if the "instance" key is missing; replace the direct
apply with a safe lookup (e.g., inputRequest.get("instance") or getOrElse) and
handle the missing-key case explicitly — either log and skip that request, or
throw a clear IllegalArgumentException with contextual information (including
request id or content) before calling convertToVertxJson, ensuring you also
validate the retrieved value's type/shape before adding to instancesArray.

Comment on lines +129 to +131
(0 until predictions.size()).map { index =>
val predictionJsonObject = predictions.getJsonObject(index)
predictionJsonObject.getMap.asScala.toMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

getJsonObject returns null for non-object entries.

If predictions contain primitives or arrays, this will NPE. Add defensive handling.

Comment on lines +87 to +100
httpClient.makeHttpRequest(url, PostMethod, Some(requestBody)) { response =>
if (response != null) {
if (response.statusCode() == 200) {
val responseBody = response.bodyAsJsonObject()
val results = VertexHttpUtils.extractPredictionResults(responseBody)
promise.success(PredictResponse(predictRequest, Success(results)))
} else {
promise.success(
PredictResponse(predictRequest,
Failure(new RuntimeException(s"HTTP Request failed", asyncResult.cause()))))
val errorMsg = s"HTTP Request failed: ${response.statusCode()}: ${response.bodyAsString()}"
promise.success(PredictResponse(predictRequest, Failure(new RuntimeException(errorMsg))))
}
} else {
promise.success(PredictResponse(predictRequest, Failure(new RuntimeException("HTTP Request failed"))))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error context lost when HTTP request fails.

After adopting VertexHttpClient, null response gives a generic "HTTP Request failed" message. Consider propagating actual error from the client.

🤖 Prompt for AI Agents
In
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala
around lines 87–100, the code returns a generic RuntimeException("HTTP Request
failed") when the httpClient callback yields a null response, losing the
client's actual error detail; update the call-site and client to propagate the
client's error info instead: change VertexHttpClient.makeHttpRequest to include
an error/Throwable (or Option[String] message) in the callback when a request
fails, then here use that error/message to construct the Failure (e.g.,
Failure(new RuntimeException(s"HTTP Request failed: " +
errorMessageOrThrowable))) so the real error from the HTTP client is preserved
in the PredictResponse.

Comment on lines +206 to +209
// Step 3: Deploy model
logger.info("\n" + "=" * 80)
logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...")
logger.info("=" * 80)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor typo: log says "Step 2" but this is Step 3.

-    logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...")
+    logger.info("Step 3: Deploying model (endpoint creation + model upload + deployment)...")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Step 3: Deploy model
logger.info("\n" + "=" * 80)
logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...")
logger.info("=" * 80)
// Step 3: Deploy model
logger.info("\n" + "=" * 80)
logger.info("Step 3: Deploying model (endpoint creation + model upload + deployment)...")
logger.info("=" * 80)
🤖 Prompt for AI Agents
In
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala
around lines 206-209, the logger message incorrectly reads "Step 2: Deploying
model..." while the surrounding comment and flow indicate this is Step 3; update
the logged string to "Step 3: Deploying model (endpoint creation + model upload
+ deployment)..." (and ensure any other adjacent step messages remain correctly
numbered).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (3)

44-44: ExecutionContext lifecycle not managed.

The execution context is created but never shut down. For long-running usage, consider tracking lifecycle or using a managed context.


99-118: Endpoint lookup lists all endpoints.

At scale, listing all endpoints to find by display name could be slow. Consider caching or indexing if endpoint count grows.


163-165: Consider auto-creating missing endpoints.

Currently fails if endpoint not found. For better UX, could automatically call createEndpoint instead.

-      case None =>
-        Future.failed(
-          new RuntimeException(
-            s"Endpoint ${endpointConfig.getEndpointName} not found. Please create it first using createEndpoint."))
+      case None =>
+        logger.info(s"Endpoint ${endpointConfig.getEndpointName} not found, creating...")
+        createNewEndpoint(endpointConfig.getEndpointName)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77411fa and f74c31f.

📒 Files selected for processing (1)
  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2025-01-24T23:55:40.650Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.

Applied to files:

  • cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)
  • GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / join_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / join_tests
  • GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
  • GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
  • GitHub Check: python_lint
  • GitHub Check: python_tests
  • GitHub Check: enforce_triggered_workflows
  • GitHub Check: build
  • GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (6)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (6)

48-84: LGTM.

Training job submission logic is well-structured with proper validation and error handling.


182-205: LGTM.

Model upload with proper async handling and logging.


210-241: LGTM.

Deployment logic correctly implements the MVP 0→1 traffic rollout as intended. Future blue/green strategies acknowledged in PR objectives.


245-297: LGTM.

Status checking methods handle states and errors comprehensively with appropriate fallback to UNKNOWN status.


299-447: LGTM.

Helper methods cleanly construct Vertex AI resources with appropriate optional field handling and state mapping.


170-170: Verify model artifact path convention.

The code assumes training outputs to $outputDir/model, but this depends on the training script. Ensure training jobs write artifacts to the expected model subdirectory.

#!/bin/bash
# Search for documentation or examples of modelArtifactUri construction patterns
rg -n -C3 'modelArtifactUri|model_artifact|training_output.*model' --type scala

@piyush-zlai piyush-zlai added this pull request to the merge queue Dec 9, 2025
Merged via the queue into main with commit d326844 Dec 9, 2025
45 checks passed
@piyush-zlai piyush-zlai deleted the piyush/models_orch_platform branch December 9, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants