Extend Api + VertexPlatform to support submitting training jobs + deployments #1331

piyush-zlai · 2025-12-09T17:49:58Z

Summary

Added a first cut implementation of adding the rails to submit training jobs & deployments with the Vertex platform. The implementation is fairly basic atm (e.g 0 to 1 traffic rollouts, support for additional deploy types like b/g). We can flesh these out in follow ups - the current implementation allows us to start building a basic end to end scaffolding and MVP.

Pulled in the Vertex SDK to trigger these. We needed the HTTP client for the Vertex predict calls (to support custom & published models - this isn't as easy to configure in the SDK) but using the SDK for training and deploy is a lot more ergonomic.

Testing

Tested using the local integration test that submits a training run, creates and endpoint, uploads a model and deploys it to an endpoint. Was able to validate that the model is up and can be hit for serving.

INFO  a.c.i.cloud_gcp.VertexOrchestration - Submitting training job for test_ctr_model-1.0; Python pkg: gs://zipline-warehouse-models/builds/test_ctr_model-1.0.tar.gz; Model output dir: gs://zipline-warehouse-models/training_output/test_ctr_model-1.0/2025-12-08
INFO  a.c.i.cloud_gcp.VertexOrchestration - Training job submitted successfully: projects/703996152583/locations/us-central1/customJobs/5518405205760147456
INFO  a.c.i.c.VertexOrchestrationTest - Training job name: projects/703996152583/locations/us-central1/customJobs/5518405205760147456
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for training job. Current state: UNKNOWN
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for training job. Current state: PENDING
...
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for training job. Current state: RUNNING
...

INFO  a.c.i.c.VertexOrchestrationTest - Step 2: Creating endpoint (if absent)...
INFO  a.c.i.c.VertexOrchestrationTest - ================================================================================
INFO  a.c.i.cloud_gcp.VertexOrchestration - Found existing endpoint: projects/703996152583/locations/us-central1/endpoints/8483716271198699520
INFO  a.c.i.c.VertexOrchestrationTest - Endpoint resource name: projects/703996152583/locations/us-central1/endpoints/8483716271198699520
INFO  a.c.i.c.VertexOrchestrationTest - 
================================================================================
INFO  a.c.i.c.VertexOrchestrationTest - Step 2: Deploying model (endpoint creation + model upload + deployment)...
INFO  a.c.i.c.VertexOrchestrationTest - ================================================================================
INFO  a.c.i.cloud_gcp.VertexOrchestration - Found existing endpoint: projects/703996152583/locations/us-central1/endpoints/8483716271198699520
INFO  a.c.i.cloud_gcp.VertexOrchestration - Uploading model: test_ctr_model-1.0; Artifact URI: gs://zipline-warehouse-models/training_output/test_ctr_model-1.0/2025-12-08/model; Container Image: us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.2-1:latest
INFO  a.c.i.cloud_gcp.VertexOrchestration - Model upload completed: projects/703996152583/locations/us-central1/models/7906534239267454976
INFO  a.c.i.cloud_gcp.VertexOrchestration - 
Deploying model to endpoint. Details:
Model: projects/703996152583/locations/us-central1/models/7906534239267454976;
Endpoint: projects/703996152583/locations/us-central1/endpoints/8483716271198699520;
Replicas: 3-10
Machine type: n1-standard-4

INFO  a.c.i.cloud_gcp.VertexOrchestration - Model deployment initiated. Operation: projects/703996152583/locations/us-central1/endpoints/8483716271198699520/operations/5139381969750065152
INFO  a.c.i.c.VertexOrchestrationTest - Deployment ID: projects/703996152583/locations/us-central1/endpoints/8483716271198699520/operations/5139381969750065152
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for deployment. Current state: UNKNOWN
INFO  a.c.i.c.VertexOrchestrationTest - Waiting for deployment. Current state: RUNNING
...

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Vertex AI orchestration: submit/monitor training jobs, create/manage endpoints, deploy models, and query operation/job status.
New Utilities
- Centralized HTTP client and prediction request/response helpers for Vertex interactions.
API
- Model platform API extended with training/deploy requests, operations, and job status types.
Tests
- Added comprehensive tests covering orchestration, model construction, deployment, and status mapping.
Chores
- Added GCP AI Platform client dependency.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-09T17:50:25Z

Walkthrough

Introduces a Vertex AI orchestration and HTTP client, refactors VertexPlatform to delegate training, deployment, and prediction to these new layers, adds ModelPlatform long-running operation APIs, updates tests, and adds the google-cloud-aiplatform 3.79.0 dependency.

Changes

Cohort / File(s)	Summary
Maven Dependency `cloud_gcp/package.mill`	Added `com.google.cloud:google-cloud-aiplatform:3.79.0` to mvnDeps
HTTP client + utils `cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala`	New `VertexHttpClient` (Vert.x WebClient wrapper with ADC OAuth2 token retrieval) and `VertexHttpUtils` (Scala→Vert.x JSON conversion, prediction request builder, prediction extractor)
Orchestration layer `cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala`	New `VertexOrchestration` managing Vertex SDK clients, job submission, endpoint creation, model upload/deploy, operation/job status translation, and lifecycle/close
Platform refactor `cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala`	Constructor removed credentials param; added `httpClient` and `orchestration`; delegate predict, submitTrainingJob, createEndpoint, deployModel, getJobStatus to new layers; removed inline JSON/token helpers
API extensions `online/src/main/scala/ai/chronon/online/Api.scala`	Added `TrainingRequest`, `DeployModelRequest`, `ModelOperation` (sealed trait with cases), `ModelJobStatus` and extended `ModelPlatform` with `submitTrainingJob`, `createEndpoint`, `deployModel`, `getJobStatus`
GCP tests `cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala`, `cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala`	Added orchestration tests; updated platform tests to drop credentials and use `VertexHttpUtils`
Online test doubles `online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala`	Added stub overrides for new ModelPlatform long-running APIs in test doubles
Spark tests `spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala`	Added placeholder overrides for new ModelPlatform APIs in test platforms
Python formatting `python/test/canary/models/gcp/click_through_rate.py`	Added trailing newline (formatting only)

Sequence Diagrams

sequenceDiagram
    participant Client
    participant VP as VertexPlatform
    participant HC as VertexHttpClient
    participant VAI as Vertex AI REST
    Client->>VP: predict(model, inputs, params)
    activate VP
    VP->>HC: makeHttpRequest(url, POST, body)
    activate HC
    HC->>HC: fetch OAuth2 token (ADC)
    HC->>VAI: POST /predict (Bearer token + JSON body)
    activate VAI
    VAI-->>HC: predictions JSON
    deactivate VAI
    HC-->>VP: response JSON
    deactivate HC
    VP->>VP: VertexHttpUtils.extractPredictionResults(...)
    VP-->>Client: predictions
    deactivate VP

sequenceDiagram
    participant Client
    participant VP as VertexPlatform
    participant VO as VertexOrchestration
    participant SDK as Vertex AI SDK
    Client->>VP: submitTrainingJob(request)
    activate VP
    VP->>VO: submitTrainingJob(request)
    activate VO
    VO->>SDK: JobServiceClient.createCustomJob(...)
    activate SDK
    SDK-->>VO: operation / job name
    deactivate SDK
    VO-->>VP: job name
    deactivate VO
    VP-->>Client: job name
    deactivate VP

sequenceDiagram
    participant Client
    participant VP as VertexPlatform
    participant VO as VertexOrchestration
    participant SDK as Vertex AI SDK
    Client->>VP: deployModel(request)
    activate VP
    VP->>VO: deployModel(request)
    activate VO
    VO->>SDK: ModelServiceClient.uploadModel(...)
    activate SDK
    SDK-->>VO: model resource name
    deactivate SDK
    VO->>SDK: EndpointServiceClient.deployModel(...)
    activate SDK
    SDK-->>VO: operation name
    deactivate SDK
    VO-->>VP: operation name
    deactivate VO
    VP-->>Client: operation name
    deactivate VP

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Pay extra attention to: VertexOrchestration (SDK usage, resource names, polling/operation handling), VertexHttpClient (token refresh, concurrency, null-on-failure handling), alignment between new Api.scala method contracts and orchestration implementations, and updated tests referencing new utilities.

Poem

🧭 New routes through Vertex skies,
Tokens fetched as requests fly,
Jobs and endpoints spin and hum,
Deployments march until they're done,
Code and cloud now dance as one.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: extending the API and VertexPlatform to support training job submission and model deployments.
Description check	✅ Passed	The description provides a clear summary, testing details with logs, and a completed checklist. All template sections are addressed with substantive content.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch piyush/models_orch_platform

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (5)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (2)
34-36: Unreachable default case.

HttpMethod is sealed with 4 exhaustive cases; the wildcard _ is dead code.
       case DeleteMethod => client.deleteAbs(url)
-      case _            => throw new IllegalArgumentException(s"Currently unsupported HTTP method: $method")
107-107: filterKeys is deprecated in Scala 2.13+.

Use view.filterKeys(...).toMap instead.
-    val additionalParams = modelParams.filterKeys(k => k != "model_name" && k != "model_type")
+    val additionalParams = modelParams.view.filterKeys(k => k != "model_name" && k != "model_type").toMap
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (1)
23-35: Missing close() method to release orchestration resources.

VertexOrchestration has a close() for SDK clients. Consider exposing it here to prevent resource leaks.
+  def close(): Unit = {
+    orchestration.close()
+  }
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (2)
297-306: Null checks on lazy vals are unnecessary.

Lazy vals are initialized on first access; they're never null after class instantiation.
   def close(): Unit = {
     try {
-      if (jobServiceClient != null) jobServiceClient.close()
-      if (endpointServiceClient != null) endpointServiceClient.close()
-      if (modelServiceClient != null) modelServiceClient.close()
+      jobServiceClient.close()
+      endpointServiceClient.close()
+      modelServiceClient.close()
     } catch {
       case e: Exception =>
         logger.warn(s"Error closing SDK clients: ${e.getMessage}", e)
     }
   }
108-108: iterateAll() loads all endpoints into memory.

For projects with many endpoints, consider pagination or early termination on match.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71e15ef and 77411fa.

📒 Files selected for processing (10)

cloud_gcp/package.mill (1 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (1 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (1 hunks)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (3 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (1 hunks)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (5 hunks)
online/src/main/scala/ai/chronon/online/Api.scala (2 hunks)
online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (4 hunks)
python/test/canary/models/gcp/click_through_rate.py (1 hunks)
spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala (3 hunks)

🧰 Additional context used

🧠 Learnings (5)

📚 Learning: 2025-02-22T20:30:28.381Z

Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.

Applied to files:

cloud_gcp/package.mill

📚 Learning: 2024-10-17T01:09:24.653Z

Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.

Applied to files:

cloud_gcp/package.mill

📚 Learning: 2024-10-29T15:21:58.102Z

Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala

📚 Learning: 2024-10-17T19:46:42.629Z

Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.

Applied to files:

spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala

📚 Learning: 2024-10-14T18:44:24.599Z

Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala

🧬 Code graph analysis (4)

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (2)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (9)

VertexOrchestration (14-307)

VertexOrchestration (309-445)

buildCustomJob (316-377)

buildModel (379-407)

parseJobState (435-444)

submitTrainingJob (48-83)

createEndpoint (85-94)

deployModel (145-176)

getOperationStatus (274-295)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (3)

submitTrainingJob (109-111)

createEndpoint (113-115)

deployModel (124-126)

online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (2)

online/src/main/scala/ai/chronon/online/Api.scala (4)

DeployModelRequest (303-303)

PredictRequest (300-300)

PredictResponse (301-301)

TrainingRequest (302-302)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (4)

submitTrainingJob (109-111)

createEndpoint (113-115)

deployModel (124-126)

getJobStatus (117-122)

online/src/main/scala/ai/chronon/online/Api.scala (2)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (3)

submitTrainingJob (48-83)

createEndpoint (85-94)

deployModel (145-176)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (4)

submitTrainingJob (109-111)

createEndpoint (113-115)

deployModel (124-126)

getJobStatus (117-122)

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (3)

VertexHttpUtils (66-134)

createPredictionRequestBody (93-117)

extractPredictionResults (122-133)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)

GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
GitHub Check: Test Spark (Scala 2.13.17) / join_tests
GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
GitHub Check: Test Spark (Scala 2.12.18) / join_tests
GitHub Check: python_tests
GitHub Check: build
GitHub Check: enforce_triggered_workflows

🔇 Additional comments (23)

python/test/canary/models/gcp/click_through_rate.py (1)

77-77: ✓ Formatting adjustment approved.

Trailing newline added. The canary test configuration properly exercises the new Vertex AI integration with comprehensive training and deployment specifications aligned with the PR objectives.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (4)

1-14: LGTM!

Imports and test structure look correct for unit testing VertexOrchestration builders.

15-64: LGTM!

Thorough verification of CustomJob construction including worker pool specs, args conversion, and output directory.

66-121: LGTM!

Good coverage of default values and container spec mapping including env vars.

123-157: LGTM!

Comprehensive coverage of DeployedModel resources and JobState enum mapping.

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (5)

24-24: LGTM!

Constructor updated to match new VertexPlatform signature without credentials parameter.

143-143: LGTM!

Correctly migrated to VertexHttpUtils for request/response handling.

198-198: LGTM!

Consistent use of VertexHttpUtils.extractPredictionResults.

217-217: LGTM!

233-233: LGTM!

online/src/main/scala/ai/chronon/online/Api.scala (2)

302-309: LGTM!

Clean sealed trait design for ModelOperation. The data classes provide a clear contract for training and deployment operations.

324-341: LGTM!

Well-designed async interface for ML lifecycle operations. The getJobStatus dispatch via sealed trait is clean.

online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (4)

6-6: LGTM!

Import updated to include new API types.

483-488: LGTM!

Required method stubs to satisfy updated ModelPlatform interface. Using ??? is fine for test doubles that won't call these methods.

506-511: LGTM!

Same pattern for CallTrackingModelPlatform.

535-540: LGTM!

Same pattern for MultiOutputModelPlatform.

spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala (2)

781-786: LGTM!

Placeholder stubs satisfy the interface contract for tests that don't exercise training/deployment paths.

796-800: LGTM!

Same as above - stubs are appropriate for the failing mock.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (1)

109-126: LGTM!

Clean delegation to orchestration layer. Operations are well-mapped.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (4)

48-83: LGTM!

Training job submission is well-structured with appropriate logging and error handling.

309-314: LGTM!

Clean utility for Google/Scala Future conversion.

435-444: LGTM!

Exhaustive job state mapping.

226-227: No action needed. The traffic split key "0" is the correct and documented convention for Vertex AI's deployModel API. According to the API specification, "0" is a special placeholder that refers to the DeployedModel being created, and the API automatically replaces it with the actual assigned deployedModel ID. Using "0" is not an assumption but the intended API contract.

coderabbitai · 2025-12-09T17:55:31Z

cloud_gcp/package.mill

      mvn"org.apache.iceberg::iceberg-spark-runtime-3.5:1.10.0",
      mvn"com.google.cloud:google-cloud-bigquery:2.54.1",
      mvn"com.google.cloud:google-cloud-bigtable:2.57.1",
+      mvn"com.google.cloud:google-cloud-aiplatform:3.79.0",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

google-cloud-aiplatform 3.79.0 latest version Maven Central

💡 Result:

3.79.0 is available on Maven Central (google-cloud-aiplatform-bom). [1]
However, newer releases (3.80.0 and 3.81.0) are already published (see artifact listings). [2]

Sources:
[1] repo1.maven.org — com/google/cloud/google-cloud-aiplatform-bom (shows 3.79.0).
[2] mvnrepository.com — com.google.cloud:google-cloud-aiplatform (lists 3.80.0 / 3.81.0 as newer).

🌐 Web query:

google-cloud-aiplatform security vulnerabilities CVE 2024 2025

💡 Result:

Short answer: I found no public CVEs specifically affecting the google-cloud-aiplatform (Vertex AI) client/library for 2024–2025. Sources that track package CVEs report no direct vulnerabilities, and Google Cloud’s security bulletins list CVEs for GCP services and infrastructure but do not show a CVE attributed to the google-cloud-aiplatform client in 2024–2025. [1][2][3]

Sources:

Snyk package vulnerability page — no direct vulnerabilities for google-cloud-aiplatform. [1]

PyPI project page for google-cloud-aiplatform (release history / metadata). [2]

Google Cloud security bulletins / CVE program (lists GCP product CVEs; no CVE entry tied to the google-cloud-aiplatform client for 2024–2025). [3]

If you want, I can:

search the NVD/CVE database and OSV for matches by package name and date range, or

monitor/alert and return results if any CVE appears for this package. Which would you prefer?

🏁 Script executed:

#!/bin/bash # Query Maven Central for latest google-cloud-aiplatform version curl -s "https://search.maven.org/solrsearch/select?q=g:com.google.cloud+AND+a:google-cloud-aiplatform&rows=20&wt=json" | jq -r '.response.docs[] | "\(.v) \(.timestamp)"' | head -20

Repository: zipline-ai/chronon

Length of output: 119

Update aiplatform library to latest version.

Version 3.79.0 is outdated; 3.81.0 is the latest available release on Maven Central. No known security vulnerabilities exist for 3.79.0, but updating to the latest version is recommended for access to latest features and bug fixes.

🤖 Prompt for AI Agents

In cloud_gcp/package.mill around line 50, the pinned dependency "com.google.cloud:google-cloud-aiplatform:3.79.0" is outdated; update the version string to "3.81.0" to use the latest release, then run a build (or dependency resolution) to verify compatibility and update any lockfiles or generated dependency manifests accordingly.

coderabbitai · 2025-12-09T17:55:31Z

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala

+  def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)(
+      handler: HttpResponse[Buffer] => T): Unit = {
+    val request = method match {
+      case GetMethod    => client.getAbs(url)
+      case PostMethod   => client.postAbs(url)
+      case PutMethod    => client.putAbs(url)
+      case DeleteMethod => client.deleteAbs(url)
+      case _            => throw new IllegalArgumentException(s"Currently unsupported HTTP method: $method")
+    }
+
+    // Add common headers
+    val requestWithHeaders = request
+      .putHeader("Authorization", s"Bearer $getAccessToken")
+    val finalRequest = if (requestBody.isDefined) {
+      requestWithHeaders
+        .putHeader("Content-Type", "application/json")
+        .putHeader("Accept", "application/json")
+    } else {
+      requestWithHeaders
+    }
+
+    // Send the request
+    val responseFuture = requestBody match {
+      case Some(body) => finalRequest.sendJsonObject(body)
+      case None       => finalRequest.send()
+    }
+
+    responseFuture.onComplete { asyncResult =>
+      if (asyncResult.succeeded()) {
+        handler(asyncResult.result())
+      } else {
+        handler(null)
+      }
+    }
+  }


⚠️ Potential issue | 🟠 Major

Returning null on failure loses error context.

When the request fails, handler(null) discards the actual exception. Consider passing the failure to the handler or using a Try/Either.

- def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)( - handler: HttpResponse[Buffer] => T): Unit = { + def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)( + handler: Either[Throwable, HttpResponse[Buffer]] => T): Unit = { val request = method match { ... } ... responseFuture.onComplete { asyncResult => if (asyncResult.succeeded()) { - handler(asyncResult.result()) + handler(Right(asyncResult.result())) } else { - handler(null) + handler(Left(asyncResult.cause())) } } }

🤖 Prompt for AI Agents

In cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala around lines 29-63, the handler is invoked with null on failure which discards the error; change the handler signature to accept an Either[Throwable, HttpResponse[Buffer]] (or a (Throwable, Option[HttpResponse[Buffer]]) pair) and update this method to call handler(Right(response)) on success and handler(Left(asyncResult.cause())) on failure so the caller receives the original exception; update all call sites accordingly to handle the Either and preserve error context in logs/flows.

coderabbitai · 2025-12-09T17:55:31Z

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala

+
+    inputRequests.foreach { inputRequest =>
+      val instance = inputRequest("instance")
+      val jsonInstance = convertToVertxJson(instance)
+      instancesArray.add(jsonInstance)
+    }


⚠️ Potential issue | 🟠 Major

Potential NoSuchElementException if "instance" key is missing.

inputRequest("instance") throws if key absent. Consider using get with error handling.

- val instance = inputRequest("instance") + val instance = inputRequest.getOrElse("instance", + throw new IllegalArgumentException(s"Missing 'instance' key in input request"))

🤖 Prompt for AI Agents

In cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala around lines 96–101, the code uses inputRequest("instance") which will throw a NoSuchElementException if the "instance" key is missing; replace the direct apply with a safe lookup (e.g., inputRequest.get("instance") or getOrElse) and handle the missing-key case explicitly — either log and skip that request, or throw a clear IllegalArgumentException with contextual information (including request id or content) before calling convertToVertxJson, ensuring you also validate the retrieved value's type/shape before adding to instancesArray.

coderabbitai · 2025-12-09T17:55:31Z

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala

+    (0 until predictions.size()).map { index =>
+      val predictionJsonObject = predictions.getJsonObject(index)
+      predictionJsonObject.getMap.asScala.toMap


⚠️ Potential issue | 🟡 Minor

getJsonObject returns null for non-object entries.

If predictions contain primitives or arrays, this will NPE. Add defensive handling.

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala

coderabbitai · 2025-12-09T17:55:32Z

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala

+      httpClient.makeHttpRequest(url, PostMethod, Some(requestBody)) { response =>
+        if (response != null) {
+          if (response.statusCode() == 200) {
+            val responseBody = response.bodyAsJsonObject()
+            val results = VertexHttpUtils.extractPredictionResults(responseBody)
+            promise.success(PredictResponse(predictRequest, Success(results)))
          } else {
-            promise.success(
-              PredictResponse(predictRequest,
-                              Failure(new RuntimeException(s"HTTP Request failed", asyncResult.cause()))))
+            val errorMsg = s"HTTP Request failed: ${response.statusCode()}: ${response.bodyAsString()}"
+            promise.success(PredictResponse(predictRequest, Failure(new RuntimeException(errorMsg))))
          }
+        } else {
+          promise.success(PredictResponse(predictRequest, Failure(new RuntimeException("HTTP Request failed"))))
        }
+      }


⚠️ Potential issue | 🟡 Minor

Error context lost when HTTP request fails.

After adopting VertexHttpClient, null response gives a generic "HTTP Request failed" message. Consider propagating actual error from the client.

🤖 Prompt for AI Agents

In cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala around lines 87–100, the code returns a generic RuntimeException("HTTP Request failed") when the httpClient callback yields a null response, losing the client's actual error detail; update the call-site and client to propagate the client's error info instead: change VertexHttpClient.makeHttpRequest to include an error/Throwable (or Option[String] message) in the callback when a request fails, then here use that error/message to construct the Failure (e.g., Failure(new RuntimeException(s"HTTP Request failed: " + errorMessageOrThrowable))) so the real error from the HTTP client is preserved in the PredictResponse.

coderabbitai · 2025-12-09T17:55:32Z

cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala

+    // Step 3: Deploy model
+    logger.info("\n" + "=" * 80)
+    logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...")
+    logger.info("=" * 80)


⚠️ Potential issue | 🟡 Minor

Minor typo: log says "Step 2" but this is Step 3.

- logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...") + logger.info("Step 3: Deploying model (endpoint creation + model upload + deployment)...")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Step 3: Deploy model

logger.info("\n" + "=" * 80)

logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...")

logger.info("=" * 80)

// Step 3: Deploy model

logger.info("\n" + "=" * 80)

logger.info("Step 3: Deploying model (endpoint creation + model upload + deployment)...")

logger.info("=" * 80)

🤖 Prompt for AI Agents

In cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala around lines 206-209, the logger message incorrectly reads "Step 2: Deploying model..." while the surrounding comment and flow indicate this is Step 3; update the logged string to "Step 3: Deploying model (endpoint creation + model upload + deployment)..." (and ensure any other adjacent step messages remain correctly numbered).

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (3)
44-44: ExecutionContext lifecycle not managed.

The execution context is created but never shut down. For long-running usage, consider tracking lifecycle or using a managed context.

99-118: Endpoint lookup lists all endpoints.

At scale, listing all endpoints to find by display name could be slow. Consider caching or indexing if endpoint count grows.

163-165: Consider auto-creating missing endpoints.

Currently fails if endpoint not found. For better UX, could automatically call createEndpoint instead.
-      case None =>
-        Future.failed(
-          new RuntimeException(
-            s"Endpoint ${endpointConfig.getEndpointName} not found. Please create it first using createEndpoint."))
+      case None =>
+        logger.info(s"Endpoint ${endpointConfig.getEndpointName} not found, creating...")
+        createNewEndpoint(endpointConfig.getEndpointName)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77411fa and f74c31f.

📒 Files selected for processing (1)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📚 Learning: 2024-10-29T15:21:58.102Z

Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.

Applied to files:

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala

📚 Learning: 2025-01-24T23:55:40.650Z

Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.

Applied to files:

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala

📚 Learning: 2024-10-14T18:44:24.599Z

Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.

Applied to files:

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala

📚 Learning: 2024-12-03T04:04:33.809Z

Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.

Applied to files:

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala

📚 Learning: 2024-11-04T20:04:18.082Z

Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.

Applied to files:

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)

GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
GitHub Check: Test Spark (Scala 2.12.18) / join_tests
GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
GitHub Check: Test Spark (Scala 2.13.17) / join_tests
GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
GitHub Check: python_lint
GitHub Check: python_tests
GitHub Check: enforce_triggered_workflows
GitHub Check: build
GitHub Check: scala_compile_fmt_fix

🔇 Additional comments (6)

cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (6)
48-84: LGTM.

Training job submission logic is well-structured with proper validation and error handling.

182-205: LGTM.

Model upload with proper async handling and logging.

210-241: LGTM.

Deployment logic correctly implements the MVP 0→1 traffic rollout as intended. Future blue/green strategies acknowledged in PR objectives.

245-297: LGTM.

Status checking methods handle states and errors comprehensively with appropriate fallback to UNKNOWN status.

299-447: LGTM.

Helper methods cleanly construct Vertex AI resources with appropriate optional field handling and state mapping.

170-170: Verify model artifact path convention.

The code assumes training outputs to $outputDir/model, but this depends on the training script. Ensure training jobs write artifacts to the expected model subdirectory.
#!/bin/bash
# Search for documentation or examples of modelArtifactUri construction patterns
rg -n -C3 'modelArtifactUri|model_artifact|training_output.*model' --type scala

piyush-zlai added 8 commits December 9, 2025 10:55

Model training & deploy - thrift changes + py examples

243fee6

Thrift + py updates based on training experiments

96ff9ec

Tweaks based on ctr training + deploy experiments

3a7f14f

First cut VertexPlatform orch updates

8670b5d

Switch to using the vertex sdk

104432f

Delete old models file

29a38e1

Update paths

f959084

style: Apply scalafix and scalafmt changes

77411fa

piyush-zlai requested review from david-zlai and nikhil-zlai December 9, 2025 17:49

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

david-zlai approved these changes Dec 9, 2025

View reviewed changes

Add some checks

f74c31f

coderabbitai bot reviewed Dec 9, 2025

View reviewed changes

Merge branch 'main' into piyush/models_orch_platform

6d38be5

piyush-zlai added this pull request to the merge queue Dec 9, 2025

Merged via the queue into main with commit d326844 Dec 9, 2025
45 checks passed

piyush-zlai deleted the piyush/models_orch_platform branch December 9, 2025 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend Api + VertexPlatform to support submitting training jobs + deployments #1331

Extend Api + VertexPlatform to support submitting training jobs + deployments #1331

piyush-zlai commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 9, 2025

Uh oh!

coderabbitai bot Dec 9, 2025

Uh oh!

coderabbitai bot Dec 9, 2025

Uh oh!

coderabbitai bot Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 9, 2025

Uh oh!

coderabbitai bot Dec 9, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Extend Api + VertexPlatform to support submitting training jobs + deployments #1331

Extend Api + VertexPlatform to support submitting training jobs + deployments #1331

Conversation

piyush-zlai commented Dec 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

piyush-zlai commented Dec 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2025 •

edited

Loading