-
Notifications
You must be signed in to change notification settings - Fork 10
Extend Api + VertexPlatform to support submitting training jobs + deployments #1331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces a Vertex AI orchestration and HTTP client, refactors VertexPlatform to delegate training, deployment, and prediction to these new layers, adds ModelPlatform long-running operation APIs, updates tests, and adds the google-cloud-aiplatform 3.79.0 dependency. Changes
Sequence DiagramssequenceDiagram
participant Client
participant VP as VertexPlatform
participant HC as VertexHttpClient
participant VAI as Vertex AI REST
Client->>VP: predict(model, inputs, params)
activate VP
VP->>HC: makeHttpRequest(url, POST, body)
activate HC
HC->>HC: fetch OAuth2 token (ADC)
HC->>VAI: POST /predict (Bearer token + JSON body)
activate VAI
VAI-->>HC: predictions JSON
deactivate VAI
HC-->>VP: response JSON
deactivate HC
VP->>VP: VertexHttpUtils.extractPredictionResults(...)
VP-->>Client: predictions
deactivate VP
sequenceDiagram
participant Client
participant VP as VertexPlatform
participant VO as VertexOrchestration
participant SDK as Vertex AI SDK
Client->>VP: submitTrainingJob(request)
activate VP
VP->>VO: submitTrainingJob(request)
activate VO
VO->>SDK: JobServiceClient.createCustomJob(...)
activate SDK
SDK-->>VO: operation / job name
deactivate SDK
VO-->>VP: job name
deactivate VO
VP-->>Client: job name
deactivate VP
sequenceDiagram
participant Client
participant VP as VertexPlatform
participant VO as VertexOrchestration
participant SDK as Vertex AI SDK
Client->>VP: deployModel(request)
activate VP
VP->>VO: deployModel(request)
activate VO
VO->>SDK: ModelServiceClient.uploadModel(...)
activate SDK
SDK-->>VO: model resource name
deactivate SDK
VO->>SDK: EndpointServiceClient.deployModel(...)
activate SDK
SDK-->>VO: operation name
deactivate SDK
VO-->>VP: operation name
deactivate VO
VP-->>Client: operation name
deactivate VP
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (5)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (2)
34-36: Unreachable default case.
HttpMethodis sealed with 4 exhaustive cases; the wildcard_is dead code.case DeleteMethod => client.deleteAbs(url) - case _ => throw new IllegalArgumentException(s"Currently unsupported HTTP method: $method")
107-107:filterKeysis deprecated in Scala 2.13+.Use
view.filterKeys(...).toMapinstead.- val additionalParams = modelParams.filterKeys(k => k != "model_name" && k != "model_type") + val additionalParams = modelParams.view.filterKeys(k => k != "model_name" && k != "model_type").toMapcloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (1)
23-35: Missingclose()method to release orchestration resources.
VertexOrchestrationhas aclose()for SDK clients. Consider exposing it here to prevent resource leaks.+ def close(): Unit = { + orchestration.close() + }cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (2)
297-306: Null checks on lazy vals are unnecessary.Lazy vals are initialized on first access; they're never null after class instantiation.
def close(): Unit = { try { - if (jobServiceClient != null) jobServiceClient.close() - if (endpointServiceClient != null) endpointServiceClient.close() - if (modelServiceClient != null) modelServiceClient.close() + jobServiceClient.close() + endpointServiceClient.close() + modelServiceClient.close() } catch { case e: Exception => logger.warn(s"Error closing SDK clients: ${e.getMessage}", e) } }
108-108:iterateAll()loads all endpoints into memory.For projects with many endpoints, consider pagination or early termination on match.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
cloud_gcp/package.mill(1 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala(1 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala(1 hunks)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala(3 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala(1 hunks)cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala(5 hunks)online/src/main/scala/ai/chronon/online/Api.scala(2 hunks)online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala(4 hunks)python/test/canary/models/gcp/click_through_rate.py(1 hunks)spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-02-22T20:30:28.381Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 393
File: cloud_gcp/BUILD.bazel:99-99
Timestamp: 2025-02-22T20:30:28.381Z
Learning: The jar file "iceberg-bigquery-catalog-1.5.2-1.0.1-beta.jar" in cloud_gcp/BUILD.bazel is a local dependency and should not be replaced with maven_artifact.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-17T01:09:24.653Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 47
File: docker-init/Dockerfile:36-38
Timestamp: 2024-10-17T01:09:24.653Z
Learning: The JAR files `spark-assembly-0.1.0-SNAPSHOT.jar` and `cloud_aws-assembly-0.1.0-SNAPSHOT.jar` are generated by `sbt` and located in the `target` directory after the build.
Applied to files:
cloud_gcp/package.mill
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala
📚 Learning: 2024-10-17T19:46:42.629Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 44
File: hub/app/controllers/ModelController.scala:15-18
Timestamp: 2024-10-17T19:46:42.629Z
Learning: References to `MockDataService` in `hub/test/controllers/SearchControllerSpec.scala` and `hub/test/controllers/ModelControllerSpec.scala` are needed for tests and should not be removed.
Applied to files:
spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala
🧬 Code graph analysis (4)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (9)
VertexOrchestration(14-307)VertexOrchestration(309-445)buildCustomJob(316-377)buildModel(379-407)parseJobState(435-444)submitTrainingJob(48-83)createEndpoint(85-94)deployModel(145-176)getOperationStatus(274-295)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (3)
submitTrainingJob(109-111)createEndpoint(113-115)deployModel(124-126)
online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (2)
online/src/main/scala/ai/chronon/online/Api.scala (4)
DeployModelRequest(303-303)PredictRequest(300-300)PredictResponse(301-301)TrainingRequest(302-302)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (4)
submitTrainingJob(109-111)createEndpoint(113-115)deployModel(124-126)getJobStatus(117-122)
online/src/main/scala/ai/chronon/online/Api.scala (2)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (3)
submitTrainingJob(48-83)createEndpoint(85-94)deployModel(145-176)cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (4)
submitTrainingJob(109-111)createEndpoint(113-115)deployModel(124-126)getJobStatus(117-122)
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala (3)
VertexHttpUtils(66-134)createPredictionRequestBody(93-117)extractPredictionResults(122-133)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: python_tests
- GitHub Check: build
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (23)
python/test/canary/models/gcp/click_through_rate.py (1)
77-77: ✓ Formatting adjustment approved.Trailing newline added. The canary test configuration properly exercises the new Vertex AI integration with comprehensive training and deployment specifications aligned with the PR objectives.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala (4)
1-14: LGTM!Imports and test structure look correct for unit testing VertexOrchestration builders.
15-64: LGTM!Thorough verification of CustomJob construction including worker pool specs, args conversion, and output directory.
66-121: LGTM!Good coverage of default values and container spec mapping including env vars.
123-157: LGTM!Comprehensive coverage of DeployedModel resources and JobState enum mapping.
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexPlatformTest.scala (5)
24-24: LGTM!Constructor updated to match new VertexPlatform signature without credentials parameter.
143-143: LGTM!Correctly migrated to VertexHttpUtils for request/response handling.
198-198: LGTM!Consistent use of VertexHttpUtils.extractPredictionResults.
217-217: LGTM!
233-233: LGTM!online/src/main/scala/ai/chronon/online/Api.scala (2)
302-309: LGTM!Clean sealed trait design for ModelOperation. The data classes provide a clear contract for training and deployment operations.
324-341: LGTM!Well-designed async interface for ML lifecycle operations. The
getJobStatusdispatch via sealed trait is clean.online/src/test/scala/ai/chronon/online/test/ModelTransformsFetcherTest.scala (4)
6-6: LGTM!Import updated to include new API types.
483-488: LGTM!Required method stubs to satisfy updated ModelPlatform interface. Using
???is fine for test doubles that won't call these methods.
506-511: LGTM!Same pattern for CallTrackingModelPlatform.
535-540: LGTM!Same pattern for MultiOutputModelPlatform.
spark/src/test/scala/ai/chronon/spark/ModelTransformsJobTest.scala (2)
781-786: LGTM!Placeholder stubs satisfy the interface contract for tests that don't exercise training/deployment paths.
796-800: LGTM!Same as above - stubs are appropriate for the failing mock.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala (1)
109-126: LGTM!Clean delegation to orchestration layer. Operations are well-mapped.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (4)
48-83: LGTM!Training job submission is well-structured with appropriate logging and error handling.
309-314: LGTM!Clean utility for Google/Scala Future conversion.
435-444: LGTM!Exhaustive job state mapping.
226-227: No action needed. The traffic split key "0" is the correct and documented convention for Vertex AI's deployModel API. According to the API specification, "0" is a special placeholder that refers to the DeployedModel being created, and the API automatically replaces it with the actual assigned deployedModel ID. Using "0" is not an assumption but the intended API contract.
| mvn"org.apache.iceberg::iceberg-spark-runtime-3.5:1.10.0", | ||
| mvn"com.google.cloud:google-cloud-bigquery:2.54.1", | ||
| mvn"com.google.cloud:google-cloud-bigtable:2.57.1", | ||
| mvn"com.google.cloud:google-cloud-aiplatform:3.79.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
google-cloud-aiplatform 3.79.0 latest version Maven Central
💡 Result:
3.79.0 is available on Maven Central (google-cloud-aiplatform-bom). [1]
However, newer releases (3.80.0 and 3.81.0) are already published (see artifact listings). [2]
Sources:
[1] repo1.maven.org — com/google/cloud/google-cloud-aiplatform-bom (shows 3.79.0).
[2] mvnrepository.com — com.google.cloud:google-cloud-aiplatform (lists 3.80.0 / 3.81.0 as newer).
🌐 Web query:
google-cloud-aiplatform security vulnerabilities CVE 2024 2025
💡 Result:
Short answer: I found no public CVEs specifically affecting the google-cloud-aiplatform (Vertex AI) client/library for 2024–2025. Sources that track package CVEs report no direct vulnerabilities, and Google Cloud’s security bulletins list CVEs for GCP services and infrastructure but do not show a CVE attributed to the google-cloud-aiplatform client in 2024–2025. [1][2][3]
Sources:
- Snyk package vulnerability page — no direct vulnerabilities for google-cloud-aiplatform. [1]
- PyPI project page for google-cloud-aiplatform (release history / metadata). [2]
- Google Cloud security bulletins / CVE program (lists GCP product CVEs; no CVE entry tied to the google-cloud-aiplatform client for 2024–2025). [3]
If you want, I can:
- search the NVD/CVE database and OSV for matches by package name and date range, or
- monitor/alert and return results if any CVE appears for this package. Which would you prefer?
🏁 Script executed:
#!/bin/bash
# Query Maven Central for latest google-cloud-aiplatform version
curl -s "https://search.maven.org/solrsearch/select?q=g:com.google.cloud+AND+a:google-cloud-aiplatform&rows=20&wt=json" | jq -r '.response.docs[] | "\(.v) \(.timestamp)"' | head -20Repository: zipline-ai/chronon
Length of output: 119
Update aiplatform library to latest version.
Version 3.79.0 is outdated; 3.81.0 is the latest available release on Maven Central. No known security vulnerabilities exist for 3.79.0, but updating to the latest version is recommended for access to latest features and bug fixes.
🤖 Prompt for AI Agents
In cloud_gcp/package.mill around line 50, the pinned dependency
"com.google.cloud:google-cloud-aiplatform:3.79.0" is outdated; update the
version string to "3.81.0" to use the latest release, then run a build (or
dependency resolution) to verify compatibility and update any lockfiles or
generated dependency manifests accordingly.
| def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)( | ||
| handler: HttpResponse[Buffer] => T): Unit = { | ||
| val request = method match { | ||
| case GetMethod => client.getAbs(url) | ||
| case PostMethod => client.postAbs(url) | ||
| case PutMethod => client.putAbs(url) | ||
| case DeleteMethod => client.deleteAbs(url) | ||
| case _ => throw new IllegalArgumentException(s"Currently unsupported HTTP method: $method") | ||
| } | ||
|
|
||
| // Add common headers | ||
| val requestWithHeaders = request | ||
| .putHeader("Authorization", s"Bearer $getAccessToken") | ||
| val finalRequest = if (requestBody.isDefined) { | ||
| requestWithHeaders | ||
| .putHeader("Content-Type", "application/json") | ||
| .putHeader("Accept", "application/json") | ||
| } else { | ||
| requestWithHeaders | ||
| } | ||
|
|
||
| // Send the request | ||
| val responseFuture = requestBody match { | ||
| case Some(body) => finalRequest.sendJsonObject(body) | ||
| case None => finalRequest.send() | ||
| } | ||
|
|
||
| responseFuture.onComplete { asyncResult => | ||
| if (asyncResult.succeeded()) { | ||
| handler(asyncResult.result()) | ||
| } else { | ||
| handler(null) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning null on failure loses error context.
When the request fails, handler(null) discards the actual exception. Consider passing the failure to the handler or using a Try/Either.
- def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)(
- handler: HttpResponse[Buffer] => T): Unit = {
+ def makeHttpRequest[T](url: String, method: HttpMethod, requestBody: Option[JsonObject] = None)(
+ handler: Either[Throwable, HttpResponse[Buffer]] => T): Unit = {
val request = method match {
...
}
...
responseFuture.onComplete { asyncResult =>
if (asyncResult.succeeded()) {
- handler(asyncResult.result())
+ handler(Right(asyncResult.result()))
} else {
- handler(null)
+ handler(Left(asyncResult.cause()))
}
}
}🤖 Prompt for AI Agents
In
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala
around lines 29-63, the handler is invoked with null on failure which discards
the error; change the handler signature to accept an Either[Throwable,
HttpResponse[Buffer]] (or a (Throwable, Option[HttpResponse[Buffer]]) pair) and
update this method to call handler(Right(response)) on success and
handler(Left(asyncResult.cause())) on failure so the caller receives the
original exception; update all call sites accordingly to handle the Either and
preserve error context in logs/flows.
|
|
||
| inputRequests.foreach { inputRequest => | ||
| val instance = inputRequest("instance") | ||
| val jsonInstance = convertToVertxJson(instance) | ||
| instancesArray.add(jsonInstance) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential NoSuchElementException if "instance" key is missing.
inputRequest("instance") throws if key absent. Consider using get with error handling.
- val instance = inputRequest("instance")
+ val instance = inputRequest.getOrElse("instance",
+ throw new IllegalArgumentException(s"Missing 'instance' key in input request"))🤖 Prompt for AI Agents
In
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexHttpClient.scala
around lines 96–101, the code uses inputRequest("instance") which will throw a
NoSuchElementException if the "instance" key is missing; replace the direct
apply with a safe lookup (e.g., inputRequest.get("instance") or getOrElse) and
handle the missing-key case explicitly — either log and skip that request, or
throw a clear IllegalArgumentException with contextual information (including
request id or content) before calling convertToVertxJson, ensuring you also
validate the retrieved value's type/shape before adding to instancesArray.
| (0 until predictions.size()).map { index => | ||
| val predictionJsonObject = predictions.getJsonObject(index) | ||
| predictionJsonObject.getMap.asScala.toMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getJsonObject returns null for non-object entries.
If predictions contain primitives or arrays, this will NPE. Add defensive handling.
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
Show resolved
Hide resolved
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
Show resolved
Hide resolved
| httpClient.makeHttpRequest(url, PostMethod, Some(requestBody)) { response => | ||
| if (response != null) { | ||
| if (response.statusCode() == 200) { | ||
| val responseBody = response.bodyAsJsonObject() | ||
| val results = VertexHttpUtils.extractPredictionResults(responseBody) | ||
| promise.success(PredictResponse(predictRequest, Success(results))) | ||
| } else { | ||
| promise.success( | ||
| PredictResponse(predictRequest, | ||
| Failure(new RuntimeException(s"HTTP Request failed", asyncResult.cause())))) | ||
| val errorMsg = s"HTTP Request failed: ${response.statusCode()}: ${response.bodyAsString()}" | ||
| promise.success(PredictResponse(predictRequest, Failure(new RuntimeException(errorMsg)))) | ||
| } | ||
| } else { | ||
| promise.success(PredictResponse(predictRequest, Failure(new RuntimeException("HTTP Request failed")))) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error context lost when HTTP request fails.
After adopting VertexHttpClient, null response gives a generic "HTTP Request failed" message. Consider propagating actual error from the client.
🤖 Prompt for AI Agents
In
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexPlatform.scala
around lines 87–100, the code returns a generic RuntimeException("HTTP Request
failed") when the httpClient callback yields a null response, losing the
client's actual error detail; update the call-site and client to propagate the
client's error info instead: change VertexHttpClient.makeHttpRequest to include
an error/Throwable (or Option[String] message) in the callback when a request
fails, then here use that error/message to construct the Failure (e.g.,
Failure(new RuntimeException(s"HTTP Request failed: " +
errorMessageOrThrowable))) so the real error from the HTTP client is preserved
in the PredictResponse.
| // Step 3: Deploy model | ||
| logger.info("\n" + "=" * 80) | ||
| logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...") | ||
| logger.info("=" * 80) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo: log says "Step 2" but this is Step 3.
- logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...")
+ logger.info("Step 3: Deploying model (endpoint creation + model upload + deployment)...")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Step 3: Deploy model | |
| logger.info("\n" + "=" * 80) | |
| logger.info("Step 2: Deploying model (endpoint creation + model upload + deployment)...") | |
| logger.info("=" * 80) | |
| // Step 3: Deploy model | |
| logger.info("\n" + "=" * 80) | |
| logger.info("Step 3: Deploying model (endpoint creation + model upload + deployment)...") | |
| logger.info("=" * 80) |
🤖 Prompt for AI Agents
In
cloud_gcp/src/test/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestrationTest.scala
around lines 206-209, the logger message incorrectly reads "Step 2: Deploying
model..." while the surrounding comment and flow indicate this is Step 3; update
the logged string to "Step 3: Deploying model (endpoint creation + model upload
+ deployment)..." (and ensure any other adjacent step messages remain correctly
numbered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (3)
44-44: ExecutionContext lifecycle not managed.The execution context is created but never shut down. For long-running usage, consider tracking lifecycle or using a managed context.
99-118: Endpoint lookup lists all endpoints.At scale, listing all endpoints to find by display name could be slow. Consider caching or indexing if endpoint count grows.
163-165: Consider auto-creating missing endpoints.Currently fails if endpoint not found. For better UX, could automatically call
createEndpointinstead.- case None => - Future.failed( - new RuntimeException( - s"Endpoint ${endpointConfig.getEndpointName} not found. Please create it first using createEndpoint.")) + case None => + logger.info(s"Endpoint ${endpointConfig.getEndpointName} not found, creating...") + createNewEndpoint(endpointConfig.getEndpointName)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2024-10-29T15:21:58.102Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 53
File: hub/app/controllers/TimeSeriesController.scala:224-224
Timestamp: 2024-10-29T15:21:58.102Z
Learning: In the mocked data implementation in `hub/app/controllers/TimeSeriesController.scala`, potential `NumberFormatException` exceptions due to parsing errors (e.g., when using `val featureId = name.split("_").last.toInt`) are acceptable and will be addressed when adding the concrete backend.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2025-01-24T23:55:40.650Z
Learnt from: tchow-zlai
Repo: zipline-ai/chronon PR: 263
File: cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/BigQueryFormat.scala:56-57
Timestamp: 2025-01-24T23:55:40.650Z
Learning: For BigQuery table creation operations in BigQueryFormat.scala, allow exceptions to propagate directly without wrapping them in try-catch blocks, as the original BigQuery exceptions provide sufficient context.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2024-10-14T18:44:24.599Z
Learnt from: piyush-zlai
Repo: zipline-ai/chronon PR: 43
File: hub/app/controllers/TimeSeriesController.scala:320-320
Timestamp: 2024-10-14T18:44:24.599Z
Learning: In `hub/app/controllers/TimeSeriesController.scala`, the `generateMockTimeSeriesPercentilePoints` method contains placeholder code that will be replaced with the actual implementation soon.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2024-12-03T04:04:33.809Z
Learnt from: nikhil-zlai
Repo: zipline-ai/chronon PR: 70
File: service/src/main/java/ai/chronon/service/ApiProvider.java:6-6
Timestamp: 2024-12-03T04:04:33.809Z
Learning: The import `scala.util.ScalaVersionSpecificCollectionsConverter` in `service/src/main/java/ai/chronon/service/ApiProvider.java` is correct and should not be flagged in future reviews.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
📚 Learning: 2024-11-04T20:04:18.082Z
Learnt from: chewy-zlai
Repo: zipline-ai/chronon PR: 50
File: spark/src/main/scala/ai/chronon/spark/stats/drift/SummaryUploader.scala:37-40
Timestamp: 2024-11-04T20:04:18.082Z
Learning: Avoid using `Await.result` in production code; prefer handling `Future`s asynchronously when possible to prevent blocking.
Applied to files:
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (41)
- GitHub Check: Test Spark (Scala 2.12.18) / udafs_tests
- GitHub Check: Test Spark (Scala 2.12.18) / join_tests
- GitHub Check: Test Spark (Scala 2.12.18) / stats_tests
- GitHub Check: Test Spark (Scala 2.12.18) / spark_tests
- GitHub Check: Test Spark (Scala 2.12.18) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.12.18) / streaming_tests
- GitHub Check: Test Spark (Scala 2.12.18) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.12.18) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.12.18) / groupby_tests
- GitHub Check: Test Spark (Scala 2.12.18) / batch_tests
- GitHub Check: Test Spark (Scala 2.13.17) / analyzer_tests
- GitHub Check: Test Spark (Scala 2.13.17) / spark_tests
- GitHub Check: Test Spark (Scala 2.13.17) / udafs_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_gcp_tests
- GitHub Check: Test Spark (Scala 2.13.17) / kv_store_tests
- GitHub Check: Test Spark (Scala 2.13.17) / groupby_tests
- GitHub Check: Test Spark (Scala 2.13.17) / fetcher_tests
- GitHub Check: Test Spark (Scala 2.13.17) / stats_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / cloud_aws_tests
- GitHub Check: Test Spark (Scala 2.13.17) / streaming_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / flink_tests
- GitHub Check: Test Spark (Scala 2.13.17) / join_tests
- GitHub Check: Test Spark (Scala 2.13.17) / batch_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.12.18) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_aws_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / cloud_gcp_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_commons_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / service_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / api_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / online_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / aggregator_tests
- GitHub Check: Test Non-Spark (Scala 2.13.17) / flink_tests
- GitHub Check: python_lint
- GitHub Check: python_tests
- GitHub Check: enforce_triggered_workflows
- GitHub Check: build
- GitHub Check: scala_compile_fmt_fix
🔇 Additional comments (6)
cloud_gcp/src/main/scala/ai/chronon/integrations/cloud_gcp/VertexOrchestration.scala (6)
48-84: LGTM.Training job submission logic is well-structured with proper validation and error handling.
182-205: LGTM.Model upload with proper async handling and logging.
210-241: LGTM.Deployment logic correctly implements the MVP 0→1 traffic rollout as intended. Future blue/green strategies acknowledged in PR objectives.
245-297: LGTM.Status checking methods handle states and errors comprehensively with appropriate fallback to UNKNOWN status.
299-447: LGTM.Helper methods cleanly construct Vertex AI resources with appropriate optional field handling and state mapping.
170-170: Verify model artifact path convention.The code assumes training outputs to
$outputDir/model, but this depends on the training script. Ensure training jobs write artifacts to the expectedmodelsubdirectory.#!/bin/bash # Search for documentation or examples of modelArtifactUri construction patterns rg -n -C3 'modelArtifactUri|model_artifact|training_output.*model' --type scala
Summary
Added a first cut implementation of adding the rails to submit training jobs & deployments with the Vertex platform. The implementation is fairly basic atm (e.g 0 to 1 traffic rollouts, support for additional deploy types like b/g). We can flesh these out in follow ups - the current implementation allows us to start building a basic end to end scaffolding and MVP.
Pulled in the Vertex SDK to trigger these. We needed the HTTP client for the Vertex predict calls (to support custom & published models - this isn't as easy to configure in the SDK) but using the SDK for training and deploy is a lot more ergonomic.
Testing
Tested using the local integration test that submits a training run, creates and endpoint, uploads a model and deploys it to an endpoint. Was able to validate that the model is up and can be hit for serving.
Checklist
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.