Skip to content

[Proposal] Support JSON format for file-metrics-collector #1744

@tenzen-y

Description

@tenzen-y

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]

Motivation

Currently, it is difficult to parse JSON format files by file-metrics-collector using regexp filter since file-metrics-collector is designed to use TEXT format files.
I believe if file-metrics-collector supports JSON format files, we can be further made Katib powerful because we can make use of JSON format metrics files without regexp more easily.
Therefore, I would like to support JSON format in file-metrics-collector, such as the following example, which is split by newlines.

{"foo": “bar", “fiz": “buz"…}
{“foo": “bar", “fiz": “buz"…}
{“foo": “bar", “fiz": “buz"…}
{“foo": “bar", “fiz": “buz"…}
…

This JSON format is also used in cloudml-hypertune recommended for use in GCP AI Platform or Vertex AI.

If you use a custom container for training or if you want to perform hyperparameter tuning with a framework other than TensorFlow, then you must use the cloudml-hypertune Python package to report your hyperparameter metric to AI Platform Training.

https://cloud.google.com/ai-platform/training/docs/using-hyperparameter-tuning#other_machine_learning_frameworks_or_custom_containers

Design

I'm thinking of the following Kubernetes API and webhook. Also, file-metrics-collector collects values whoose key is spec.objective.objectiveMetricName and spec.objective.additionalMetricNames from the metrcs file if FileSystemFileFormat is set Json.

+ type FileSystemFileFormat string
+
+ const (
+   TextFormat    FileSystemFileFormat = "Text"
+   JsonFormat    FileSystemFileFormat = "Json"
+ )

type FileSystemPath struct {
  Path string                     `json:"path,omitempty"`
  Kind FileSystemKind             `json:"kind,omitempty"`
+ FileFormat FileSystemFileFormat `json:"fileFormat,omitempty"`
}
func (g *DefaultValidator) validateMetricsCollector(inst *experimentsv1beta1.Experiment) error {
  mcSpec := inst.Spec.MetricsCollectorSpec
  mcKind := mcSpec.Collector.Kind
  ...
  switch mcKind {
  ...
  case commonapiv1beta1.FileCollector:
    ...
+     fileFormat := mcSpec.Source.FileSystemPath.FileSytemFileFormat
+     if fileFormat == "" {
+       fileFormat = commonapiv1beta1.TextFormat
+     } else if fileFormat != commonapiv1beta1.TextFormat && fileFormat != commonapiv1beta1.JsonFormat {
+         return return fmt.Errorf("The format of the metrics file is required by .spec.metricsCollectorSpec.source.fileSystemPath.fileFormat.")
+     }
  ...

Does it sound good to you? @kubeflow/wg-automl-leads

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions