Skip to content

Export TF/Tensorboard/TF Summaries to prometheus #722

@jlewi

Description

@jlewi

In TF TF.summary is the primary ways people export metrics tracking the performance of their models during training. This data can be visualized using tensorboard (see here).

A lot of the key signals; e.g. accuracy metrics are just time series.

Should we export summaries in prometheus format so that they can be visualized and collected using tools in the prometheus tool chain?

Potential Use Cases

  • For hyperparameter tuning(Katib) we'd like a generic way for the HP parameter infrastructure to get model metrics; prometheus could provide a standard interface that abstracts the details of metrics in a particular framework (PyTorch vs. TF).

  • We could use this data to support features like "Run until converged"

Implementation

I think implementation would be pretty straightforward we would just need a Python server to read TF.Events files and export metrics to prometheus.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions