Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

grafana MAU metrics not working when using a background worker #14622

@janonym1

Description

@janonym1

Description

When using the grafana template the "MAU limits" metric does not work, because in a synapse setup with workers, the MAU is reported by the background worker instead of the synapse/master. This became really relevant with the deprecation of legacy metrics and there were already a lot of fixed for the grafana template: a651479

Steps to reproduce

-) Install synapse + workers with slavis playbook
-) install an external prometheus server and use the templated prometheus config
-) import grafana template and lview MAU Limits metric

A pretty typical prometheus template config (which gets generated with the playbook) looks something like:

- job_name: 'synapse'
    metrics_path: /metrics/synapse/main-process
    scheme: https
    basic_auth:
      username: prometheus
      password_file: /path/to/passwordfile
    static_configs:
      - targets: ['matrix.domain.com:443']
        labels:
          instance: "prod"
          job: "master"
          index: "0"
[...]
  - job_name: 'matrix-synapse-worker-background-0'
    metrics_path: /metrics/synapse/worker/background-0
    scheme: https
    basic_auth:
      username: prometheus
      password_file: /path/to/passwordfile
    static_configs:
      - targets: ['matrix.domain.com:443']
        labels:
          worker_id: background-0
          job: "background"
          app: generic_worker
          instance: "prod"

Most other metrics work fine

Homeserver

synapse homeserver with a lot of workers

Synapse Version

synapse 1.71

Installation Method

Docker (matrixdotorg/synapse)

Database

dedicated PostgreSQL 14 DB

Workers

Multiple workers

Platform

Ubuntu 20.04
12 cores, 20GB RAM, 1GBit/s Network, dedicated postgreSQL DB cluster
installed with Slavis ansible playbook running on a VM

Configuration

presence disabled, cache_factor 5.0

Relevant log output

"MAU Limits

No data

"

Anything else that would be useful to know?

the affected lines are L11299 and L11313

synapse_admin_mau_current{instance="$instance", job=~"(hhs_)?synapse"} does not work in my case but a workaround would be removing the job: synapse_admin_mau_current{instance="$instance"}. However, then I get one time series for every worker, which I think is unintended.

Specifying the job label to background works better: synapse_admin_mau_current{instance="$instance", job="background"} but I assume that breaks for most monolith synapse setups.

@reivilibre suggested using something like max over (job) { ... } but I am not sure how to get around this best.

One possible workaround that works nicely for me is just using max over the expression without using a job variable: max(synapse_admin_mau_current{instance="$instance"}) and max(synapse_admin_mau_max{instance="$instance"}).

However, I dont know if that also works for monolithic synapse instances or if there are any other drawbacks, since I am not well versed in writing grafana templates

Metadata

Metadata

Assignees

Labels

O-UncommonMost users are unlikely to come across this or unexpected workflowS-MinorBlocks non-critical functionality, workarounds exist.T-DefectBugs, crashes, hangs, security vulnerabilities, or other reported issues.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions