Skip to content

$upstream_response_time might contain multiple, comma-separated values #217

@saz

Description

@saz

Describe the bug
If there are multiple upstream backends and more than one will be tried by nginx, $upstream_response_time will contain a comma-separated list of times.
If this is happening, the log line will be parsed incorrectly

nginx documentation for $upstream_response_time:

keeps time spent on receiving the response from the upstream server; the time is kept in seconds with millisecond resolution. Times of several responses are separated by commas and colons like addresses in the $upstream_addr variable.

To Reproduce
Add multiple upstream backends (e.g. PHP-FPM) to nginx, break the first one and request a site using this backend.

Expected behavior
Log line is parsed properly and proper metrics are available

Configuration file (remove section, if not applicable):

namespaces:
  - name: nginxlog
    format: "$server_name $remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" $request_time $upstream_response_time $pipe $ssl_protocol/$ssl_cipher $request_id $host"
    source:
      syslog:
        listen_address: "udp://127.0.0.1:8514"
        format: "rfc3164"
        tags: ["nginx"]
    relabel_configs:
      - target_label: "vhost"
        from: "server_name"
      - target_label: "ssl_protocol"
        from: "ssl_protocol"
      - target_label: "user_agent"
        from: "http_user_agent"
        whitelist: ["KeepAliveClient"]

Example log file (remove section, if not applicable):

www.example.com 10.0.0.1 - - [27/Oct/2021:06:00:14 +0200] "GET /websocket HTTP/1.1" 502 552 "-" "SomeUserAgentString" 0.000 0.000, 0.000, 0.000, 0.000 . TLSv1.3/TLS_AES_256_GCM_SHA384 1234567890 www.example.com
www.example.com 10.0.0.2 - - [27/Oct/2021:06:00:15 +0200] "GET /websocket HTTP/1.1" 502 552 "-" "AnotherUserAgent" 0.000 0.000, 0.000, 0.000, 0.000 . TLSv1.3/TLS_AES_256_GCM_SHA384 1234567890 www.example.com
www.example.com 10.0.0.3 - - [27/Oct/2021:06:00:15 +0200] "GET /websocket HTTP/1.1" 502 150 "-" "SomeUserAgentString" 0.001 0.000, 0.000, 0.000, 0.000 . TLSv1.3/TLS_AES_256_GCM_SHA384 1234567890 www.example.com

Metrics output (remove section, if not applicable):
Not exactly matching the log lines above, but the error is clearly visible, looking at ssl_protocol, which should have a value like TLSv1.3 only

nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="0.1"} 1116
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="0.25"} 1116
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="0.5"} 1116
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="1"} 1116

Environment:

  • Exporter version: 1.9.0
  • OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
  • Deployment method (e.g. Docker image, deb/rpm package, self-compiled, ...): binary
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions