-
-
Notifications
You must be signed in to change notification settings - Fork 184
Description
Describe the bug
If there are multiple upstream backends and more than one will be tried by nginx, $upstream_response_time
will contain a comma-separated list of times.
If this is happening, the log line will be parsed incorrectly
nginx documentation for $upstream_response_time
:
keeps time spent on receiving the response from the upstream server; the time is kept in seconds with millisecond resolution. Times of several responses are separated by commas and colons like addresses in the $upstream_addr variable.
To Reproduce
Add multiple upstream backends (e.g. PHP-FPM) to nginx, break the first one and request a site using this backend.
Expected behavior
Log line is parsed properly and proper metrics are available
Configuration file (remove section, if not applicable):
namespaces:
- name: nginxlog
format: "$server_name $remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" $request_time $upstream_response_time $pipe $ssl_protocol/$ssl_cipher $request_id $host"
source:
syslog:
listen_address: "udp://127.0.0.1:8514"
format: "rfc3164"
tags: ["nginx"]
relabel_configs:
- target_label: "vhost"
from: "server_name"
- target_label: "ssl_protocol"
from: "ssl_protocol"
- target_label: "user_agent"
from: "http_user_agent"
whitelist: ["KeepAliveClient"]
Example log file (remove section, if not applicable):
www.example.com 10.0.0.1 - - [27/Oct/2021:06:00:14 +0200] "GET /websocket HTTP/1.1" 502 552 "-" "SomeUserAgentString" 0.000 0.000, 0.000, 0.000, 0.000 . TLSv1.3/TLS_AES_256_GCM_SHA384 1234567890 www.example.com
www.example.com 10.0.0.2 - - [27/Oct/2021:06:00:15 +0200] "GET /websocket HTTP/1.1" 502 552 "-" "AnotherUserAgent" 0.000 0.000, 0.000, 0.000, 0.000 . TLSv1.3/TLS_AES_256_GCM_SHA384 1234567890 www.example.com
www.example.com 10.0.0.3 - - [27/Oct/2021:06:00:15 +0200] "GET /websocket HTTP/1.1" 502 150 "-" "SomeUserAgentString" 0.001 0.000, 0.000, 0.000, 0.000 . TLSv1.3/TLS_AES_256_GCM_SHA384 1234567890 www.example.com
Metrics output (remove section, if not applicable):
Not exactly matching the log lines above, but the error is clearly visible, looking at ssl_protocol
, which should have a value like TLSv1.3
only
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="0.1"} 1116
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="0.25"} 1116
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="0.5"} 1116
nginxlog_http_response_time_seconds_hist_bucket{method="GET",ssl_protocol="0.000, 0.000 . TLSv1.3",status="502",user_agent="other",vhost="www.example.com",le="1"} 1116
Environment:
- Exporter version: 1.9.0
- OS (e.g. from /etc/os-release): Ubuntu 20.04.3 LTS
- Deployment method (e.g. Docker image, deb/rpm package, self-compiled, ...): binary
- Others: