Tempesta & backend servers health statistics

Linked with #726 

# Scope

## Backends health status

Currently Tempesta reports the number of non 2xx responses from the backend servers only if [health checking](https://github.com/tempesta-tech/tempesta/wiki/Health-monitor) is switched on.

It's extremely useful for troubleshooting to know how many non 2xx responses were generated by each backend server as well as by Tempesta FW (e.g. @i-rinat recently observed many error responses from Tempesta on an automated tests, probably due to #940 ).

The configuration is TBD, but I propose to introduce a new configuration option
```
health_stat 400 5*;
```
to monitor `400` and all `5xx` error responses produced by Tempesta FW. The statistics for each of response code matching the list must appear in `/proc/tempesta/perfstat`.

There also should be global and/or per-backend configuration option
```
 health_stat_server 400 5*;
```
which shows similar statistics for the per-server procfs file. If health monitoring is enabled and uses the same error codes, then there should be no doubling statistics.

I place the task for 0.9 just as #940 because the monitoring will be very useful in debugging and testing the problem.

## Backend performance statisitcs

~Per vhost cache misses/hits.~ Agreed on the meeting that we can not account cache hits per server (upstream) and only should count 200 responses per server. Probably we'll implement per-vhost perf statistic, but that will be not only cache counters and we need a separate feature request for this.

## ~Tempesta TLS connection errors~

~https://www.ssllabs.com/ssltest/analyze.html?d=tempesta-tech.com reports some TLS issues, e.g. for iOS 6. However, with checking using a real device from the faulty list, I didn't reveal any issues.~

~We need to account TLS connection errors for the better observability.~

(we have https://github.com/tempesta-tech/tempesta/issues/1914 for TLS traceability)

## Tempesta FW performance statistics

At the moment we gather response time percentiles for each of the backend server, but not for Tempesta FW itself. Need to provide the same response time statistics as we provide for backend servers.

Need to provide avg, 90% and max duration of client TCP connection.

# Other issues

## Negative values in statistics

At the current master as the date of the issue I observe negative values in the statistics, when only couple of requests were processed by Tempesta (observed for the first line only):
```# cat /proc/tempesta/servers/default/127.0.0.1\:9090 
Minimal response time		: -1ms
Average response time		: 0ms
Median  response time		: 0ms
Maximum response time		: 0ms
```

## Hung socket buffers

The same number of socket buffers may appear in the site statistics for relatively long time, which looks fishy (our web site statistics):
```
# for i in `seq 1 10`; do grep 'Socket buffers in flight' perfstat ; sleep 1; done
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
Socket buffers in flight		: 49
```

# Testing

- [ ] Check there error statistics for Tempesta using wildcard and full error code matchers
- [ ] Check there error statistics for per-backend servers using wildcard and full error code matchers
- [ ] Check there error statistics for default backend servers using wildcard and full error code matchers
- [ ] Check overlapping staticstics with health monitoring


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tempesta & backend servers health statistics #1454

Scope

Backends health status

Backend performance statisitcs

Tempesta TLS connection errors

Tempesta FW performance statistics

Other issues

Negative values in statistics

Hung socket buffers

Testing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tempesta & backend servers health statistics #1454

Description

Scope

Backends health status

Backend performance statisitcs

Tempesta TLS connection errors

Tempesta FW performance statistics

Other issues

Negative values in statistics

Hung socket buffers

Testing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions