Skip to content

Conversation

@yousukseung
Copy link
Contributor

No description provided.

@yousukseung yousukseung requested a review from markdroth February 8, 2023 00:03
@markdroth markdroth changed the title Update A58 with ServerMetricRecorder and QPS. A51 update: add ServerMetricRecorder API and QPS field Feb 8, 2023
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Comments are all minor.

I'll let @ejona86 be the approver for this PR, since he was the approver for this gRFC.

Please let me know if you have any questions. Thanks!

@markdroth markdroth marked this pull request as ready for review February 13, 2023 19:03
@yousukseung yousukseung requested a review from ejona86 February 17, 2023 01:47
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! @ejona86 should also approve before merging.

@yousukseung
Copy link
Contributor Author

Updated after today's sync:

  • Use "per-request" and "per-server" everywhere. This doc already was using per-RPC|Query|Request interchangeably and I updated all to "per-request".
  • Java ServerMetricRecorder -> MetricRecorder
  • Moved "per-request recording" to before "per-server recording" to keep the current order.
  • Made "per-XXX reporting on client" into "per-XX reporting" to cover both the client and server, and restored back what's already there in Java sections on server-side reporting.

function recordUtilizationMetric(String name, double value);
// Records an opaque named metric measurement for the call.
function recordNamedMetricsMetric(String name, double value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The named_metrics and eps changes do not belong in this PR. We'll make those changes in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. I think I accidentally rebased on top of the follow-up change (eps/named metrics).

Comment on lines +503 to +504
// Records number of queries per second to the server in the range [0, infy).
// Values outside of the valid range are rejected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the WIP grpc java implementation, they are ignored, not rejected. Should we clarify here if by "rejected" we mean "ignored" - because rejected may imply the method could throw.

On the other hand, I'm a bit afraid if we just ignore the value. I agree that throwing is a bad idea, but just swallowing an error leads to reporting incorrect data. I don't think we can judge if skewing a particular metric is critical to a customer.

One example that come to mind: a customer is storing qps internally in an int, the service gets a huge number of requests, and the int gets overflowed. In this case, they'll call setQps() with a negative value, when in fact there was a huge spike. If another service goes down because of a qps spike, but our service doesn't report the qps increase, an engineer responding to the incident won't be able to identify the source of the request spike correctly.

I wonder if it'd make sense require logging a warning when an incorrect value is presented to the MetricRecorder/CallMetricRecorder. However, depending on how often the data is reported, it might lead to the logs getting flooded. Obviously, we can restrict how often to produce these warnings, but this may be going to far out-of-scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is for C++ and it's from comments in code so I don't think "rejected" may confuse. Java snippet does not have this word.

Regarding the range validation, I believe the caller can validate or log in such cases. Note that recorder APIs already ignore with no feedback today (before this change) and this PR just clarifies that. Today the value will be rejected during proto's range validation and/or when consumed by the client where neither is explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants