Skip to content

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Jan 12, 2024

The issue is that within Service::InvokeCmd, we copy to a local variable the ServerState::tlocal() before we invoke the command. After the invocation, it could be the case that the connection migrated. If that's true, accessing the local variable creates a data race (because the thread local variable leaked to a different thread, and now it can be accessed concurrently)

@kostasrim kostasrim self-assigned this Jan 12, 2024
@kostasrim kostasrim requested review from romange and dranikpg January 12, 2024 13:03
@kostasrim
Copy link
Contributor Author

kostasrim commented Jan 12, 2024

@romange @dranikpg IMO, I do not like the idea to introduce server_state variable within connection_context for the two reasons I describe above. It's better to just enforce explicit use of ServerState::tlocal() on execution paths that invoke commands. That way, even if the command migrates the connection, the code after it will still use the correct thread local.

p.s. The very reason that I am not happy with server_state in connection_context is because it allows the exact same problem as before (storing this to a local variable, migrate and data race)

@kostasrim kostasrim marked this pull request as ready for review January 12, 2024 13:06
@kostasrim kostasrim requested a review from romange January 16, 2024 08:57
// We are not sending any admin command in the monitor, and we do not want to
// do any processing if we don't have any waiting connections with monitor
// enabled on them - see https://redis.io/commands/monitor/
const MonitorsRepo& monitors = etl.Monitors();
if (!monitors.Empty() && (cid->opt_mask() & CO::ADMIN) == 0) {
if (!ServerState::SafeTLocal()->Monitors().Empty() && (cid->opt_mask() & CO::ADMIN) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment explaining the context behind SafeTLocal call

static ServerState* tlocal() {
return state_;
}

// Safe version. https://stackoverflow.com/a/75622732
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain here what Safe means. I do not think that this stack overflow link applies here (asm volatile is not needed). because we always use it after the call to a function that can not be inlined (Invoke()), hence a compiler won't be able to reorder a call to SafeTLocal to before Invoke.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep the link only in the implementation, the answer is too long and sophisticated to read for anyone reading the code (or is it just me 😆), instead we can describe in one sentence what safe means

@kostasrim kostasrim requested a review from romange January 16, 2024 17:18
invoke_time_usec >= etl.log_slower_than_usec) {
if (!(cid->opt_mask() & CO::BLOCKING) && conn != nullptr &&
// Use SafeTLocal() to avoid accessing the wrong thread local instance
ServerState::SafeTLocal()->GetSlowLog().IsEnabled() &&
Copy link
Collaborator

@romange romange Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last nit.
I think it we will more elegant and more performant that instead of
checking both IsEnabled() and invoke_time_usec >= ServerState::SafeTLocal()->log_slower_than_usec
we will introduce bool ServerState::ShouldLogSlowCmd(unsigned latency_usec) const that will do both at the same time.

romange
romange previously approved these changes Jan 16, 2024
@romange romange merged commit c887189 into main Jan 17, 2024
@romange romange deleted the fix_migration branch January 17, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants