-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feature(server): Bring back inline scheduling #1130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If possible, please make a benchmark When we discussed inline scheduling, we actually saw that the time for accessing a thread local has an impact on performance - this is why we store the coordinator index and don't access it with proactorbase::me() (or was it even some other side effect 🤷🏻♂️ ) Though we changed thread_local to the simpler thread specifier in some cases (don't remember fully) |
src/server/transaction.cc
Outdated
// will be scheduled before RdbLoader::LoadItemsBuffer is finished. We can't use the regular | ||
// locking mechanism because RdbLoader is not using transactions. | ||
if (coordinator_index_ == unique_shard_id_ && | ||
ServerState::tlocal()->gstate() != GlobalState::LOADING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be more clear if you check !is_master and sync_in_progress. See the code in ServerFamily::Role
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I have a good way to get a reference to the global Replica
object - it's stored in ServerFamily
and it's not thread safe to read as well. And if I check gstate()
anyway then checking is_master
is a bit redundant, no?
looks good, I think - |
b8e3392
to
309b186
Compare
I do not think we can merge this PR yet. See #1036 (comment) |
…908) The optimization is applied within ScheduleSingleHop call. Signed-off-by: Roman Gershman <[email protected]>
I fixed the other preemption bug. It was pretty hard to even confirm it exists; At the end I managed to create inconsistent replication states by I don't think that the script is not going into CI anytime soon 😅 But at least I could check that the problem is real and fixed by the patch. |
Also, a nice benefit I didn't realize before - this is pretty helpful when we replicate instances with the same number of shards because we always send the data to the correct thread. |
309b186
to
2969be8
Compare
2969be8
to
9453f72
Compare
@royjacobson just make sure you merge only after Roman creates a tag for the new version |
* feat: run tx-schedule inline if the dest shard is on the same thread (#908) The optimization is applied within ScheduleSingleHop call. Signed-off-by: Roman Gershman <[email protected]> * fix(server): Don't inline schedule when in LOADING * Fix the another pre-emption bug with inline scheduling * Better locking around journal callbacks --------- Signed-off-by: Roman Gershman <[email protected]> Co-authored-by: Roman Gershman <[email protected]>
#908 added a path to schedule transactions immediately if the operator fiber was on the same thread as the shard of the relevant keys.
This caused a replication bug: When the replica would receive a command from the journal it could schedule it immediately, before the matching RDB record was loaded. Since the RDB reader is not using the regular transaction mechanisms, it's tricky to use the regular locking mechanism to fix the issue. Instead of locking, I fixed it by completely disabling inline scheduling when the DB is in a 'LOADING' state.
Close #1036