Skip to content

Conversation

royjacobson
Copy link
Contributor

@royjacobson royjacobson commented Apr 24, 2023

#908 added a path to schedule transactions immediately if the operator fiber was on the same thread as the shard of the relevant keys.

This caused a replication bug: When the replica would receive a command from the journal it could schedule it immediately, before the matching RDB record was loaded. Since the RDB reader is not using the regular transaction mechanisms, it's tricky to use the regular locking mechanism to fix the issue. Instead of locking, I fixed it by completely disabling inline scheduling when the DB is in a 'LOADING' state.

Close #1036

@dranikpg
Copy link
Contributor

dranikpg commented Apr 24, 2023

If possible, please make a benchmark

When we discussed inline scheduling, we actually saw that the time for accessing a thread local has an impact on performance - this is why we store the coordinator index and don't access it with proactorbase::me() (or was it even some other side effect 🤷🏻‍♂️ ) Though we changed thread_local to the simpler thread specifier in some cases (don't remember fully)

// will be scheduled before RdbLoader::LoadItemsBuffer is finished. We can't use the regular
// locking mechanism because RdbLoader is not using transactions.
if (coordinator_index_ == unique_shard_id_ &&
ServerState::tlocal()->gstate() != GlobalState::LOADING) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be more clear if you check !is_master and sync_in_progress. See the code in ServerFamily::Role

Copy link
Contributor Author

@royjacobson royjacobson Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have a good way to get a reference to the global Replica object - it's stored in ServerFamily and it's not thread safe to read as well. And if I check gstate() anyway then checking is_master is a bit redundant, no?

@royjacobson
Copy link
Contributor Author

If possible, please make a benchmark

When we discussed inline scheduling, we actually saw that the time for accessing a thread local has an impact on performance - this is why we store the coordinator index and don't access it with proactorbase::me() (or was it even some other side effect 🤷🏻‍♂️ ) Though we changed thread_local to the simpler thread specifier in some cases (don't remember fully)

looks good, I think -

SingleHopBench.txt

@royjacobson royjacobson force-pushed the InlineSchedulingRoy branch from b8e3392 to 309b186 Compare April 24, 2023 13:22
dranikpg
dranikpg previously approved these changes Apr 24, 2023
@romange
Copy link
Collaborator

romange commented Apr 26, 2023

I do not think we can merge this PR yet. See #1036 (comment)

romange and others added 2 commits May 9, 2023 16:38
@royjacobson
Copy link
Contributor Author

I fixed the other preemption bug. It was pretty hard to even confirm it exists; At the end I managed to create inconsistent replication states by
1 - purposefully inserting random sleeps into the journal callbacks loop and
2 - running the attached script ->

trigger.txt

I don't think that the script is not going into CI anytime soon 😅 But at least I could check that the problem is real and fixed by the patch.

@royjacobson
Copy link
Contributor Author

Also, a nice benefit I didn't realize before - this is pretty helpful when we replicate instances with the same number of shards because we always send the data to the correct thread.

@royjacobson royjacobson force-pushed the InlineSchedulingRoy branch from 2969be8 to 9453f72 Compare May 10, 2023 09:57
@adiholden
Copy link
Contributor

@royjacobson just make sure you merge only after Roman creates a tag for the new version

@royjacobson royjacobson merged commit 7adf379 into main May 21, 2023
romange added a commit that referenced this pull request Jun 1, 2023
* feat: run tx-schedule inline if the dest shard is on the same thread (#908)

The optimization is applied within ScheduleSingleHop call.

Signed-off-by: Roman Gershman <[email protected]>

* fix(server): Don't inline schedule when in LOADING

* Fix the another pre-emption bug with inline scheduling

* Better locking around journal callbacks

---------

Signed-off-by: Roman Gershman <[email protected]>
Co-authored-by: Roman Gershman <[email protected]>
@romange romange deleted the InlineSchedulingRoy branch June 7, 2023 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replication - journal change sometimes serialised before entry snapshot if we run tx-schedule inline
5 participants