feature(server): Bring back inline scheduling #1130

royjacobson · 2023-04-24T09:01:41Z

#908 added a path to schedule transactions immediately if the operator fiber was on the same thread as the shard of the relevant keys.

This caused a replication bug: When the replica would receive a command from the journal it could schedule it immediately, before the matching RDB record was loaded. Since the RDB reader is not using the regular transaction mechanisms, it's tricky to use the regular locking mechanism to fix the issue. Instead of locking, I fixed it by completely disabling inline scheduling when the DB is in a 'LOADING' state.

Close #1036

dranikpg · 2023-04-24T10:06:47Z

If possible, please make a benchmark

When we discussed inline scheduling, we actually saw that the time for accessing a thread local has an impact on performance - this is why we store the coordinator index and don't access it with proactorbase::me() (or was it even some other side effect 🤷🏻‍♂️ ) Though we changed thread_local to the simpler thread specifier in some cases (don't remember fully)

.github/workflows/ci.yml

adiholden · 2023-04-24T10:18:17Z

src/server/transaction.cc

+    // will be scheduled before RdbLoader::LoadItemsBuffer is finished. We can't use the regular
+    // locking mechanism because RdbLoader is not using transactions.
+    if (coordinator_index_ == unique_shard_id_ &&
+        ServerState::tlocal()->gstate() != GlobalState::LOADING) {


It will be more clear if you check !is_master and sync_in_progress. See the code in ServerFamily::Role

I don't think I have a good way to get a reference to the global Replica object - it's stored in ServerFamily and it's not thread safe to read as well. And if I check gstate() anyway then checking is_master is a bit redundant, no?

royjacobson · 2023-04-24T13:17:32Z

If possible, please make a benchmark

When we discussed inline scheduling, we actually saw that the time for accessing a thread local has an impact on performance - this is why we store the coordinator index and don't access it with proactorbase::me() (or was it even some other side effect 🤷🏻‍♂️ ) Though we changed thread_local to the simpler thread specifier in some cases (don't remember fully)

looks good, I think -

SingleHopBench.txt

romange · 2023-04-26T17:46:34Z

I do not think we can merge this PR yet. See #1036 (comment)

…908) The optimization is applied within ScheduleSingleHop call. Signed-off-by: Roman Gershman <[email protected]>

royjacobson · 2023-05-09T20:27:49Z

I fixed the other preemption bug. It was pretty hard to even confirm it exists; At the end I managed to create inconsistent replication states by
1 - purposefully inserting random sleeps into the journal callbacks loop and
2 - running the attached script ->

trigger.txt

I don't think that the script is not going into CI anytime soon 😅 But at least I could check that the problem is real and fixed by the patch.

royjacobson · 2023-05-09T20:29:29Z

Also, a nice benefit I didn't realize before - this is pretty helpful when we replicate instances with the same number of shards because we always send the data to the correct thread.

src/server/server_state.h

src/server/server_state.cc

src/server/journal/journal_slice.h

adiholden · 2023-05-14T07:22:45Z

@royjacobson just make sure you merge only after Roman creates a tag for the new version

* feat: run tx-schedule inline if the dest shard is on the same thread (#908) The optimization is applied within ScheduleSingleHop call. Signed-off-by: Roman Gershman <[email protected]> * fix(server): Don't inline schedule when in LOADING * Fix the another pre-emption bug with inline scheduling * Better locking around journal callbacks --------- Signed-off-by: Roman Gershman <[email protected]> Co-authored-by: Roman Gershman <[email protected]>

royjacobson requested review from dranikpg and adiholden April 24, 2023 09:01

adiholden reviewed Apr 24, 2023

View reviewed changes

.github/workflows/ci.yml Outdated Show resolved Hide resolved

adiholden reviewed Apr 24, 2023

View reviewed changes

royjacobson force-pushed the InlineSchedulingRoy branch from b8e3392 to 309b186 Compare April 24, 2023 13:22

dranikpg previously approved these changes Apr 24, 2023

View reviewed changes

romange and others added 2 commits May 9, 2023 16:38

feat: run tx-schedule inline if the dest shard is on the same thread (#…

f2194a4

…908) The optimization is applied within ScheduleSingleHop call. Signed-off-by: Roman Gershman <[email protected]>

fix(server): Don't inline schedule when in LOADING

3b898cd

royjacobson dismissed dranikpg’s stale review via 2969be8 May 10, 2023 06:48

royjacobson force-pushed the InlineSchedulingRoy branch from 309b186 to 2969be8 Compare May 10, 2023 06:48

adiholden reviewed May 10, 2023

View reviewed changes

src/server/server_state.h Outdated Show resolved Hide resolved

Fix the another pre-emption bug with inline scheduling

9453f72

royjacobson force-pushed the InlineSchedulingRoy branch from 2969be8 to 9453f72 Compare May 10, 2023 09:57

adiholden reviewed May 10, 2023

View reviewed changes

src/server/server_state.cc Outdated Show resolved Hide resolved

adiholden reviewed May 10, 2023

View reviewed changes

src/server/journal/journal_slice.h Show resolved Hide resolved

Better locking around journal callbacks

9e131ff

adiholden approved these changes May 14, 2023

View reviewed changes

royjacobson merged commit 7adf379 into main May 21, 2023

romange deleted the InlineSchedulingRoy branch June 7, 2023 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feature(server): Bring back inline scheduling #1130

feature(server): Bring back inline scheduling #1130

Uh oh!

royjacobson commented Apr 24, 2023 •

edited

Loading

Uh oh!

dranikpg commented Apr 24, 2023 •

edited

Loading

Uh oh!

Uh oh!

adiholden Apr 24, 2023

Uh oh!

royjacobson Apr 24, 2023 •

edited

Loading

Uh oh!

royjacobson commented Apr 24, 2023

Uh oh!

romange commented Apr 26, 2023

Uh oh!

royjacobson commented May 9, 2023

Uh oh!

royjacobson commented May 9, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adiholden commented May 14, 2023

Uh oh!

Uh oh!

feature(server): Bring back inline scheduling #1130

feature(server): Bring back inline scheduling #1130

Uh oh!

Conversation

royjacobson commented Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dranikpg commented Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adiholden Apr 24, 2023

Choose a reason for hiding this comment

Uh oh!

royjacobson Apr 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

royjacobson commented Apr 24, 2023

Uh oh!

romange commented Apr 26, 2023

Uh oh!

royjacobson commented May 9, 2023

Uh oh!

royjacobson commented May 9, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adiholden commented May 14, 2023

Uh oh!

Uh oh!

royjacobson commented Apr 24, 2023 •

edited

Loading

dranikpg commented Apr 24, 2023 •

edited

Loading

royjacobson Apr 24, 2023 •

edited

Loading