-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix(replication): Correctly replicate commands even when OOM #2428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before this change, OOM in shard callbacks could have led to data inconsistency between the master and the replica. For example, commands which mutated data on 1 shard but failed on another, like `LMOVE`. After this change, callbacks that result in an OOM will correctly replicate their work (none, partial or complete) to replicas. Note that `MSET` and `MSETNX` required special handling, in that they are the only commands that can _create_ multiple keys, and so some of them can fail. Fixes #2381
if (result.status != OpStatus::OK) { | ||
// We log NOOP even for NO_AUTOJOURNAL commands because the non-success status could have been | ||
// due to OOM in a single shard, while other shards succeeded | ||
journal->RecordEntry(txid_, journal::Op::NOOP, db_index_, unique_shard_cnt_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out JournalStreamer::Start.
It does not replicate NOOP, but instead today we use this to trigger journal writes to sink. so this will not get to replica , did you run your tests with the flag enable_multi_shard_sync set to true ? I think this will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
How about I'll send a PING
, but mark the opcode as COMMAND
instead? Do you see any problem with that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw it does pass with --df enable_multi_shard_sync=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PING is ok.
Oh I see that the test is using proactor_threads=1 so there is only one shard.
Lets change the test to run on 2 shards. you will need to increase the max memory size as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see the test with proactor_threads=1, can you change it ? see my comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot about it :(
Done now.
|
||
|
||
@pytest.mark.asyncio | ||
async def test_policy_based_eviction_propagation(df_local_factory, df_seeder_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this test above, I asked Yue on his PR to add it and mark it as skip. You can remove remove his test.
Note that there is unsupported_types in seeder in his test because another bug that Kostas fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it messes up the diff, but the change is really only adding the test below
src/server/string_family.cc
Outdated
// journal [0, i) | ||
payload = make_pair("MSET", ArgSlice(&args[0], i)); | ||
} | ||
tx->LogJournalOnShard(op_args.shard, std::move(payload), tx->GetUniqueShardCnt(), false, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you still dont use RecordJournal function. why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this for later and did it now :)
Before this change, OOM in shard callbacks could have led to data inconsistency between the master and the replica. For example, commands which mutated data on 1 shard but failed on another, like
LMOVE
.After this change, callbacks that result in an OOM will correctly replicate their work (none, partial or complete) to replicas.
Note that
MSET
andMSETNX
required special handling, in that they are the only commands that can create multiple keys, and so some of them can fail.Fixes #2381