Skip to content

Conversation

chakaz
Copy link
Contributor

@chakaz chakaz commented Nov 19, 2024

We have an internal utility tool that we use to deserialize values in some use cases:

  • RESTORE
  • Cluster slot migration
  • RENAME, if the source and target shards are different

We recently changed this area of the code, which caused this regression as it only handled RDB / replication streams.

Fixes #4143

We have an internal utility tool that we use to deserialize values in
some use cases:

* `RESTORE`
* Cluster slot migration
* `RENAME`, if the source and target shards are different

We [recently](#3760)
changed this area of the code, which caused this regression as it only
handled RDB / replication streams.

Fixes #4143
@romange
Copy link
Collaborator

romange commented Nov 20, 2024

We will have to release a patch release for that. I would also want to add all the INFO improvements for 1.25.2

@chakaz chakaz requested a review from adiholden November 20, 2024 06:15

await push_config(json.dumps(generate_config(nodes)), [node.admin_client for node in nodes])

logging.debug(f"Generating huge {type}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check out test_big_containers how it uses StaticSeeder
if we can do the same here and use the capture to compare
also this will be one test case and not running for each data type

@pytest.mark.parametrize("type", ["list", "hash", "string", "set", "zset"])
@dfly_args({"proactor_threads": 2, "cluster_mode": "yes"})
@pytest.mark.asyncio
async def test_cluster_migration_huge_list(df_factory: DflyInstanceFactory, type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename test_cluster_migration_huge_list -> test_cluster_migration_huge_container
I see you test string data type above in params but its not clear to me what is
debug populate 1 k 10000 RAND TYPE string ELEMENTS 1000
what is the affect of elements param here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also lets add streams to tested types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the affect of elements param here?

It doesn't have any effect, I thought it's a "free test" so why not, but I removed it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also lets add streams to tested types

I don't have support for breaking streams, nor does StaticSeeder nor debug populate support streams :|

@@ -168,3 +169,41 @@ async def test_denyoom_commands(df_factory):

# mget should not be rejected
await client.execute_command("mget x")


@pytest.mark.parametrize("type", ["list", "hash", "string", "set", "zset"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do this test in unit test
see TEST_F(RdbTest, LoadHugeSet) for example
btw we can check the dump and restore command directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cleaner and with the seeder (your suggestion) also easier to do in pytests :)

config.streamed = true;
}

if (auto ec = FromOpaque(*opaque_res, config, &pv); ec) {
Copy link
Contributor

@adiholden adiholden Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove (std::error_code RdbLoaderBase::FromOpaque(const OpaqueObj& opaque, CompactObj* pv)) function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@chakaz chakaz requested a review from adiholden November 20, 2024 13:09
collection_size=10_000,
variance=1,
samples=1,
types=[type],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest instead of running a test case each data type you can pass all the types here
this will be a little faster ci and regression

@chakaz chakaz requested a review from adiholden November 20, 2024 13:29
@chakaz chakaz enabled auto-merge (squash) November 20, 2024 13:34
@chakaz chakaz merged commit 24a1ec6 into main Nov 20, 2024
9 checks passed
@chakaz chakaz deleted the chakaz/migration-huge-restore branch November 20, 2024 14:00
adiholden pushed a commit that referenced this pull request Nov 20, 2024
* fix: Huge entries fail to load outside RDB / replication

We have an internal utility tool that we use to deserialize values in
some use cases:

* `RESTORE`
* Cluster slot migration
* `RENAME`, if the source and target shards are different

We [recently](#3760)
changed this area of the code, which caused this regression as it only
handled RDB / replication streams.

Fixes #4143
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Huge lists do not migrate fully between cluster nodes
3 participants