Skip to content

Conversation

vyavdoshenko
Copy link
Contributor

@vyavdoshenko vyavdoshenko commented Mar 26, 2025

Copy link
Contributor

@BagritsevichStepan BagritsevichStepan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some comments:

@BagritsevichStepan
Copy link
Contributor

Misclick

const auto& group = index->GetSynonyms().UpdateGroup(group_id, terms);

// Rebuild indices only for documents containing terms from the updated group
index->RebuildForGroup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that inorder to support SKIPINITIALSCAN we just need to skip this if the flag is set.
Why did you decide to ignore it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this flag with redis_version:6.2.13.
It always updates the index.
Originally, I skipped rebuilding the index with this flag passed. But I decided to skip this flag after redis behavior investigation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might cause some bugs, so it would be better at least to create a separate PR that implements this behavior correctly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is not enough to skip it here, we need to save this in the field index and clear all data during remove

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, good that we discussed it here

Copy link
Contributor

@BagritsevichStepan BagritsevichStepan Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a bug

index_not_found.store(false, std::memory_order_relaxed);

// Update synonym group in this shard
const auto& group = index->GetSynonyms().UpdateGroup(group_id, terms);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are calling update here and then doing rebuild. Take into account that remove operation during rebuild will use this new group in Tokenize method. So you will not remove some entries from text index. So, you need first to remove, then call UpdateGroup, then Add

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that you reproduced this bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is correct because synonyms update as insert way.
FT.SYNUPDATE enlarges the existing group. It doesn't remove any entities; it inserts new ones and creates a superset of previous entities and new ones.
That's why it's the correct behavior. I updated the group, and after that, I can rebuild all existing documents, including the added synonyms.

Copy link
Contributor

@BagritsevichStepan BagritsevichStepan Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already checked this on your branch. Use this snippet:

127.0.0.1:6379> json.set j1 . '{"text":"word"}'
OK
127.0.0.1:6379> json.set j2 . '{"text":"another"}'
OK
127.0.0.1:6379> ft.create index on json schema $.text AS text TEXT
OK
127.0.0.1:6379> ft.synupdate index new_group word another
OK

You can print the entries from entries_ using GetTerms method. And you will see that after rebuilding we have following entries inside the entries_ :

{" new_group", set<DocId>}, {"word", set<DocId>}, {"another", set<DocId>}

but it should contain only {" new_group", set<DocId>}.

You can reproduce the right behavior with this snippet:

127.0.0.1:6379> ft.create index on json schema $.text AS text TEXT
OK
127.0.0.1:6379> ft.synupdate index new_group word another
OK
127.0.0.1:6379> json.set j1 . '{"text":"word"}'
OK
127.0.0.1:6379> json.set j2 . '{"text":"another"}'
OK

And then after all json.set check that entries_ map contains only this:

{" new_group", set<DocId>}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@BagritsevichStepan BagritsevichStepan self-requested a review April 1, 2025 07:23
Copy link
Contributor

@BagritsevichStepan BagritsevichStepan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix rebuilding

@vyavdoshenko vyavdoshenko merged commit be11fa0 into main Apr 1, 2025
10 checks passed
@vyavdoshenko vyavdoshenko deleted the bobik/add_search_synonyms branch April 1, 2025 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants