-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: Support synonyms in queries. Add FT.SYNUPDATE, FT.SYNDUMP #4837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ensitive search for synonyms was fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some comments:
Misclick |
const auto& group = index->GetSynonyms().UpdateGroup(group_id, terms); | ||
|
||
// Rebuild indices only for documents containing terms from the updated group | ||
index->RebuildForGroup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that inorder to support SKIPINITIALSCAN we just need to skip this if the flag is set.
Why did you decide to ignore it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this flag with redis_version:6.2.13.
It always updates the index.
Originally, I skipped rebuilding the index with this flag passed. But I decided to skip this flag after redis behavior investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might cause some bugs, so it would be better at least to create a separate PR that implements this behavior correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is not enough to skip it here, we need to save this in the field index and clear all data during remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, good that we discussed it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a bug
src/server/search/search_family.cc
Outdated
index_not_found.store(false, std::memory_order_relaxed); | ||
|
||
// Update synonym group in this shard | ||
const auto& group = index->GetSynonyms().UpdateGroup(group_id, terms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are calling update here and then doing rebuild. Take into account that remove operation during rebuild will use this new group in Tokenize method. So you will not remove some entries from text index. So, you need first to remove, then call UpdateGroup, then Add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm that you reproduced this bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior is correct because synonyms update as insert way.
FT.SYNUPDATE enlarges the existing group. It doesn't remove any entities; it inserts new ones and creates a superset of previous entities and new ones.
That's why it's the correct behavior. I updated the group, and after that, I can rebuild all existing documents, including the added synonyms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have already checked this on your branch. Use this snippet:
127.0.0.1:6379> json.set j1 . '{"text":"word"}'
OK
127.0.0.1:6379> json.set j2 . '{"text":"another"}'
OK
127.0.0.1:6379> ft.create index on json schema $.text AS text TEXT
OK
127.0.0.1:6379> ft.synupdate index new_group word another
OK
You can print the entries from entries_
using GetTerms method. And you will see that after rebuilding we have following entries inside the entries_
:
{" new_group", set<DocId>}, {"word", set<DocId>}, {"another", set<DocId>}
but it should contain only {" new_group", set<DocId>}
.
You can reproduce the right behavior with this snippet:
127.0.0.1:6379> ft.create index on json schema $.text AS text TEXT
OK
127.0.0.1:6379> ft.synupdate index new_group word another
OK
127.0.0.1:6379> json.set j1 . '{"text":"word"}'
OK
127.0.0.1:6379> json.set j2 . '{"text":"another"}'
OK
And then after all json.set check that entries_
map contains only this:
{" new_group", set<DocId>}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to fix rebuilding
feat: Added FT.SYNUPDATE: https://redis.io/docs/latest/commands/ft.synupdate/
feat: Added FT.SYNDUMP: https://redis.io/docs/latest/commands/ft.syndump/