-
-
Notifications
You must be signed in to change notification settings - Fork 62
[WIP] Support for auto-completion and spelling correction in suggestions #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## suggestions_cleanup #994 +/- ##
=======================================================
- Coverage 58.13% 58.10% -0.03%
=======================================================
Files 101 102 +1
Lines 5384 5519 +135
Branches 2197 2263 +66
=======================================================
+ Hits 3130 3207 +77
- Misses 795 804 +9
- Partials 1459 1508 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e6e7d2c
to
84ad890
Compare
@veloman-yunkan I made a first quick run, here my remarks:
I keep thinking testing this on the command line with |
It is in the list as enttauschen (without diacritics) because suggestions are based on indexed data from which diacritics have been removed. |
@kelson42 What do you mean by that? It works for me. If a title suggestion is clicked the linked page is opened. Clicking on the completion or spelling suggestion changes the text in the search box. |
Do you mean automated testing? Otherwise how can command line usage of suggestion functionality be more efficient compared to interactive instant feedback? |
In a multiword query only the last word is spell-checked. |
bfdae6d
to
5a90b35
Compare
@kelson42 I guess some of the "issues" that you've experienced with spelling corrections could be caused by the lag in spelling corrections on large ZIM files (with the title vocabulary exceeding a few hundred words) since the temporary spell-checker database was being created on every suggestion request. I have now committed an optimization so that the (still temporary in-memory) spellchecker database is cached in suggestion searcher. |
@veloman-yunkan No, I mean
|
We never display suggestions without diacritics. The indexation is made without diacritics, but was we display is always with diacritics. So I don't understand the argumentation here. |
This is an additional reason why we should implement this first in |
6df44d2
to
8d72969
Compare
This is just a starting point for beginning the TDD process of implementing autocompletion and spelling correction support in libzim.
But the stated autocompletion functionality is not yet implemented. SuggestionSearch::getAutocompletionResults() doesn't even get called contrary to expectations. Need to debug it.
It turned out that SuggestionSearch::getAutocompletionResults() stub introduced in the previous commit was actually being called. However instead of returning an empty result set it delivered all entries because of a bug with the usage of zim::Archive::Entry::offset() that was copied from SuggestionSearch::getResults(). That bug has to be fixed too.
This commit somewhat reverts an earlier commit "Added result count limit to SuggestionSearcher::suggest()", and introduces support for auto-completion suggestions (and a similar door to spelling correction suggestions) in a backward compatible way.
Implemented SuggestionSearch::getAutocompletionSuggestions() via fake implementation of some of the newly introduced helper code that only works for the sole unit test addressing the new functionality.
Added more titles to the Suggestion.autoCompletionAndSpellingCorrection unit-test so that it can be enhanced with more checks.
But still works only on unit-test data.
Renamed the unit-test `Suggestion.autoCompletionAndSpellingCorrection` to `Suggestion.smartSuggestions`. Also renamed the macro `EXPECT_SUGGESTION_RESULTS()` to `EXPECT_SMART_SUGGESTION_RESULTS`.
The new unit-test covers SuggestionSearch::getAutocompletionSuggestions() and demonstrates its various shortcomings by passing (rather than failing) due to usage of the results returned by the current implementation as the expected outcome.
Filtered out stemmed terms from the autocompletion database
Autocompletion query is now case & diacritics insensitive. Ignoring diacritics in the query during autocompletion is probably wrong. However, in the current implementation it cannot be easily fixed, since autocompletion works off of the terms recorded in the title index database, and diacritics info is removed during indexing.
This was supposed to be a draft implementation but it doesn't work since support for spelling correction is not implemented in the InMemory backend of Xapian.
Made SpellingsDB work using a Glass Xapian database in an in-memory filesystem. This is only a temporary workaround for the proof-of-concept stage. The plan is to use a better spelling suggestion engine (nuspell) but that will require significant additional effort.
But that may result in duplicated suggestions as demonstrated by some test points in the Suggestion.smartSuggestions unit-test.
The results of obtaining all suggestion terms are memoized in SuggestionDataBase.
Made spellings DB a member of SuggestionsDataBase so that it is not recreated on every call of getSpellingCorrections().
cf894d8
to
5d00100
Compare
8d72969
to
2da49eb
Compare
Fixes #731