Include digits and update unicode regex generation #115

dhdaines · 2024-07-06T21:14:59Z

The unicode-8.0.0 package has been deprecated for a while. The README also recommends to use regenerate to make regexes, which is much nicer than the way we were doing it before.

But also, a persistent annoyance with lunr-languages was that numbers were missing from wordCharacters in all the Latin and Cyrillic-based languages, while they are present in the default wordCharacters. (also, Indic-Arabic numerals are present for Arabic, Hindi, etc...). So this adds them back, thus fixing #66 and maybe some other bugs.

The problem of the trimmer not being run in the search pipeline persists but that's a lunr.js bug :) at least now things like "HAL9000" wil get indexed.

dhdaines · 2024-07-06T21:22:10Z

For some reason we weren't including combining diacritics, yet we were depending on them in the test, so those get added too.

McShelby · 2025-03-02T21:55:06Z

Your input here and in the Lunr repo is really helpful. I wished, this would be available in an official release. Thanks a lot!

dhdaines · 2025-03-03T01:23:22Z

Your input here and in the Lunr repo is really helpful. I wished, this would be available in an official release. Thanks a lot!

I have contemplated forking lunr, to just include (corrected) lunr-languages... and yet, there are only 24 hours in a day and 7 days in a week :-(

fix: replace deprecated unicode-8.0.0 with @unicode-8.0.0

bfd38a6

dhdaines added 5 commits July 6, 2024 17:23

chore: install and audit fix

087eb0f

fix: add numerals and use regenerate to make regex

68404df

chore: rebuild all the things

40d6347

fix: add combining diacritics to fix russian test

f49f79c

chore: rebuild all the things

a4a2c3d

dhdaines force-pushed the include_digits branch from 39cb0d4 to a4a2c3d Compare July 6, 2024 21:23

This was referenced Jul 6, 2024

Not able to search for just numbers in lunr.de #66

Open

Indexing and search pipelines are mismatched with language support yeraydiazdiaz/lunr.py#149

Open

McShelby mentioned this pull request Mar 2, 2025

search: improve utilization of Lunr lib for better results McShelby/hugo-theme-relearn#890

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include digits and update unicode regex generation #115

Include digits and update unicode regex generation #115

Uh oh!

dhdaines commented Jul 6, 2024 •

edited

Loading

Uh oh!

dhdaines commented Jul 6, 2024

Uh oh!

McShelby commented Mar 2, 2025

Uh oh!

dhdaines commented Mar 3, 2025

Uh oh!

Uh oh!

Include digits and update unicode regex generation #115

Are you sure you want to change the base?

Include digits and update unicode regex generation #115

Uh oh!

Conversation

dhdaines commented Jul 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhdaines commented Jul 6, 2024

Uh oh!

McShelby commented Mar 2, 2025

Uh oh!

dhdaines commented Mar 3, 2025

Uh oh!

Uh oh!

dhdaines commented Jul 6, 2024 •

edited

Loading