Skip to content

Conversation

@farshidz
Copy link
Collaborator

@farshidz farshidz commented Aug 21, 2025

Change Summary

Implement typeahead search. (Phase1)

The typeahead API can:

  • Index popular queries to suggest from
  • Generate suggestions from partial or full user input

Suggestion are generated in two stages:

  1. Retrieval: Match all indexed queries where at least one input word matches a prefix of at least one query word, within a configured edit distance.
  2. Rank by a score function consisting of query popularity and the match BM25 score.

We will implement this feature in two phases, and this PR only covers phase 1.

Phase 1 (in this PR): Basic features

  • Add a new schema for typeahead, during index creation time
  • Indexing with query normalisation
  • Prefix query with fuzziness support, tunable weight on bm25 score and popularity
  • Support random number fields in a metadata map fields (for display purpose only, e.g. hit count, etc.)
  • Timestamp field to track the last updated epoch time
  • Add the schema to existing indexes at bootstrapping time.

Phase 2 (future work): Advanced features

  • Support score modifiers based on the metadata field, so user can define ranking based on random fields or multiple fields.
  • Support Vespa garbage collection mechanism to automatically delete queries based on a sliding window.. This is to facilitate any ETL pipeline.
  • Support schema upgrade (just for typeahead schema) to support new typeahead features for existing indexes
  • (TBD) Configurable/Smart normalisation?

Related Jira Ticket

https://s2search.atlassian.net/browse/MOSD-394

Checklist

  • Tests have been added for changes
  • Documentation has been updated
  • Breaking changes are clearly identified
  • Python client changes linked or N/A

For new field types:

  • Tests cover score modifier usage of this new type
  • Test indexes updated to cover the new type for all APIs (add docs, search, partial update, etc.)

For marqo indexes created by Marqo 2.23.0+
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Collaborator Author

@farshidz farshidz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@papa99do papa99do merged commit b806c7b into mainline Sep 4, 2025
36 checks passed
@papa99do papa99do deleted the farshid/type-ahead branch September 4, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants