Skip to content

Conversation

@jahorton
Copy link
Contributor

@jahorton jahorton commented Nov 13, 2025

This PR aims to start an internal doc on the role of SearchSpace, SearchPath, and SearchCluster in the correction-search process.

At present, I don't claim it to be complete by any measure. But, "something" is better than "nothing" here, and this provides a chance to get some eyes on things early in order to determine what works as an explanation and what doesn't. Feedback appreciated, even while in draft mode.

Build-bot: skip
Test-bot: skip

@keymanapp-test-bot
Copy link

User Test Results

Test specification and instructions

User tests are not required

@keymanapp-test-bot keymanapp-test-bot bot changed the title docs(web): starts internal doc on SearchSpace design, requirements, and analysis docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 Nov 13, 2025
@keymanapp-test-bot keymanapp-test-bot bot added this to the A19S16 milestone Nov 13, 2025
@keyman-server keyman-server modified the milestones: A19S16, A19S17 Nov 22, 2025
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helps a lot in understanding the SearchSpace, SearchPath, SearchCluster types.

But I do have some questions and suggestions:

  • It would help to describe the shape of these types (i.e. properties, methods, and particularly how Path differs from Cluster).
  • Include a concrete example at the top, of a short series of key events + resulting SearchSpace types to illustrate how the types are used. This should be a common case rather than a pathological one!
  • Even after reading, I am not really clear why SearchCluster exists; why do SearchPaths need grouping and how does this help?
  • I am a little unclear on the names of the types - Space vs Path. Why is a Path an implementation of a Space?
  • I am unclear how a SearchPath can 'extend' a SearchSpace given a SearchSpace is just an interface without implementation? Isn't the relationship between SearchPath and SearchSpace 'implements'?
  • I guess a Cluster could be called a PathCluster or a PathGroup to clarify the relationship?
  • It seems like a large part of the reason for these types is fat fingering at word boundaries. Is that right? It's never explictly stated, just obliquely when defining the problem.

Formatting nit: we generally wrap our .md files at 80 chars

The `SearchSpace` interface exists to represent portions of the dynamically-generated graph used for correction-searching within the predictive-text engine. As new input is received, new extensions to previous `SearchSpace`s may be created to extend the graph's reach, appending newly-received input to the context token to be corrected. Loosely speaking, different instances of `SearchSpace` correspond to different potential tokenizations of the input and/or to different requirements for constructing and applying generated suggestions.

There are two implementations of this interface:
- `SearchPath`, which extends a `SearchSpace` by a single set of recent inputs affecting the range of represented text in the same manner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • SearchPath is a single set of recent inputs -- how do you define the boundaries of this set? Is it 1 keystroke, 10, 100? And what does 'in the same manner' mean -- what are the actual effects?

<!-- - To complicate matters further, note that the letters `c`, `v`, and `n` are also close to `b`.
- Suppose this leads to `van errors`, `NaN errors`, etc..., but also `cannery`, `Vannessa`, etc. -->

2. Each individual `SearchSpace` should only model correction of inputs that result in tokens of the same codepoint length as each other.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me -- you talk about an individual SearchSpace but then say 'the same codepoint length as each other' -- what is the 'other' here?


2. It is not possible to guarantee that one keystroke will only extend a previous `SearchSpace` in one way.
- If the incoming keystroke produces `Transform`s that have different `insert` length without varying the left-deletion count, this _must_ result in multiple `SearchSpace`s, as the total codepoint length will vary accordingly.
- Also of note: if left-deleting, it is possible for a left-deletion to erase the token adjacent to the text insertion point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this point -- I would assume that a left-deletion would always be deleting the token adjacent to the text insertion point?

Comment on lines +74 to +76
For example, consider a case with two keystrokes, each of which has versions emitting insert strings of one and two characters. Taking two chars from one and one char from the other will result in a `SearchSpace` that models a total of two keystrokes that fully covers the two keys.

For such cases, any future keystrokes can extend both input sequences in the same manner. While the actual correction-text may differ, the net effect it has on the properties of a token necessary for correction and construction of suggestions is identical. The `SearchCluster` variant of `SearchSpace` exists for such cases, modeling the convergence of multiple `SearchPath`s and extending all of them together at once. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't make sense to me. I don't understand "each of which has versions emitting insert strings of one and two characters."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants