docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161

jahorton · 2025-11-13T19:17:11Z

This PR aims to start an internal doc on the role of SearchSpace, SearchPath, and SearchCluster in the correction-search process.

At present, I don't claim it to be complete by any measure. But, "something" is better than "nothing" here, and this provides a chance to get some eyes on things early in order to determine what works as an explanation and what doesn't. Feedback appreciated, even while in draft mode.

Build-bot: skip
Test-bot: skip

…nd analysis Build-bot: skip Test-bot: skip

keymanapp-test-bot · 2025-11-13T19:17:36Z

User Test Results

Test specification and instructions

User tests are not required

mcdurdin

This helps a lot in understanding the SearchSpace, SearchPath, SearchCluster types.

But I do have some questions and suggestions:

It would help to describe the shape of these types (i.e. properties, methods, and particularly how Path differs from Cluster).
Include a concrete example at the top, of a short series of key events + resulting SearchSpace types to illustrate how the types are used. This should be a common case rather than a pathological one!
Even after reading, I am not really clear why SearchCluster exists; why do SearchPaths need grouping and how does this help?
I am a little unclear on the names of the types - Space vs Path. Why is a Path an implementation of a Space?
I am unclear how a SearchPath can 'extend' a SearchSpace given a SearchSpace is just an interface without implementation? Isn't the relationship between SearchPath and SearchSpace 'implements'?
I guess a Cluster could be called a PathCluster or a PathGroup to clarify the relationship?
It seems like a large part of the reason for these types is fat fingering at word boundaries. Is that right? It's never explictly stated, just obliquely when defining the problem.

Formatting nit: we generally wrap our .md files at 80 chars

mcdurdin · 2025-11-22T06:16:04Z

web/src/engine/predictive-text/worker-thread/docs/search-spaces.md

+The `SearchSpace` interface exists to represent portions of the dynamically-generated graph used for correction-searching within the predictive-text engine.  As new input is received, new extensions to previous `SearchSpace`s may be created to extend the graph's reach, appending newly-received input to the context token to be corrected.  Loosely speaking, different instances of `SearchSpace` correspond to different potential tokenizations of the input and/or to different requirements for constructing and applying generated suggestions.
+
+There are two implementations of this interface:
+- `SearchPath`, which extends a `SearchSpace` by a single set of recent inputs affecting the range of represented text in the same manner.


SearchPath is a single set of recent inputs -- how do you define the boundaries of this set? Is it 1 keystroke, 10, 100? And what does 'in the same manner' mean -- what are the actual effects?

mcdurdin · 2025-11-22T06:21:36Z

web/src/engine/predictive-text/worker-thread/docs/search-spaces.md

+<!--   - To complicate matters further, note that the letters `c`, `v`, and `n` are also close to `b`.
+    - Suppose this leads to `van errors`, `NaN errors`, etc..., but also `cannery`, `Vannessa`, etc. -->
+
+2.  Each individual `SearchSpace` should only model correction of inputs that result in tokens of the same codepoint length as each other.


This confuses me -- you talk about an individual SearchSpace but then say 'the same codepoint length as each other' -- what is the 'other' here?

mcdurdin · 2025-11-22T06:23:19Z

web/src/engine/predictive-text/worker-thread/docs/search-spaces.md

+
+2.  It is not possible to guarantee that one keystroke will only extend a previous `SearchSpace` in one way.
+    - If the incoming keystroke produces `Transform`s that have different `insert` length without varying the left-deletion count, this _must_ result in multiple `SearchSpace`s, as the total codepoint length will vary accordingly.
+    - Also of note:  if left-deleting, it is possible for a left-deletion to erase the token adjacent to the text insertion point.


I don't understand this point -- I would assume that a left-deletion would always be deleting the token adjacent to the text insertion point?

mcdurdin · 2025-11-22T06:25:46Z

web/src/engine/predictive-text/worker-thread/docs/search-spaces.md

+For example, consider a case with two keystrokes, each of which has versions emitting insert strings of one and two characters.  Taking two chars from one and one char from the other will result in a `SearchSpace` that models a total of two keystrokes that fully covers the two keys.
+
+For such cases, any future keystrokes can extend both input sequences in the same manner.  While the actual correction-text may differ, the net effect it has on the properties of a token necessary for correction and construction of suggestions is identical.  The `SearchCluster` variant of `SearchSpace` exists for such cases, modeling the convergence of multiple `SearchPath`s and extending all of them together at once.


This example doesn't make sense to me. I don't understand "each of which has versions emitting insert strings of one and two characters."

docs(web): starts internal doc on SearchSpace design, requirements, a…

5005850

…nd analysis Build-bot: skip Test-bot: skip

github-project-automation bot added this to Keyman Nov 13, 2025

github-project-automation bot moved this to Todo in Keyman Nov 13, 2025

github-actions bot added web/ web/predictive-text/ labels Nov 13, 2025

jahorton requested review from ermshiperete, markcsinclair and mcdurdin November 13, 2025 19:17

github-actions bot added the docs label Nov 13, 2025

keymanapp-test-bot bot added the epic-autocorrect label Nov 13, 2025

keymanapp-test-bot bot changed the title ~~docs(web): starts internal doc on SearchSpace design, requirements, and analysis~~ docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 Nov 13, 2025

keymanapp-test-bot bot added this to the A19S16 milestone Nov 13, 2025

keyman-server modified the milestones: A19S16, A19S17 Nov 22, 2025

mcdurdin reviewed Nov 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161

docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161

jahorton commented Nov 13, 2025 •

edited

Loading

Uh oh!

keymanapp-test-bot bot commented Nov 13, 2025

Uh oh!

mcdurdin left a comment

Uh oh!

mcdurdin Nov 22, 2025

Uh oh!

mcdurdin Nov 22, 2025

Uh oh!

mcdurdin Nov 22, 2025

Uh oh!

mcdurdin Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		For example, consider a case with two keystrokes, each of which has versions emitting insert strings of one and two characters. Taking two chars from one and one char from the other will result in a `SearchSpace` that models a total of two keystrokes that fully covers the two keys.

		For such cases, any future keystrokes can extend both input sequences in the same manner. While the actual correction-text may differ, the net effect it has on the properties of a token necessary for correction and construction of suggestions is identical. The `SearchCluster` variant of `SearchSpace` exists for such cases, modeling the convergence of multiple `SearchPath`s and extending all of them together at once. No newline at end of file

Uh oh!

docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161

Are you sure you want to change the base?

docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161

Conversation

jahorton commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keymanapp-test-bot bot commented Nov 13, 2025

User Test Results

Uh oh!

mcdurdin left a comment

Choose a reason for hiding this comment

Uh oh!

mcdurdin Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

mcdurdin Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

mcdurdin Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

mcdurdin Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jahorton commented Nov 13, 2025 •

edited

Loading