-
-
Notifications
You must be signed in to change notification settings - Fork 129
docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feat/web/cluster-splitting-and-merging
Are you sure you want to change the base?
docs(web): starts internal doc on SearchSpace design, requirements, and analysis 🚂 #15161
Conversation
…nd analysis Build-bot: skip Test-bot: skip
User Test ResultsTest specification and instructions User tests are not required |
mcdurdin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps a lot in understanding the SearchSpace, SearchPath, SearchCluster types.
But I do have some questions and suggestions:
- It would help to describe the shape of these types (i.e. properties, methods, and particularly how Path differs from Cluster).
- Include a concrete example at the top, of a short series of key events + resulting SearchSpace types to illustrate how the types are used. This should be a common case rather than a pathological one!
- Even after reading, I am not really clear why SearchCluster exists; why do SearchPaths need grouping and how does this help?
- I am a little unclear on the names of the types - Space vs Path. Why is a Path an implementation of a Space?
- I am unclear how a SearchPath can 'extend' a SearchSpace given a SearchSpace is just an interface without implementation? Isn't the relationship between SearchPath and SearchSpace 'implements'?
- I guess a Cluster could be called a PathCluster or a PathGroup to clarify the relationship?
- It seems like a large part of the reason for these types is fat fingering at word boundaries. Is that right? It's never explictly stated, just obliquely when defining the problem.
Formatting nit: we generally wrap our .md files at 80 chars
| The `SearchSpace` interface exists to represent portions of the dynamically-generated graph used for correction-searching within the predictive-text engine. As new input is received, new extensions to previous `SearchSpace`s may be created to extend the graph's reach, appending newly-received input to the context token to be corrected. Loosely speaking, different instances of `SearchSpace` correspond to different potential tokenizations of the input and/or to different requirements for constructing and applying generated suggestions. | ||
|
|
||
| There are two implementations of this interface: | ||
| - `SearchPath`, which extends a `SearchSpace` by a single set of recent inputs affecting the range of represented text in the same manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- SearchPath is a single set of recent inputs -- how do you define the boundaries of this set? Is it 1 keystroke, 10, 100? And what does 'in the same manner' mean -- what are the actual effects?
| <!-- - To complicate matters further, note that the letters `c`, `v`, and `n` are also close to `b`. | ||
| - Suppose this leads to `van errors`, `NaN errors`, etc..., but also `cannery`, `Vannessa`, etc. --> | ||
|
|
||
| 2. Each individual `SearchSpace` should only model correction of inputs that result in tokens of the same codepoint length as each other. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me -- you talk about an individual SearchSpace but then say 'the same codepoint length as each other' -- what is the 'other' here?
|
|
||
| 2. It is not possible to guarantee that one keystroke will only extend a previous `SearchSpace` in one way. | ||
| - If the incoming keystroke produces `Transform`s that have different `insert` length without varying the left-deletion count, this _must_ result in multiple `SearchSpace`s, as the total codepoint length will vary accordingly. | ||
| - Also of note: if left-deleting, it is possible for a left-deletion to erase the token adjacent to the text insertion point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this point -- I would assume that a left-deletion would always be deleting the token adjacent to the text insertion point?
| For example, consider a case with two keystrokes, each of which has versions emitting insert strings of one and two characters. Taking two chars from one and one char from the other will result in a `SearchSpace` that models a total of two keystrokes that fully covers the two keys. | ||
|
|
||
| For such cases, any future keystrokes can extend both input sequences in the same manner. While the actual correction-text may differ, the net effect it has on the properties of a token necessary for correction and construction of suggestions is identical. The `SearchCluster` variant of `SearchSpace` exists for such cases, modeling the convergence of multiple `SearchPath`s and extending all of them together at once. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example doesn't make sense to me. I don't understand "each of which has versions emitting insert strings of one and two characters."
This PR aims to start an internal doc on the role of
SearchSpace,SearchPath, andSearchClusterin the correction-search process.At present, I don't claim it to be complete by any measure. But, "something" is better than "nothing" here, and this provides a chance to get some eyes on things early in order to determine what works as an explanation and what doesn't. Feedback appreciated, even while in draft mode.
Build-bot: skip
Test-bot: skip