-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Improve diff rendering for notebooks #20036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It is enforced, just not in a good way ;) Related issue: #14445 ruff/crates/ruff_notebook/src/notebook.rs Lines 278 to 284 in 7a44ea6
|
| // indicating a regular script file, all the lines will be in one "cell" under the `None` | ||
| // key. | ||
| let mut cells: BTreeMap<Option<OneIndexed>, TextRange> = BTreeMap::default(); | ||
| for line in source_code.line_starts() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts here:
- Iterating over all lines seems especially unfortunate if this isn't a notebook. Can't we just push a single entry if the notebook index is absent that spans the entire source code range instead of iterating over the lines?
- Are there cases where the notebook cells don't appear in source order? I'm asking because we don't need a
BTreeMapif it's guaranteed that we iterate over the cells in order anyway - This data structure seems very similar to
Notebook::cell_offsets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks, Notebook::cell_offsets does seem like what I need. I looked into adding a FileResolver method to get a CellOffsets, but it only works, or at least works easily, for ty files. Ruff's FileResolver only has access to a NotebookIndex, not the Notebook itself. Should I try to pass that down through Diagnostics, and EmitterContext?
I don't think we want to clone a whole Notebook, so I'm thinking of adding a new struct wrapping only the index and the cell offsets to store in the (renamed) Diagnostics::notebook_indexes field.
ruff/crates/ruff/src/diagnostics.rs
Lines 338 to 342 in 0b6ce1c
| let notebook_indexes = if let SourceKind::IpyNotebook(notebook) = transformed { | |
| FxHashMap::from_iter([(path.to_string_lossy().to_string(), notebook.into_index())]) | |
| } else { | |
| FxHashMap::default() | |
| }; |
I guess we could also considering moving the CellOffsets into the NotebookIndex, making it the wrapper struct. They're mostly accessed via Notebook::cell_offsets anyway.
But yes, we definitely don't need to iterate the lines for a normal file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea, I think we can reconstruct something more efficient than looping over all the lines by only looking at the unique entries in NotebookIndex::row_to_cell. That's probably the easiest thing to do.
| .last() | ||
| .map(|op| (op.old_range().start, op.new_range().start)) | ||
| .unwrap_or_default(); | ||
| for range in cells.values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could reduce the overhead for non notebooks by extracing the shared logic for rendering fixes into a separate method and:
- We call that method directly if source isn't a notebook (in which case we can skip all/most of the operations?)
- For notebooks, do whatever mapping is necessary, call that method for each cell
| 5 |- bar1 = !pwd | ||
| 5 |+ !pwd | ||
| 6 6 | bar2: str = !pwd | ||
| 1 1 | def f(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to show the cell number somewhere because I don't think it's guaranteed that the diagnostic cell is the same as the one for fixes.
I'm also curious on how the rendering looks if a diagnostic has fixes across multiple cells. We should add a test for this.
The UP049 example that you used to evaluate the fixes rendering could be a good example here too, just extend it so that the fixes span multiple cells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's a good point, in all of these cases the cell number matched. What do you think about the header lines I removed in this commit: 80f9b55? I was trying to match the horizontal line we draw for different sections, but I ended up ripping them out because I thought they were too noisy, especially for the single-cell cases in our tests. Another option I considered was something like the ::: in our diagnostics:
error[unused-import]: `os` imported but unused
--> notebook.ipynb:cell 1:2:8
|
1 | # cell 1
2 | import os
| ^^
|
::: notebook.ipynb:cell 2:2:8
|
1 | # cell 2
2 | import math
| ---- second cell
3 |
4 | print('hello world')
|
help: Remove unused import: `os`
I'll work on a test for these. Unfortunately I don't think we can use UP049 because the PEP-695 generics are restricted to a single class or function body, which I don't think can span cells. I'm sure I can find another rule, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like ::: notebook.ipynb:cell 2:2:8 except that I would strip everything except the cell number because it feels redundant
--> notebook.ipynb:cell 1:2:8
|
1 | # cell 1
2 | import os
| ^^
|
::: cell 2
|
1 | # cell 2
2 | import math
| ---- second cell
3 |
4 | print('hello world')
|
help: Remove unused import: `os`
Summary -- As noted in a code TODO, our `Diff` rendering code previously didn't have any special handling for notebooks. This was particularly obvious when the diffs were rendered right next to the corresponding diagnostic because the diagnostic used cell-based line numbers, while the diff was still using line numbers from the concatenated source. This PR updates the diff rendering to handle notebooks too. The first commit is a pure refactor, moving our rendering to a model more like the `DisplayList` type in `annotate-snippets`. Collecting a sequence of line structs to display later sets up for the second commit, which groups the diff operations produced by `similar` by notebook cell. This prevents output like this, which I was running into earlier: ``` ::: cell 2 1 | def bar(): 2 | "Context lines in cell 2" - n.array([1, 2, 3]) ::: cell 3 - n.array([4, 5, 6]) ::: cell 2 3 + np.array([1, 2, 3]) ::: cell 3 1 + np.array([4, 5, 6]) ``` Note the cell 2 deletion, cell 3 deletion, cell 2 insertion, cell 3 insertion pattern, instead of a combined version like this: ``` ::: cell 2 1 | def bar(): 2 | "Context lines in cell 2" - n.array([1, 2, 3]) 3 + np.array([1, 2, 3]) ::: cell 3 - n.array([4, 5, 6]) 1 + np.array([4, 5, 6]) ``` Test Plan -- Existing notebook tests
This is a good first step but is obviously incomplete because we still render context lines from other cells, which leads to repeated line numbers in the output.
last thing to do is remove the redundant headers if there are no changes
this isn't needed anymore because it's impossible for edits to span cells in this implementation
note that we can't just check ouput.is_empty(), like I tried initially. if an edit fires and deletes all of the input, output may be empty while we still need a diff
659f3d8 to
cd39932
Compare
|
I've updated the output formatting, and I believe also resolved Micha's other suggestions. As I mentioned here it seems a bit tricky to get access to a I didn't extract any code as suggested here, but I think most of the overhead for non-notebooks, and for notebooks too really, should be resolved by getting rid of the |
MichaReiser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
| | | ||
| help: Remove unused import: `os` | ||
| ℹ Safe fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not something we have to change as part of this PR. But I only just realized that our fix infrastructure always assumes that all changes belong to the file from the primary span.
How would the rendered diagnostic look like if we had a sub-diagnostic with a span pointing to a different file? I don't think it's something we have to solve now but it might be worth documenting if the rendered output looks confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened an issue to track this here: #20080.
## Summary
As noted in a code TODO, our `Diff` rendering code previously didn't
have any
special handling for notebooks. This was particularly obvious when the
diffs
were rendered right next to the corresponding diagnostic because the
diagnostic
used cell-based line numbers, while the diff was still using line
numbers from
the concatenated source. This PR updates the diff rendering to handle
notebooks
too.
The main improvements shown in the example below are:
- Line numbers are now remapped to be relative to their cell
- Context lines from other cells are suppressed
```
error[unused-import][*]: `math` imported but unused
--> notebook.ipynb:cell 2:2:8
|
1 | # cell 2
2 | import math
| ^^^^
3 |
4 | print('hello world')
|
help: Remove unused import: `math`
ℹ Safe fix
1 1 | # cell 2
2 |-import math
3 2 |
4 3 | print('hello world')
```
I tried a few different approaches here before finally just splitting
the notebook into separate text ranges by cell and diffing each one
separately. It seems to work and passes all of our tests, but I don't
know if it's actually enforced anywhere that a single edit doesn't span
cells. Such an edit would silently be dropped right now since it would
fail the `contains_range` check. I also feel like I may have overlooked
an existing way to partition a file into cells like this.
## Test Plan
Existing notebook tests, plus a new one in `ruff_db`
Summary
As noted in a code TODO, our
Diffrendering code previously didn't have anyspecial handling for notebooks. This was particularly obvious when the diffs
were rendered right next to the corresponding diagnostic because the diagnostic
used cell-based line numbers, while the diff was still using line numbers from
the concatenated source. This PR updates the diff rendering to handle notebooks
too.
The main improvements shown in the example below are:
I tried a few different approaches here before finally just splitting the notebook into separate text ranges by cell and diffing each one separately. It seems to work and passes all of our tests, but I don't know if it's actually enforced anywhere that a single edit doesn't span cells. Such an edit would silently be dropped right now since it would fail the
contains_rangecheck. I also feel like I may have overlooked an existing way to partition a file into cells like this.Test Plan
Existing notebook tests, plus a new one in
ruff_db