Improve diff rendering for notebooks #20036

ntBre · 2025-08-21T23:39:49Z

Summary

As noted in a code TODO, our Diff rendering code previously didn't have any
special handling for notebooks. This was particularly obvious when the diffs
were rendered right next to the corresponding diagnostic because the diagnostic
used cell-based line numbers, while the diff was still using line numbers from
the concatenated source. This PR updates the diff rendering to handle notebooks
too.

The main improvements shown in the example below are:

Line numbers are now remapped to be relative to their cell
Context lines from other cells are suppressed

error[unused-import][*]: `math` imported but unused                             
 --> notebook.ipynb:cell 2:2:8                                                  
  |                                                                             
1 | # cell 2                                                                    
2 | import math                                                                 
  |        ^^^^                                                                 
3 |                                                                             
4 | print('hello world')                                                        
  |                                                                             
help: Remove unused import: `math`                                              
                                                                                
ℹ Safe fix                                                                      
1 1 | # cell 2                                                                  
2   |-import math                                                               
3 2 |                                                                           
4 3 | print('hello world')

I tried a few different approaches here before finally just splitting the notebook into separate text ranges by cell and diffing each one separately. It seems to work and passes all of our tests, but I don't know if it's actually enforced anywhere that a single edit doesn't span cells. Such an edit would silently be dropped right now since it would fail the contains_range check. I also feel like I may have overlooked an existing way to partition a file into cells like this.

Test Plan

Existing notebook tests, plus a new one in ruff_db

github-actions · 2025-08-21T23:51:14Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

dhruvmanila · 2025-08-22T05:02:46Z

It seems to work and passes all of our tests, but I don't know if it's actually enforced anywhere that a single edit doesn't span cells.

It is enforced, just not in a good way ;)

Related issue: #14445

ruff/crates/ruff_notebook/src/notebook.rs

Lines 278 to 284 in 7a44ea6

    
           let cell_content = transformed 
        
               .get(start.to_usize()..end.to_usize()) 
        
               .unwrap_or_else(|| { 
        
                   panic!( 
        
                       "Transformed content out of bounds ({start:?}..{end:?}) for cell at {idx:?}" 
        
                   ); 
        
               });

MichaReiser · 2025-08-22T07:46:46Z

crates/ruff_db/src/diagnostic/render/full.rs

+        // indicating a regular script file, all the lines will be in one "cell" under the `None`
+        // key.
+        let mut cells: BTreeMap<Option<OneIndexed>, TextRange> = BTreeMap::default();
+        for line in source_code.line_starts() {


A few thoughts here:

Iterating over all lines seems especially unfortunate if this isn't a notebook. Can't we just push a single entry if the notebook index is absent that spans the entire source code range instead of iterating over the lines?

Are there cases where the notebook cells don't appear in source order? I'm asking because we don't need a BTreeMap if it's guaranteed that we iterate over the cells in order anyway

This data structure seems very similar to Notebook::cell_offsets.

Ah thanks, Notebook::cell_offsets does seem like what I need. I looked into adding a FileResolver method to get a CellOffsets, but it only works, or at least works easily, for ty files. Ruff's FileResolver only has access to a NotebookIndex, not the Notebook itself. Should I try to pass that down through Diagnostics, and EmitterContext?

I don't think we want to clone a whole Notebook, so I'm thinking of adding a new struct wrapping only the index and the cell offsets to store in the (renamed) Diagnostics::notebook_indexes field.

ruff/crates/ruff/src/diagnostics.rs

Lines 338 to 342 in 0b6ce1c

let notebook_indexes = if let SourceKind::IpyNotebook(notebook) = transformed {

FxHashMap::from_iter([(path.to_string_lossy().to_string(), notebook.into_index())])

} else {

FxHashMap::default()

};

I guess we could also considering moving the CellOffsets into the NotebookIndex, making it the wrapper struct. They're mostly accessed via Notebook::cell_offsets anyway.

But yes, we definitely don't need to iterate the lines for a normal file.

Another idea, I think we can reconstruct something more efficient than looping over all the lines by only looking at the unique entries in NotebookIndex::row_to_cell. That's probably the easiest thing to do.

MichaReiser · 2025-08-22T07:49:45Z

crates/ruff_db/src/diagnostic/render/full.rs

-            .last()
-            .map(|op| (op.old_range().start, op.new_range().start))
-            .unwrap_or_default();
+        for range in cells.values() {


I wonder if we could reduce the overhead for non notebooks by extracing the shared logic for rendering fixes into a separate method and:

We call that method directly if source isn't a notebook (in which case we can skip all/most of the operations?)

For notebooks, do whatever mapping is necessary, call that method for each cell

MichaReiser · 2025-08-22T07:53:54Z

crates/ruff_linter/src/snapshots/ruff_linter__linter__tests__unused_variable.snap

-5   |-    bar1 = !pwd
-  5 |+    !pwd
-6 6 |     bar2: str = !pwd
+1 1 | def f():


I think we have to show the cell number somewhere because I don't think it's guaranteed that the diagnostic cell is the same as the one for fixes.

I'm also curious on how the rendering looks if a diagnostic has fixes across multiple cells. We should add a test for this.

The UP049 example that you used to evaluate the fixes rendering could be a good example here too, just extend it so that the fixes span multiple cells.

Oh that's a good point, in all of these cases the cell number matched. What do you think about the header lines I removed in this commit: 80f9b55? I was trying to match the horizontal line we draw for different sections, but I ended up ripping them out because I thought they were too noisy, especially for the single-cell cases in our tests. Another option I considered was something like the ::: in our diagnostics:

error[unused-import]: `os` imported but unused --> notebook.ipynb:cell 1:2:8 | 1 | # cell 1 2 | import os | ^^ | ::: notebook.ipynb:cell 2:2:8 | 1 | # cell 2 2 | import math | ---- second cell 3 | 4 | print('hello world') | help: Remove unused import: `os`

I'll work on a test for these. Unfortunately I don't think we can use UP049 because the PEP-695 generics are restricted to a single class or function body, which I don't think can span cells. I'm sure I can find another rule, though.

I like ::: notebook.ipynb:cell 2:2:8 except that I would strip everything except the cell number because it feels redundant

--> notebook.ipynb:cell 1:2:8 | 1 | # cell 1 2 | import os | ^^ | ::: cell 2 | 1 | # cell 2 2 | import math | ---- second cell 3 | 4 | print('hello world') | help: Remove unused import: `os`

Summary -- As noted in a code TODO, our `Diff` rendering code previously didn't have any special handling for notebooks. This was particularly obvious when the diffs were rendered right next to the corresponding diagnostic because the diagnostic used cell-based line numbers, while the diff was still using line numbers from the concatenated source. This PR updates the diff rendering to handle notebooks too. The first commit is a pure refactor, moving our rendering to a model more like the `DisplayList` type in `annotate-snippets`. Collecting a sequence of line structs to display later sets up for the second commit, which groups the diff operations produced by `similar` by notebook cell. This prevents output like this, which I was running into earlier: ``` ::: cell 2 1 | def bar(): 2 | "Context lines in cell 2" - n.array([1, 2, 3]) ::: cell 3 - n.array([4, 5, 6]) ::: cell 2 3 + np.array([1, 2, 3]) ::: cell 3 1 + np.array([4, 5, 6]) ``` Note the cell 2 deletion, cell 3 deletion, cell 2 insertion, cell 3 insertion pattern, instead of a combined version like this: ``` ::: cell 2 1 | def bar(): 2 | "Context lines in cell 2" - n.array([1, 2, 3]) 3 + np.array([1, 2, 3]) ::: cell 3 - n.array([4, 5, 6]) 1 + np.array([4, 5, 6]) ``` Test Plan -- Existing notebook tests

This is a good first step but is obviously incomplete because we still render context lines from other cells, which leads to repeated line numbers in the output.

last thing to do is remove the redundant headers if there are no changes

this isn't needed anymore because it's impossible for edits to span cells in this implementation

note that we can't just check ouput.is_empty(), like I tried initially. if an edit fires and deletes all of the input, output may be empty while we still need a diff

ntBre · 2025-08-22T22:54:36Z

I've updated the output formatting, and I believe also resolved Micha's other suggestions. As I mentioned here it seems a bit tricky to get access to a Notebook to call Notebook::cell_offsets directly, but I think I've reconstructed something similar here.

I didn't extract any code as suggested here, but I think most of the overhead for non-notebooks, and for notebooks too really, should be resolved by getting rid of the BTreeMap and the loop over all of the source lines.

MichaReiser

Nice

MichaReiser · 2025-08-23T12:47:30Z

crates/ruff_db/src/diagnostic/render/full.rs

+          |
+        help: Remove unused import: `os`
+
+        ℹ Safe fix


Not something we have to change as part of this PR. But I only just realized that our fix infrastructure always assumes that all changes belong to the file from the primary span.

How would the rendered diagnostic look like if we had a sub-diagnostic with a span pointing to a different file? I don't think it's something we have to solve now but it might be worth documenting if the rendered output looks confusing.

Opened an issue to track this here: #20080.

## Summary As noted in a code TODO, our `Diff` rendering code previously didn't have any special handling for notebooks. This was particularly obvious when the diffs were rendered right next to the corresponding diagnostic because the diagnostic used cell-based line numbers, while the diff was still using line numbers from the concatenated source. This PR updates the diff rendering to handle notebooks too. The main improvements shown in the example below are: - Line numbers are now remapped to be relative to their cell - Context lines from other cells are suppressed ``` error[unused-import][*]: `math` imported but unused --> notebook.ipynb:cell 2:2:8 | 1 | # cell 2 2 | import math | ^^^^ 3 | 4 | print('hello world') | help: Remove unused import: `math` ℹ Safe fix 1 1 | # cell 2 2 |-import math 3 2 | 4 3 | print('hello world') ``` I tried a few different approaches here before finally just splitting the notebook into separate text ranges by cell and diffing each one separately. It seems to work and passes all of our tests, but I don't know if it's actually enforced anywhere that a single edit doesn't span cells. Such an edit would silently be dropped right now since it would fail the `contains_range` check. I also feel like I may have overlooked an existing way to partition a file into cells like this. ## Test Plan Existing notebook tests, plus a new one in `ruff_db`

ntBre added diagnostics Related to reporting of diagnostics. internal An internal refactor or improvement labels Aug 21, 2025

MichaReiser reviewed Aug 22, 2025

View reviewed changes

ntBre added 23 commits August 22, 2025 18:12

factor direct writes into a sequence of DisplayLines

a909314

remove underline style

fd9ebf2

add test showing incorrect line numbers

85ece50

add Diff::notebook_index, remap cell numbers

3411598

This is a good first step but is obviously incomplete because we still render context lines from other cells, which leads to repeated line numbers in the output.

use hlines to separate cells

1496d79

partition cells

8188c7c

last thing to do is remove the redundant headers if there are no changes

only print header in a new group and cell

661c362

still apply edits at the start of input

d301a59

allow equality on the end too

5d1148d

use contains_range

2a0eeea

don't remap again in Display after partitioning cells

0cf7cce

accept snapshots

9905e07

drop the cell header

0988390

this isn't needed anymore because it's impossible for edits to span cells in this implementation

use a BTreeMap to ensure cells are sorted

e290fd4

revert unrelated changes

6dc6541

add a test for one diagnostic with edits spanning multiple cells

4c15e3c

add a cell header, update ruff_db snapshot

95b5771

accept snapshots

b2dd9c7

don't print headings, or diff, empty cells

b50436b

note that we can't just check ouput.is_empty(), like I tried initially. if an edit fires and deletes all of the input, output may be empty while we still need a diff

avoid line iteration for regular files

5f159f3

use a structure more like Notebook::cell_offsets

042dbd8

revert unused line_starts method

cd39932

ntBre force-pushed the brent/notebook-diff branch from 659f3d8 to cd39932 Compare August 22, 2025 22:44

ntBre marked this pull request as ready for review August 22, 2025 23:04

ntBre requested review from carljm, dcreager, dhruvmanila and sharkdp as code owners August 22, 2025 23:04

ntBre removed request for carljm, dcreager and sharkdp August 22, 2025 23:04

MichaReiser approved these changes Aug 23, 2025

View reviewed changes

ntBre mentioned this pull request Aug 25, 2025

Diff rendering assumptions #20080

Open

ntBre merged commit d0bcf56 into main Aug 25, 2025
35 checks passed

ntBre deleted the brent/notebook-diff branch August 25, 2025 13:20

	let notebook_indexes = if let SourceKind::IpyNotebook(notebook) = transformed {
	FxHashMap::from_iter([(path.to_string_lossy().to_string(), notebook.into_index())])
	} else {
	FxHashMap::default()
	};

Improve diff rendering for notebooks #20036

Improve diff rendering for notebooks #20036

Uh oh!

Conversation

ntBre commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

github-actions bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Uh oh!

dhruvmanila commented Aug 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ntBre commented Aug 22, 2025

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ntBre commented Aug 21, 2025 •

edited

Loading

github-actions bot commented Aug 21, 2025 •

edited

Loading

`ruff-ecosystem` results