Skip to content

Conversation

@ntBre
Copy link
Contributor

@ntBre ntBre commented Aug 21, 2025

Summary

As noted in a code TODO, our Diff rendering code previously didn't have any
special handling for notebooks. This was particularly obvious when the diffs
were rendered right next to the corresponding diagnostic because the diagnostic
used cell-based line numbers, while the diff was still using line numbers from
the concatenated source. This PR updates the diff rendering to handle notebooks
too.

The main improvements shown in the example below are:

  • Line numbers are now remapped to be relative to their cell
  • Context lines from other cells are suppressed
error[unused-import][*]: `math` imported but unused                             
 --> notebook.ipynb:cell 2:2:8                                                  
  |                                                                             
1 | # cell 2                                                                    
2 | import math                                                                 
  |        ^^^^                                                                 
3 |                                                                             
4 | print('hello world')                                                        
  |                                                                             
help: Remove unused import: `math`                                              
                                                                                
ℹ Safe fix                                                                      
1 1 | # cell 2                                                                  
2   |-import math                                                               
3 2 |                                                                           
4 3 | print('hello world')                                                      

I tried a few different approaches here before finally just splitting the notebook into separate text ranges by cell and diffing each one separately. It seems to work and passes all of our tests, but I don't know if it's actually enforced anywhere that a single edit doesn't span cells. Such an edit would silently be dropped right now since it would fail the contains_range check. I also feel like I may have overlooked an existing way to partition a file into cells like this.

Test Plan

Existing notebook tests, plus a new one in ruff_db

@ntBre ntBre added diagnostics Related to reporting of diagnostics. internal An internal refactor or improvement labels Aug 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@dhruvmanila
Copy link
Member

It seems to work and passes all of our tests, but I don't know if it's actually enforced anywhere that a single edit doesn't span cells.

It is enforced, just not in a good way ;)

Related issue: #14445

let cell_content = transformed
.get(start.to_usize()..end.to_usize())
.unwrap_or_else(|| {
panic!(
"Transformed content out of bounds ({start:?}..{end:?}) for cell at {idx:?}"
);
});

// indicating a regular script file, all the lines will be in one "cell" under the `None`
// key.
let mut cells: BTreeMap<Option<OneIndexed>, TextRange> = BTreeMap::default();
for line in source_code.line_starts() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts here:

  • Iterating over all lines seems especially unfortunate if this isn't a notebook. Can't we just push a single entry if the notebook index is absent that spans the entire source code range instead of iterating over the lines?
  • Are there cases where the notebook cells don't appear in source order? I'm asking because we don't need a BTreeMap if it's guaranteed that we iterate over the cells in order anyway
  • This data structure seems very similar to Notebook::cell_offsets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, Notebook::cell_offsets does seem like what I need. I looked into adding a FileResolver method to get a CellOffsets, but it only works, or at least works easily, for ty files. Ruff's FileResolver only has access to a NotebookIndex, not the Notebook itself. Should I try to pass that down through Diagnostics, and EmitterContext?

I don't think we want to clone a whole Notebook, so I'm thinking of adding a new struct wrapping only the index and the cell offsets to store in the (renamed) Diagnostics::notebook_indexes field.

let notebook_indexes = if let SourceKind::IpyNotebook(notebook) = transformed {
FxHashMap::from_iter([(path.to_string_lossy().to_string(), notebook.into_index())])
} else {
FxHashMap::default()
};

I guess we could also considering moving the CellOffsets into the NotebookIndex, making it the wrapper struct. They're mostly accessed via Notebook::cell_offsets anyway.

But yes, we definitely don't need to iterate the lines for a normal file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea, I think we can reconstruct something more efficient than looping over all the lines by only looking at the unique entries in NotebookIndex::row_to_cell. That's probably the easiest thing to do.

.last()
.map(|op| (op.old_range().start, op.new_range().start))
.unwrap_or_default();
for range in cells.values() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could reduce the overhead for non notebooks by extracing the shared logic for rendering fixes into a separate method and:

  • We call that method directly if source isn't a notebook (in which case we can skip all/most of the operations?)
  • For notebooks, do whatever mapping is necessary, call that method for each cell

5 |- bar1 = !pwd
5 |+ !pwd
6 6 | bar2: str = !pwd
1 1 | def f():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to show the cell number somewhere because I don't think it's guaranteed that the diagnostic cell is the same as the one for fixes.

I'm also curious on how the rendering looks if a diagnostic has fixes across multiple cells. We should add a test for this.

The UP049 example that you used to evaluate the fixes rendering could be a good example here too, just extend it so that the fixes span multiple cells.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's a good point, in all of these cases the cell number matched. What do you think about the header lines I removed in this commit: 80f9b55? I was trying to match the horizontal line we draw for different sections, but I ended up ripping them out because I thought they were too noisy, especially for the single-cell cases in our tests. Another option I considered was something like the ::: in our diagnostics:

error[unused-import]: `os` imported but unused
 --> notebook.ipynb:cell 1:2:8                
  |                                           
1 | # cell 1                                  
2 | import os                                 
  |        ^^                                 
  |                                           
 ::: notebook.ipynb:cell 2:2:8                
  |                                           
1 | # cell 2                                  
2 | import math                               
  |        ---- second cell                   
3 |                                           
4 | print('hello world')                      
  |                                           
help: Remove unused import: `os`              

I'll work on a test for these. Unfortunately I don't think we can use UP049 because the PEP-695 generics are restricted to a single class or function body, which I don't think can span cells. I'm sure I can find another rule, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like ::: notebook.ipynb:cell 2:2:8 except that I would strip everything except the cell number because it feels redundant

--> notebook.ipynb:cell 1:2:8                
  |                                           
1 | # cell 1                                  
2 | import os                                 
  |        ^^                                 
  |                                           
 ::: cell 2
  |                                           
1 | # cell 2                                  
2 | import math                               
  |        ---- second cell                   
3 |                                           
4 | print('hello world')                      
  |                                           
help: Remove unused import: `os`              

ntBre added 23 commits August 22, 2025 18:12
Summary
--

As noted in a code TODO, our `Diff` rendering code previously didn't have any
special handling for notebooks. This was particularly obvious when the diffs
were rendered right next to the corresponding diagnostic because the diagnostic
used cell-based line numbers, while the diff was still using line numbers from
the concatenated source. This PR updates the diff rendering to handle notebooks
too.

The first commit is a pure refactor, moving our rendering to a model more like
the `DisplayList` type in `annotate-snippets`. Collecting a sequence of line
structs to display later sets up for the second commit, which groups the diff
operations produced by `similar` by notebook cell. This prevents output like
this, which I was running into earlier:

```
 ::: cell 2
1 | def bar():
2 |     "Context lines in cell 2"
  -     n.array([1, 2, 3])
 ::: cell 3
  - n.array([4, 5, 6])
 ::: cell 2
3 +     np.array([1, 2, 3])
 ::: cell 3
1 + np.array([4, 5, 6])
```

Note the cell 2 deletion, cell 3 deletion, cell 2 insertion, cell 3 insertion
pattern, instead of a combined version like this:

```
 ::: cell 2
1 | def bar():
2 |     "Context lines in cell 2"
  -     n.array([1, 2, 3])
3 +     np.array([1, 2, 3])
 ::: cell 3
  - n.array([4, 5, 6])
1 + np.array([4, 5, 6])
```

Test Plan
--

Existing notebook tests
This is a good first step but is obviously incomplete because we still render
context lines from other cells, which leads to repeated line numbers in the
output.
last thing to do is remove the redundant headers if there are no changes
this isn't needed anymore because it's impossible for edits to span cells in
this implementation
note that we can't just check ouput.is_empty(), like I tried initially. if an
edit fires and deletes all of the input, output may be empty while we still need
a diff
@ntBre ntBre force-pushed the brent/notebook-diff branch from 659f3d8 to cd39932 Compare August 22, 2025 22:44
@ntBre
Copy link
Contributor Author

ntBre commented Aug 22, 2025

I've updated the output formatting, and I believe also resolved Micha's other suggestions. As I mentioned here it seems a bit tricky to get access to a Notebook to call Notebook::cell_offsets directly, but I think I've reconstructed something similar here.

I didn't extract any code as suggested here, but I think most of the overhead for non-notebooks, and for notebooks too really, should be resolved by getting rid of the BTreeMap and the loop over all of the source lines.

@ntBre ntBre marked this pull request as ready for review August 22, 2025 23:04
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

|
help: Remove unused import: `os`
ℹ Safe fix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something we have to change as part of this PR. But I only just realized that our fix infrastructure always assumes that all changes belong to the file from the primary span.

How would the rendered diagnostic look like if we had a sub-diagnostic with a span pointing to a different file? I don't think it's something we have to solve now but it might be worth documenting if the rendered output looks confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue to track this here: #20080.

@ntBre ntBre merged commit d0bcf56 into main Aug 25, 2025
35 checks passed
@ntBre ntBre deleted the brent/notebook-diff branch August 25, 2025 13:20
second-ed pushed a commit to second-ed/ruff that referenced this pull request Sep 9, 2025
## Summary

As noted in a code TODO, our `Diff` rendering code previously didn't
have any
special handling for notebooks. This was particularly obvious when the
diffs
were rendered right next to the corresponding diagnostic because the
diagnostic
used cell-based line numbers, while the diff was still using line
numbers from
the concatenated source. This PR updates the diff rendering to handle
notebooks
too.

The main improvements shown in the example below are:

- Line numbers are now remapped to be relative to their cell
- Context lines from other cells are suppressed

```
error[unused-import][*]: `math` imported but unused                             
 --> notebook.ipynb:cell 2:2:8                                                  
  |                                                                             
1 | # cell 2                                                                    
2 | import math                                                                 
  |        ^^^^                                                                 
3 |                                                                             
4 | print('hello world')                                                        
  |                                                                             
help: Remove unused import: `math`                                              
                                                                                
ℹ Safe fix                                                                      
1 1 | # cell 2                                                                  
2   |-import math                                                               
3 2 |                                                                           
4 3 | print('hello world')                                                      
```

I tried a few different approaches here before finally just splitting
the notebook into separate text ranges by cell and diffing each one
separately. It seems to work and passes all of our tests, but I don't
know if it's actually enforced anywhere that a single edit doesn't span
cells. Such an edit would silently be dropped right now since it would
fail the `contains_range` check. I also feel like I may have overlooked
an existing way to partition a file into cells like this.

## Test Plan

Existing notebook tests, plus a new one in `ruff_db`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diagnostics Related to reporting of diagnostics. internal An internal refactor or improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants