ENH: for multiscale, use dask and auto-rechunk #45

psobolewskiPhD · 2025-05-16T15:05:29Z

Currently the plugin uses the zarr tifffile backend to read multiscale tiff-variants, which includes ndpi, svs, and other whole-slide image formats. Note: this is only supported with zarr<3)

In this PR, I wrap the arrays with dask, which has two primary advantages:

it allows for re-chunking to happen ~~(chunks='auto')~~ chunks='1 MiB' with a 1 MiB chunk mass, which is a major performance boost for ndpi images from Hamamatsu slide scanners which have native chunks of (8, 3840, 3). You can download some samples at:
https://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/
(I've also verified this with scans produced by JAX microscopy core.)
For a svs (native 256, 256, 3) chunks='1 MiB' yields up as 512, 512, 3 and is nice and performant.
The ndpis end up as either chunksize=(32, 8192, 3) or chunksize=(24, 11520, 3) and are also much improved.
Nota bene: dask by default aims for 128 MiB chunks -- we may consider making this configurable in napari: Preferences: consider adding dask config array.chunk-size along side the cache size napari#7924 -- which is not great for visualization, hence specifying 1 MiB
by using dask, we can take advantage of caching, which increases memory usage, but also contributes to improved performance.

Note: With this PR for the big ndpi i get more similar, though still slightly worse performance than with QuPath. There is a stutter at the resolution levels (without ASYNC) and with ASYNC it takes a split second to load the 0 level.
If I manually set the chunks, e.g. chunks=(2048, 2048, 3)) or chunks=(1024, 1024, 3)) then performance is bit better for the npdi and with ASYNC pretty much identical to QuPath.
But I think the auto-chunks are fine for the general use case.

psobolewskiPhD · 2025-05-16T15:14:30Z

I do find it interesting that chunks (1024, 1024) is so good when the native chunks are (8, 3840).
Could use group[0].chunks to get the native chunk size and then use that to set the dask chunk size.

cc: @jni

Also not sure we want to release the important fix, which was #43 before or after this PR?

psobolewskiPhD · 2025-05-16T15:41:33Z

Should the plugin just reduce the dask setting? and let it auto choose then? This still isn't great, because it keeps superwide shape, much wider than any display e.g.

dask.config.get('array.chunk-size')
Out[4]: '6 MiB'

viewer.layers[0].data[0]
Out[5]: dask.array<from-zarr, shape=(101376, 188160, 3), dtype=uint8, chunksize=(64, 30720, 3), chunktype=numpy.ndarray>

This is better than using native chunks, but still not great.
It feels like using multiples of the native chunk size does make sense, but I'm not sure how to make it universal if someone has a 4D multiscale image or something.

psobolewskiPhD · 2025-05-16T17:17:00Z

After some discussion on zulip: https://napari.zulipchat.com/#narrow/channel/212875-general/topic/lazy.20reading.20tiff-based.20WSI/near/518694288
I am leaning towards using:
data = [da.from_zarr(arr, chunks='1 MiB') for _, arr in group.arrays()]
which seems ok. The svs (native 256, 256, 3) ends up as 512, 512, 3 and is nice and performant. the ndpis end up as either chunksize=(32, 8192, 3) or chunksize=(24, 11520, 3), which are both fine and much better than the 128 MiB chunks from auto. I'm not firm on the 1 MiB vs say 2 MiB or even 3 MiB

psobolewskiPhD · 2025-05-16T17:32:11Z

For CMU-1
ndpi from : https://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/
svs from: https://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/

With data = [da.from_zarr(arr, chunks='1 MiB') for _, arr in group.arrays()] and ASYNC on:
performance:
SVS QuPath ~ SVS napari > NDPI QuPath > NDPI napari
but the difference for the ndpi isn't huge. It's fine honestly, it takes s split second to snap in the 0 resolution when you zoom in.

jni · 2025-05-17T00:56:24Z

One issue is that dask array performance to get a single chunk out is quite bad when the overall task graph has many (say millions) of chunks. afaik that issue remains unfixed. So I would probably err on the side of slightly bigger chunks, say 4MiB. How does that work for you performance-wise @psobolewskiPhD? Otherwise I think it's a good idea to merge and release this quickly.

psobolewskiPhD · 2025-05-17T01:35:05Z

I'll try to test ASAP, but I don't have a good idea for a real test other than eyeballing.
For sure 2 MiB is fine. With 4 MiB I suspect the ndpi will start to suffer because the shape will be more and more pathological--unless you want to only pan side to side 🤣

jni · 2025-05-17T01:38:27Z

I like your eyeball tests! You have a good eye! 😊

jni · 2025-05-17T01:39:00Z

(at any rate, I'm happy for you to self-merge as-is to get you unblocked. We can always iterate!)

jni · 2025-05-17T01:39:13Z

(and you could turn my comment into an issue. 😊)

psobolewskiPhD · 2025-05-17T02:23:19Z

I only have the smaller sample svs and ndpi.
with 3 MiB the SVS is (1024, 1024, 3) which is fine obviously. But the ndpi is (64, 16384, 3). It's still OK, but you start to notice it a bit more in terms of the time to snap in the level 0 when zooming with ASYNC.

4 MiB
SVS: (1170, 1024, 3) perfectly performant.
NDPI: (72, 18432, 3) similar to the 3 MiB? Not sure we can really make this one much better without hard-coding something.

The question is how likely is a 3D multiscale? because my understanding is for 3D you want smaller chunks.

The full size ones are ~60 GiB uncompressed? so then at 1 MiB it's 60K chunks, at 3 MiB it would be 20K chunks...

TimMonko · 2025-05-17T02:52:16Z

One issue is that dask array performance to get a single chunk out is quite bad when the overall task graph has many (say millions) of chunks.

Anything above 200ish GB I start to consider 'bigger than we can expect to work smoothly out of the box', so even with any of the chunk sizes mentioned, it would still be at most 200k chunks (with 1MiB), so I think whatever y'all deem reasonable would find me content.

The question is how likely is a 3D multiscale? because my understanding is for 3D you want smaller chunks.

I think think this is the direction light sheet is going -- but I have very little experience.

psobolewskiPhD · 2025-05-17T13:39:53Z

I think think this is the direction light sheet is going -- but I have very little experience.

I would assume light sheets are not using TIFF variants, but something like zarr or n5?

I'm tempted to leave this at 1 MiB for now, which was recommended by multiple people much smarter than me. And we can always adjust it upwards if the need arises. This might all become moot if we want to move to a zarr v3 world.

jni · 2025-05-17T15:07:25Z

Agreed! Thanks for all your work on this @psobolewskiPhD ! I'm on phone but I'll try to remember to merge and release in the morning.

psobolewskiPhD · 2025-05-24T14:29:19Z

Making draft until after #48 is merged.
Then we reconsider this PR in light of zarr3/dask issues.

psobolewskiPhD · 2025-05-29T22:21:55Z

Ok, we're going to need this.
I finally had a chance to test with some real ndpi, that are 7Gb on disk and ~70Gb uncompressed.
Like Hamamatsu-1.ndpi (from https://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/).
I can replicate the performance regression reported in cgohlke/tifffile#297 (comment) above using updated napari-tiff.
And with zarr3 visualizing in napari is brutally bad compared to zarr2. At least 5x worse (4-6-second delays vs. choppiness).

py313, zarr3, no async. brutal, even pan level 5 (lowest) or minor zoom blocks multi seconds, double-click zoom blocks few seconds
- async: very slow to update after pan on level 5, 5+ seconds
  - zoom to level 1 from 5: ~4 s
  - home from a crop of level 5, multiple seconds
py313, zarr2, no async, very slow but not blocking on level 5. home is fast. double-click to zoom is OK, scroll is choppy, but not horrendous. home from level 0 is fine.
- async, pan level 5 maybe 1 s refresh? zoom to level 1 fine, small delay

It's probably at least partially related to:

viewer.layers[0].data[5].shape
Out[1]: (3168, 5880, 3)

viewer.layers[0].data[5].chunks
Out[3]: (8, 120, 3)

:freeze:

With wrapping the arrays in a dask.array.from_zarr() with chunks='1 MiB' as in this PR gives:

viewer.layers[0].data[5]
Out[1]: dask.array<from-zarr, shape=(3168, 5880, 3), dtype=uint8, chunksize=(152, 2280, 3), chunktype=numpy.ndarray>

And now it's pretty alright with zarr3. Still worse than the equivalent with zarr2, but the user experience is passable for sure.
I will update the PR, as I don't see another avenue right now.

ianhi · 2025-06-06T17:11:07Z

@psobolewskiPhD I'm trying to wrok on these zarr 3 performance issues. Just so i understand the simplest reproducer here is to download a large tiff from openslide and open it up with napari-tiff main branch prior to this PR?

psobolewskiPhD · 2025-06-06T17:18:27Z

Hi @ianhi thanks for reaching out!
Yes, with the ndpi from https://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/ being particularly problematic as noted in my testing above -- all with release 0.6.1 napari and the latest release of this plugin. It would seem to be related to the number of small chunks in the ndpi (their shape isn't ideal in general) and the tifffile zarr store.
Other WSI like SVS seem better behaved with typical chunks either 256x256 or 512x512 for all data levels.

You may also want to look at the reproducers from cgohlke here:
cgohlke/tifffile#297 (comment)

psobolewskiPhD · 2025-06-14T21:17:13Z

Implementing this is a workaround for:

Split RGB or Split Stack on a zarr converts it to numpy (and takes a long time) napari#8008
Currently, (without using dask here, just zarr) If you try to split stack on a WSI loaded using this plugin, you will load everything into memory.

Of course, merging back doesn't work:

codecov · 2025-06-14T22:41:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.00%. Comparing base (ce361e7) to head (cd17ef1).

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
+ Coverage   79.95%   80.00%   +0.04%     
==========================================
  Files           8        8              
  Lines         459      460       +1     
==========================================
+ Hits          367      368       +1     
  Misses         92       92

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

psobolewskiPhD · 2025-10-18T22:44:48Z

Bumping this. I did some tests with the latest napari, tifffile, zarr, dask and still get the same behaviors, where using dask with 1MiB chunks is massively more performant for the very large NDPI i have access to with the "pathological" native chunks. Oddly, just doing some np.asarray(data[5]) or whatever naive tests zarr was faster than dask, but in napari the difference is starkly the other way around. I assume it has to do with the way slicing is implemented and leverages dask.

psobolewskiPhD added 2 commits May 15, 2025 16:12

for multiscale, use dask and auto-rechunk

443c3bd

fix typing in test

eeee30d

use chunks=('1 MiB')

05108ff

jni approved these changes May 17, 2025

View reviewed changes

jni mentioned this pull request May 17, 2025

Consider switching from as_zarr to memmap #46

Open

psobolewskiPhD marked this pull request as draft May 24, 2025 14:28

psobolewskiPhD and others added 2 commits June 14, 2025 18:37

update for changes from napari#48

1153e3f

Merge branch 'main' into dask_for_multiscale

5623cf5

psobolewskiPhD marked this pull request as ready for review June 14, 2025 23:15

psobolewskiPhD and others added 2 commits October 18, 2025 16:32

Merge branch 'main' into dask_for_multiscale

c4a37e4

Merge branch 'main' into dask_for_multiscale

cd17ef1

ENH: for multiscale, use dask and auto-rechunk #45

Are you sure you want to change the base?

ENH: for multiscale, use dask and auto-rechunk #45

Conversation

psobolewskiPhD commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psobolewskiPhD commented May 16, 2025

Uh oh!

psobolewskiPhD commented May 16, 2025

Uh oh!

psobolewskiPhD commented May 16, 2025

Uh oh!

psobolewskiPhD commented May 16, 2025

Uh oh!

jni commented May 17, 2025

Uh oh!

psobolewskiPhD commented May 17, 2025

Uh oh!

jni commented May 17, 2025

Uh oh!

jni commented May 17, 2025

Uh oh!

jni commented May 17, 2025

Uh oh!

psobolewskiPhD commented May 17, 2025

Uh oh!

TimMonko commented May 17, 2025

Uh oh!

psobolewskiPhD commented May 17, 2025

Uh oh!

jni commented May 17, 2025

Uh oh!

psobolewskiPhD commented May 24, 2025

Uh oh!

psobolewskiPhD commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianhi commented Jun 6, 2025

Uh oh!

psobolewskiPhD commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psobolewskiPhD commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

psobolewskiPhD commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

psobolewskiPhD commented May 16, 2025 •

edited

Loading

psobolewskiPhD commented May 29, 2025 •

edited

Loading

psobolewskiPhD commented Jun 6, 2025 •

edited

Loading

psobolewskiPhD commented Jun 14, 2025 •

edited

Loading

codecov bot commented Jun 14, 2025 •

edited

Loading