-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Problem
xarray.open_mfdataset accepts a string with wildcards, and then uses this janky bit of fsspec code to glob with it.
But it's pretty fragile - in particular it confusingly raises if you try to use glob syntax with an s3 URL without using the xarray zarr backend.
VirtualiZarr currently imports that private internal to do the same kind of globbing, but VirtualiZarr doesn't even have backends in the same way, which is why I attempted to improve the situation upstream (see pydata/xarray#9930).
Solution
However I realize now that a better way to improve xarray upstream might be to use obstore and obspec instead of fsspec, and make a robust internal utility in xarray (that doesn't raise a random exception for only one xarray backend) and which virtualizarr can safely import.
Therefore I think we should:
- vendor those internals into virtualizarr instead of importing them (soon because I think globbing remote urls from
open_virtual_mfdatasetis broken right now because of that exception), - iterate and improve them using obstore and obspec,
- eventually push the changes upstream so that xarray no longer needs fsspec for that.
cc the usual suspects @maxrjones @sharkinsspatial @kylebarron
EDIT: related to #568 too.