Skip to content

Use obstore / obspec for globbing in open_virtual_mfdataset #569

@TomNicholas

Description

@TomNicholas

Problem

xarray.open_mfdataset accepts a string with wildcards, and then uses this janky bit of fsspec code to glob with it.

But it's pretty fragile - in particular it confusingly raises if you try to use glob syntax with an s3 URL without using the xarray zarr backend.

VirtualiZarr currently imports that private internal to do the same kind of globbing, but VirtualiZarr doesn't even have backends in the same way, which is why I attempted to improve the situation upstream (see pydata/xarray#9930).

Solution

However I realize now that a better way to improve xarray upstream might be to use obstore and obspec instead of fsspec, and make a robust internal utility in xarray (that doesn't raise a random exception for only one xarray backend) and which virtualizarr can safely import.

Therefore I think we should:

  1. vendor those internals into virtualizarr instead of importing them (soon because I think globbing remote urls from open_virtual_mfdataset is broken right now because of that exception),
  2. iterate and improve them using obstore and obspec,
  3. eventually push the changes upstream so that xarray no longer needs fsspec for that.

cc the usual suspects @maxrjones @sharkinsspatial @kylebarron

EDIT: related to #568 too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingxarrayRequires changes to xarray upstream

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions