-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
Discussed in #11038
Originally posted by etienneschalk December 19, 2025
expand_dims creates object dtype for string coordinates instead of inferring string dtype
Summary
When creating string coordinates, expand_dims produces object dtype while other methods (DataArray constructor, assign_coords) correctly infer Unicode string dtype (<U*). This inconsistency is hard to detect and can cause subtle bugs.
Minimal reproducible example
import xarray as xr
import numpy as np
# Method 1: DataArray constructor → correct <U2 dtype
da1 = xr.DataArray([10, 20, 30], dims=["band"], coords={"band": ["b1", "b2", "b3"]})
print(da1.coords["band"].dtype) # <U2 ✓
# Method 2: assign_coords → correct <U2 dtype
da2 = xr.DataArray([10, 20, 30], dims=["band"]).assign_coords(band=["b1", "b2", "b3"])
print(da2.coords["band"].dtype) # <U2 ✓
# Method 3: expand_dims → unexpected object dtype
da3 = xr.DataArray(10).expand_dims({"band": ["b1", "b2", "b3"]})
print(da3.coords["band"].dtype) # object ✗Output:
<U2
<U2
object
Why this is problematic
1. The mismatch is invisible
equals() and identical() both return True, hiding the dtype difference:
da1.identical(da3) # True — but dtypes differ!Even element-wise comparison passes:
da1.coords["band"].values == da3.coords["band"].values # [True, True, True]Only explicit dtype inspection reveals the issue:
print(da1.coords["band"].dtype) # <U2
print(da3.coords["band"].dtype) # object2. object dtype propagates through concat
When concatenating, any object dtype "infects" the result:
da_string = xr.DataArray([1, 2, 3], dims=["x"], coords={"x": ["a", "b", "c"]}) # <U1
da_object = xr.DataArray(10).expand_dims({"x": ["d", "e", "f"]}) # object
# Concat string then object
result = xr.concat([da_string, da_object], dim="x")
print(result.coords["x"].dtype) # object — string dtype is lost!
# Concat object then string
result = xr.concat([da_object, da_string], dim="x")
print(result.coords["x"].dtype) # object — string dtype is lost!Expected behavior
expand_dims should infer string dtype from string inputs, consistent with using the constructor, or using assign_coords:
da = xr.DataArray(10).expand_dims({"band": ["b1", "b2", "b3"]})
print(da.coords["band"].dtype) # Expected: <U2, Actual: objectCurrent workaround
Explicitly reassign coordinates after expand_dims:
da = xr.DataArray(10).expand_dims({"band": ["b1", "b2", "b3"]})
da = da.assign_coords(band=np.array(da.coords["band"].values, dtype=str))
print(da.coords["band"].dtype) # <U2Environment
Version info
INSTALLED VERSIONS
------------------
commit: None
python: 3.13.2 (main, Mar 17 2025, 21:02:54) [Clang 20.1.0 ]
python-bits: 64
OS: Linux
OS-release: 5.15.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2025.9.1
pandas: 2.3.3
numpy: 2.3.4
scipy: 1.16.3
netCDF4: 1.7.3
pydap: None
h5netcdf: 1.7.3
h5py: 3.15.1
zarr: None
cftime: 1.6.5
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.1
distributed: None
matplotlib: 3.10.7
cartopy: 0.24.1
seaborn: None
numbagg: None
fsspec: 2025.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 80.9.0
pip: None
conda: None
pytest: 8.4.2
mypy: 1.18.2
IPython: 9.7.0
sphinx: None
Metadata
Metadata
Assignees
Labels
No labels