Skip to content

expand_dims creates object dtype for string coordinates instead of inferring string dtype #11061

@dcherian

Description

@dcherian

Discussed in #11038

Originally posted by etienneschalk December 19, 2025

expand_dims creates object dtype for string coordinates instead of inferring string dtype

Summary

When creating string coordinates, expand_dims produces object dtype while other methods (DataArray constructor, assign_coords) correctly infer Unicode string dtype (<U*). This inconsistency is hard to detect and can cause subtle bugs.

Minimal reproducible example

import xarray as xr
import numpy as np

# Method 1: DataArray constructor → correct <U2 dtype
da1 = xr.DataArray([10, 20, 30], dims=["band"], coords={"band": ["b1", "b2", "b3"]})
print(da1.coords["band"].dtype)  # <U2 ✓

# Method 2: assign_coords → correct <U2 dtype
da2 = xr.DataArray([10, 20, 30], dims=["band"]).assign_coords(band=["b1", "b2", "b3"])
print(da2.coords["band"].dtype)  # <U2 ✓

# Method 3: expand_dims → unexpected object dtype
da3 = xr.DataArray(10).expand_dims({"band": ["b1", "b2", "b3"]})
print(da3.coords["band"].dtype)  # object ✗

Output:

<U2
<U2
object

Why this is problematic

1. The mismatch is invisible

equals() and identical() both return True, hiding the dtype difference:

da1.identical(da3)  # True — but dtypes differ!

Even element-wise comparison passes:

da1.coords["band"].values == da3.coords["band"].values  # [True, True, True]

Only explicit dtype inspection reveals the issue:

print(da1.coords["band"].dtype)  # <U2
print(da3.coords["band"].dtype)  # object

2. object dtype propagates through concat

When concatenating, any object dtype "infects" the result:

da_string = xr.DataArray([1, 2, 3], dims=["x"], coords={"x": ["a", "b", "c"]})  # <U1
da_object = xr.DataArray(10).expand_dims({"x": ["d", "e", "f"]})  # object

# Concat string then object 
result = xr.concat([da_string, da_object], dim="x")
print(result.coords["x"].dtype)  # object — string dtype is lost!

# Concat object then string
result = xr.concat([da_object, da_string], dim="x")
print(result.coords["x"].dtype)  # object — string dtype is lost!

Expected behavior

expand_dims should infer string dtype from string inputs, consistent with using the constructor, or using assign_coords:

da = xr.DataArray(10).expand_dims({"band": ["b1", "b2", "b3"]})
print(da.coords["band"].dtype)  # Expected: <U2, Actual: object

Current workaround

Explicitly reassign coordinates after expand_dims:

da = xr.DataArray(10).expand_dims({"band": ["b1", "b2", "b3"]})
da = da.assign_coords(band=np.array(da.coords["band"].values, dtype=str))
print(da.coords["band"].dtype)  # <U2

Environment

Version info
INSTALLED VERSIONS
------------------
commit: None
python: 3.13.2 (main, Mar 17 2025, 21:02:54) [Clang 20.1.0 ]
python-bits: 64
OS: Linux
OS-release: 5.15.0-139-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3

xarray: 2025.9.1
pandas: 2.3.3
numpy: 2.3.4
scipy: 1.16.3
netCDF4: 1.7.3
pydap: None
h5netcdf: 1.7.3
h5py: 3.15.1
zarr: None
cftime: 1.6.5
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.1
distributed: None
matplotlib: 3.10.7
cartopy: 0.24.1
seaborn: None
numbagg: None
fsspec: 2025.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 80.9.0
pip: None
conda: None
pytest: 8.4.2
mypy: 1.18.2
IPython: 9.7.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions