Skip to content

2026.4.0 breaks pickling with backends.scipy_ #11323

@SoundDesignerToBe

Description

@SoundDesignerToBe

What happened?

Switching from 2026.2.0 to 2026.4.0 breaks some pickling backend for netcdf files in multi-processing (concurrent.future.ProcessPoolExecutor).

Quoting Claude:

The error is a classic pickle-identity mismatch: the instance's class qualname is xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file, but when pickle imports that path, it finds a different class object. Most often this happens when a class is created inside a function/method (so each call rebuilds it) or when scipy's netcdf_file shape changes the class definition path.
Root cause

Different xarray versions ship different flush_only_netcdf_file class layouts:

  • env A: xarray==2026.2.0 — flush_only_netcdf_file is a module-level class at xarray/backends/scipy_.py:114. Its __qualname__ is just flush_only_netcdf_file. Pickle resolves it cleanly via the module attribute.
  • env B: xarray==2026.4.0 — the error message reveals it has been moved into a _PickleWorkaround class: xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file. Pickle's qualname walk (module →
    _PickleWorkaround → flush_only_netcdf_file) returns a class object that is is-different from the instance's actual class — hence "it's not the same object as ... _PickleWorkaround.flush_only_netcdf_file".
    That identity-mismatch is the textbook symptom of a class created inside a function/factory body rather than at class scope: each call to the factory produces a fresh class object with the same qualname, but instances created in one call can't be unpickled by a lookup that finds the class object from a different call. xarray 2026.4.0 evidently restructured the workaround in a way that makes the class identity unstable across process boundaries.

I let Claude compare the releases and write a minimal reproduction example posted below.

What did you expect to happen?

No pickling error.

Minimal Complete Verifiable Example

"""
Minimal reproduction: xarray's scipy backend builds `flush_only_netcdf_file`
inside `_open_scipy_netcdf()` and registers it on `_PickleWorkaround` via a
class attribute that gets overwritten on every call. After a second call, the
first instance's class is no longer reachable via its own __qualname__, so
pickle's class-identity check fails:

    _pickle.PicklingError: Can't pickle
    <class 'xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file'>:
    it's not the same object as
    xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file

Trigger pattern:
    1. open a scipy-backed Dataset from a file-like object   →  class C1 stored
       on _PickleWorkaround.flush_only_netcdf_file
    2. open another scipy-backed Dataset                     →  class C2
       overwrites the attribute (and rewrites C2.__qualname__)
    3. pickle the FIRST Dataset                              →  qualname walk
       returns C2; the instance's class is C1; PicklingError

Bisects to xarray v2026.04.0:

    git diff v2026.02.0..v2026.04.0 -- xarray/backends/scipy_.py

In v2026.02.0 `flush_only_netcdf_file` was a module-level class. In v2026.04.0
it moved inside `_open_scipy_netcdf()`:

    class _PickleWorkaround:
        flush_only_netcdf_file: type[scipy.io.netcdf_file]

        @classmethod
        def add_cls(cls, new_class):
            setattr(cls, new_class.__name__, new_class)
            new_class.__qualname__ = cls.__qualname__ + "." + new_class.__name__


    def _open_scipy_netcdf(...):
        import scipy.io

        class flush_only_netcdf_file(scipy.io.netcdf_file):  # NEW class every call
            ...

        _PickleWorkaround.add_cls(flush_only_netcdf_file)    # overwrites attr

The "PickleWorkaround" actively breaks pickling for any Dataset that survives
across a second open.

Environment where reproduced:
    Python 3.13, xarray 2026.04.0, scipy 1.16.3, numpy 2.4.4

Run:
    python pickling_error.py
"""

from __future__ import annotations

import io
import pickle
import sys
from concurrent.futures import ProcessPoolExecutor

import numpy as np
import scipy as sp
import xarray as xr

print(f"{np.__version__=}")
print(f"{sp.__version__=}")
print(f"{xr.__version__=}")

def _make_payload() -> bytes:
    """Build a tiny NetCDF3 file in memory using the scipy backend."""
    ds = xr.Dataset(
        {"foo": (("x",), np.arange(4, dtype=np.float64))},
        coords={"x": np.arange(4)},
    )
    buf = io.BytesIO()
    ds.to_netcdf(buf, engine="scipy")
    return buf.getvalue()


def repro_two_opens() -> None:
    """Open two scipy datasets from file-like objects in the same process.

    The second open overwrites _PickleWorkaround.flush_only_netcdf_file, so
    pickling the FIRST dataset fails its class-identity check.
    """
    payload = _make_payload()

    ds1 = xr.open_dataset(io.BytesIO(payload), engine="scipy")
    ds2 = xr.open_dataset(io.BytesIO(payload), engine="scipy")  # noqa: F841

    pickle.dumps(ds1)


def _worker_two_opens(payload: bytes) -> xr.Dataset:
    """Same trigger inside a ProcessPool worker — matches the original
    failure site (`_sendback_result` pickling the worker's return value)."""
    ds1 = xr.open_dataset(io.BytesIO(payload), engine="scipy")
    _ds2 = xr.open_dataset(io.BytesIO(payload), engine="scipy")
    return ds1


def repro_subprocess() -> None:
    payload = _make_payload()
    with ProcessPoolExecutor(max_workers=1) as pool:
        pool.submit(_worker_two_opens, payload).result()


def main() -> int:
    print(f"xarray={xr.__version__}, python={sys.version.split()[0]}")

    for label, fn in [
        ("two opens, pickle.dumps the first", repro_two_opens),
        ("ProcessPoolExecutor, two opens, return the first", repro_subprocess),
    ]:
        print(f"\n--- {label} ---")
        try:
            fn()
        except Exception as e:
            print(f"FAIL: {type(e).__name__}: {e}")
        else:
            print("OK (no error)")

    return 0


if __name__ == "__main__":
    sys.exit(main())

Steps to reproduce

execute code from above

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Original error:

"""                                                                                                                                                                                                               
  Traceback (most recent call last):                                                                                                                                                                                
    File "/usr/lib/python3.13/concurrent/futures/process.py", line 210, in _sendback_result                                                                                                                         
      result_queue.put(_ResultItem(work_id, result=result,                                                                                                                                                          
      ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                          
                                   exception=exception, exit_pid=exit_pid))                                                                                                                                         
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                         
    File "/usr/lib/python3.13/multiprocessing/queues.py", line 391, in put                                                                                                                                          
      obj = _ForkingPickler.dumps(obj)                                                                                                                                                                              
    File "/usr/lib/python3.13/multiprocessing/reduction.py", line 51, in dumps                                                                                                                                      
      cls(buf, protocol).dump(obj)                                                                                                                                                                                  
      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^                                                                                                                                                                                  
  _pickle.PicklingError: Can't pickle <class 'xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file'>: it's not the same object as xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file        
  """

Reproduction example:

pickling_error.py 
np.__version__='2.4.4'
sp.__version__='1.16.3'
xr.__version__='2026.4.0'
xarray=2026.4.0, python=3.13.4

--- two opens, pickle.dumps the first ---
FAIL: PicklingError: Can't pickle <class 'xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file'>: it's not the same object as xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file

--- ProcessPoolExecutor, two opens, return the first ---
FAIL: PicklingError: Can't pickle <class 'xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file'>: it's not the same object as xarray.backends.scipy_._PickleWorkaround.flush_only_netcdf_file

Process finished with exit code 0

Anything else we need to know?

No response

Environment

Details
INSTALLED VERSIONS
------------------
commit: None
python: 3.13.4 (main, Jun  4 2025, 17:37:06) [Clang 20.1.4 ]
python-bits: 64
OS: Linux
OS-release: 6.19.14-200.fc43.x86_64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2026.4.0
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.16.3
netCDF4: 1.7.4
pydap: 3.5.9
h5netcdf: 1.8.1
h5py: None
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: 3.10.8
cartopy: 0.25.0
seaborn: None
numbagg: None
fsspec: 2026.3.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 80.10.2
pip: None
conda: None
pytest: 9.0.3
mypy: 1.20.2
IPython: 9.12.0
sphinx: None
$uname -a
Linux fedora 6.19.14-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 23 17:34:07 UTC 2026 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions