Skip to content

FutureCancelledError (lost dependencies) during dask.compute with optimize_graph=True when chaining Dataset.assign #11329

@maneesh29s

Description

@maneesh29s

What happened?

It appears that the High-Level Graph (HLG) optimization fails to correctly resolve dependencies when a variable (like new_weight in the example) is used both as an input for a subsequent calculation and as a replacement variable in an intermediate Dataset state.

Raised exception for the failure scenario (when run using distributed client)

---------------------------------------------------------------------------
FutureCancelledError                      Traceback (most recent call last)
Cell In[67], line 26
     22 output_gaintable = output_gaintable.assign(gain=new_gain)
     24 # trigger computation
---> 26 dask.compute(output_gaintable, optimize_graph=True) # Fail

File /lib/python3.11/site-packages/dask/base.py:685, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    682     expr = expr.optimize()
    683     keys = list(flatten(expr.__dask_keys__()))
--> 685     results = schedule(expr, keys, **kwargs)
    687 return repack(results)

File /lib/python3.11/site-packages/distributed/client.py:2431, in Client._gather(self, futures, errors, direct, local_worker)
   2429     exception = st.exception
   2430     traceback = st.traceback
-> 2431     raise exception.with_traceback(traceback)
   2432 if errors == "skip":
   2433     bad_keys.add(key)

FutureCancelledError: finalize-hlgfinalizecompute-0b5dadc6527147a1bffc7006ce7c9329 cancelled for reason: lost dependencies.

What did you expect to happen?

The computations should have completed successfully, even with optimize_graph=True

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

import xarray as xr
xr.show_versions()
# your reproducer code ...

import dask
import dask.array as da

rng = da.random.default_rng(seed=1234)

# Setup small dask-backed dataset
gain = rng.random((100,), chunks=10)
weight = rng.random((100,), chunks=10)
initialtable = xr.Dataset({
    "gain": (("x"), gain),
    "weight": (("x"), weight),
})
original_chunks = initialtable.chunksizes

# Update weight
new_weight = initialtable.weight * 1.1
output_gaintable = initialtable.assign(weight=new_weight)

# Update gain, filtered based on weight
new_gain = initialtable.gain.where(new_weight > 0.5, 0.0)
output_gaintable = output_gaintable.assign(gain=new_gain)

# trigger computation, which FAILs
dask.compute(output_gaintable, optimize_graph=True)

# Other ways to compute, which PASS
dask.compute(new_weight, new_gain, optimize_graph=True)
dask.compute(output_gaintable, optimize_graph=False)[0].gain
dask.persist(output_gaintable, optimize_graph=True)[0].gain.compute()
output_gaintable.compute(optimize_graph=True).gain

Steps to reproduce

Run above script through uv run

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Traceback (most recent call last):
  File "/home/maneesh/Work/SKAO/ska-sdp-instrumental-calibration/compute_bug.py", line 38, in <module>
    dask.compute(output_gaintable, optimize_graph=True)
  File "/home/maneesh/.cache/uv/environments-v2/compute-bug-884655f05503df7b/lib/python3.11/site-packages/dask/base.py", line 685, in compute
    results = schedule(expr, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/maneesh/.cache/uv/environments-v2/compute-bug-884655f05503df7b/lib/python3.11/site-packages/dask/local.py", line 191, in start_state_from_dask
    raise ValueError(
ValueError: Missing dependency ('mul-e4ad8b7f030eed6eae70b41334e6993e', 6) for dependents {'finalize-hlgfinalizecompute-c00f26a73e664e208c485a28c4ea721b'}

Anything else we need to know?

No response

Environment

Details

INSTALLED VERSIONS

commit: None
python: 3.11.12 (main, Apr 9 2025, 08:55:54) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3

xarray: 2026.4.1.dev6+g757a7d42a
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.17.1
netCDF4: 1.7.4
pydap: 3.5.9
h5netcdf: 1.8.1
h5py: 3.16.0
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2026.3.0
distributed: 2026.3.0
matplotlib: 3.10.9
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.4
fsspec: 2026.4.0
cupy: None
pint: None
sparse: 0.18.0
flox: 0.11.2
numpy_groupies: 0.11.3
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions