What happened?
It appears that the High-Level Graph (HLG) optimization fails to correctly resolve dependencies when a variable (like new_weight in the example) is used both as an input for a subsequent calculation and as a replacement variable in an intermediate Dataset state.
Raised exception for the failure scenario (when run using distributed client)
---------------------------------------------------------------------------
FutureCancelledError Traceback (most recent call last)
Cell In[67], line 26
22 output_gaintable = output_gaintable.assign(gain=new_gain)
24 # trigger computation
---> 26 dask.compute(output_gaintable, optimize_graph=True) # Fail
File /lib/python3.11/site-packages/dask/base.py:685, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
682 expr = expr.optimize()
683 keys = list(flatten(expr.__dask_keys__()))
--> 685 results = schedule(expr, keys, **kwargs)
687 return repack(results)
File /lib/python3.11/site-packages/distributed/client.py:2431, in Client._gather(self, futures, errors, direct, local_worker)
2429 exception = st.exception
2430 traceback = st.traceback
-> 2431 raise exception.with_traceback(traceback)
2432 if errors == "skip":
2433 bad_keys.add(key)
FutureCancelledError: finalize-hlgfinalizecompute-0b5dadc6527147a1bffc7006ce7c9329 cancelled for reason: lost dependencies.
What did you expect to happen?
The computations should have completed successfully, even with optimize_graph=True
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
import dask
import dask.array as da
rng = da.random.default_rng(seed=1234)
# Setup small dask-backed dataset
gain = rng.random((100,), chunks=10)
weight = rng.random((100,), chunks=10)
initialtable = xr.Dataset({
"gain": (("x"), gain),
"weight": (("x"), weight),
})
original_chunks = initialtable.chunksizes
# Update weight
new_weight = initialtable.weight * 1.1
output_gaintable = initialtable.assign(weight=new_weight)
# Update gain, filtered based on weight
new_gain = initialtable.gain.where(new_weight > 0.5, 0.0)
output_gaintable = output_gaintable.assign(gain=new_gain)
# trigger computation, which FAILs
dask.compute(output_gaintable, optimize_graph=True)
# Other ways to compute, which PASS
dask.compute(new_weight, new_gain, optimize_graph=True)
dask.compute(output_gaintable, optimize_graph=False)[0].gain
dask.persist(output_gaintable, optimize_graph=True)[0].gain.compute()
output_gaintable.compute(optimize_graph=True).gain
Steps to reproduce
Run above script through uv run
MVCE confirmation
Relevant log output
Traceback (most recent call last):
File "/home/maneesh/Work/SKAO/ska-sdp-instrumental-calibration/compute_bug.py", line 38, in <module>
dask.compute(output_gaintable, optimize_graph=True)
File "/home/maneesh/.cache/uv/environments-v2/compute-bug-884655f05503df7b/lib/python3.11/site-packages/dask/base.py", line 685, in compute
results = schedule(expr, keys, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/maneesh/.cache/uv/environments-v2/compute-bug-884655f05503df7b/lib/python3.11/site-packages/dask/local.py", line 191, in start_state_from_dask
raise ValueError(
ValueError: Missing dependency ('mul-e4ad8b7f030eed6eae70b41334e6993e', 6) for dependents {'finalize-hlgfinalizecompute-c00f26a73e664e208c485a28c4ea721b'}
Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
commit: None
python: 3.11.12 (main, Apr 9 2025, 08:55:54) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2026.4.1.dev6+g757a7d42a
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.17.1
netCDF4: 1.7.4
pydap: 3.5.9
h5netcdf: 1.8.1
h5py: 3.16.0
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2026.3.0
distributed: 2026.3.0
matplotlib: 3.10.9
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.4
fsspec: 2026.4.0
cupy: None
pint: None
sparse: 0.18.0
flox: 0.11.2
numpy_groupies: 0.11.3
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
What happened?
It appears that the High-Level Graph (HLG) optimization fails to correctly resolve dependencies when a variable (like
new_weightin the example) is used both as an input for a subsequent calculation and as a replacement variable in an intermediate Dataset state.Raised exception for the failure scenario (when run using distributed client)
What did you expect to happen?
The computations should have completed successfully, even with optimize_graph=True
Minimal Complete Verifiable Example
Steps to reproduce
Run above script through
uv runMVCE confirmation
Relevant log output
Anything else we need to know?
No response
Environment
Details
INSTALLED VERSIONS
commit: None
python: 3.11.12 (main, Apr 9 2025, 08:55:54) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-65-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.6
libnetcdf: 4.9.3
xarray: 2026.4.1.dev6+g757a7d42a
pandas: 3.0.2
numpy: 2.4.4
scipy: 1.17.1
netCDF4: 1.7.4
pydap: 3.5.9
h5netcdf: 1.8.1
h5py: 3.16.0
zarr: 3.1.6
cftime: 1.6.5
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2026.3.0
distributed: 2026.3.0
matplotlib: 3.10.9
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.4
fsspec: 2026.4.0
cupy: None
pint: None
sparse: 0.18.0
flox: 0.11.2
numpy_groupies: 0.11.3
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None