Add option to change file loading engine#334
Conversation
d720287 to
b0681fc
Compare
|
The tests are failing on a accessing a read only value type problem, but I changed nothing to do with that, and other PRs have their tests pass! This feels like it's linked to #331, where Xarray suddenly decided it's not going to allow a particular type of coordinate assignment. This is extremely confusing because it looks like this type of assignment hasn't been allowed for quite some time.... and because Xarray seemingly complains about it intermittently?? See the stack trace from the error as of writing this comment: |
522daa3 to
c7babf2
Compare
|
From Peter: better to have the engine choice as a string. The defaults should prioritise the user not having issues - if this breaks CI then the CI tests should be modified to override the engine. To simplify things, we could make the engine flag to be a global variable. |
|
An alternative might to change the dependency on |
|
@ZedThree pyfive doesn't work. We are missing 8 bytes!! Gives: Click to expand console dump``` (xbout-h5netcdf-pyfive) mike@P0728-Ubuntu:~/work/notebooks/xbout-dev$ python test-load.py Traceback (most recent call last): File "/home/mike/pyenvs/xbout-h5netcdf-pyfive/lib/python3.12/site-packages/xarray/backends/file_manager.py", line 219, in _acquire_with_cache_info file = self._cache[self._key] ~~~~~~~~~~~^^^^^^^^^^^ File "/home/mike/pyenvs/xbout-h5netcdf-pyfive/lib/python3.12/site-packages/xarray/backends/lru_cache.py", line 56, in __getitem__ value = self._cache[key] ~~~~~~~~~~~^^^^^ KeyError: [, ('/home/mike/work/cases/devtests/neutlim-base-init_only/BOUT.dmp.0.nc',), 'r', (('decode_vlen_strings', True), ('driver', None), ('format', 'NETCDF4'), ('invalid_netcdf', None), ('phony_dims', 'access')), 'ee29f736-e928-42f2-a109-3b24c80d907d']During handling of the above exception, another exception occurred: Traceback (most recent call last): According to my LLM of choice, this is a bug in pyfive, because it doesn't fully implement HDF5 global heap reading for all edge cases, which makes it compute the wrong buffer size in this case. It even found the bug and a fix. This actually fixes the problem, believe it or not: However, there are further issues because Xarray's h5netcdf backend hardcodes h5py in several places.... so I'm going to give up and continue on making the engine user settable |
5444cc8 to
1886ed8
Compare
|
I changed it so that you can override the engine with a string instead of forcing netCDF. I also added a warning message with guidance on the fixes. |
Co-authored-by: David Bold <dschwoerer@users.noreply.github.com>

This is a workaround for bugs with the h5netcdf binaries: #329
If you don't install h5netcdf/h5py from your distribution package manager or Spack, and instead install it e.g. from pip, then you are likely going to have a HDF5 error upon loading any dataset.
There are three possible fixes. This PR implements fix 3:
Install both from source against a single shared HDF5:
sudo apt install libhdf5-dev libnetcdf-dev
pip install --no-binary netCDF4,h5py netCDF4 h5py
Install both from your distribution package manager,
e.g. apt, conda or Spack
Switch to the netcdf4 engine:
import xbout
xbout.load.file_engine = 'netcdf4'
There is also a helpful error message to tell the user about these fixes if they encounter the error when loading a results dataset. This is not done for grid datasets to keep it simple.