Conversation
This had a huge performance cost because of the recursive deep copies of the regions in .attrs.... which were unnecessarily copied once again manually a few lines later!! Instead, deep copy only the single required region, and copy the .attrs with the regions separately.
Significantly improves performance by reducing overhead
Big performance boost: xr.align compares all coordinates for equality. Fall back on xr.align for edge cases like limiter configuration.
We should really take out corner coords from the coordinates.
|
Sorry, I'm unlikely to have time to review this.
You say you do a copy anyway, so that should be fine, but note that some to/from regions operations (e.g. in |
|
Interesting. I would also like to know how much cost could be reduced if we need to perform some manipulations on the dataset, e.g., a simple toroidal average, before plotting? |
|
@johnomotani no worries on not being able to review. Yes, the output region does get a deep copy at the end, so I think it's fine. The tests also all pass.
@Vandoo I don't know I'm afraid, I've never touched a 3D dataset! If you give it a try, it would be interesting to know. |
plot2D_polygonis significantly faster thanpcolormesh, but it's still so slow compared to what I can do with SOLPS data that it's really frustrating in my workflow. It still takes ~6s per plot, compared to <<1s for SOLPS, so I can be waiting whole minutes for a grid of plots!Since I can't see being able to commit the time for the kind of necessary deep dive into the complexities of xBOUT to fix this, I decided to try out the Claude agentic workflow. I gave it a Python script to test how long the plot takes. I told it to make sure to run xBOUT tests periodically to make sure nothing is broken. I then asked it to find performance improvements and modify the code as needed.
Afterwards, I ran the full test suite myself and went through the changes to understand them and put it into separate commits. As far as I can see, these are reasonable changes, but xBOUT is very complicated, so I would welcome second/third pairs of eyes.
Here is what it found:
_from_regionwas making unnecessary deep copies ofds_or_da. This apparently resulted in ~7,000,000 deep copies needed per 2D plot due to all the regions stored in.attrs. This is not needed as the dataset/dataarray are being used read-only, and the new region has attributes explicitly copied by hand a few lines down from the deep copy! After the changes, we still do one deep copy but only for the region that's actually generated.plot_separatriceswas usingxr.aligncalls, which compare all existing coordinate arrays for equality, including things like cell corner coordinates. This apparently required 101 dask compute calls for the plot. Instead, we do the operation on numpy arrays, saving tons of time.xr.alignif this fails. To speed even this up, the corner coordinates are stripped from the dataset before the operation since these are 8 coordinate arrays which aren't needed. This is a bit ironic, as plotting the separatrix would ideally be done with corner coordinates... but these aren't present if it's not a Hypnotoad grid.These changes reduced my plot time from ~6s to ~0.3s. I think the obvious next step would be to remove the cell corners from the coordinates completely. I don't see any need for them to be there. We should also review if some of the other coordinates can be dropped.
This PR contains #334 so let's get that merged first.