Skip to content

20x faster 2D plots#337

Merged
dschwoerer merged 9 commits intomasterfrom
polygonplot-upgrade
Mar 10, 2026
Merged

20x faster 2D plots#337
dschwoerer merged 9 commits intomasterfrom
polygonplot-upgrade

Conversation

@mikekryjak
Copy link
Copy Markdown
Collaborator

@mikekryjak mikekryjak commented Mar 6, 2026

plot2D_polygon is significantly faster than pcolormesh, but it's still so slow compared to what I can do with SOLPS data that it's really frustrating in my workflow. It still takes ~6s per plot, compared to <<1s for SOLPS, so I can be waiting whole minutes for a grid of plots!

Since I can't see being able to commit the time for the kind of necessary deep dive into the complexities of xBOUT to fix this, I decided to try out the Claude agentic workflow. I gave it a Python script to test how long the plot takes. I told it to make sure to run xBOUT tests periodically to make sure nothing is broken. I then asked it to find performance improvements and modify the code as needed.

Afterwards, I ran the full test suite myself and went through the changes to understand them and put it into separate commits. As far as I can see, these are reasonable changes, but xBOUT is very complicated, so I would welcome second/third pairs of eyes.

Here is what it found:

  • _from_region was making unnecessary deep copies of ds_or_da. This apparently resulted in ~7,000,000 deep copies needed per 2D plot due to all the regions stored in .attrs. This is not needed as the dataset/dataarray are being used read-only, and the new region has attributes explicitly copied by hand a few lines down from the deep copy! After the changes, we still do one deep copy but only for the region that's actually generated.
  • plot_separatrices was using xr.align calls, which compare all existing coordinate arrays for equality, including things like cell corner coordinates. This apparently required 101 dask compute calls for the plot. Instead, we do the operation on numpy arrays, saving tons of time.
  • Claude found that the above can break in edge cases, so we fall back to xr.align if this fails. To speed even this up, the corner coordinates are stripped from the dataset before the operation since these are 8 coordinate arrays which aren't needed. This is a bit ironic, as plotting the separatrix would ideally be done with corner coordinates... but these aren't present if it's not a Hypnotoad grid.

These changes reduced my plot time from ~6s to ~0.3s. I think the obvious next step would be to remove the cell corners from the coordinates completely. I don't see any need for them to be there. We should also review if some of the other coordinates can be dropped.

This PR contains #334 so let's get that merged first.

This had a huge performance cost because of the recursive deep copies of the regions in .attrs.... which were unnecessarily copied once again manually a few lines later!! Instead, deep copy only the single required region, and copy the .attrs with the regions separately.
Significantly improves performance by reducing overhead
Big performance boost: xr.align compares all coordinates for equality. Fall back on xr.align for edge cases like limiter configuration.
We should really take out corner coords from the coordinates.
@mikekryjak mikekryjak added the enhancement New feature or request label Mar 6, 2026
@johnomotani
Copy link
Copy Markdown
Collaborator

Sorry, I'm unlikely to have time to review this.

This is not needed as the dataset/dataarray are being used read-only

You say you do a copy anyway, so that should be fine, but note that some to/from regions operations (e.g. in to_field_aligned() and from_field_aligned() do need to modify data in the output.

@Vandoo
Copy link
Copy Markdown
Collaborator

Vandoo commented Mar 6, 2026

Interesting. I would also like to know how much cost could be reduced if we need to perform some manipulations on the dataset, e.g., a simple toroidal average, before plotting?

@mikekryjak
Copy link
Copy Markdown
Collaborator Author

@johnomotani no worries on not being able to review. Yes, the output region does get a deep copy at the end, so I think it's fine. The tests also all pass.

Interesting. I would also like to know how much cost could be reduced if we need to perform some manipulations on the dataset, e.g., a simple toroidal average, before plotting?

@Vandoo I don't know I'm afraid, I've never touched a 3D dataset! If you give it a try, it would be interesting to know.

@mikekryjak mikekryjak requested review from dschwoerer March 6, 2026 11:09
@dschwoerer dschwoerer merged commit bcccd23 into master Mar 10, 2026
1 check passed
@dschwoerer dschwoerer deleted the polygonplot-upgrade branch March 10, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants