Skip to content

Conversation

@yuvraajnarula
Copy link

Pull Request

Description

This PR introduces a unified xarray -> NumpySample conversion function to replace the existing modality-specific numpy conversion functions in numpy_sample.

What this pr does

  • Adds a single convert_xarray_dict_to_numpy_sample function that:
  • accepts a dictionary of xarray.DataArray objects (matching the structure already used in the PVNet datasets)
  • converts all supported modalities (generation, nwp, satellite) into a single, structured NumpySample
  • Extracts the core modality conversion logic into internal helpers (_convert_generation, _convert_nwp, _convert_satellite)
  • Keeps the existing convert_*_to_numpy_sample functions as thin wrappers for now

What this PR intentionally does not do

  • Remove or refactor *SampleKey classes
  • Update PVNetDataset or other downstream consumers to the new unified API
  • Change the runtime structure of existing samples
  • Those changes are intentionally deferred until we agree on whether this new approach is the right long-term direction.

Design question for reviewers

@dfulu
The main motivation for this PR is to move away from:

  • many modality-specific numpy conversion functions
  • scattered *SampleKey classes defining implicit sample structure

and towards:

  • a single, explicit conversion entrypoint
  • a structured, self-describing NumpySample layout

Before continuing with:

  • deprecating/removing *SampleKey
  • migrating datasets and collate logic

I’d really appreciate feedback on whether this unified conversion approach feels sustainable and directionally correct for the project.

Fixes #368

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Yes

If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@dfulu
Copy link
Member

dfulu commented Jan 5, 2026

@yuvraajnarula just a heads up that your PR isn't passing linting. Also just a warning that I don't think your PR will pass the pytests since you've changed the structure of the dict. So there's probably quite a lot of changes that will be required.

Also the main point of the issue was to remove the calls to convert_X_to_numpy_sample() inside the the torch PVNetDataset class which isn't addressed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate numpy conversion functions

2 participants