Conversation
7 tasks
Resample parameters (source_dataset_id, period_type, method) are now
passed directly to POST /processes/resample/execution instead of being
declared on a YAML template with sync_kind: derived.
Derived dataset IDs are auto-generated as {source}_{period}_{method}.
The derived sync_kind and processing validation blocks are removed from
the registry, SyncKind enum, and sync engine.
Expose the raw pandas offset alias (e.g. '1D', 'W-MON', 'MS', '10D') directly in the resample request instead of mapping through a fixed set of named period types. This removes _resample_frequency(), _PERIOD_ORDER, and the period hierarchy guard, and unlocks any frequency xarray accepts (bi-weekly, dekadal, seasonal, etc.) without code changes. Coverage timestamps for derived artifacts are stored as ISO date strings via period_type="daily" on the synthetic target dataset dict.
… pattern - Add climate_api/data/processes/resample.yaml as the built-in resample process definition - Add climate_api/data_registry/services/processes.py: list_processes(), get_process(), plugin loading from plugins_dir/processes/ (same pattern as datasets/plugins_dir/datasets/) - Route dispatches to process['execution_function'] via registry lookup — no hardcoded process_id check - Add services.execute_resample() as the generic entry point called by the dispatcher; it handles method/frequency validation and returns a JSON-serializable dict - Custom processes can be added via plugins_dir/processes/*.yaml without touching core code
2 tasks
Introduces a rioxarray-backed reprojection transform that converts source datasets to the instance CRS during ingestion. The transform is a no-op when the source CRS already matches the configured instance CRS, so WGS84 instances incur no overhead. - Add climate_api/transforms/reproject.py with reproject_to_instance_crs - Wire the transform into chirps3, era5_land (both variables), and worldpop dataset YAMLs - Add rioxarray>=0.17 as an explicit dependency - Add tests using a mocked .rio accessor to avoid local PROJ database conflicts
feat: add reproject_to_instance_crs transform to zarr build pipeline
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a pluggable process registry (YAML-backed, plugin-overridable) and a generic processing execution route, introducing temporal resampling as the first built-in process and extending the transforms pipeline with built-in transforms (unit conversion, deaccumulation, reprojection). Also expands time-period handling to support weekly ISO week strings across sync and coverage.
Changes:
- Introduces process registry (+ plugin override support) and
POST /processes/{process_id}/executiongeneric dispatcher. - Implements resampling materialization workflow (derived Zarr artifacts) and adds weekly period parsing/normalization.
- Adds transforms pipeline + built-in transforms, updates dataset YAMLs to use dotted-path transforms, and adds comprehensive tests.
Reviewed changes
Copilot reviewed 30 out of 32 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_transforms_reproject.py | Adds tests for the reprojection transform behavior and rioxarray integration. |
| tests/test_transforms.py | Adds tests for unit conversion, deaccumulation, and the dotted-path transforms pipeline. |
| tests/test_shared_time.py | Extends coverage for weekly period normalization/parsing. |
| tests/test_processing_routes.py | Adds route-level tests for /processes/resample/execution behavior and error cases. |
| tests/test_processing_resample.py | Adds extensive tests for resample materialization, edge-period dropping, reuse/overwrite behavior, and publishing. |
| tests/test_process_registry.py | Tests built-in + plugin process registry loading and override behavior. |
| tests/test_datasets_sync.py | Updates sync tests for weekly support and refines unsupported period-type expectations. |
| tests/test_dataset_registry.py | Updates registry tests to reflect new ingestion function strings and removes some ingestion-validation tests. |
| pyproject.toml | Adds rioxarray dependency for .rio accessor support. |
| climate_api/transforms/unit_conversion.py | Implements built-in unit conversion transform (scale + offset). |
| climate_api/transforms/reproject.py | Implements built-in reprojection transform to instance CRS. |
| climate_api/transforms/deaccumulate.py | Implements built-in ERA5 deaccumulation transform. |
| climate_api/transforms/init.py | Exposes built-in transforms for dotted-path references. |
| climate_api/shared/time.py | Adds weekly period support and weekly handling in numpy datetime conversions. |
| climate_api/publications/services.py | Refactors managed dataset id generation into a reusable function. |
| climate_api/processing/services.py | Adds execution function for resample with validation and response formatting. |
| climate_api/processing/schemas.py | Adds Pydantic request/response schemas and supported methods constant. |
| climate_api/processing/routes.py | Adds generic process execution endpoint dispatched via process registry. |
| climate_api/processing/resample.py | Implements derived resampling materialization, artifact reuse, completeness checks, and Zarr writing. |
| climate_api/processing/init.py | Adds processing package module. |
| climate_api/main.py | Registers processing routes in the FastAPI app. |
| climate_api/ingestions/sync_engine.py | Adds weekly period arithmetic for sync planning. |
| climate_api/ingestions/services.py | Adds helper to store locally materialized Zarr artifacts and supports weekly default end. |
| climate_api/data_registry/services/processes.py | Adds YAML-backed process registry with plugin merging and dotted-path execution loading. |
| climate_api/data_manager/services/downloader.py | Adds transforms execution hook (_run_transforms) into dataset build flow. |
| climate_api/data_accessor/services/accessor.py | Normalizes period-string scalars when computing coverage. |
| climate_api/data/processes/resample.yaml | Registers the built-in resample process and its execution function. |
| climate_api/data/datasets/worldpop.yaml | Adds reprojection transform to dataset definition. |
| climate_api/data/datasets/era5_land.yaml | Switches preprocessing to transforms pipeline and adjusts display range after unit conversion. |
| climate_api/data/datasets/chirps3.yaml | Adds reprojection transform to dataset definition. |
| .gitignore | Ignores derived data directory (data/derived). |
Remove reproject_to_instance_crs from dataset YAML transforms lists and call it automatically in build_dataset_zarr after user-defined transforms. Source CRS defaults to EPSG:4326; datasets with a different source CRS can declare source_crs in their YAML template.
…ss registry - shared/time.py: replace np.vectorize with pd.DatetimeIndex.isocalendar() for weekly period strings - processing/resample.py: derive period_type from frequency alias instead of hardcoding daily - processing/routes.py: catch TypeError from mismatched kwargs and return HTTP 400 - downloader.py: validate transform entries have required 'function' key with clear error message - data_registry/services/processes.py: validate execution_function is a valid dotted path - tests: update weekly/monthly coverage assertions to use correct period string format
… block Replace flat top-level sync_kind, sync_execution, sync_availability fields with a nested sync: block (kind/execution/availability) in dataset YAMLs. Update all code reads and validation error messages to match.
…and metres_to_mm transforms
- stac/services.py: remove stale convert_units fallback (no longer exists in datasets) - sync_engine.py: fix error message to say sync.kind instead of sync_kind - processes.py: validate name field is present in process definitions - processing/services.py: inline _SUPPORTED_RESAMPLE_METHODS, delete unused schemas.py - tests: update error message assertion and add missing-name validation test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR consolidates three areas of new functionality — all following the same plugin pattern:
Extensible transforms pipeline
transforms:list of dotted-path functions applied during zarr build (after download, before writing)climate_api/transforms/; custom transforms can be loaded from any importable package underplugins_dirconvert_units,deaccumulate_era5,reproject_to_instance_crsReprojection transform
reproject_to_instance_crs(rioxarray-backed) reprojects source data to the instance CRS configured inclimate-api.yamlchirps3,era5_land(temperature and precipitation), andworldpopdataset templatesrioxarray>=0.17as an explicit dependencyProcess registry with plugin support
climate_api/data/processes/; custom processes viaplugins_dir/processes/execution_functiondotted path;POST /processes/{id}/executiondispatches generically — no hardcoded process-id checksTemporal resampling (first built-in process)
1D,W-MON,MS, etc.)mean,sum,min,maxPlugin pattern (consistent across datasets, transforms, processes)
climate_api/data/datasets/plugins_dir/datasets/ingestion.functiondotted pathclimate_api/transforms/plugins_dirtransforms:listclimate_api/data/processes/plugins_dir/processes/execution_functiondotted pathKey files changed
climate_api/transforms/__init__.pyclimate_api/transforms/reproject.pyreproject_to_instance_crs— rioxarray reprojectionclimate_api/transforms/unit_conversion.pyconvert_units— unit conversion via metpyclimate_api/transforms/deaccumulate.pydeaccumulate_era5— ERA5 accumulation fixclimate_api/data_registry/services/processes.pyclimate_api/processing/resample.pyclimate_api/processing/routes.pyclimate_api/processing/services.pyexecute_resampleentry pointclimate_api/data/processes/resample.yamlclimate_api/data/datasets/*.yamlpyproject.tomlrioxarray>=0.17Test plan
make lint— clean (ruff, mypy, pyright)pytest— all tests pass (transforms, reproject, process registry, resampling, routes)POST /processes/resample/executionwith valid request returns 200POST /processes/resample/executionwith invalid method/frequency returns 400POST /processes/unknown/executionreturns 404resamplefromdata/processes/resample.yamlplugins_dir/processes/merges with built-insreproject_to_instance_crsno-ops when source CRS matches instance CRSreproject_to_instance_crscallsrio.reprojectwith the correct target CRSSupersedes #63. Incorporates #86 (transforms pipeline) and #93 (reprojection transform).