Installable package foundation, Python client, and User-facing docs by turban · Pull Request #59 · dhis2/climate-api

turban · 2026-05-05T13:33:53Z

Why

The Climate API was designed from the start for a single deployment scenario: clone the repo, edit files in place, run. This worked for early development but creates real problems as we move toward production deployments and want users to be able to install the package with pip install climate-api:

Instance configuration (extent, custom datasets) was stored inside the repository, making upgrades destructive.
Built-in dataset templates were found by walking directory paths relative to source files — a technique that breaks when the package is installed into site-packages/ because the project root is no longer accessible.
There was no Python client for discovering and opening datasets without constructing raw URLs.
Documentation assumed you had already cloned the repo and knew the internal structure.

This PR addresses all of these to make the package usable outside a source checkout.

What changed

Instance configuration via `CLIMATE_API_CONFIG` (closes #61)

A new CLIMATE_API_CONFIG environment variable points to a YAML file that lives outside the repository. This separates instance-specific configuration from the package itself, so the package can be upgraded without overwriting local config.

# climate-api.yaml — lives outside the repo, not committed
extent:
  id: rwa
  name: Rwanda
  bbox: [28.8, -2.9, 30.9, -1.0]
  country_code: RWA

datasets_dir: ./my-datasets/   # optional — merged on top of built-ins

The extent is a single block per instance (not a list). The GET /extent endpoint returns it, or 404 if not configured. Dataset templates from datasets_dir are merged with the built-ins — a custom template with the same id overrides the built-in one.

Built-in dataset templates bundled inside the package

Previously, the built-in YAML templates (chirps3.yaml, era5_land.yaml, worldpop.yaml) lived in data/datasets/ at the project root and were located by walking four directory levels up from the source file. This breaks when the package is installed with pip install, because the package ends up in site-packages/ with no path to the original project root.

The YAMLs are now bundled inside the package at src/climate_api/data/datasets/ and loaded via importlib.resources, which resolves the correct location regardless of how the package was installed.

Coordinate normalisation at write time

All Zarr datasets are now written with canonical coordinate names (time, latitude, longitude) regardless of what the upstream source uses (valid_time, lat/lon, x/y). This is enforced in build_dataset_zarr() for both flat and pyramid outputs.

Every downstream consumer — the client, the user guide, the OGC API — can now use ds.latitude, ds.longitude, ds.time without dataset-specific branching.

Python client for dataset discovery and access (closes #60)

A new climate_api.client module makes it possible to discover and open datasets without constructing URLs manually:

from climate_api.client import Client

api = Client("http://127.0.0.1:8000")
datasets = api.catalog()          # list published datasets
ds = api.open(datasets[0]["id"]) # open as xarray.Dataset

Module-level functions (list_datasets, open_dataset) fall back to the CLIMATE_API_BASE_URL environment variable, so scripts work without hardcoding a URL.

`create_app()` factory function

The FastAPI application is now created via a create_app() factory, making it straightforward to embed the API in a larger application:

from climate_api.main import create_app
app = create_app()

CORS credentials flag corrected

allow_credentials was incorrectly set to True alongside allow_origins=["*"]. This combination violates the CORS specification and is rejected by browsers. It is now set to False, which is correct for a public data API that does not use cookies or session tokens.

Dataset template field renamed: `cache_info` → `ingestion`

The cache_info block in dataset template YAMLs is renamed to ingestion. The ingestion.eo_function field is now required for all sync kinds, not just temporal ones.

Documentation

docs/setup_guide.md — step-by-step instance setup from install to first ingestion
docs/user_guide.md — consumer guide: STAC discovery, opening with xarray, subsetting
docs/adding_custom_datasets.md — how to write a custom dataset template and wire it up
examples/stac_discover_and_open.py and examples/zarr_direct_access.py — runnable examples using the client

Migration note

Existing datasets must be deleted and re-ingested. Coordinate normalisation only applies to newly written Zarr stores. Zarr files written before this PR will retain their original source coordinate names.

Rename cache_info: to ingestion: in any custom dataset YAML templates.

Test plan

make run starts the API without errors
uv run examples/stac_discover_and_open.py lists published datasets and prints dataset info
uv run examples/zarr_direct_access.py opens a Zarr store and prints a spatial mean time series
from climate_api.client import Client; print(Client("http://127.0.0.1:8000").catalog()) works in a Python session
A fresh instance configured with only climate-api.yaml serves the correct extent and built-in datasets
datasets_dir with a custom YAML adds that dataset alongside the built-ins
Setup guide is followable end to end for a new country
make test passes

Remove DHIS2 connection string references from setup section, add /extents and /datasets to endpoint table, and expand STAC example to show catalog discovery before opening a dataset with xarray.

uv run uvicorn resolves the uvicorn binary via PATH, which picks up conda's uvicorn when the base environment is active. Using python -m uvicorn forces the venv's interpreter and avoids the module not found error in the reload subprocess.

Add docs/user_guide.md covering STAC-based dataset discovery and xarray access, two runnable example scripts in examples/, and update implementation-status.md to reflect PRs #51, #54, and #55 as merged.

Datasets use x/y dimension names not latitude/longitude. Direct access example now reads open_kwargs from the STAC collection rather than hardcoding consolidated=False, which fails for Zarr v3 stores.

Step-by-step guide covering extent configuration, environment setup, first ingestion, and ERA5-Land DestinE authentication. Links added from README and user_guide.md.

… README example

… x/y

…user guide

…refs in user guide

…rr_direct_access.py

…inE note

…stance - Add 30s timeout to both httpx.get() calls in client.py to prevent indefinite hangs on network issues - Set allow_credentials=False in CORSMiddleware; combining allow_origins=["*"] with allow_credentials=True is a CORS spec violation and a security footgun - Use isinstance(x, (str, Path)) instead of str | Path union syntax for broader clarity (tuple form is unambiguous across all Python versions)

…x plural in docs - Validate href in each STAC child link before slicing the id from it - Check that assets is a dict before calling .get("zarr") to avoid AttributeError on malformed STAC responses - Fix "Confirm configured extents" heading to singular in managed data guide

Previously, built-in dataset YAMLs were located by walking four directory levels up from datasets.py and appending data/datasets/. This works in a source checkout or editable install but fails silently in a wheel install: the package lands in site-packages/ and the project-root data/ directory is never included in the wheel, causing list_datasets() to crash with "Path is not a directory". Move the YAMLs into the package at src/climate_api/data/datasets/ and load them via importlib.resources.files(). importlib.resources is package-aware and resolves correctly whether the package is an unpacked directory or a zip inside a wheel. User-provided datasets_dir (from CLIMATE_API_CONFIG) continues to use regular Path objects via _load_from_dir() — that path is always on disk.

…ts, safer conftest teardown - Raise ValueError (not KeyError) when the Zarr asset is missing or not a dict — all other error paths in open_dataset raise ValueError, so callers catch one exception type - Inject id into a copy of the link dict instead of mutating the parsed JSON object in-place - Use os.environ.pop() instead of del in conftest session fixture teardown to avoid KeyError if the env var was already removed by a test's monkeypatch - Replace next() generator in setup guide with an explicit list so an empty catalog gives an IndexError with clear context rather than StopIteration

…ative path Walking __file__ four levels up to find data/downloads/ fails when the package is installed with pip because __file__ lands in site-packages/ and the project root is not accessible. The directory may also be non-writable. Default to $XDG_DATA_HOME/climate-api/downloads (~/.local/share/climate-api/downloads if XDG_DATA_HOME is unset), which is always user-writable. The existing CACHE_OVERRIDE env var continues to work and takes precedence, keeping Docker and dev deployments unchanged.

abyot

Review Summary

This PR is well-structured and the direction is good. The package/installability work, config model, client, docs, and supporting tests are all substantial improvements.

The main issue is that the installable-package transition is incomplete. Built-in dataset templates and the download cache were decoupled from the repo tree, but three other runtime paths still assume repo-relative writable/readable files. That breaks the new pip/wheel install story and should be fixed before merge.

Findings

src/climate_api/ingestions/services.py:50-52

The artifact store still resolves to a repo-relative path:

DATA_DIR = Path(__file__).resolve().parent.parent.parent.parent / "data"
ARTIFACTS_DIR = DATA_DIR / "artifacts"
ARTIFACTS_INDEX_PATH = ARTIFACTS_DIR / "records.json"

On a wheel install this will usually land inside site-packages and be non-writable. ensure_store() will fail on first ingestion or sync. This is a must-fix because the CLI can start but primary operations fail immediately.

Suggested fix:
Use the same XDG-style resolution pattern already applied to DOWNLOAD_DIR.

src/climate_api/publications/services.py:19-24 and src/climate_api/startup.py:16-19

pygeoapi still depends on repo-relative paths in two places.

There are two separate problems:

Writable output:
data/pygeoapi/pygeoapi-config.yml and pygeoapi-openapi.yml are still written into a repo/package-relative directory, which will fail under a wheel install.
Read-only input:
config/pygeoapi/base.yml is still read from a repo-relative path and is not under src/. With the current build setup, it is not guaranteed to be present in the installed wheel. _load_base_config() can fail with FileNotFoundError even if the writable output path is fixed.

Suggested fix:

move writable pygeoapi output to an XDG-writable location
move base.yml into src/climate_api/ and load it via importlib.resources

.env.example:1-5 and src/climate_api/config.py:21-25

The config bootstrap story is brittle for installed CLI usage.

The example config sets:

CLIMATE_API_CONFIG=./climate-api.yaml

get_config_path() resolves this relative to the caller’s current working directory. climate-api.yaml is a top-level repo file, not packaged runtime data. A user who installs the package and runs climate-api outside the repo root will get:

FileNotFoundError: CLIMATE_API_CONFIG not found: /current/cwd/climate-api.yaml

This breaks the intended installed-CLI workflow.

Suggested fix:

update .env.example and docs to make the path semantics explicit
preferably support a more durable bootstrap path, such as an XDG config location or a clearer example-based workflow

climate-api.yaml

Shipping a live default extent is risky.

The committed default extent is Sierra Leone. If a deployer forgets to replace it, the instance runs against the wrong spatial scope silently. This is operationally risky.

Suggested fix:
Rename to climate-api.yaml.example and ignore the live file, mirroring .env.example.

src/climate_api/extents/services.py

get_extent_or_404 may now be dead code.

The GET /extents/{extent_id} route was removed and the instance model is now single-extent. If this helper is no longer used in the ingestion path, remove it. If it is still used indirectly, add a focused test to justify keeping it.

src/climate_api/data_registry/services/datasets.py

cache_info -> ingestion is a breaking change with no migration assist.

The validator now requires ingestion.eo_function, so older custom templates using cache_info fail immediately. The breaking change itself is acceptable, but there is no migration aid.

Suggested fix:
Add a startup-time warning when custom templates contain cache_info, to make the upgrade failure easier to diagnose.

src/climate_api/data_registry/services/datasets.py

The dataset registry validation message could be more precise.

The current error message conflates:

missing ingestion block
ingestion block present but missing eo_function

This is not a correctness bug, but splitting the messages would improve operator debugging for malformed custom templates.

src/climate_api/client.py

The new client implementation has a few small design debts.

Not blockers, but worth noting:

list_datasets() derives id from href using string splitting; fragile if link shapes change
each call creates a fresh httpx request rather than reusing a client/session
the 30s timeout is hardcoded

src/climate_api/data_manager/services/downloader.py

The downloader coordinate rename block is correct but subtle.

The coordinate normalization is correct. The reassignment to longitude / latitude immediately after rename is slightly non-obvious and would benefit from a short explanatory comment.

Test Coverage

Overall coverage is strong and the new tests are useful.

Notable gaps:

no tests for artifact-store path resolution / XDG writable behavior
no tests for pygeoapi base-config packaging/runtime path
no tests for CLI bootstrap with CLIMATE_API_CONFIG outside repo root
no focused test for whether get_extent_or_404 remains live code
if validation messages are split, add a test for missing ingestion block vs missing ingestion.eo_function

…safe Fixes four issues that would break a pip-installed deployment: - ingestions/services.py: ARTIFACTS_DIR now resolves to XDG_DATA_HOME/climate-api/artifacts (or CACHE_OVERRIDE/artifacts) instead of a package-relative path. - publications/services.py + startup.py: PYGEOAPI_DIR now resolves to XDG_DATA_HOME/climate-api/pygeoapi. startup.py imports the constants from publications.services rather than computing its own path. - publications/services.py: _load_base_config() now reads base.yml via importlib.resources rather than a __file__-relative path. base.yml is moved into src/climate_api/data/pygeoapi/ so it is bundled inside the wheel. - climate-api.yaml renamed to climate-api.yaml.example and added to .gitignore, mirroring the .env.example pattern. Deployers copy it before editing so their live extent config never lands in version control. Also renames ingestion.eo_function to ingestion.function throughout (dataset YAMLs, downloader, data registry validation, docs, tests), adds a note to downloader.py explaining the coordinate rename invariant, and documents that CLIMATE_API_CONFIG must be an absolute path when running the installed CLI from a directory other than the repo root. Tests added: XDG path resolution for DOWNLOAD_DIR, ARTIFACTS_DIR, and PYGEOAPI_DIR; base.yml loadable from package; datasets_dir resolved relative to the config file location (covers pip install outside the repo).

turban · 2026-05-07T08:28:51Z

All findings from @abyot's review (4241378691) have been addressed. Here is the complete status:

#	Finding	Resolution
1	`ARTIFACTS_DIR` resolves to repo-relative path, breaks pip install	Fixed — XDG resolution in `ingestions/services.py` (same pattern as `DOWNLOAD_DIR`)
2	`pygeoapi` writable output paths + `base.yml` not packaged	Fixed — `PYGEOAPI_DIR` moved to XDG; `base.yml` bundled inside the package and loaded via `importlib.resources`
3	`.env.example` path semantics brittle for installed CLI	Fixed — updated `.env.example` and `docs/setup_guide.md` to clarify that an absolute path is required when running the `climate-api` CLI from outside the repo root
4	`climate-api.yaml` ships a live default extent	Fixed — renamed to `climate-api.yaml.example`, `climate-api.yaml` added to `.gitignore`, mirroring the `.env.example` pattern
5	`get_extent_or_404` may be dead code	Not an issue — still used in `ingestions/routes.py:34`
6	No migration aid for `cache_info → ingestion` rename	Deferred — noted as a known gap; out of scope for this PR
7	Validation message conflates missing ingestion block vs missing function	Fixed — separate error messages; tests added for both cases
8	Client: `id` from string splitting, fresh httpx per call, hardcoded timeout	Fixed — `Client` now holds a persistent `httpx.Client` (connection reuse), accepts a configurable `timeout` parameter, and extracts `id` via `urlparse` instead of raw string split. `Client` also implements `__enter__`/`__exit__` as a context manager.
9	Coordinate rename block in `downloader.py` lacks a comment	Fixed — comment added explaining the invariant and why downstream readers depend on it

Test coverage gaps (from review):

Artifact store XDG path resolution — added
pygeoapi base-config packaging — added
CLIMATE_API_CONFIG outside repo root — added (datasets_dir resolved relative to config file)
Split validation messages — added (both missing ingestion block and missing function cases)
_id_from_href — added (query string, fragment, trailing slash cases)
Client context manager and configurable timeout — added

docs: update README for current API state

5e95250

Remove DHIS2 connection string references from setup section, add /extents and /datasets to endpoint table, and expand STAC example to show catalog discovery before opening a dataset with xarray.

turban marked this pull request as draft May 5, 2026 13:34

turban added 5 commits May 5, 2026 15:44

docs: add user guide and usage examples

3c2f40d

Add docs/user_guide.md covering STAC-based dataset discovery and xarray access, two runnable example scripts in examples/, and update implementation-status.md to reflect PRs #51, #54, and #55 as merged.

fix: correct coordinate names and consolidated flag in examples

a8411cb

Datasets use x/y dimension names not latitude/longitude. Direct access example now reads open_kwargs from the STAC collection rather than hardcoding consolidated=False, which fails for Zarr v3 stores.

docs: add setup guide with Rwanda example

0b0bb22

Step-by-step guide covering extent configuration, environment setup, first ingestion, and ERA5-Land DestinE authentication. Links added from README and user_guide.md.

fix: add missing docstrings and fix formatting in examples

f32eb82

turban requested a review from Copilot May 5, 2026 14:18

Copilot started reviewing on behalf of turban May 5, 2026 14:19 View session

This comment was marked as duplicate.

Sign in to view

turban added 10 commits May 5, 2026 21:57

docs: fix Python version requirement in setup guide

fb208ea

docs: use python -m uvicorn in pip and conda setup instructions

f97ec86

docs: open first catalog child instead of hardcoded SLE collection in…

c0eb232

… README example

fix: detect valid_time vs time dimension in examples

57b6196

fix: handle lon/lat coordinate names alongside longitude/latitude and…

0805636

… x/y

docs: document coordinate name variants and use dynamic selection in …

8e1eb1f

…user guide

docs: fix WorldPop variable name from pop to pop_total

755214d

docs: correct ERA5-Land lag to 120 hours per lag_hours config

19f5812

docs: add make as a prerequisite in setup guide

48952f5

fix: correct Freetown coordinate label from E to W

6427e90

turban requested a review from Copilot May 5, 2026 20:09

Copilot started reviewing on behalf of turban May 5, 2026 20:09 View session

This comment was marked as resolved.

Sign in to view

turban added 3 commits May 5, 2026 22:17

docs: replace hardcoded SLE collection URLs with catalog-discovered h…

c8952e6

…refs in user guide

fix: discover dataset from catalog instead of hardcoding SLE id in za…

0cb9c42

…rr_direct_access.py

chore: update .env.example to remove DHIS2 and CDS API vars, add Dest…

5da532a

…inE note

turban requested a review from Copilot May 5, 2026 20:21

Copilot started reviewing on behalf of turban May 5, 2026 20:21 View session

This comment was marked as resolved.

Sign in to view

chore: document all env vars in .env.example, grouped by purpose

a7543c8

Copilot started reviewing on behalf of turban May 6, 2026 10:21 View session

This comment was marked as resolved.

Sign in to view

turban requested a review from Copilot May 6, 2026 10:40

Copilot started reviewing on behalf of turban May 6, 2026 10:41 View session

This comment was marked as resolved.

Sign in to view

turban requested a review from Copilot May 6, 2026 10:59

Copilot started reviewing on behalf of turban May 6, 2026 11:00 View session

This comment was marked as resolved.

Sign in to view

turban requested a review from Copilot May 6, 2026 11:15

Copilot started reviewing on behalf of turban May 6, 2026 11:16 View session

This comment was marked as resolved.

Sign in to view

turban added 2 commits May 6, 2026 14:26

turban requested a review from Copilot May 6, 2026 12:43

Copilot started reviewing on behalf of turban May 6, 2026 12:43 View session

turban requested a review from abyot May 6, 2026 12:51

turban marked this pull request as ready for review May 6, 2026 12:51

This comment was marked as resolved.

Sign in to view

turban mentioned this pull request May 6, 2026

Implement temporal resampling for derived managed datasets [CLIM-679] #63

Open

abyot reviewed May 7, 2026

View reviewed changes

turban requested a review from abyot May 7, 2026 08:12

abyot approved these changes May 7, 2026

View reviewed changes

turban merged commit 8011eb6 into main May 7, 2026
1 check passed

This was referenced May 7, 2026

PyPI release: make climate-api installable via pip install climate-api #62

Open

fix: update Dockerfile for post-PR#59 package layout #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installable package foundation, Python client, and User-facing docs#59

Installable package foundation, Python client, and User-facing docs#59
turban merged 94 commits intomainfrom
CLIM-683

turban commented May 5, 2026 •

edited

Loading

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

abyot left a comment

Uh oh!

turban commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

turban commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Instance configuration via CLIMATE_API_CONFIG (closes #61)

Built-in dataset templates bundled inside the package

Coordinate normalisation at write time

Python client for dataset discovery and access (closes #60)

create_app() factory function

CORS credentials flag corrected

Dataset template field renamed: cache_info → ingestion

Documentation

Migration note

Test plan

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

abyot left a comment

Choose a reason for hiding this comment

Review Summary

Findings

Test Coverage

Uh oh!

turban commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

turban commented May 5, 2026 •

edited

Loading

Instance configuration via `CLIMATE_API_CONFIG` (closes #61)

`create_app()` factory function

Dataset template field renamed: `cache_info` → `ingestion`