Skip to content

STAC catalogue: investigate pygeoapi vs dedicated /stac endpoint #50

@turban

Description

@turban

Background

STAC (SpatioTemporal Asset Catalog) is the natural discovery and direct-access layer for GeoZarr datasets. A STAC collection entry gives clients everything they need to open a dataset directly with xarray — the Zarr URL, open_kwargs, chunk dimensions, and variable metadata — without any server-side materialisation.

dynamical.org is a good reference for the developer experience we are aiming for. Their catalog is browsable at /stac/catalog.json and via the Radiant Earth STAC Browser. Opening a dataset is a single line:

ds = xr.open_zarr("https://data.dynamical.org/noaa/gefs/forecast-35-day/latest.zarr")

We want the Climate API to offer the same simplicity.


Collection-per-dataset pattern

Because our datasets are aligned (same spatial extent, consistent time axis per variable), the right STAC pattern is one Collection per dataset with no individual Items. The Collection asset points directly at the Zarr store root. This avoids the overhead of maintaining thousands of items for what is one logical dataset.

A minimal collection entry:

GET /stac/collections/chirps3_precipitation_daily_sle
{
  "type": "Collection",
  "id": "chirps3_precipitation_daily_sle",
  "stac_extensions": [
    "https://stac-extensions.github.io/datacube/v2.2.0/schema.json"
  ],
  "extent": { ... },
  "assets": {
    "zarr": {
      "href": "https://climate-api.example.org/zarr/chirps3_precipitation_daily_sle",
      "type": "application/vnd+zarr",
      "roles": ["data"],
      "xarray:open_kwargs": {"consolidated": true}
    }
  },
  "cube:dimensions": {
    "x": {"type": "spatial", "axis": "x", "extent": [-13.5, -10.1]},
    "y": {"type": "spatial", "axis": "y", "extent": [6.9, 10.0]},
    "time": {"type": "temporal", "extent": ["1981-01-01", null], "step": "P1D"}
  },
  "cube:variables": {
    "precipitation": {"type": "data", "dimensions": ["time", "y", "x"], "unit": "mm/day"}
  }
}

Note: there is an active community discussion about replacing cube:variables with a bands-style field for better alignment with the broader STAC spec — worth tracking before locking in the schema.


openEO client compatibility

Exposing a proper STAC endpoint gives openEO client compatibility for free — no openEO server needed. Data scientists can use load_stac() locally against the Climate API's STAC endpoint:

conn = openeo.LocalConnection()
cube = conn.load_stac(
    "https://climate.dhis2.org/stac/collections/chirps3_precipitation_daily_sle",
    spatial_extent={"west": -13.5, "east": -10.1, "south": 6.9, "north": 10.0},
    temporal_extent=["2024-01-01", "2024-12-31"],
)

The client reads the Zarr store URL from the STAC asset directly — no Climate API processing involved.


The question: pygeoapi vs dedicated /stac endpoint

Option A — pygeoapi STAC provider

pygeoapi supports STAC via the same xarray backend used for OGC API EDR and Coverages. No additional code.

Pros:

  • No extra code to maintain
  • Consistent with the rest of the pygeoapi surface

Cons:

  • STAC lives at /ogcapi/stac, not /stac — less discoverable
  • pygeoapi's STAC implementation may not support collection-level Zarr assets and the xarray/datacube extensions needed for direct xr.open_zarr() access
  • Harder to control the exact STAC output shape

Option B — Dedicated /stac endpoint

Implement a lightweight STAC endpoint directly in FastAPI alongside the existing routes.

Pros:

  • Clean /stac path — matches community conventions and is browsable in Radiant Earth STAC Browser
  • Full control over STAC output, including xarray and datacube extensions
  • /zarr/{dataset_id} already exists as the natural asset href target
  • xstac can auto-generate STAC metadata from an xarray Dataset — could slot into the ingestion pipeline to publish STAC on ingest

Cons:

  • Additional code to maintain
  • Need to keep STAC entries in sync with published artifacts

Key extensions to support

Extension Purpose
Datacube cube:dimensions, cube:variables — describe variables and dimensions
xarray-assets xarray:open_kwargs, xarray:storage_options — direct open hints
Link Templates Browse multiscale pyramid subgroups (/0, /1, …) without duplicating assets

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions