Skip to content

Add NISAR azimuth-block splitting for parallel full-frame processing#717

Open
mgovorcin wants to merge 14 commits into
isce-framework:mainfrom
mgovorcin:azimuth-block
Open

Add NISAR azimuth-block splitting for parallel full-frame processing#717
mgovorcin wants to merge 14 commits into
isce-framework:mainfrom
mgovorcin:azimuth-block

Conversation

@mgovorcin
Copy link
Copy Markdown

@mgovorcin mgovorcin commented May 22, 2026

Adds a block-as-burst pattern so NISAR full-frame GSLCs can be processed in parallel using the existing burst pool.

For burst inputs nothing changes. For NISAR (non-burst) inputs, when input_options.azimuth_blocks > 1 the frame is split into N azimuth blocks at the dispatch layer, each block is processed as a synthetic burst through ProcessPoolExecutor, and per-block outputs are stitched by the existing stitching_bursts.run step. No changes to wrapped_phase.run or the stitcher are required.

Change Summary

  • workflows/_block_split.py (new)split_frame_into_blocks() divides the pixel grid into N azimuth blocks with a halo; _min_halo_rows() auto-computes the minimum safe overlap from half-window, similarity search radius, and stride settings.
  • workflows/config/_common.pyInputOptions gains azimuth_blocks: int = 1 and halo_rows: Optional[int] = None.
  • workflows/displacement.py — extracts dispatch into a testable _prepare_grouped_inputs() function; sets HDF5_USE_FILE_LOCKING=FALSE before spawning workers when multiple processes share the same HDF5 files (prevents hangs on NFS/Lustre).
  • workflows/_utils.py_create_burst_cfg accepts an optional bounds parameter to set output_options.bounds per block.

Typical config for a 4-block run:

input_options:
  azimuth_blocks: 4
worker_settings:
  n_parallel_bursts: 4
  threads_per_worker: 2

CLAUDE NOTES

└── Process #k (one per active "burst" / NISAR block)
    │
    ├── ThreadPoolExecutor (size = workers_per_burst = n_parallel_bursts // active_bursts)
    │   └── threads doing GDAL/HDF5 read-ahead inside EagerLoader
    │
    └── Main thread: per-output-block phase-linking
        ├── JAX/XLA host pool (default = os.cpu_count())
        │   └── parallelizes vmap'd ops across pixels
        └── BLAS via OMP_NUM_THREADS = threads_per_worker
            └── multi-threads eigh, solve, matmul inside JAX kernels

Key insight: n_parallel_bursts is dual-purpose. It's the outer pool size AND a budget that gets re-allocated to inner I/O workers when there are fewer "bursts" than pool slots.

n_parallel_bursts = 8, with N bursts available:
N = 1 (FULL) → 1 active process × 8 inner I/O threads
N = 4 → 4 active processes × 2 inner I/O threads each
N = 8 (BLOCKED) → 8 active processes × 1 inner I/O thread each

FULL vs BLOCKED

Aspect FULL (azimuth_blocks=1) BLOCKED (azimuth_blocks=N)
Processes 1 N (8 typical)
GIL Locked to 1 interpreter Escaped — N interpreters
JAX kernels Parallelize across pixels via vmap (within 1 block) Same per block, run N in parallel across blocks
Python control flow Strictly serial sweep through frame N serial sweeps in parallel
HDF5 reads 1 stream N concurrent streams
JIT compile Once, ~10–30 s N times in parallel, ~30 s wall
Working set Full frame in 1 process ~1/N of frame in each process

Each worker still uses ~1 core; you just have N of them running simultaneously on different rows.


Per-Knob Impact

n_parallel_bursts

  • Bigger = more outer pool slots (capped by available bursts)
  • For NISAR: this is the number of blocks running in parallel
  • Set equal to azimuth_blocks for maximum block parallelism
  • Typical sweet spot: both at 8

threads_per_worker

Sets:

  • OMP_NUM_THREADS
  • MKL_NUM_THREADS

Controls BLAS threads inside JAX kernels:

  • eigh
  • solve
  • matmul

Separate from JAX/XLA’s own thread pool.

Typical values:

  • 2 → reasonable default
  • 1 → if CPU constrained
  • 4+ → can help in FULL mode where one process can use more parallelism

JAX / XLA

JAX’s XLA backend maintains its own thread pool, sized by default to:

os.cpu_count()

Local benchmark full frame vs blocked results

displacement.run end-to-end through PS+PL+stitch benchmark run with taskset -c 0-33

GSLCs : 7 (Track 042 Frame 070 Descending orbit)
half_window : 5
num_blocks : 8
n_parallel_bursts : 8 (both modes)
threads_per_worker : 4 (both modes)
block_shape: 2048x2048 (both modes)
active threads FULL : 4
active threads BLK : 32

FULL wall = 8387.3s (139.79 min)
BLOCKED wall = 2159.2s ( 35.99 min)
SPEEDUP (BLOCKED vs FULL) : 3.88×

Checklist

  • The pull request title is a good summary of the changes - it will be used in the changelog
  • Unit tests for the changes exist
  • Tests pass on CI
  • Documentation reflects the changes where applicable
  • My PR is ready to review

mgovorcin and others added 2 commits May 21, 2026 22:22
Splits a single non-OPERA-burst frame (e.g. NISAR GSLC) into N azimuth
blocks processed in parallel as synthetic "bursts". Each block reuses
the full input file list but applies its own output_options.bounds, so
wrapped_phase.run masks per block via _get_mask. The existing burst pool
runs the blocks concurrently and stitching_bursts.run mosaics them.

Changes
-------
- New _block_split.split_frame_into_blocks: computes per-block (bounds,
  epsg) from cfg georef with a minimum-halo formula
    max(half_window_y,
        similarity_search_radius * stride_y,
        (corr_window_y // 2) * stride_y) + 5
  so block edges aren't cropped by SHP/similarity/stitcher windows.
- displacement.py: new module-level _GroupedInputs dataclass and
  _prepare_grouped_inputs helper that detects OPERA vs non-OPERA-burst
  mode from group_by_burst behavior. In non-OPERA mode, optional file
  lists (amp_disp, amp_mean, layover_shadow) are broadcast to every
  block id so second-batch reuse works.
- _create_burst_cfg gains a bounds= kwarg that writes
  output_options.bounds/_epsg per block. OPERA path passes None and is
  unchanged.
- config.InputOptions adds azimuth_blocks: int = 1 (default = no
  splitting, OPERA path unchanged) and halo_rows: int | None.
When running azimuth-block mode with n_parallel_bursts > 1, multiple
worker processes read the same NISAR GSLC (HDF5) files concurrently
at different spatial extents. On NFS/Lustre filesystems this triggers
HDF5 file-locking hangs. Set HDF5_USE_FILE_LOCKING=FALSE (via setdefault,
so an explicit caller setting is honoured) before spawning workers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
scottstanie pushed a commit to scottstanie/dolphin that referenced this pull request May 24, 2026
Replaces the cube → per-layer tif export with a single per-layer VRT
shim that points at ``ZARR:cube.zarr:/<var>:i`` as its source. GDAL's
ZARR driver (3.10+) reads the slice natively, so the existing tif-based
downstream pipeline (interferogram formation, stitching, unwrap,
timeseries) sees ordinary 2D rasters with no pixel-data duplication.

Highlights:
- Default `zarr_format=2` for the cube (GDAL's ZARR read path supports
  it cleanly; v3's `bytes` codec isn't implemented on the GDAL read side
  yet). `zarr_format=3` is still selectable via writer kwarg.
- New `emit_layer_vrts`, `zarr_subdataset_uri`, `layer_vrt_path` helpers
  in `dolphin.io._geozarr`. VRTs use absolute paths so they survive the
  per-ministack → top-level rename in `sequential.py`.
- `_ZarrLayerStack.export_tifs` now writes VRT shims instead of dumping
  layer data into tif files. `_make_stack_output` returns `.slc.vrt`
  paths in GEOZARR mode.
- Skip the GeoTIFF-only `repack_rasters` passes in GEOZARR mode (the
  cube is already chunked and compressed; repack would create one tif
  per VRT, undoing the no-duplication win).
- `_ensure_data_array` rejects the reserved coord names (x/y/spatial_ref)
  and checks shape against any existing array of the same name to catch
  accidental writes that would silently corrupt a coord.
- `wrapped_phase.py` cached/skip path and `sequential._get_outputs_from_folder`
  also glob for `.vrt` so GEOZARR runs can resume.

Parallelism: matches PR isce-framework#717. Each "burst" (or synthetic-burst
azimuth-split block) process gets its own work directory and its own
cube — no cross-process zarr coordination. Inside a process the cube
writer drains on a single background thread, the direct analog of
`BackgroundStackWriter`'s single-thread tif feeder, so the parallel
ThreadPoolExecutor in `single.py` sees the same write-side throughput
either way.

Tests updated:
- `test_io_geozarr.py`: added VRT-readable-via-rasterio, VRT-survives-move,
  shape-mismatch rejection, reserved-coord-name rejection, and
  default-is-v2 cases.
- `test_workflows_single.py::test_sequential_geozarr` and
  `test_workflows_displacement.py::test_displacement_run_geozarr` now
  assert *no* per-date `.slc.tif` are written, that VRTs are emitted,
  and that VRT reads round-trip to the cube layer.

https://claude.ai/code/session_01UC8HNQWpNsKBmBNhdMhsLu
scottstanie and others added 12 commits May 25, 2026 14:22
`format_nc_filename` always emitted `NETCDF:"file":"//ds"`. CF-compliant
HDF5s (OPERA CSLCs, COMPASS static_layers) work fine because GDAL's
NETCDF driver accepts them, but raw HDF5 files without CF metadata
(NISAR GSLCs) hit `RuntimeError: ... No such file or directory` when
GDAL's NETCDF driver refuses to open the subdataset.

The real split isn't by extension — it's whether the file carries CF
metadata. NISAR raw HDF5 doesn't; every other .h5 we care about does.
Use a basename prefix check: files starting with NISAR_ get the HDF5
driver, everything else gets NETCDF. Extend the allowlist if/when
another non-CF raw-HDF5 source appears.

Backport of 7e8f3a3 + f134bcc onto upstream main for the
azimuth-block-fixes branch.
Splits a single non-OPERA-burst frame (e.g. NISAR GSLC) into N azimuth
blocks processed in parallel as synthetic "bursts". Each block reuses
the full input file list but applies its own output_options.bounds, so
wrapped_phase.run masks per block via _get_mask. The existing burst pool
runs the blocks concurrently and stitching_bursts.run mosaics them.

Changes
-------
- New _block_split.split_frame_into_blocks: computes per-block (bounds,
  epsg) from cfg georef with a minimum-halo formula
    max(half_window_y,
        similarity_search_radius * stride_y,
        (corr_window_y // 2) * stride_y) + 5
  so block edges aren't cropped by SHP/similarity/stitcher windows.
- displacement.py: new module-level _GroupedInputs dataclass and
  _prepare_grouped_inputs helper that detects OPERA vs non-OPERA-burst
  mode from group_by_burst behavior. In non-OPERA mode, optional file
  lists (amp_disp, amp_mean, layover_shadow) are broadcast to every
  block id so second-batch reuse works.
- _create_burst_cfg gains a bounds= kwarg that writes
  output_options.bounds/_epsg per block. OPERA path passes None and is
  unchanged.
- config.InputOptions adds azimuth_blocks: int = 1 (default = no
  splitting, OPERA path unchanged) and halo_rows: int | None.
When running azimuth-block mode with n_parallel_bursts > 1, multiple
worker processes read the same NISAR GSLC (HDF5) files concurrently
at different spatial extents. On NFS/Lustre filesystems this triggers
HDF5 file-locking hangs. Set HDF5_USE_FILE_LOCKING=FALSE (via setdefault,
so an explicit caller setting is honoured) before spawning workers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The original azimuth-block splitter set each block's
``output_options.bounds`` to its read extent (central rows + halo) and
wrote phase-linking outputs over that full extent. Adjacent blocks then
*overlapped* in the halo region. ``stitching_bursts.run`` → ``gdal_merge.py``
does last-pixel-wins on overlap with no feathering, so on every interior
boundary about half of the overlap band ended up sourced from a block's
halo (edge-degraded) rather than its central (clean) rows.

Fix: each block runs phase linking over ``read_bounds`` (central + halo)
exactly as before — the halo is still loaded so SHP/EVD have clean
neighborhood context — but after ``wrapped_phase.run`` returns, the
stitching-bound outputs are cropped to ``central_bounds`` (no halo) via
``gdal.Translate``. Adjacent centrals are disjoint by construction, so
the stitcher has nothing to choose between. ``comp_slcs`` are not
cropped — they feed back into the *same* block's next ministack and
must match its read extent.

Other cleanups in the same diff:

- ``_block_split.py``: new ``BlockBounds`` dataclass carrying both
  read/central bounds + EPSG. Replace inline NETCDF: URI construction
  and raw GDAL/OSR calls with ``io.format_nc_filename``,
  ``io.get_raster_crs``, ``io.get_raster_xysize``, ``io.get_raster_gt``.
  Drop the dead AUTHORITY-code check (was unreachable under
  ``osr.UseExceptions``) in favor of a single try/except with a clear
  "geocoded input required" message.

- ``_utils.py``: ``_create_burst_cfg`` takes ``BlockBounds | None`` and
  clears top-level ``bounds_wkt`` before stamping per-block bounds, so
  cfgs that set ``bounds_wkt`` globally don't fail pydantic validation
  on every block.

- ``displacement.py``: ``HDF5_USE_FILE_LOCKING=FALSE`` moved into the
  ``ProcessPoolExecutor`` initializer (``_worker_init``) so the env
  mutation lives in worker processes, not the parent. Drop the
  redundant ``num_workers`` shadow of ``num_parallel``.

Tests (``tests/test_workflows_block_split.py``): 8 new unit tests
covering num_blocks=1, central tiling, halo overlap math at frame
edges and interior, halo-too-large raise, missing-CRS friendly
message, ``crop_to_central`` in-place rewrite, and frozen dataclass.
On real NISAR GSLCs, ``io.get_raster_crs`` returned an empty CRS because
GDAL's HDF5 driver doesn't read the CF ``grid_mapping`` attribute — that
machinery is in the NETCDF driver, which won't open NISAR (no CF
metadata at the file level). Result: the friendly error from my last
commit ("Could not read a usable EPSG... pass num_blocks=1") fired on
every NISAR run, defeating the whole point of the splitter.

Fix: dispatch in ``_read_grid_metadata``. NISAR-prefixed .h5 files now
go through ``_read_nisar_grid_metadata``, which reads via h5py from the
``grids/frequencyA`` group:

- ``projection`` scalar -> EPSG (uint32 or bytes)
- ``xCoordinates`` / ``yCoordinates`` -> grid extent (cell centers,
  backed off by ``x/yCoordinateSpacing/2`` for the UL-edge geotransform)

Honors the dataset's ``grid_mapping`` attribute if present, falling
back to the conventional ``projection`` name. Same convention
``opera_utils.nisar._info.get_nisar_bbox`` uses.

Everything else (.tif, OPERA CSLC .h5, .nc) still routes through the
dolphin GDAL helpers with ``format_nc_filename``.

Also drops a dead defensive check for a missing ``subdataset`` on NISAR
input — pydantic already enforces this on any .h5/.nc cfg.

Tests: add ``_make_nisar_h5`` helper that builds a minimal NISAR-shaped
HDF5 (HH + projection + coord datasets + grid_mapping attr), plus two
new cases (metadata roundtrip via h5py path, full splitter on a
synthetic NISAR frame).
GDAL's HDF5 driver doesn't read NISAR's CF ``grid_mapping`` — it
returns an identity geotransform and empty projection regardless of
what's in the file. On ``main`` this is invisible because nothing
rasterizes against a NISAR VRTStack's CRS, but the moment the
azimuth-block splitter sets ``output_options.bounds`` to drive
``masking.create_bounds_mask``, the mask comes back all-zero (the
UTM polygon doesn't intersect an identity-GT pixel grid) and the
run fails with ``MaskingError: No valid pixels left in mask``.

Fix in ``VRTStack.__init__``: after the usual GDAL-based metadata
read, if the result is the identity-GT + empty-projection signature
*and* the input is NISAR-prefixed .h5, override ``self.gt`` /
``self.proj`` / ``self.srs`` from a focused h5py read. The metadata
then flows into the .vrt on disk via the existing ``_write()`` path,
so every consumer (bounds mask, nodata mask, stitching, downstream
GDAL opens) sees the right CRS and geotransform.

Mechanics:

- New ``io._core.read_nisar_grid_metadata(filename, subdataset)``
  returns ``(nx, ny, gt, epsg, wkt)``. Reads the ``projection``
  sibling dataset for the EPSG and ``xCoordinates`` /
  ``yCoordinates`` for the cell-center grid, backing off by
  ``x/yCoordinateSpacing/2`` to anchor the geotransform at the
  upper-left edge. Honors the data dataset's ``grid_mapping``
  attribute if present.
- New ``io._core._is_nisar_h5(path)`` mirrors
  ``format_nc_filename``'s NISAR detection (basename prefix).
- ``_block_split.py`` drops its private copies and uses the shared
  helpers, so the NISAR convention lives in one place.

Test: new ``test_vrtstack_patches_nisar_metadata`` builds a minimal
NISAR-shaped HDF5 and verifies ``VRTStack`` exposes real CRS+GT
both as attributes and in the resulting ``.vrt`` on disk. All 11
block-split tests + the broader workflows/io suites (119 total)
pass.
Stitching failed with ``RuntimeError: Recursion detected`` from
gdal_merge when merging per-block interferograms. Root cause: dolphin's
per-block IFGs are virtual interferograms (``.vrt``), and
``crop_to_central`` was calling ``gdal.Translate(tmp.vrt, src.vrt,
projWin=...)`` then renaming ``tmp`` over ``src``. ``gdal.Translate``
on a VRT input writes an output VRT whose ``<SourceFilename>`` is the
*input path string*, so after the rename the file references itself —
gdal_merge follows the chain and trips GDAL's recursion guard.

Fix: always materialize through ``format="GTiff"``. For ``.tif`` inputs
the in-place rewrite still works; for ``.vrt`` inputs the result is a
sibling ``.tif`` and the original ``.vrt`` is removed (no self-reference
possible). ``crop_to_central`` now returns the output ``Path`` so
callers can substitute the possibly-extension-changed path back into
their file lists. ``displacement.run`` does that via list-comprehension
reassignment over each block's stitching-bound output lists.

The cost is materializing per-block IFGs (extra disk + the
multiply-conjugate they encoded virtually). For typical NISAR runs
with ~20 ifgs per ministack and 8 blocks this is ~tens of GB — not
ideal, but small relative to the GSLC stack and acceptable as the
correctness-first fix. A leaner alternative would be a VRT-XML-level
crop (adjust ``SrcRect``/``DstRect`` in place) which preserves the
virtual nature; deferred.

Tests: new ``test_crop_to_central_materializes_vrt`` builds a VRT
pointing at a GeoTIFF (mirrors how the IFG VRTs are structured) and
asserts the cropped output is a non-recursive GTiff at the central
extent. Updated ``test_crop_to_central_replaces_in_place`` to also
check the returned path.
Azimuth block fixes. Thank you Sara
…nputs

The merge left two copies of the dataclass and its builder; the stale copy
used dict[str, tuple[Bbox, int]] for block_bounds and triggered mypy
no-redef + assignment/arg-type errors. Drop the duplicate and the now-unused
Bbox import.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mgovorcin mgovorcin marked this pull request as ready for review May 27, 2026 19:11
@mirzaees mirzaees self-requested a review May 27, 2026 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants