Add NISAR azimuth-block splitting for parallel full-frame processing by mgovorcin · Pull Request #717 · isce-framework/dolphin

mgovorcin · 2026-05-22T05:59:50Z

Adds a block-as-burst pattern so NISAR full-frame GSLCs can be processed in parallel using the existing burst pool.

For burst inputs nothing changes. For NISAR (non-burst) inputs, when input_options.azimuth_blocks > 1 the frame is split into N azimuth blocks at the dispatch layer, each block is processed as a synthetic burst through ProcessPoolExecutor, and per-block outputs are stitched by the existing stitching_bursts.run step. No changes to wrapped_phase.run or the stitcher are required.

Change Summary

workflows/_block_split.py (new) — split_frame_into_blocks() divides the pixel grid into N azimuth blocks with a halo; _min_halo_rows() auto-computes the minimum safe overlap from half-window, similarity search radius, and stride settings.
workflows/config/_common.py — InputOptions gains azimuth_blocks: int = 1 and halo_rows: Optional[int] = None.
workflows/displacement.py — extracts dispatch into a testable _prepare_grouped_inputs() function; sets HDF5_USE_FILE_LOCKING=FALSE before spawning workers when multiple processes share the same HDF5 files (prevents hangs on NFS/Lustre).
workflows/_utils.py — _create_burst_cfg accepts an optional bounds parameter to set output_options.bounds per block.

Typical config for a 4-block run:

input_options:
  azimuth_blocks: 4
worker_settings:
  n_parallel_bursts: 4
  threads_per_worker: 2

CLAUDE NOTES

└── Process #k (one per active "burst" / NISAR block)
    │
    ├── ThreadPoolExecutor (size = workers_per_burst = n_parallel_bursts // active_bursts)
    │   └── threads doing GDAL/HDF5 read-ahead inside EagerLoader
    │
    └── Main thread: per-output-block phase-linking
        ├── JAX/XLA host pool (default = os.cpu_count())
        │   └── parallelizes vmap'd ops across pixels
        └── BLAS via OMP_NUM_THREADS = threads_per_worker
            └── multi-threads eigh, solve, matmul inside JAX kernels

Key insight: n_parallel_bursts is dual-purpose. It's the outer pool size AND a budget that gets re-allocated to inner I/O workers when there are fewer "bursts" than pool slots.

n_parallel_bursts = 8, with N bursts available:
N = 1 (FULL) → 1 active process × 8 inner I/O threads
N = 4 → 4 active processes × 2 inner I/O threads each
N = 8 (BLOCKED) → 8 active processes × 1 inner I/O thread each

FULL vs BLOCKED

Aspect	FULL (`azimuth_blocks=1`)	BLOCKED (`azimuth_blocks=N`)
Processes	1	N (8 typical)
GIL	Locked to 1 interpreter	Escaped — N interpreters
JAX kernels	Parallelize across pixels via `vmap` (within 1 block)	Same per block, run N in parallel across blocks
Python control flow	Strictly serial sweep through frame	N serial sweeps in parallel
HDF5 reads	1 stream	N concurrent streams
JIT compile	Once, ~10–30 s	N times in parallel, ~30 s wall
Working set	Full frame in 1 process	~1/N of frame in each process

Each worker still uses ~1 core; you just have N of them running simultaneously on different rows.

Per-Knob Impact

`n_parallel_bursts`

Bigger = more outer pool slots (capped by available bursts)
For NISAR: this is the number of blocks running in parallel
Set equal to azimuth_blocks for maximum block parallelism
Typical sweet spot: both at 8

`threads_per_worker`

Sets:

OMP_NUM_THREADS
MKL_NUM_THREADS

Controls BLAS threads inside JAX kernels:

eigh
solve
matmul

Separate from JAX/XLA’s own thread pool.

Typical values:

2 → reasonable default
1 → if CPU constrained
4+ → can help in FULL mode where one process can use more parallelism

JAX / XLA

JAX’s XLA backend maintains its own thread pool, sized by default to:

os.cpu_count()

Local benchmark full frame vs blocked results

displacement.run end-to-end through PS+PL+stitch benchmark run with taskset -c 0-33

GSLCs : 7 (Track 042 Frame 070 Descending orbit)
half_window : 5
num_blocks : 8
n_parallel_bursts : 8 (both modes)
threads_per_worker : 4 (both modes)
block_shape: 2048x2048 (both modes)
active threads FULL : 4
active threads BLK : 32

FULL wall = 8387.3s (139.79 min)
BLOCKED wall = 2159.2s ( 35.99 min)
SPEEDUP (BLOCKED vs FULL) : 3.88×

Checklist

The pull request title is a good summary of the changes - it will be used in the changelog
Unit tests for the changes exist
Tests pass on CI
Documentation reflects the changes where applicable
My PR is ready to review

Splits a single non-OPERA-burst frame (e.g. NISAR GSLC) into N azimuth blocks processed in parallel as synthetic "bursts". Each block reuses the full input file list but applies its own output_options.bounds, so wrapped_phase.run masks per block via _get_mask. The existing burst pool runs the blocks concurrently and stitching_bursts.run mosaics them. Changes ------- - New _block_split.split_frame_into_blocks: computes per-block (bounds, epsg) from cfg georef with a minimum-halo formula max(half_window_y, similarity_search_radius * stride_y, (corr_window_y // 2) * stride_y) + 5 so block edges aren't cropped by SHP/similarity/stitcher windows. - displacement.py: new module-level _GroupedInputs dataclass and _prepare_grouped_inputs helper that detects OPERA vs non-OPERA-burst mode from group_by_burst behavior. In non-OPERA mode, optional file lists (amp_disp, amp_mean, layover_shadow) are broadcast to every block id so second-batch reuse works. - _create_burst_cfg gains a bounds= kwarg that writes output_options.bounds/_epsg per block. OPERA path passes None and is unchanged. - config.InputOptions adds azimuth_blocks: int = 1 (default = no splitting, OPERA path unchanged) and halo_rows: int | None.

When running azimuth-block mode with n_parallel_bursts > 1, multiple worker processes read the same NISAR GSLC (HDF5) files concurrently at different spatial extents. On NFS/Lustre filesystems this triggers HDF5 file-locking hangs. Set HDF5_USE_FILE_LOCKING=FALSE (via setdefault, so an explicit caller setting is honoured) before spawning workers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replaces the cube → per-layer tif export with a single per-layer VRT shim that points at ``ZARR:cube.zarr:/<var>:i`` as its source. GDAL's ZARR driver (3.10+) reads the slice natively, so the existing tif-based downstream pipeline (interferogram formation, stitching, unwrap, timeseries) sees ordinary 2D rasters with no pixel-data duplication. Highlights: - Default `zarr_format=2` for the cube (GDAL's ZARR read path supports it cleanly; v3's `bytes` codec isn't implemented on the GDAL read side yet). `zarr_format=3` is still selectable via writer kwarg. - New `emit_layer_vrts`, `zarr_subdataset_uri`, `layer_vrt_path` helpers in `dolphin.io._geozarr`. VRTs use absolute paths so they survive the per-ministack → top-level rename in `sequential.py`. - `_ZarrLayerStack.export_tifs` now writes VRT shims instead of dumping layer data into tif files. `_make_stack_output` returns `.slc.vrt` paths in GEOZARR mode. - Skip the GeoTIFF-only `repack_rasters` passes in GEOZARR mode (the cube is already chunked and compressed; repack would create one tif per VRT, undoing the no-duplication win). - `_ensure_data_array` rejects the reserved coord names (x/y/spatial_ref) and checks shape against any existing array of the same name to catch accidental writes that would silently corrupt a coord. - `wrapped_phase.py` cached/skip path and `sequential._get_outputs_from_folder` also glob for `.vrt` so GEOZARR runs can resume. Parallelism: matches PR isce-framework#717. Each "burst" (or synthetic-burst azimuth-split block) process gets its own work directory and its own cube — no cross-process zarr coordination. Inside a process the cube writer drains on a single background thread, the direct analog of `BackgroundStackWriter`'s single-thread tif feeder, so the parallel ThreadPoolExecutor in `single.py` sees the same write-side throughput either way. Tests updated: - `test_io_geozarr.py`: added VRT-readable-via-rasterio, VRT-survives-move, shape-mismatch rejection, reserved-coord-name rejection, and default-is-v2 cases. - `test_workflows_single.py::test_sequential_geozarr` and `test_workflows_displacement.py::test_displacement_run_geozarr` now assert *no* per-date `.slc.tif` are written, that VRTs are emitted, and that VRT reads round-trip to the cube layer. https://claude.ai/code/session_01UC8HNQWpNsKBmBNhdMhsLu

`format_nc_filename` always emitted `NETCDF:"file":"//ds"`. CF-compliant HDF5s (OPERA CSLCs, COMPASS static_layers) work fine because GDAL's NETCDF driver accepts them, but raw HDF5 files without CF metadata (NISAR GSLCs) hit `RuntimeError: ... No such file or directory` when GDAL's NETCDF driver refuses to open the subdataset. The real split isn't by extension — it's whether the file carries CF metadata. NISAR raw HDF5 doesn't; every other .h5 we care about does. Use a basename prefix check: files starting with NISAR_ get the HDF5 driver, everything else gets NETCDF. Extend the allowlist if/when another non-CF raw-HDF5 source appears. Backport of 7e8f3a3 + f134bcc onto upstream main for the azimuth-block-fixes branch.

Splits a single non-OPERA-burst frame (e.g. NISAR GSLC) into N azimuth blocks processed in parallel as synthetic "bursts". Each block reuses the full input file list but applies its own output_options.bounds, so wrapped_phase.run masks per block via _get_mask. The existing burst pool runs the blocks concurrently and stitching_bursts.run mosaics them. Changes ------- - New _block_split.split_frame_into_blocks: computes per-block (bounds, epsg) from cfg georef with a minimum-halo formula max(half_window_y, similarity_search_radius * stride_y, (corr_window_y // 2) * stride_y) + 5 so block edges aren't cropped by SHP/similarity/stitcher windows. - displacement.py: new module-level _GroupedInputs dataclass and _prepare_grouped_inputs helper that detects OPERA vs non-OPERA-burst mode from group_by_burst behavior. In non-OPERA mode, optional file lists (amp_disp, amp_mean, layover_shadow) are broadcast to every block id so second-batch reuse works. - _create_burst_cfg gains a bounds= kwarg that writes output_options.bounds/_epsg per block. OPERA path passes None and is unchanged. - config.InputOptions adds azimuth_blocks: int = 1 (default = no splitting, OPERA path unchanged) and halo_rows: int | None.

When running azimuth-block mode with n_parallel_bursts > 1, multiple worker processes read the same NISAR GSLC (HDF5) files concurrently at different spatial extents. On NFS/Lustre filesystems this triggers HDF5 file-locking hangs. Set HDF5_USE_FILE_LOCKING=FALSE (via setdefault, so an explicit caller setting is honoured) before spawning workers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The original azimuth-block splitter set each block's ``output_options.bounds`` to its read extent (central rows + halo) and wrote phase-linking outputs over that full extent. Adjacent blocks then *overlapped* in the halo region. ``stitching_bursts.run`` → ``gdal_merge.py`` does last-pixel-wins on overlap with no feathering, so on every interior boundary about half of the overlap band ended up sourced from a block's halo (edge-degraded) rather than its central (clean) rows. Fix: each block runs phase linking over ``read_bounds`` (central + halo) exactly as before — the halo is still loaded so SHP/EVD have clean neighborhood context — but after ``wrapped_phase.run`` returns, the stitching-bound outputs are cropped to ``central_bounds`` (no halo) via ``gdal.Translate``. Adjacent centrals are disjoint by construction, so the stitcher has nothing to choose between. ``comp_slcs`` are not cropped — they feed back into the *same* block's next ministack and must match its read extent. Other cleanups in the same diff: - ``_block_split.py``: new ``BlockBounds`` dataclass carrying both read/central bounds + EPSG. Replace inline NETCDF: URI construction and raw GDAL/OSR calls with ``io.format_nc_filename``, ``io.get_raster_crs``, ``io.get_raster_xysize``, ``io.get_raster_gt``. Drop the dead AUTHORITY-code check (was unreachable under ``osr.UseExceptions``) in favor of a single try/except with a clear "geocoded input required" message. - ``_utils.py``: ``_create_burst_cfg`` takes ``BlockBounds | None`` and clears top-level ``bounds_wkt`` before stamping per-block bounds, so cfgs that set ``bounds_wkt`` globally don't fail pydantic validation on every block. - ``displacement.py``: ``HDF5_USE_FILE_LOCKING=FALSE`` moved into the ``ProcessPoolExecutor`` initializer (``_worker_init``) so the env mutation lives in worker processes, not the parent. Drop the redundant ``num_workers`` shadow of ``num_parallel``. Tests (``tests/test_workflows_block_split.py``): 8 new unit tests covering num_blocks=1, central tiling, halo overlap math at frame edges and interior, halo-too-large raise, missing-CRS friendly message, ``crop_to_central`` in-place rewrite, and frozen dataclass.

On real NISAR GSLCs, ``io.get_raster_crs`` returned an empty CRS because GDAL's HDF5 driver doesn't read the CF ``grid_mapping`` attribute — that machinery is in the NETCDF driver, which won't open NISAR (no CF metadata at the file level). Result: the friendly error from my last commit ("Could not read a usable EPSG... pass num_blocks=1") fired on every NISAR run, defeating the whole point of the splitter. Fix: dispatch in ``_read_grid_metadata``. NISAR-prefixed .h5 files now go through ``_read_nisar_grid_metadata``, which reads via h5py from the ``grids/frequencyA`` group: - ``projection`` scalar -> EPSG (uint32 or bytes) - ``xCoordinates`` / ``yCoordinates`` -> grid extent (cell centers, backed off by ``x/yCoordinateSpacing/2`` for the UL-edge geotransform) Honors the dataset's ``grid_mapping`` attribute if present, falling back to the conventional ``projection`` name. Same convention ``opera_utils.nisar._info.get_nisar_bbox`` uses. Everything else (.tif, OPERA CSLC .h5, .nc) still routes through the dolphin GDAL helpers with ``format_nc_filename``. Also drops a dead defensive check for a missing ``subdataset`` on NISAR input — pydantic already enforces this on any .h5/.nc cfg. Tests: add ``_make_nisar_h5`` helper that builds a minimal NISAR-shaped HDF5 (HH + projection + coord datasets + grid_mapping attr), plus two new cases (metadata roundtrip via h5py path, full splitter on a synthetic NISAR frame).

GDAL's HDF5 driver doesn't read NISAR's CF ``grid_mapping`` — it returns an identity geotransform and empty projection regardless of what's in the file. On ``main`` this is invisible because nothing rasterizes against a NISAR VRTStack's CRS, but the moment the azimuth-block splitter sets ``output_options.bounds`` to drive ``masking.create_bounds_mask``, the mask comes back all-zero (the UTM polygon doesn't intersect an identity-GT pixel grid) and the run fails with ``MaskingError: No valid pixels left in mask``. Fix in ``VRTStack.__init__``: after the usual GDAL-based metadata read, if the result is the identity-GT + empty-projection signature *and* the input is NISAR-prefixed .h5, override ``self.gt`` / ``self.proj`` / ``self.srs`` from a focused h5py read. The metadata then flows into the .vrt on disk via the existing ``_write()`` path, so every consumer (bounds mask, nodata mask, stitching, downstream GDAL opens) sees the right CRS and geotransform. Mechanics: - New ``io._core.read_nisar_grid_metadata(filename, subdataset)`` returns ``(nx, ny, gt, epsg, wkt)``. Reads the ``projection`` sibling dataset for the EPSG and ``xCoordinates`` / ``yCoordinates`` for the cell-center grid, backing off by ``x/yCoordinateSpacing/2`` to anchor the geotransform at the upper-left edge. Honors the data dataset's ``grid_mapping`` attribute if present. - New ``io._core._is_nisar_h5(path)`` mirrors ``format_nc_filename``'s NISAR detection (basename prefix). - ``_block_split.py`` drops its private copies and uses the shared helpers, so the NISAR convention lives in one place. Test: new ``test_vrtstack_patches_nisar_metadata`` builds a minimal NISAR-shaped HDF5 and verifies ``VRTStack`` exposes real CRS+GT both as attributes and in the resulting ``.vrt`` on disk. All 11 block-split tests + the broader workflows/io suites (119 total) pass.

Stitching failed with ``RuntimeError: Recursion detected`` from gdal_merge when merging per-block interferograms. Root cause: dolphin's per-block IFGs are virtual interferograms (``.vrt``), and ``crop_to_central`` was calling ``gdal.Translate(tmp.vrt, src.vrt, projWin=...)`` then renaming ``tmp`` over ``src``. ``gdal.Translate`` on a VRT input writes an output VRT whose ``<SourceFilename>`` is the *input path string*, so after the rename the file references itself — gdal_merge follows the chain and trips GDAL's recursion guard. Fix: always materialize through ``format="GTiff"``. For ``.tif`` inputs the in-place rewrite still works; for ``.vrt`` inputs the result is a sibling ``.tif`` and the original ``.vrt`` is removed (no self-reference possible). ``crop_to_central`` now returns the output ``Path`` so callers can substitute the possibly-extension-changed path back into their file lists. ``displacement.run`` does that via list-comprehension reassignment over each block's stitching-bound output lists. The cost is materializing per-block IFGs (extra disk + the multiply-conjugate they encoded virtually). For typical NISAR runs with ~20 ifgs per ministack and 8 blocks this is ~tens of GB — not ideal, but small relative to the GSLC stack and acceptable as the correctness-first fix. A leaner alternative would be a VRT-XML-level crop (adjust ``SrcRect``/``DstRect`` in place) which preserves the virtual nature; deferred. Tests: new ``test_crop_to_central_materializes_vrt`` builds a VRT pointing at a GeoTIFF (mirrors how the IFG VRTs are structured) and asserts the cropped output is a non-recursive GTiff at the central extent. Updated ``test_crop_to_central_replaces_in_place`` to also check the returned path.

Azimuth block fixes. Thank you Sara

for more information, see https://pre-commit.ci

…nputs The merge left two copies of the dataclass and its builder; the stale copy used dict[str, tuple[Bbox, int]] for block_bounds and triggered mypy no-redef + assignment/arg-type errors. Drop the duplicate and the now-unused Bbox import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mgovorcin and others added 2 commits May 21, 2026 22:22

scottstanie and others added 12 commits May 25, 2026 14:22

populate the raise_on_empty error or warning choice to run_displacement

653da61

Merge branch 'azimuth-block' into azimuth-block-fixes

00cfcf5

Merge pull request #1 from mirzaees/azimuth-block-fixes

5584c17

Azimuth block fixes. Thank you Sara

[pre-commit.ci] auto fixes from pre-commit.com hooks

c23ccc6

for more information, see https://pre-commit.ci

mgovorcin marked this pull request as ready for review May 27, 2026 19:11

mirzaees self-requested a review May 27, 2026 19:56

mirzaees approved these changes May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NISAR azimuth-block splitting for parallel full-frame processing#717

Add NISAR azimuth-block splitting for parallel full-frame processing#717
mgovorcin wants to merge 14 commits into
isce-framework:mainfrom
mgovorcin:azimuth-block

mgovorcin commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mgovorcin commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Summary

CLAUDE NOTES

FULL vs BLOCKED

Per-Knob Impact

n_parallel_bursts

threads_per_worker

JAX / XLA

Local benchmark full frame vs blocked results

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mgovorcin commented May 22, 2026 •

edited

Loading

`n_parallel_bursts`

`threads_per_worker`