Iceberg Performance/correctness improvements by DAlperin · Pull Request #34961 · MaterializeInc/materialize

DAlperin · 2026-02-09T18:46:28Z

This PR is a stack of changes designed to address some of the bugs and perf issues discovered in testing.

I recommend reviewing each commit separately. If github had stacked PRs I'd use it here, but I'm not opening a PR for each of these commits :)

Motivation

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

Snapshot batches can contain millions of rows, causing the DeltaWriter's seen_rows HashMap to grow unbounded and consume excessive memory. For snapshots, disable position delete tracking by setting max_seen_rows=0. All deletes will use equality deletes instead, eliminating the memory overhead at the cost of slightly slower reads (acceptable for snapshots). Normal post-snapshot batches continue using position deletes as usual. Requires iceberg-rust 1b01c099 which adds the disable feature.

For fresh sinks, the catch-up batch was incorrectly starting from Timestamp::minimum() instead of as_of, causing it to cover a range where no data exists. Use max(resume_upper, as_of) as the batch lower bound to handle both: - Fresh sinks: start from as_of (where data actually begins) - Resuming sinks: start from resume_upper (where we left off)

Add debug! and trace! logging at key points to help diagnose issues: - Batch description minting (catch-up and future batches) - Waiting for first batch description before processing data - Batch descriptions received by write operator - Stashed rows (trace level) and periodic stash size warnings - Batch closing with frontier positions - Files written per batch This will help debug snapshot processing issues and frontier advancement.

Track max observed timestamps before init to synthesize an upper when a bounded input closes, and exit cleanly once the frontier is empty after init. Start minting once the frontier reaches as_of/resume_upper instead of waiting past them. Close write batches when the input frontier reaches the batch upper and only rescan when batch/frontier advances.

After restarts or restores we might see rows that we have already written. We should robustly drop those at the soonest moment we know we will not look at them.

martykulma

awesome, lgtm!

DAlperin added 6 commits February 9, 2026 13:27

iceberg: bump staged stats while writing

6f29f42

iceberg: rename metrics, avoid eager drop

16f2729

DAlperin requested a review from a team as a code owner February 9, 2026 18:46

DAlperin requested a review from a team February 9, 2026 18:50

DAlperin force-pushed the dov/iceberg-perf-improvements branch from 7cb18b8 to 16f2729 Compare February 9, 2026 18:54

DAlperin added 2 commits February 11, 2026 10:38

iceberg: prune rows we will never write

e9080cd

After restarts or restores we might see rows that we have already written. We should robustly drop those at the soonest moment we know we will not look at them.

Iceberg: Add some more comments describing thorny timestamp behavior

c871479

martykulma approved these changes Feb 12, 2026

View reviewed changes

DAlperin merged commit 1468518 into MaterializeInc:main Feb 13, 2026
133 of 134 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg Performance/correctness improvements#34961

Iceberg Performance/correctness improvements#34961
DAlperin merged 8 commits intoMaterializeInc:mainfrom
DAlperin:dov/iceberg-perf-improvements

DAlperin commented Feb 9, 2026

Uh oh!

martykulma left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DAlperin commented Feb 9, 2026

Motivation

Tips for reviewer

Checklist

Uh oh!

martykulma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants