Skip to content

Add tableProperties to TableAuditEvent#601

Open
james5418 wants to merge 4 commits into
linkedin:mainfrom
james5418:jamewang/table-properties-audit-event
Open

Add tableProperties to TableAuditEvent#601
james5418 wants to merge 4 commits into
linkedin:mainfrom
james5418:jamewang/table-properties-audit-event

Conversation

@james5418
Copy link
Copy Markdown

@james5418 james5418 commented May 24, 2026

Summary

Surface a snapshot of Iceberg table properties on the per-commit TableAuditEvent.

Emission is gated by allowlist:

Config Default Purpose
cluster.iceberg.tables.audit.table-properties-allowlist [] Which keys are appropriate to audit. [] = nothing emitted.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

./gradlew :services:tables:test --tests IcebergSnapshotsApiHandlerAuditTest

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

Copy link
Copy Markdown
Collaborator

@jiang95-dev jiang95-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for adding table properties?

We should cap on the size of table properties as some can be quite large. See this PR: https://github.com/linkedin-multiproduct/li-openhouse/pull/688. Large events can cause OOM and losing events.

@james5418
Copy link
Copy Markdown
Author

What is the use case for adding table properties?

  • Table properties are current-value-only. Once a property changes the prior value is gone.
  • No durability. There is no audit trail linking a property value to the snapshot that was active when it was set.
  • ACL-gated. Cannot read properties without per-table ACL grants.

We should cap on the size of table properties as some can be quite large. See this PR: https://github.com/linkedin-multiproduct/li-openhouse/pull/688. Large events can cause OOM and losing events.

Added a new configuration property cluster.iceberg.tables.audit.table-property-value-max-length, which defaults to 0 (unlimited). Values longer than N characters are dropped from the event with a warning log.

@james5418 james5418 requested review from cbb330 and jiang95-dev May 26, 2026 22:20
* Iterates the allowlist rather than the source so cost is O(|allowlist|) regardless of source
* size.
*/
private Map<String, String> filterTableProperties(Map<String, String> source) {
Copy link
Copy Markdown
Member

@abhisheknath2011 abhisheknath2011 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we exclude some system level operation properties such as dlo specific which can be very large? (cc: @jiang95-dev )?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also ran into OOM issues earlier while logging large events.

@cbb330
Copy link
Copy Markdown
Collaborator

cbb330 commented May 26, 2026

@jiang95-dev the table property audit is required for a partner team's observability use case which must read freshness watermarks from table properties every time a commit is made

@jiang95-dev , @abhisheknath2011 the existing table property config cluster.iceberg.tables.audit.table-properties-allowlist prevents it from being too large, by only allowing specific keys. it is default {} and only allowed keys enter

the keys we plan to add:

  • any with "watermark" in the name e.g. completionWatermark, inferredHighWatermark, etc.
  • ASL (which can be large)

@cbb330
Copy link
Copy Markdown
Collaborator

cbb330 commented May 26, 2026

@james5418 cluster.iceberg.tables.audit.table-property-value-max-length

is redundant with the other config. we can limit size already, by only allowing keys with small size. lets only have cluster.iceberg.tables.audit.table-properties-allowlist for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants