[core][flink] Support incremental clustering for append bucketed table #6835

LsomeYeah · 2025-12-19T02:02:34Z

Purpose

Linked issue: close #xxx

By default, Paimon's append bucketed tables maintain data ordering. However, this ordering requirement can be relaxed to enable additional optimizations.

This PR introduces the ability to disable ordering requirements for append bucketed tables, allowing incremental clustering within buckets. When ordering is not strictly required, data can be incrementally clustered within each bucket, significantly improving query performance for bucket-key + clustering-key combinations.

Unlike append-unaware tables that require range partitioning, bucketed tables only need to shuffle by partition + bucket and perform local clustering within each bucket partition, making this approach much more efficient and resource-friendly.

Tests

API and Format

Documentation

JingsongLi

My idea is that the clustering of this Bucketed table is like merging small files of Append+Bucket-1 table, directly generating tasks for different concurrent writer to reads and writes separately.

JingsongLi · 2025-12-24T14:09:57Z

paimon-core/src/main/java/org/apache/paimon/append/cluster/IncrementalClusterManager.java

-                                runsInfo);
-                    });
-        }
+        partitionLevels.forEach(


Too deep. Use for loop.

LsomeYeah · 2025-12-26T06:18:21Z

Quote reply
Refer

Good idea! I will encapsulate the local-sort operation within a single bucket into a Task and move it to paimon-core, which will also facilitate integration with multiple other engines.

LsomeYeah marked this pull request as draft December 19, 2025 02:02

LsomeYeah marked this pull request as ready for review December 23, 2025 12:13

LsomeYeah force-pushed the bucket-cluster branch from 1835c68 to aedeea9 Compare December 23, 2025 12:36

LsomeYeah added 8 commits December 24, 2025 10:04

extract rangeShuffle and localSort

b99d645

support cluster for bucket table

9904b30

minor fix

d2632a0

add flink test

f6c43b3

change for spark

a6d3ec5

fix

119ef8e

fix for bucket-append-ordered

2e82e2a

add docs

692346a

LsomeYeah force-pushed the bucket-cluster branch from aedeea9 to 692346a Compare December 24, 2025 02:04

JingsongLi reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core][flink] Support incremental clustering for append bucketed table #6835

[core][flink] Support incremental clustering for append bucketed table #6835

Uh oh!

LsomeYeah commented Dec 19, 2025 •

edited

Loading

Uh oh!

JingsongLi left a comment

Uh oh!

JingsongLi Dec 24, 2025

Uh oh!

LsomeYeah commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[core][flink] Support incremental clustering for append bucketed table #6835

Are you sure you want to change the base?

[core][flink] Support incremental clustering for append bucketed table #6835

Uh oh!

Conversation

LsomeYeah commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Tests

API and Format

Documentation

Uh oh!

JingsongLi left a comment

Choose a reason for hiding this comment

Uh oh!

JingsongLi Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

LsomeYeah commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LsomeYeah commented Dec 19, 2025 •

edited

Loading