perf: improve performance of lpad/rpad by reusing buffers #19558

viirya · 2025-12-30T06:49:49Z

Optimized lpad and rpad functions to eliminate per-row allocations by reusing buffers for graphemes and fill characters.

The previous implementation allocated new Vec<&str> for graphemes and Vec for fill characters on every row, which was inefficient. This optimization introduces reusable buffers that are allocated once and cleared/refilled for each row.

Changes:

lpad: Added graphemes_buf and fill_chars_buf outside loops, clear and refill per row instead of allocating new Vec each time
rpad: Added graphemes_buf outside loops to reuse across iterations
Both functions now allocate buffers once and reuse them for all rows
Buffers are cleared and reused for each row via .clear() and .extend()

Optimization impact:

For lpad with fill parameter: Eliminates 2 Vec allocations per row (graphemes + fill_chars)
For lpad without fill: Eliminates 1 Vec allocation per row (graphemes)
For rpad: Eliminates 1 Vec allocation per row (graphemes)

This optimization is particularly effective for:

Large arrays with many rows
Strings with multiple graphemes (unicode characters)
Workloads with custom fill patterns

Benchmark results comparing main vs optimized branch:

lpad benchmarks:

size=1024, str_len=5, target=20: 116.53 µs -> 63.226 µs (45.7% faster)
size=1024, str_len=20, target=50: 314.07 µs -> 190.30 µs (39.4% faster)
size=4096, str_len=5, target=20: 467.35 µs -> 261.29 µs (44.1% faster)
size=4096, str_len=20, target=50: 1.2286 ms -> 754.24 µs (38.6% faster)

rpad benchmarks:

size=1024, str_len=5, target=20: 113.89 µs -> 72.645 µs (36.2% faster)
size=1024, str_len=20, target=50: 313.68 µs -> 202.98 µs (35.3% faster)
size=4096, str_len=5, target=20: 456.08 µs -> 295.57 µs (35.2% faster)
size=4096, str_len=20, target=50: 1.2523 ms -> 818.47 µs (34.6% faster)

Overall improvements: 35-46% faster across all workloads

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Optimized lpad and rpad functions to eliminate per-row allocations by reusing buffers for graphemes and fill characters. The previous implementation allocated new Vec<&str> for graphemes and Vec<char> for fill characters on every row, which was inefficient. This optimization introduces reusable buffers that are allocated once and cleared/refilled for each row. Changes: - lpad: Added graphemes_buf and fill_chars_buf outside loops, clear and refill per row instead of allocating new Vec each time - rpad: Added graphemes_buf outside loops to reuse across iterations - Both functions now allocate buffers once and reuse them for all rows - Buffers are cleared and reused for each row via .clear() and .extend() Optimization impact: - For lpad with fill parameter: Eliminates 2 Vec allocations per row (graphemes + fill_chars) - For lpad without fill: Eliminates 1 Vec allocation per row (graphemes) - For rpad: Eliminates 1 Vec allocation per row (graphemes) This optimization is particularly effective for: - Large arrays with many rows - Strings with multiple graphemes (unicode characters) - Workloads with custom fill patterns Benchmark results comparing main vs optimized branch: lpad benchmarks (size=1024): - utf8, str_len=5, target=20: 119.39 µs -> 56.297 µs (52.8% faster, 2.1x speedup) - stringview, str_len=5, target=20: 118.73 µs -> 56.254 µs (52.6% faster, 2.1x speedup) - utf8, str_len=20, target=50: 329.59 µs -> 169.95 µs (48.4% faster, 1.9x speedup) - stringview, str_len=20, target=50: 328.09 µs -> 175.37 µs (46.5% faster, 1.9x speedup) lpad benchmarks (size=4096): - utf8, str_len=5, target=20: 480.81 µs -> 225.45 µs (53.1% faster, 2.1x speedup) - stringview, str_len=5, target=20: 477.46 µs -> 227.22 µs (52.4% faster, 2.1x speedup) - utf8, str_len=20, target=50: 1.3199 ms -> 687.14 µs (47.9% faster, 1.9x speedup) - stringview, str_len=20, target=50: 1.3209 ms -> 688.33 µs (47.9% faster, 1.9x speedup) rpad benchmarks (size=1024): - utf8, str_len=5, target=20: 99.587 µs -> 61.490 µs (38.3% faster, 1.6x speedup) - stringview, str_len=5, target=20: 100.06 µs -> 61.470 µs (38.6% faster, 1.6x speedup) - utf8, str_len=20, target=50: 274.94 µs -> 170.35 µs (38.0% faster, 1.6x speedup) - stringview, str_len=20, target=50: 277.86 µs -> 171.87 µs (38.1% faster, 1.6x speedup) rpad benchmarks (size=4096): - utf8, str_len=5, target=20: 400.68 µs -> 244.85 µs (38.9% faster, 1.6x speedup) - stringview, str_len=5, target=20: 404.15 µs -> 245.60 µs (39.2% faster, 1.6x speedup) - utf8, str_len=20, target=50: 1.1168 ms -> 686.78 µs (38.5% faster, 1.6x speedup) - stringview, str_len=20, target=50: 1.1219 ms -> 695.01 µs (38.1% faster, 1.6x speedup) Overall improvements: 38-53% faster across all workloads (1.6-2.1x speedup)

Copilot

Pull request overview

This PR optimizes the performance of the lpad and rpad functions by eliminating per-row Vec allocations through buffer reuse. The optimization introduces reusable buffers for graphemes and fill characters that are allocated once outside the loops and cleared/refilled for each row, rather than allocating new vectors on every iteration.

Key changes:

Added buffer reuse optimization for graphemes in both lpad and rpad functions
Added buffer reuse optimization for fill characters in lpad function
Enhanced benchmarks with more comprehensive test cases covering different array sizes, string lengths, and data types

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
datafusion/functions/src/unicode/lpad.rs	Implements buffer reuse for graphemes and fill characters by declaring Vec buffers outside loops and using clear()/extend() pattern instead of per-row allocations
datafusion/functions/src/unicode/rpad.rs	Implements buffer reuse for graphemes using the same clear()/extend() pattern; however, missing the fill_chars_buf optimization that was implemented in lpad
datafusion/functions/benches/pad.rs	Restructures benchmarks with more comprehensive test coverage including different array sizes (1024, 4096), string lengths (5, 20), target lengths (20, 50), and data types (Utf8, StringView)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T09:42:31Z

datafusion/functions/src/unicode/rpad.rs

                                    fill.chars()
                                        .cycle()
-                                        .take(length - graphemes.len())
+                                        .take(length - graphemes_buf.len())
                                        .for_each(|ch| builder.write_char(ch).unwrap());


The fill characters should be pre-collected into a reusable buffer similar to how it's done in lpad.rs. Currently, fill.chars() is being called on every row which allocates a new iterator each time. This misses an optimization opportunity mentioned in the PR description: "For lpad with fill parameter: Eliminates 2 Vec allocations per row (graphemes + fill_chars)". The rpad function should also benefit from the same fill_chars_buf optimization by declaring a fill_chars_buf at line 219 and reusing it by clearing and extending with fill.chars() before this usage.

github-actions bot added the functions Changes to functions implementation label Dec 30, 2025

viirya force-pushed the pad_optimize branch 2 times, most recently from 3c3717a to aeb2911 Compare December 30, 2025 07:34

viirya force-pushed the pad_optimize branch from aeb2911 to e4cc5cb Compare December 30, 2025 07:57

Dandandan requested a review from Copilot December 30, 2025 09:39

Copilot started reviewing on behalf of Dandandan December 30, 2025 09:40 View session

Copilot AI reviewed Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: improve performance of lpad/rpad by reusing buffers #19558

perf: improve performance of lpad/rpad by reusing buffers #19558

viirya commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: improve performance of lpad/rpad by reusing buffers #19558

Are you sure you want to change the base?

perf: improve performance of lpad/rpad by reusing buffers #19558

Conversation

viirya commented Dec 30, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant