Increase copy speed by orders of magnitude #141

cdellacqua · 2025-01-31T16:22:26Z

Abstract

When T implements Copy, we can use the std/core method copy_from_slice to offload the data transfer to very optimized and potentially platform-specific functions.

Backstory

I was troubleshooting some code that deals with a huge ringbuffer (1mln f32s), where the most common operation is copying the last 2 thousand elements.

After some profiling, I found that the slowest operation was just skipping and iterating over the elements that I needed to copy out of the buffer.

I experimented with the built-in copy_from_slice, which under the hood calls memcpy, and I got these results:

	baseline	memcpy
debug	~30ms	~5μs
release	~1ms	~2μs

The baseline consists of using this:

let mut out = Vec::with_capacity(2000);
out.extend(ringbuffer.iter().skip(tons_of_items_to_skip).take(2000).copied());

While the code changes in this PR allow doing this:

let mut out = vec[0; 2000];
ringbuffer.copy_to_slice(tons_of_items_to_skip, &mut out); // one or two memcpy depending on the readptr position

The results are less impressive when working on the entire buffer, but still noticeable (benchmarks below).

Proposed solution

I've added two methods: copy_from_slice and copy_to_slice to the RingBuffer trait.

How it works

For ConstGeneric and Alloc buffers, copy_from_slice works by taking the pointer to the first relevant byte of the ringbuffer. It then checks whether the &slice fits a contiguous region of memory. If it does, then a single copy operation is performed. If it doesn't, the copy is split into the two halves.

copy_to_slice works the same way but inverting the destination and source slices.

VecDequeue has a simpler (and safe) implementation based on the built-in methods as_slices()/as_slices_mut().

Benchmark

I've added some tests and run them with criterion. Here are some relevant results:

`copy_to_slice` vs `extend` on a pre-allocated `Vec` with 1_000_000 elements

`copy_to_slice` vs `extend` on a pre-allocated `Vec` with 16 elements

I made sure to pre-allocate everything and, assuming I did it correctly, the speed-up looks quite substantial!

On this note, I added an unsafe set_len method to ConstGeneric and Alloc ring buffers that mimics what Vec::set_len does. It provides a nice way to "empty" a buffer of primitives by simply moving the buffer writeptr, without incurring the penalty of iterating over all the elements to call Drop::drop. Just like Vec::set_len this method can leak, as stated in the doc comment.

…en` method that mimics `Vec::set_len`

jdonszelmann · 2025-02-03T17:26:53Z

Seems like reasonable changes, also well tested and I like the perf bonus. lemme quickly approve CI and see if miri still passes but then it's ok by me. @NULLx76 ?

jdonszelmann · 2025-02-03T17:33:24Z

@cdellacqua I'm afraid, however nice, it fails miri. Want to take a look at it?

jdonszelmann · 2025-02-03T17:33:58Z

coverage is no prob for now, that's just a deprecated line in the ci workflow definition

…6#141

cdellacqua · 2025-02-03T18:33:20Z

Good catch!

I previously used the first element in the buffer (accessible by get_unchecked) to retrieve a pointer to the base. With the latest commit I added a dedicated function that doesn't index into the const generic buffer, avoiding the retagging issue

cdellacqua · 2025-02-07T15:39:01Z

I've been experimenting with an extend_from_slice and a drain_to_slice method. They're based on the same concept of copy_from_slice and copy_to_slice, except they move the read and write pointers accordingly. Benchmarks show similar improvements.

Would you be interested in this feature too? In that case, let me know if you'd like to slightly expand the scope of this PR or if you'd rather have a separate PR

Increase copy speed by orders of magnitude

cdellacqua added 3 commits January 31, 2025 15:24

feat: provide specialized methods to copy to and from slices + `set_l…

394f6e4

…en` method that mimics `Vec::set_len`

test: add benchmarks

00e654c

chore: rustfmt

70832ae

fix: add a dedicated function to get to the base of the buffer NULLx7…

3a36251

…6#141

Merge remote-tracking branch 'origin/main' into cdellacqua/main

d658362

NULLx76 changed the title ~~Increse copy speed by orders of magnitude~~ Increase copy speed by orders of magnitude Jul 11, 2025

NULLx76 merged commit 8721a85 into NULLx76:main Jul 11, 2025
7 of 8 checks passed

NULLx76 pushed a commit that referenced this pull request Jul 11, 2025

fix: add a dedicated function to get to the base of the buffer #141

109088b

NULLx76 added a commit that referenced this pull request Jul 11, 2025

Merge pull request #141 from cdellacqua/main

9f4b442

Increase copy speed by orders of magnitude

cdellacqua mentioned this pull request Aug 12, 2025

Extend and drain with slices #146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase copy speed by orders of magnitude #141

Increase copy speed by orders of magnitude #141

Uh oh!

cdellacqua commented Jan 31, 2025

Uh oh!

jdonszelmann commented Feb 3, 2025

Uh oh!

jdonszelmann commented Feb 3, 2025

Uh oh!

jdonszelmann commented Feb 3, 2025

Uh oh!

cdellacqua commented Feb 3, 2025 •

edited

Loading

Uh oh!

cdellacqua commented Feb 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Increase copy speed by orders of magnitude #141

Increase copy speed by orders of magnitude #141

Uh oh!

Conversation

cdellacqua commented Jan 31, 2025

Abstract

Backstory

Proposed solution

How it works

Benchmark

copy_to_slice vs extend on a pre-allocated Vec with 1_000_000 elements

copy_to_slice vs extend on a pre-allocated Vec with 16 elements

Uh oh!

jdonszelmann commented Feb 3, 2025

Uh oh!

jdonszelmann commented Feb 3, 2025

Uh oh!

jdonszelmann commented Feb 3, 2025

Uh oh!

cdellacqua commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdellacqua commented Feb 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`copy_to_slice` vs `extend` on a pre-allocated `Vec` with 1_000_000 elements

`copy_to_slice` vs `extend` on a pre-allocated `Vec` with 16 elements

cdellacqua commented Feb 3, 2025 •

edited

Loading