-
Notifications
You must be signed in to change notification settings - Fork 27
Increase copy speed by orders of magnitude #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…en` method that mimics `Vec::set_len`
|
Seems like reasonable changes, also well tested and I like the perf bonus. lemme quickly approve CI and see if miri still passes but then it's ok by me. @NULLx76 ? |
|
@cdellacqua I'm afraid, however nice, it fails miri. Want to take a look at it? |
|
coverage is no prob for now, that's just a deprecated line in the ci workflow definition |
|
Good catch! I previously used the first element in the buffer (accessible by |
|
I've been experimenting with an Would you be interested in this feature too? In that case, let me know if you'd like to slightly expand the scope of this PR or if you'd rather have a separate PR |
Increase copy speed by orders of magnitude
Abstract
When
Timplements Copy, we can use the std/core methodcopy_from_sliceto offload the data transfer to very optimized and potentially platform-specific functions.Backstory
I was troubleshooting some code that deals with a huge ringbuffer (1mln
f32s), where the most common operation is copying the last 2 thousand elements.After some profiling, I found that the slowest operation was just skipping and iterating over the elements that I needed to copy out of the buffer.
I experimented with the built-in
copy_from_slice, which under the hood callsmemcpy, and I got these results:The baseline consists of using this:
While the code changes in this PR allow doing this:
The results are less impressive when working on the entire buffer, but still noticeable (benchmarks below).
Proposed solution
I've added two methods:
copy_from_sliceandcopy_to_sliceto the RingBuffer trait.How it works
For ConstGeneric and Alloc buffers,
copy_from_sliceworks by taking the pointer to the first relevant byte of the ringbuffer. It then checks whether the&slicefits a contiguous region of memory. If it does, then a single copy operation is performed. If it doesn't, the copy is split into the two halves.copy_to_sliceworks the same way but inverting the destination and source slices.VecDequeuehas a simpler (and safe) implementation based on the built-in methodsas_slices()/as_slices_mut().Benchmark
I've added some tests and run them with criterion. Here are some relevant results:
copy_to_slicevsextendon a pre-allocatedVecwith 1_000_000 elementscopy_to_slicevsextendon a pre-allocatedVecwith 16 elementsI made sure to pre-allocate everything and, assuming I did it correctly, the speed-up looks quite substantial!
On this note, I added an unsafe
set_lenmethod to ConstGeneric and Alloc ring buffers that mimics whatVec::set_lendoes. It provides a nice way to "empty" a buffer of primitives by simply moving the buffer writeptr, without incurring the penalty of iterating over all the elements to callDrop::drop. Just likeVec::set_lenthis method can leak, as stated in the doc comment.