Speed up views with ICU sort keys by nickva · Pull Request #6050 · apache/couchdb

nickva · 2026-06-24T20:55:04Z

Add a sort-key ICU NIF and use to optimize views merging and sorting on the coordinator.

The are two separate commits. The first one implements the NIF and the subsequent one uses it to optimize fabric view handling.

ICU NIF Implementation

Add a sort key libicu NIF function. A sort key is an opaque binary representation generated by libicu from a key, which then can then be compared directly against other sort keys to produce an equivalent collation order as calling the pair-wise comparison libicu function.

The idea to use sort keys in the fabric view row "merge head" structure, where we merge together streaming rows from multiple workers. When we do that we keep either a sorted list (for map-only views) and then do an insertion sort step and take the minimum, or we keep the rows in key/value structure for reduce views and find the minimum key and its grouped values. In either case we can reduce the number of libicu compare(a,b) calls from O(K^2) to just O(K) sort key generating calls and since libicu calls are not cheap, it worth adding an extra NIF calls just for it.

As a side note: we've actually implemented this once during the now abandoned CouchDB 4.0 /w FoundationDB backed attempt, there we stored sort key in the database, which libicu workers do not recommend doing. Here we're planning on using in memory only on the coordinator.

https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison

The Optimization Per-se

On coordinators there are two separate places we optimize: the reduce views and map-only views. They are implemented somewhat differently. For both cases we win by generating the sort key once per row as it comes in, pay the CPU price once, and then when we merge sort it or insert it into the gb_tree when reducing. After that we only do Erlang comparisons, avoiding expensive repeated ICU pair-wise calls.

For both map and reduce views use a common buf_key/2 function to generate the sort key or a raw key, depending on the user's collator setting.

The map-only change is relatively straight-forward. We just use {{buf_key, Id}, Row} as the sortable rows and keep the same merge-sort behavior.

For reduce views we actually get a nice simplification. Previously, we had a map keyed by ejson key and search over it (order O(N)) on every emit. There were two distinct steps: 1) find the lowest key 2) find any other keys collating equal to it. We don't have to do that any longer, with gb_trees use the buf_key as the key and simply take the small (or greatest key).

On a quick benchmark of 100k docs with Q=8 saw a decent speedup:

  reduce (group level=3) : 5974ms -> 3294ms  (1.8x)
  maps                   : 2699ms -> 1917ms  (1.4x)

Add a sort key libicu NIF function. A sort key is an opaque binary representation generated by libicu from a key, which then can then be compared directly against other sort keys to produce an equivalent collation order as calling the pair-wise comparison libicu function. The idea to use sort keys in the fabric view row "merge head" structure, where we merge together streaming rows from multiple workers. When we do that we keep either a sorted list (for map-only views) and then do an insertion sort step and take the minimum, or we keep the rows in key/value structure for reduce views and find the minimum key and its grouped values. In either case we can reduce the number of libicu compare(a,b) calls from O(K^2) to just O(K) sort key generating calls and since libicu calls are not cheap, it worth adding an extra NIF calls just for it. As a side note: we've actually implemented this once during the now abandonned CouchDB 4.0 /w FoundationDB backed attempt, there we stored sort key in the database, which libicu workers do not recommend doing. Here we're planning on using in memory only on the coordinator. https://unicode-org.github.io/icu/userguide/collation/concepts#sortkeys-vs-comparison

In the previous commit we implemented sort keys and here is where we're using them to optimize views. On coordinators there are two separate places we optimize: the reduce views and map-only views. They are implemented somewhat differently. For both cases we win by generating the sort key once per row as it comes in, pay the CPU price once, and then when we merge sort it or insert it into the gb_tree when reducing. After that we only do Erlang comparisons, avoiding expensive repeated ICU pair-wise calls. For both map and reduce views use a common buf_key/2 function to generate the sort key or a raw key, depending on the user's collator setting. The map-only change is relatively straight-forward. We just use `{{buf_key, Id}, ROw}` as the sortable rows and keep the same merge-sort behavior. For reduce views we actually get a nice simplification. Previously, we had a map keyed by ejson key and search over it (order O(N)) on every emit. There were two distinct steps: 1) find the lowest key 2) find any other keys collating equal to it. We don't have to do that any longer, with gb_trees use the buf_key as the key and simply take the small (or greatest key). On a quick benchmark of 100k docs with Q=8 saw a decent speedup: ``` reduce (group level=3) : 5974ms -> 3294ms (1.8x) maps : 2699ms -> 1917ms (1.4x) ```

nickva added 2 commits June 26, 2026 16:19

nickva force-pushed the sortkey-collation branch from 66e3d6f to 93d2666 Compare June 26, 2026 21:12

nickva changed the title ~~Sort key libicu NIF~~ Speed up views with ICU sort keys Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up views with ICU sort keys#6050

Speed up views with ICU sort keys#6050
nickva wants to merge 2 commits into
mainfrom
sortkey-collation

nickva commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

nickva commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ICU NIF Implementation

The Optimization Per-se

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

nickva commented Jun 24, 2026 •

edited

Loading