Object stats by mjp41 · Pull Request #616 · microsoft/snmalloc

mjp41 · 2023-06-08T11:35:18Z

This adds some statistic for tracking

How many deallocations are in message queues.
How many allocators have been created.
Per sizeclass statistics
- Number of objects allocated
- Number of objects deallocated
- Number of slabs allocated
- Number of slabs deallocated

The per sizeclass statistics are tracked per allocator, and a racy read is done to combine the results for displaying.

These statistics were used to debug #615 to calculate the fragmentation.

The displayed statistics are intended for post processing to calculate the fragmentation/utilisation.

The interface just prints the results using message. This could be improved with a better logging infrastructure.

nwf-msr

Comments so far; posting before switching threads.

src/snmalloc/backend_helpers/statsrange.h

src/snmalloc/ds_core/stats.h

src/snmalloc/mem/allocstats.h

src/snmalloc/mem/corealloc.h

nwf-msr

Generally looks quite nice. ISTR snmalloc of old had the ability to conditionally keep stats or not; perhaps it would be worth having an empty implementation of the Stat and MonotoneStat interfaces and either templating or having a namespace snmalloc-scoped using to pick between them?

nwf-msr · 2023-06-08T15:32:40Z

src/snmalloc/mem/globalalloc.h

    }

+    if (
+      result == nullptr && RemoteDeallocCache::remote_inflight.get_curr() != 0)


Something has happened (TM) with the syntax there. Can this be a SNMALLOC_ASSERT_MSG?

nwf-msr

Generally looks quite nice. ISTR snmalloc of old had the ability to conditionally keep stats or not; perhaps it would be worth having an empty implementation of the Stat and MonotoneStat interfaces and either templating or having a namespace snmalloc-scoped using to pick between them?

mjp41 · 2023-06-08T15:59:10Z

Generally looks quite nice. ISTR snmalloc of old had the ability to conditionally keep stats or not; perhaps it would be worth having an empty implementation of the Stat and MonotoneStat interfaces and either templating or having a namespace snmalloc-scoped using to pick between them?

I was going to profile to see how much the operations cost. If they are noticeable, then I will macro it away as you suggest.

mjp41 · 2025-03-25T13:10:05Z

So I have benchmarked this, and it has a perf regression. The worst case seems to be 3% (glibc-thread), but most tests are below 1%. I am going to investigate moving more of the statistics off the fast path.

This will basically be,

assuming everything in current fast free list has been allocated up front, so individual allocations don't need to do accounting.
on frees only update the count for a slab when we hit a slow path, or that free list is taken to be used as a fast free list.

This will over approximate the current user allocations quite a bit, but should make the overhead practically zero.

Alternatively, we could look at making this a compile time option.

akrieger · 2026-02-13T17:29:32Z

It would be great if this could be rebased to main (even if not landed), I gave it a shot but some of the conflicts were too weird for me to figure out. I don't care about landing or a perf regression, I just want to get some better stats around utilization for some local testing/comparisons.

This adds a collection of per sizeclass statistic for tracking how many allocations have occurred on each thread. These are racily combined to provide basic tracking information.

mjp41 · 2026-02-19T15:42:06Z

@akrieger I have rebased and it seems to pass tests, but this is currently minimally tested. I'll set a perf run to check what the regression is.

Please let me know what kind of API you would like to access the stats. Also, how accurate do you want the statistics? This should be tracking individual allocations and deallocations for each sizeclass. But we might want to track the number of allocations and deallocations at a coarser granularity to reduce the performance impact.

I would either make this reasonably accurate statistic available under a compile flag, or an over-approximating system which is always on. Or possibly both.

mjp41 · 2026-02-19T16:04:23Z

The perf results look similar to before, but now we have prettier results.

https://bencher.dev/console/projects/snmalloc/plots

The regression in redis might be noise, as one run was about the expected amount, and two were much larger. I've sent a second run to get a bit more data.

Currently, I don't think the performance is good enough for an always on feature. So either we need to add compile flags and some more CI targets, or reduce the accuracy and make it always on.

akrieger · 2026-02-19T16:06:38Z

I personally would like an accurate system over a performant one, but that's because I'm doing offline evaluation of various allocator options to decide which to use :)

There's two main questions I have not gotten good answers for when comparing/evaluating the various allocators: what is my fragmentation/utilization like, and can I tune my size classes to get better results for my specific workloads (and to what specific sizes). Right now all I can see are very high level patterns like 'on this test suite, snmalloc is consistently from 0-100MB higher in rss than mimalloc v3 but also seems to more aggressively return memory to the kernel'. But that extra 100MB might come at a bad time for an old android device and cause it to OOM instead, so I want to know if that's usable memory that will buffer incoming allocators or relatively permanently fragmented.

The api doesn't have to be particularly fast to answer any question either. Like a function call which returns or prints a list of stats like, I don't know, amount of used/fragmented/free space per slab or bucket or whatever the internal unit of allocation is (apologies, I haven't dug into it that deeply), which I can then print out at my convenience and postprocess in a spreadsheet app. It can take however long it needs to walk the internal structures in that case.

I wrote up this entire comment, by the way, without having reminded myself what the original PR summary was, and I see now that what I'm asking for is exactly what this PR was originally intended for :)

(For comparison, until now my memory debugging tool is to the Visual Studio memory profiler, which is... great for debugging specific allocations but not good for high level statistics).

mjp41 · 2026-02-19T16:55:16Z

@akrieger thanks. I think what is there should be fairly useable based on your description. It dumps to std::err, so hopefully not too interleaved with your output. We can move to a file, but that would be a reasonable amount of work to add for all platforms (we have a lot of them).

$ ./perf-batchblitz-fast  2> output
...............................................................

You can then grep output and you will grep two interleaved tables.

$ grep output -e "snmalloc_allocs"
0x1: snmalloc_allocs,dumpid,sizeclass,size,allocated,deallocated,in_use,bytes,slabs allocated,slabs deallocated,slabs in_use,slabs bytes
0x1: snmalloc_allocs,0x0,0x5c,0x1400,0x18d314,0x0,0x18d314,0x1f07d9000,0x241bc,0x0,0x241bc,0x120de0000
0x1: snmalloc_allocs,0x1,0x5c,0x1400,0x753fd1,0x3236a0,0x430931,0x53cb7d400,0xa6b12,0x81ccc,0x24e46,0x127230000
0x1: snmalloc_allocs,0x2,0x5c,0x1400,0x133c6d2,0x9f3b80,0x948b52,0xb9ae26800,0x1b378f,0x19ba09,0x17d86,0xbec30000
0x1: snmalloc_allocs,0x3,0x5c,0x1400,0x1c62752,0xee0fc0,0xd81792,0x10e1d76800,0x282401,0x2676a8,0x1ad59,0xd6ac8000
0x1: snmalloc_allocs,0x4,0x5c,0x1400,0x2a18f93,0x15fe9e0,0x141a5b3,0x1920f1fc00,0x3b854b,0x38dac3,0x2aa88,0x155440000
0x1: snmalloc_allocs,0x5,0x5c,0x1400,0x3714762,0x1dbc7d0,0x1957f92,0x1fadf76800,0x4dcef3,0x4cde0c,0xf0e7,0x78738000
0x1: snmalloc_allocs,0x6,0x5c,0x1400,0x430bee2,0x23fdbe0,0x1f0e302,0x26d1bc2800,0x5ebb71,0x5d0bd3,0x1af9e,0xd7cf0000
0x1: snmalloc_allocs,0x7,0x5c,0x1400,0x4f2e4ee,0x2ac16fe,0x246cdf0,0x2d8816c000,0x6fdccc,0x6e8854,0x15478,0xaa3c0000
0x1: snmalloc_allocs,0x8,0x5c,0x1400,0x5ac00f7,0x30fa8dc,0x29c581b,0x3436e21c00,0x803472,0x7e9e5a,0x19618,0xcb0c0000
0x1: snmalloc_allocs,0x9,0x5c,0x1400,0x6853d44,0x3812214,0x3041b30,0x3c521fc000,0x936571,0x90f2f1,0x27280,0x139400000
0x1: snmalloc_allocs,0xa,0x5c,0x1400,0x76fd921,0x40aa4c0,0x3653461,0x43e8179400,0xa81165,0xa729df,0xe786,0x73c30000
0x1: snmalloc_allocs,0xb,0x5c,0x1400,0x8259c2f,0x46c293a,0x3b972f5,0x4a7cfb2400,0xb81a4c,0xb6e9ee,0x1305e,0x982f0000
0x1: snmalloc_allocs,0xc,0x5c,0x1400,0x8ee3632,0x4de455a,0x40ff0d8,0x513ed0e000,0xc9cbd2,0xc95a50,0x7182,0x38c10000
0x1: snmalloc_allocs,0xd,0x5c,0x1400,0x9b91cc6,0x53fabe0,0x47970e6,0x597cd1f800,0xdbbf2b,0xd91507,0x2aa24,0x155120000
0x1: snmalloc_allocs,0xe,0x5c,0x1400,0xa7beed8,0x5af995e,0x4cc557a,0x5ff6ad8800,0xece835,0xeb27dc,0x1c059,0xe02c8000
0x1: snmalloc_allocs,0xf,0x5c,0x1400,0xb535aa4,0x626fc7e,0x52c5e26,0x67775af800,0xffeb67,0xfe71d4,0x17993,0xbcc98000
0x1: snmalloc_allocs,0x10,0x5c,0x1400,0xc1b6afc,0x68f96e0,0x58bd41c,0x6eec923000,0x111962c,0x10f5825,0x23e07,0x11f038000
0x1: snmalloc_allocs,0x11,0x5c,0x1400,0xcf8faa8,0x70f24f6,0x5e9d5b2,0x7644b1e800,0x1251d46,0x123f281,0x12ac5,0x95628000
0x1: snmalloc_allocs,0x12,0x5c,0x1400,0xdba0c0c,0x76f88e0,0x64a832c,0x7dd23f7000,0x1362dd3,0x133834b,0x2aa88,0x155440000
0x1: snmalloc_allocs,0x13,0x5c,0x1400,0xe5b4b51,0x7ccd5de,0x68e7573,0x83212cfc00,0x144627a,0x142942a,0x1ce50,0xe7280000
0x1: snmalloc_allocs,0x14,0x5c,0x1400,0xf1b4339,0x83c6c3e,0x6ded6fb,0x8968cb9c00,0x1554b1f,0x1549a85,0xb09a,0x584d0000
0x1: snmalloc_allocs,0x15,0x5c,0x1400,0xfe9963c,0x8aa9502,0x73f013a,0x90ec188800,0x167820f,0x1666602,0x11c0d,0x8e068000
0x1: snmalloc_allocs,0x16,0x5c,0x1400,0x10ca3c0e,0x91f6de0,0x7aace2e,0x99581b9800,0x17b5988,0x179457b,0x2140d,0x10a068000
0x1: snmalloc_allocs,0x17,0x5c,0x1400,0x11806728,0x97f722e,0x800f4fa,0xa013238800,0x18b7066,0x188c6ae,0x2a9b8,0x154dc0000
0x1: snmalloc_allocs,0x18,0x5c,0x1400,0x122f08c7,0x9df61e0,0x84fa6e7,0xa6390a0c00,0x19ada35,0x1984889,0x291ac,0x148d60000
0x1: snmalloc_allocs,0x19,0x5c,0x1400,0x12c2c88b,0xa36e7be,0x88be0cd,0xaaed900400,0x1a7def5,0x1a66ccf,0x17226,0xb9130000
0x1: snmalloc_allocs,0x1a,0x5c,0x1400,0x13467792,0xa7c2810,0x8ca4f82,0xafce362800,0x1b37f07,0x1b19b94,0x1e373,0xf1b98000
0x1: snmalloc_allocs,0x1b,0x5c,0x1400,0x13e75ac5,0xacf52e0,0x91807e5,0xb5e09de400,0x1c1b774,0x1bf0cec,0x2aa88,0x155440000
0x1: snmalloc_allocs,0x1c,0x5c,0x1400,0x145f9e02,0xb1a7f50,0x9451eb2,0xb96665e800,0x1cc4a2b,0x1cb324c,0x117df,0x8bef8000
0x1: snmalloc_allocs,0x1d,0x5c,0x1400,0x14ef354f,0xb67efb6,0x9874599,0xbe916ff400,0x1d8f939,0x1d7b4d3,0x14466,0xa2330000

and

$ grep output -e "snmalloc_totals"
0x1: snmalloc_totals,dumpid,backend bytes,peak backend bytes,requested,slabs requested bytes,remote inflight bytes,allocator count
0x1: snmalloc_totals,0x0,0x122820000,0x122820000,0x1f07d9000,0x120de0000,0x0,0x8
0x1: snmalloc_totals,0x1,0x155620000,0x156020000,0x53cb7d400,0x127230000,0xd8f90000,0x8
0x1: snmalloc_totals,0x2,0x151620000,0x156020000,0xb9ae26800,0xbec30000,0x28000,0x8
0x1: snmalloc_totals,0x3,0x151a20000,0x156020000,0x10e1d76800,0xd6ac8000,0x11a58000,0x8
0x1: snmalloc_totals,0x4,0x156020000,0x156020000,0x1920f1fc00,0x155440000,0x108cb0800,0x8
0x1: snmalloc_totals,0x5,0x152420000,0x156020000,0x1fadf76800,0x78738000,0x33941000,0x8
0x1: snmalloc_totals,0x6,0x154c20000,0x156020000,0x26d1bc2800,0xd7cf0000,0x28000,0x8
0x1: snmalloc_totals,0x7,0x151020000,0x156020000,0x2d8816c000,0xaa3c0000,0x28000,0x8
0x1: snmalloc_totals,0x8,0x156020000,0x156020000,0x3436e21c00,0xcb0c0000,0xf3fcf000,0x8
0x1: snmalloc_totals,0x9,0x150420000,0x156020000,0x3c521fc000,0x139400000,0x4e1d3000,0x8
0x1: snmalloc_totals,0xa,0x152e20000,0x156020000,0x43e8179400,0x73c30000,0xbe48800,0x8
0x1: snmalloc_totals,0xb,0x14ca20000,0x156020000,0x4a7cfb2400,0x982f0000,0x2b8dd800,0x8
0x1: snmalloc_totals,0xc,0x14be20000,0x156020000,0x513ed0e000,0x38c10000,0x5426800,0x8
0x1: snmalloc_totals,0xd,0x156020000,0x156020000,0x597cd1f800,0x155120000,0x10fbcb000,0x8
0x1: snmalloc_totals,0xe,0x153c20000,0x156020000,0x5ff6ad8800,0xe02c8000,0x28000,0x8
0x1: snmalloc_totals,0xf,0x151220000,0x156020000,0x67775af800,0xbcc98000,0x2a07b000,0x8
0x1: snmalloc_totals,0x10,0x156020000,0x156020000,0x6eec923000,0x11f038000,0x12a47b000,0x8
0x1: snmalloc_totals,0x11,0x151220000,0x156020000,0x7644b1e800,0x95628000,0x1676000,0x8
0x1: snmalloc_totals,0x12,0x156020000,0x156020000,0x7dd23f7000,0x155440000,0x1095bb800,0x8
0x1: snmalloc_totals,0x13,0x152420000,0x156020000,0x83212cfc00,0xe7280000,0x2f977000,0x8
0x1: snmalloc_totals,0x14,0x14dc20000,0x156020000,0x8968cb9c00,0x584d0000,0x18b98800,0x8
0x1: snmalloc_totals,0x15,0x151620000,0x156020000,0x90ec188800,0x8e068000,0x11a3c800,0x8
0x1: snmalloc_totals,0x16,0x156020000,0x156020000,0x99581b9800,0x10a068000,0x7080000,0x8
0x1: snmalloc_totals,0x17,0x153620000,0x156020000,0xa013238800,0x154dc0000,0x3518a800,0x8
0x1: snmalloc_totals,0x18,0x152c20000,0x156020000,0xa6390a0c00,0x148d60000,0x7b88d800,0x8
0x1: snmalloc_totals,0x19,0x150620000,0x156020000,0xaaed900400,0xb9130000,0x39c5b000,0x8
0x1: snmalloc_totals,0x1a,0x151220000,0x156020000,0xafce362800,0xf1b98000,0x1987000,0x8
0x1: snmalloc_totals,0x1b,0x156020000,0x156020000,0xb5e09de400,0x155440000,0x13f303000,0x8
0x1: snmalloc_totals,0x1c,0x153a20000,0x156020000,0xb96665e800,0x8bef8000,0x2b5ca000,0x8
0x1: snmalloc_totals,0x1d,0x150620000,0x156020000,0xbe916ff400,0xa2330000,0x5cf4e000,0x8

The 0x1: comes from the logging to give the output line a thread id. You will want to strip this, and then you have two CSV files.

This doesn't stop the threads and uses relaxed reads and writes, so it isn't a technically correct snapshot, but for the kind of analysis you want, and I used it for, it should be accurate enough.

The 100Mb you mentioned what is the overall footprint? I am interested to know how much overhead we have on mimalloc for your application.

akrieger · 2026-02-19T17:15:06Z

Up to 100mb out of 1-2gb, so 5-10%, but it's rounding to the .1GB in the UI and I haven't dug deeper into that aspect yet (still mostly spending my time analyzing cpu/wall time).

mjp41 requested a review from nwf-msr June 8, 2023 11:35

nwf-msr reviewed Jun 8, 2023

View reviewed changes

mjp41 force-pushed the object_stats branch from b7650c2 to e0b4007 Compare March 21, 2025 17:13

mjp41 mentioned this pull request Mar 21, 2025

Fixed range relying on lazy commit? #755

Open

mjp41 force-pushed the object_stats branch 4 times, most recently from bfc415a to 5bc8fd8 Compare March 22, 2025 22:38

mjp41 mentioned this pull request Jul 1, 2025

metrics of snmalloc #409

Open

mjp41 marked this pull request as draft February 19, 2026 15:21

mjp41 force-pushed the object_stats branch from 85dac30 to 9a24163 Compare February 19, 2026 15:22

mjp41 added 14 commits February 19, 2026 15:22

Some outstanding tests.

7c9f6b5

Add statistic to snmalloc.

d15c1bc

This adds a collection of per sizeclass statistic for tracking how many allocations have occurred on each thread. These are racily combined to provide basic tracking information.

conversion fix.

efb4263

Fix header

fcfbd25

Clangformat

229a330

Do not write to the default allocators state.

ee46a3f

temporarily disable test to get a cleaner CI run

40e84f5

Change headers slightly.

5dca692

Move seqset as it uses pointeroffset and that is aal.

b8f369c

Fix stl

1ed9ab2

stl const?

be43d4c

Fixing CI

b3d9ff8

Fix inflight check

e395751

Fix inflight statistic

5b4e542

mjp41 added 3 commits February 19, 2026 15:23

Remove interlocked from fast path.

f97e741

Fix comment

28c7f4b

WIP: Rebasing to latest snmalloc.

fa8814e

mjp41 force-pushed the object_stats branch from 9a24163 to fa8814e Compare February 19, 2026 15:23

mjp41 added 2 commits February 19, 2026 15:24

CF

3af40d2

Fix inlining for failure.

9a7d04b

Comments

Conversation

mjp41 commented Jun 8, 2023

Uh oh!

nwf-msr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nwf-msr left a comment

Choose a reason for hiding this comment

Uh oh!

nwf-msr Jun 8, 2023

Choose a reason for hiding this comment

Uh oh!

nwf-msr left a comment

Choose a reason for hiding this comment

Uh oh!

mjp41 commented Jun 8, 2023

Uh oh!

mjp41 commented Mar 25, 2025

Uh oh!

akrieger commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mjp41 commented Feb 19, 2026

Uh oh!

akrieger commented Feb 19, 2026

Uh oh!

mjp41 commented Feb 19, 2026

Uh oh!

akrieger commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akrieger commented Feb 13, 2026 •

edited

Loading

mjp41 commented Feb 19, 2026 •

edited

Loading

akrieger commented Feb 19, 2026 •

edited

Loading